ERQA+
datasetEnhanced benchmark for embodied reasoning extending Google's ERQA benchmark. Focuses on egocentric scenes specifically adapted to embodied robot perspectives. Newly created via manual annotation over recent robotic videos (not reusing older VQA datasets). Offers a finer-grained taxonomy covering planning, prediction, perception, and spatial reasoning with detailed subcategories. Multi-stage filtering excludes trivial samples to keep evaluation challenging.