SEED-Bench
evalBenchmark for evaluating multimodal large language models across multiple dimensions. SEED-Bench-2 expanded to 24K multiple-choice questions covering 27 evaluation dimensions. Published at CVPR 2024.
Evaluation Details
Questions 19,242
Tasks 12
Domains 2
Scoring multiple-choice accuracy (4 options; likelihood-based answer ranking, no human or GPT judging)
Domains: image comprehension, video comprehension