Comprehensive benchmark for vision-language models.
benchmarkmultimodalevaluation