Comprehensive benchmark for long-context understanding in LLMs.

Dataset

GitHub Repository

benchmarkscaling