Benchmark for evaluating the "context learning" abilities of long-context models.
benchmarkscaling