Benchmark for evaluating an agent's ability to browse the web and synthesize diverse sources.
benchmarkagenticevaluation

Notes

Date approximate.