AI Lab Tracker
Labs
Timeline
AutoCodeBench
dataset
2025-08-13
Tencent
Large-scale benchmark for evaluating agentic coding models. Includes AutoCodeInstruct with distilled answers from DeepSeek-V3.
Paper (arXiv)
GitHub
HuggingFace
Project Page
benchmark
coding
agentic
Notes
V2 released 2026.