Open-source framework of 1,000+ domain-diverse verifiable task environments for LLM reasoning research. Covers 8 domains including algorithms, cryptography, science, logical reasoning, and puzzles. Demonstrates "Task Scaling" — that broadening the variety of verifiable reasoning tasks during RL training significantly enhances both performance and efficiency. A 32B model trained with InternBootcamp achieved SOTA on Bootcamp-EVAL.

Library

GitHub Repository

reasoningreinforcement-learningevaluationopen-source

Related