Introduces tool-integrated reasoning agents that interleave natural language reasoning with program-based tool use (computation libraries, symbolic solvers) for mathematical problem solving. The agent decides when to write code, execute it, and interpret results within a single reasoning chain — an early and influential demonstration of the "reasoning + code execution" paradigm that later became standard in frontier models.

Created ToRA-Corpus: 16K interactive tool-use trajectories via diverse self-sampling with GPT-4 correction. Models at 7B/13B/34B/70B (LLaMA-2 and CodeLLaMA bases) achieve 13-19% absolute improvements across 10 math datasets. ToRA-Code-34B reaches 50.8% on MATH (vs. GPT-4 CoT 42.5%) and ToRA-70B reaches 84.3% on GSM8k. ICLR 2024. By Gou, Shao, Gong, Shen, Yang, Huang, Duan, Chen (Tsinghua + Microsoft).

Paper

Venue ICLR 2024
foundationalreasoningcodingmath