Understanding Tool-Integrated Reasoning
Abstract
We study why Tool-Integrated Reasoning (TIR) makes Large Language Models (LLMs) more capable. While LLMs integrated with tools like Python code interpreters show great promise, a principled theory explaining why this paradigm is effective has been missing. We provide the first formal proof that TIR fundamentally expands an LLM's capabilities, enabling previously impossible reasoning paths (Support Expansion) and making complex strategies practical within a finite token budget (Feasible Support). We conduct comprehensive experiments on challenging mathematical benchmarks, leveraging a Python interpreter as the external tool. Our results show that TIR models solve a class of problems that are fundamentally out of reach for pure-text models, even on tasks requiring deep abstract insight, not just calculation. We further identify the emergent cognitive patterns that illustrate how models learn to think with tools. To stably guide model behavior, we introduce Advantage Shaping Policy Optimization (ASPO), a novel algorithm that modifies the advantage directly, effectively encouraging desired tool-use behaviors without the training instability and performance loss of traditional reward shaping. Overall, our work provides the first principled explanation for TIR's success, shifting the focus from the mere fact that tools work to why and how they enable more powerful reasoning.