Adaptive Control for Test-time Scaling
Taneesh Gupta · Rahul Madhavan · Rishabh Tiwari · Xuchao Zhang · Chetan Bansal · Saravan Rajmohan · Kurt Keutzer
Abstract
We introduce a framework of Adaptive Control Token Sampling (ACTS) policies that leverage probability signals from specific tokens in the LLM vocabulary to dynamically regulate optimal stopping in the generation process. Specifically, ACTS combats over-thinking and under-thinking in LLMs by leveraging adaptive signals about the generation trace at test-time offering superior test-time scaling properties. Our experiments show that ACTS effectively mitigates under-thinking on complex reasoning tasks using adaptive stopping-time policies. Furthermore, we propose an \textbf{Adaptive Self-Critique Sampler} that uses end-of-thinking spikes as triggers for self-evaluation, boosting reasoning accuracy upto $\sim 9.8$\% on the MATH-500. On instruction-following tasks, ACTS leverages end-of-sequence spikes to improve the quality-efficiency trade-off. Finally, we used spikes to propose a novel parallel sampling technique that intelligently initiates high-quality parallel reasoning trajectories from a shared sequentially generated thinking trace. Our work establishes control token probabilities as a powerful, untapped signal for creating more robust and efficient inference policies, offering a new paradigm to control test-time scaling.
Chat is not available.
Successful Page Load