A Control-Theoretic Account of Cognitive Effort in Language Models
Pranjal Garg
Abstract
We study how post-training reshapes the control geometry of large language models. Treating the residual stream as the state of a time-varying linear system, we fit local layer-to-layer maps, build finite-horizon controllability Gramians, and quantify (i) geometric difficulty via minimal end-to-end control energy $E_{\min}$ and (ii) efficiency $\eta = E_{\min}/E_{\text{actual}}$ along realized trajectories. Across four stages, from Baseline to fine-tuned models (SFT $\rightarrow$ DPO $\rightarrow$ Instruct (RLVR)) the Gramian spectrum compresses (fewer large-eigenvalue ''easy'' directions) and $E_{\min}$ rises monotonically. Principal-angle analyses show that fine-tuning rotates both “easy” and “hard” subspaces relative to Baseline, while off-manifold occupancy increases. Surprisingly, under a shared PCA, conversational prompts are geometrically harder than math prompts (higher $E_{\min}$, lower $\eta$), revealing a divergence between human-intuitive difficulty and LM (language model) control geometry. These results recast well-known post-training trade-offs as changes in controllability: steering remains possible, but “cheap” directions become scarce, implying larger control energy unless interventions target the new post-training control axes.
Chat is not available.
Successful Page Load