Aryabhata: An exam-focused language model for JEE Math
Ritvik Rastogi · Sachin Dharashivkar · Sandeep Penmetsa
Abstract
We present $\textbf{Aryabhata 1.0}$, a 7B parameter math reasoning model optimized for the Indian Joint Entrance Examination (JEE). While recent LLMs have advanced mathematical reasoning, many remain unsuitable for high-stakes educational use. Our model is created by merging strong open-weight reasoning backbones, followed by supervised fine-tuning with curriculum learning on verified chain-of-thought (CoT) traces obtained through best-of-$n$ rejection sampling. We further enhance performance via reinforcement learning with verifiable rewards (RLVR) using an A2C objective with group-relative advantage estimation, along with novel exploration strategies including $\textit{Adaptive Group Resizing}$ and $\textit{Temperature Scaling}$. Evaluated on in-distribution (JEE Main 2025) and out-of-distribution (MATH, GSM8K) benchmarks, the model surpasses comparable baselines in accuracy and efficiency, while producing pedagogically useful step-by-step reasoning. This work demonstrates that compact, exam-focused language models can deliver both strong performance and practical usability for educational contexts.
Chat is not available.
Successful Page Load