Learning Modular Exponentiation with Transformers
David Demitri Africa · sara kapoor · Simon Sorg · Challenger Mishra
Abstract
Modular exponentiation ($a^b \equiv d \bmod c$) is crucial to number theory and cryptography, yet remains largely unexplored from a mechanistic interpretability standpoint. We train compact 4‑layer encoder–decoder Transformers to predict $d$ and analyze how they come to solve the task. We compare principled sampling schemes for $(a,b,c,d)$, probe the learned token embeddings, and use causal interventions (activation patching) to localize the computation inside the network. Sampling $a$ and $b$ log‑uniformly (reciprocal sampling) removes severe output imbalance and yields large accuracy gains, with abrupt, synchronized jumps in accuracy that simultaneously cover families of related moduli (e.g., multiples of 23). Causal analysis shows that, on instances without reduction ($c > a^b$), a small circuit consisting only of final‑layer attention heads reproduces full‑model behavior, indicating functional specialization. These results suggest that Transformers can internalize modular arithmetic via compact, specialized circuits, and that data distribution strongly shapes both learning dynamics and generalization.
Chat is not available.
Successful Page Load