Reinforcement Learning for Hierarchical Proof Generation in Lean 4
Abstract
Scaling formal theorem proving to more complex problems requires more efficient inference methods than standard whole-proof generation, which struggles with longer proofs due to exponentially decreasing success rates with increasing proof length. In this work, we systematically study how online reinforcement learning can be coupled with a hierarchical (lemma-based) style of proof generation, that has recently gained popularity for inference-time methods. We show a fruitful interaction in two ways: reinforcement learning allows to train proof decomposition policies successfully, and hierarchical inference allows to overcome plateaus in reinforcement learning as its richer distribution favors exploration and generated lemmas can be understood as an online synthetic data generation technique. Overall, hierarchical inference trained with reinforcement learning produces strong numbers on evaluations, even at the 7B parameter scale, and outperforms standard whole-proof generation setups in terms of sample efficiency and scalability, making it a suitable technique for necessarily data-constrained formalization research efforts.