Learning to Solve and Verify: A Self-Play Framework for Mutually Improving Code and Test Generation
Abstract
Recent breakthroughs in Large Language Models (LLMs) have significantly advanced code generation. However, further progress is increasingly constrained by the limited availability of high-quality supervised data. Synthetic data generation via self-instruction shows potential, but naive approaches often suffer from error accumulation and generalization collapse, underscoring the critical need for robust quality control. This paper introduces Sol-Ver, a novel self-play framework where an LLM simultaneously acts as a solver (generating code) and a verifier (generating tests). These two capabilities are mutually enhanced: improved tests lead to better code, which in turn enables the generation of more discerning tests. Sol-Ver iteratively refines both code solutions and their corresponding unit tests, jointly improving both functionalities without requiring human annotations or larger, more capable teacher models. Our experiments using Llama 3.1 8B demonstrate substantial gains, achieving average relative improvements of 19.63% in code generation (pass@1) and 17.49% in test generation accuracy on the MBPP and LiveCodeBench benchmarks.