Advancing Environment Setup LLMs through Online Reinforcement Learning
Abstract
Environment setup—the process of configuring the system to work with a specific software project—represents a persistent challenge in Software Engineering (SE). Automated environment setup methods could assist developers by providing fully configured environments for arbitrary repositories without manual effort. This also helps SE researchers to scale execution-based benchmarks. However, recent studies reveal that even state-of-the-art Large Language Models (LLMs) achieve limited success on automating this task. To address this limitation, we employ an online Reinforcement Learning with Verifiable Rewards approach to improve the environment setup capabilities of LLMs. As outcome-based rewards for en- vironment setup require containerisation of each sample and are computationally expensive, we leverage lightweight proxy rewards. On EnvBench-Python, our method enables Qwen3-8B (a model runnable on consumer hardware) to set up 15.8 out of 329 repositories on average over five runs. This is a +690% gain over the base model and +58% over GPT-4o-mini at comparable cost. Our replication package with training code and trained model checkpoints is available online: https://github.com/envsetup-rl-dl4c/envsetup-rl.