Poster
in
Workshop: Deep Learning for Code in the Agentic Era

Advancing Environment Setup LLMs through Online Reinforcement Learning

Alexander Kovrigin · Aleksandra Eliseeva · Konstantin Grotov · Egor Bogomolov · Yaroslav Zharov

Project Page [ OpenReview]

Abstract

Environment setup—the process of configuring the system to work with a specific software project—represents a persistent challenge in Software Engineering (SE). Automated environment setup methods could assist developers by providing fully configured environments for arbitrary repositories without manual effort. This also helps SE researchers to scale execution-based benchmarks. However, recent studies reveal that even state-of-the-art Large Language Models (LLMs) achieve limited success on automating this task. To address this limitation, we employ an online Reinforcement Learning with Verifiable Rewards approach to improve the environment setup capabilities of LLMs. As outcome-based rewards for en- vironment setup require containerisation of each sample and are computationally expensive, we leverage lightweight proxy rewards. On EnvBench-Python, our method enables Qwen3-8B (a model runnable on consumer hardware) to set up 15.8 out of 329 repositories on average over five runs. This is a +690% gain over the base model and +58% over GPT-4o-mini at comparable cost. Our replication package with training code and trained model checkpoints is available online: https://github.com/envsetup-rl-dl4c/envsetup-rl.

Chat is not available.