Automating High Energy Physics Data Analysis with LLM-Powered Agents
Eli Gendreau-Distler · Luc Le Pottier · Chengxi Yang · Dongwon Kim · Joshua Ho · Haichen Wang
Abstract
We present a proof-of-principle study demonstrating the use of large language model (LLM) agents to automate a representative high energy physics (HEP) analysis. Using the Higgs boson diphoton cross-section measurement as a case study with ATLAS Open Data, we design a hybrid system that combines an LLM-based supervisor–coder agent with the Snakemake workflow manager. In this architecture, the workflow manager enforces reproducibility and determinism, while the agent autonomously generates, executes, and iteratively corrects analysis code in response to user instructions. We define evaluation metrics including success rate and efficiency, and use them to evaluate the performance of the $\texttt{gemini-2.5-pro}$ model. We also perform an analysis of errors in the case when the agent fails to complete the task. This work establishes a first LLM-agent automated data analysis workflow, enabling future studies of the capabilities and limitations of LLM agents in HEP.
Chat is not available.
Successful Page Load