AI Agents For Decision-Making in Climate Governance Using Policy Benchmarks
Abstract
Climate change governance requires navigating complex policy documents, including treaties, regulations, and socio-political frameworks. Understanding these texts is critical for evidence-based decision-making, yet remains challenging due to their complexity and domain specificity. In this work, we will evaluate the ability of GPT powered AI Agents to reason over climate policy benchmarks, with an enhanced focus on benchmarking LLMs in dynamic climate governance scenarios. Using datasets such as Climate-FEVER (fact-checking climate claims), LegalBench (legal reasoning tasks), and PolicyQA (policy text question answering), we assess agent’s performance in treaty interpretation, climate socio-political reasoning, adaptation policy assessment, and long-term policy forecasting. We apply a structured evaluation protocol based on both expert evaluations and interdisciplinary feedback, aiming to identify the strengths and limitations of LLMs in supporting climate governance tasks. Our study will seek a structured evaluation protocol and identifies strengths and limitations of LLMs in supporting climate governance tasks.