Spotlight
in
Workshop: Tackling Climate Change with Machine Learning
ATLAS: A spend classification benchmark for estimating scope 3 carbon emissions
Andrew Dumit · Krishna Rao · Travis Kwee · Varsha Gopalakrishnan · Katherine Tsai · Sangwon Suh
Sun 15 Dec 8:15 a.m. PST — 5:30 p.m. PST
The majority (70%) of companies reporting their value chain emissions rely on financial spend ledger and emissions factors per dollar. Accurate classification of expenditures to emissions factors is critical but complex, given the sheer number of line items and the diversity of the ways in which they are categorized and described. This is an area where Large Language Models (LLMs) can play a key role. However, currently there is no benchmark dataset to evaluate the performance of LLM-based solutions. Here, we introduce the Aggregate Transaction Ledgers for Accounting Sustainability dataset or, ATLAS, and the initial evaluation results of four models using ATLAS. ATLAS is the first spend classification benchmark, comprising 10,000 labeled and de-identified spend items derived from human experts classifying spend items for company scope 3 emissions inventories. We evaluate four baseline models, with the best model achieving a top-1 accuracy of 50% and a top-3 accuracy of 61%. ATLAS enables systematic evaluation of LLMs for spend classification and our results provide a starting point for advancing automated carbon accounting and sustainability reporting for spend-based emissions.
Live content is unavailable. Log in and register to view live content