NeurIPS ATLAS: A spend classification benchmark for estimating scope 3 carbon emissions

Spotlight
in
Workshop: Tackling Climate Change with Machine Learning

ATLAS: A spend classification benchmark for estimating scope 3 carbon emissions

Andrew Dumit · Krishna Rao · Travis Kwee · Varsha Gopalakrishnan · Katherine Tsai · Sangwon Suh

[ Abstract ]

presentation: Tackling Climate Change with Machine Learning
Sun 15 Dec 8:15 a.m. PST — 5:30 p.m. PST

Abstract:

The majority (70%) of companies reporting their value chain emissions rely on financial spend ledger and emissions factors per dollar. Accurate classification of expenditures to emissions factors is critical but complex, given the sheer number of line items and the diversity of the ways in which they are categorized and described. This is an area where Large Language Models (LLMs) can play a key role. However, currently there is no benchmark dataset to evaluate the performance of LLM-based solutions. Here, we introduce the Aggregate Transaction Ledgers for Accounting Sustainability dataset or, ATLAS, and the initial evaluation results of four models using ATLAS. ATLAS is the first spend classification benchmark, comprising 10,000 labeled and de-identified spend items derived from human experts classifying spend items for company scope 3 emissions inventories. We evaluate four baseline models, with the best model achieving a top-1 accuracy of 50% and a top-3 accuracy of 61%. ATLAS enables systematic evaluation of LLMs for spend classification and our results provide a starting point for advancing automated carbon accounting and sustainability reporting for spend-based emissions.

Chat is not available.

Spotlight in Workshop: Tackling Climate Change with Machine Learning

ATLAS: A spend classification benchmark for estimating scope 3 carbon emissions

Andrew Dumit · Krishna Rao · Travis Kwee · Varsha Gopalakrishnan · Katherine Tsai · Sangwon Suh

Spotlight
in
Workshop: Tackling Climate Change with Machine Learning