NeurIPS Poster The ALCHEmist: Automated Labeling 500x CHEaper than LLM Data Annotators

Spotlight Poster

The ALCHEmist: Automated Labeling 500x CHEaper than LLM Data Annotators

Tzu-Heng Huang · Catherine Cao · Vaishnavi Bhargava · Frederic Sala

East Exhibit Hall A-C #1903

[ Abstract ] [ Project Page ]

[ Paper] [ Slides] [ Poster] [ OpenReview]

Wed 11 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract: Large pretrained models can be used as annotators, helping replace or augment crowdworkers and enabling distilling generalist models into smaller specialist models. Unfortunately, this comes at a cost: employing top-of-the-line models often requires paying thousands of dollars for API calls, while the resulting datasets are static and challenging to audit. To address these challenges, we propose a simple alternative: rather than directly querying labels from pretrained models, we task models to generate programs that can produce labels. These programs can be stored and applied locally, re-used and extended, and cost orders of magnitude less. Our system,

Alchemist

$\textbf{Alchemist}$ , obtains comparable to or better performance than large language model-based annotation in a range of tasks for a fraction of the cost: on average, improvements amount to a

12.9

$\textbf{12.9}$ % enhancement while the total labeling costs across all datasets are reduced by a factor of approximately

500 \times

$\textbf{500}\times$ .

Chat is not available.