NeurIPS Competition LLM Merging: Building LLMs Efficiently through Merging

Competition

LLM Merging: Building LLMs Efficiently through Merging

Margaret Li · Jiacheng Zhu · Rickard Brüel Gabrielsson · Derek Tam · Mikhail Yurochkin · Colin Raffel · Leshem Choshen

West Meeting Room 210

[ Abstract ]

[ OpenReview]

Sun 15 Dec 9 a.m. PST — 11:50 a.m. PST

Abstract:

Training high-performing large language models (LLMs) from scratch is a notoriously expensive and difficult task, costing hundreds of millions of dollars in compute alone. These pretrained LLMs, however, can cheaply and easily be adapted to new tasks via fine-tuning, leading to a proliferation of models that suit specific use cases. Recent work has shown that specialized fine-tuned models can be rapidly merged to combine capabilities and generalize to new skills. This raises the question: given a new suite of desired skills and design parameters, is it necessary to fine-tune or train yet another LLM from scratch, or can similar existing models be re-purposed for a new task with the right selection or merging procedure? The LLM Merging challenge aims to spur the development and evaluation of methods for merging and reusing existing models to form stronger new models without needing additional training. Specifically, the competition focuses on merging existing publicly-released expert models from Hugging Face, using only minimal compute and additional parameters. The goal will be to develop merged models that outperform existing models and existing merging baselines. Submissions will be judged based on the average accuracy on a set of held-out multiple-choice evaluation tasks and their efficiency. To make the competition as accessible as possible and ensure that the merging procedures are more efficient than fine-tuning, we will enforce a compute budget and focus on merging models with fewer than 8B parameters. A starter kit with all necessary materials (baseline implementations, requirements, the evaluation script, etc.) will be released on May 1st.

Chat is not available.

Schedule

Sun 9:05 a.m. - 9:10 a.m.	Welcome ( Intro ) > SlidesLive Video	Margaret Li 🔗
Sun 9:10 a.m. - 9:20 a.m.	Writeup Winners talk ( Prize Winner Presentation ) > SlidesLive Video	Siddharth Gupta 🔗
Sun 9:20 a.m. - 9:30 a.m.	Efficiency Winners talk ( Prize Winners Presentation ) > SlidesLive Video	yang ding 🔗
Sun 9:30 a.m. - 10:15 a.m.	Invited Talk: Modular Deep Learning (Jonas Pfeiffer) ( Invited Talk ) > SlidesLive Video	Jonas Pfeiffer 🔗
Sun 10:15 a.m. - 10:30 a.m.	Break	Margaret Li 🔗
Sun 10:30 a.m. - 10:40 a.m.	3rd place talk ( Prize Winner Presentation ) > SlidesLive Video	Zixiang Di 🔗
Sun 10:40 a.m. - 10:50 a.m.	2nd place talk ( Prize Winner Presentation ) > SlidesLive Video	Yinuo Zhang 🔗
Sun 10:50 a.m. - 11:00 a.m.	1st place talk ( Prize Winner Presentation ) > SlidesLive Video	Jisheng Fang 🔗
Sun 11:00 a.m. - 11:45 a.m.	Invited Talk: Decoding-time experts for language model adaptation (Alisa Liu) ( Invited Talk ) > SlidesLive Video	Alisa Liu 🔗
Sun 11:45 a.m. - 11:50 a.m.	Closing remarks ( Closing ) > SlidesLive Video	Margaret Li 🔗