Competition
LLM Merging: Building LLMs Efficiently through Merging
Margaret Li · Jiacheng Zhu · Rickard BrĂ¼el Gabrielsson · Derek Tam · Mikhail Yurochkin · Colin Raffel · Leshem Choshen
West Meeting Room 210
Training high-performing large language models (LLMs) from scratch is a notoriously expensive and difficult task, costing hundreds of millions of dollars in compute alone. These pretrained LLMs, however, can cheaply and easily be adapted to new tasks via fine-tuning, leading to a proliferation of models that suit specific use cases. Recent work has shown that specialized fine-tuned models can be rapidly merged to combine capabilities and generalize to new skills. This raises the question: given a new suite of desired skills and design parameters, is it necessary to fine-tune or train yet another LLM from scratch, or can similar existing models be re-purposed for a new task with the right selection or merging procedure? The LLM Merging challenge aims to spur the development and evaluation of methods for merging and reusing existing models to form stronger new models without needing additional training. Specifically, the competition focuses on merging existing publicly-released expert models from Hugging Face, using only minimal compute and additional parameters. The goal will be to develop merged models that outperform existing models and existing merging baselines. Submissions will be judged based on the average accuracy on a set of held-out multiple-choice evaluation tasks and their efficiency. To make the competition as accessible as possible and ensure that the merging procedures are more efficient than fine-tuning, we will enforce a compute budget and focus on merging models with fewer than 8B parameters. A starter kit with all necessary materials (baseline implementations, requirements, the evaluation script, etc.) will be released on May 1st.
Live content is unavailable. Log in and register to view live content
Schedule
Sun 9:05 a.m. - 9:10 a.m.
|
Welcome
(
Intro
)
>
|
Margaret Li 🔗 |
Sun 9:10 a.m. - 9:20 a.m.
|
Writeup Winners talk
(
Prize Winner Presentation
)
>
|
Siddharth Gupta 🔗 |
Sun 9:20 a.m. - 9:30 a.m.
|
Efficiency Winners talk
(
Prize Winners Presentation
)
>
|
yang ding 🔗 |
Sun 9:30 a.m. - 10:15 a.m.
|
Invited Talk: Modular Deep Learning (Jonas Pfeiffer)
(
Invited Talk
)
>
|
Jonas Pfeiffer 🔗 |
Sun 10:15 a.m. - 10:30 a.m.
|
Break
|
Margaret Li 🔗 |
Sun 10:30 a.m. - 10:40 a.m.
|
3rd place talk
(
Prize Winner Presentation
)
>
|
Zixiang Di 🔗 |
Sun 10:40 a.m. - 10:50 a.m.
|
2nd place talk
(
Prize Winner Presentation
)
>
|
Yinuo Zhang 🔗 |
Sun 10:50 a.m. - 11:00 a.m.
|
1st place talk
(
Prize Winner Presentation
)
>
|
Jisheng Fang 🔗 |
Sun 11:00 a.m. - 11:45 a.m.
|
Invited Talk: Decoding-time experts for language model adaptation (Alisa Liu)
(
Invited Talk
)
>
|
Alisa Liu 🔗 |
Sun 11:45 a.m. - 11:50 a.m.
|
Closing remarks
(
Closing
)
>
|
Margaret Li 🔗 |