Optimizing Data Use for Efficient Pre-training
Danqi Chen
2024 KeyNote Talk
in
Workshop: The Fourth Workshop on Efficient Natural Language and Speech Processing (ENLSP-IV): Highlighting New Architectures for Future Foundation Models
in
Workshop: The Fourth Workshop on Efficient Natural Language and Speech Processing (ENLSP-IV): Highlighting New Architectures for Future Foundation Models
Abstract
Training large language models relies heavily on the quality and composition of data, yet optimizing data selection and utilization remains a significant challenge in the field. In this talk, I will outline several key ideas to enhance training efficiency through better data use and cover several findings from my lab on selecting high-quality datasets and optimizing data compositions. I will also introduce a simple yet powerful pre-training approach that conditions on meta-data information associated with training data. This approach is remarkably straightforward to implement, incurs minimal computational overhead, and yields significant efficiency gains.
Speaker
Danqi Chen
Video
Chat is not available.
Successful Page Load