Skip to yearly menu bar Skip to main content


An Information Theory of Compute-Optimal Size Scaling, Emergence, and Plateaus in Language Models

Anuj Keshava Nayak · Lav Varshney
[ Poster

Abstract

Chat is not available.