Skip to yearly menu bar Skip to main content


Why Adam Outperforms Gradient Descent on Language Models: A Heavy-Tailed Class Imbalance Problem

Robin Yadav · Frederik Kunstner · Mark Schmidt · Alberto Bietti

Abstract

Chat is not available.