Skip to yearly menu bar Skip to main content


Muon: Training and Trade-offs with Latent Attention and MoE

Sushant Mehta ⋅ Raj Dandekar ⋅ Rajat Dandekar ⋅ Sreedath Panat

Abstract

Chat is not available.