Skip to yearly menu bar Skip to main content


DenseMixer: Improving MoE Post-Training with Precise Router Gradient

Feng Yao ⋅ Junxia Cui ⋅ Ruohan Zhang ⋅ Liyuan Liu ⋅ Shibo Hao ⋅ Li Zhang ⋅ Chengyu Dong ⋅ Shuohang Wang ⋅ yelong shen ⋅ Jianfeng Gao ⋅ Jingbo Shang

Abstract

Chat is not available.