Skip to yearly menu bar Skip to main content


DenseMixer: Improving MoE Post-Training with Precise Router Gradient

Feng Yao · Junxia Cui · Ruohan Zhang · Liyuan Liu · Shibo Hao · Li Zhang · Chengyu Dong · Shuohang Wang · yelong shen · Jianfeng Gao · Jingbo Shang

Abstract

Chat is not available.