Skip to yearly menu bar Skip to main content


Enhancing Generalization in Sparse Mixture of Experts Models: The Case for Increased Expert Activation in Compositional Tasks

Jinze Zhao · Junjie Yang · Peihao Wang · Yingbin Liang · Zhangyang "Atlas" Wang

Abstract

Chat is not available.