Skip to yearly menu bar Skip to main content


Poster

Dense Backpropagation Improves Training for Sparse Mixture-of-Experts

Ashwinee Panda ⋅ Vatsal Baherwani ⋅ Zain Sarwar ⋅ Benjamin Thérien ⋅ Sambit Sahu ⋅ Tom Goldstein ⋅ Supriyo Chakraborty
2025 Poster

Abstract

Video

Chat is not available.