Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Optimization for ML Workshop

Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time

Yingyu Liang ⋅ Zhizhou Sha ⋅ Zhenmei Shi ⋅ Zhao Song ⋅ Yufa Zhou

Abstract

Chat is not available.