Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Optimization for ML Workshop

Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time

Yingyu Liang · Zhizhou Sha · Zhenmei Shi · Zhao Song · Yufa Zhou

Abstract

Chat is not available.