Skip to yearly menu bar Skip to main content


Towards Low-bit Communication for Tensor Parallel LLM Inference

Harry Dong ⋅ Tyler Johnson ⋅ Minsik Cho ⋅ Emad Soroush

Abstract

Video

Chat is not available.