Skip to yearly menu bar Skip to main content


Towards Low-bit Communication for Tensor Parallel LLM Inference

Harry Dong â‹… Tyler Johnson â‹… Minsik Cho â‹… Emad Soroush

Abstract

Video

Chat is not available.