Skip to yearly menu bar Skip to main content


Poster

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Coleman Hooper ⋅ Sehoon Kim ⋅ Hiva Mohammadzadeh ⋅ Michael Mahoney ⋅ Sophia Shao ⋅ Kurt Keutzer ⋅ Amir Gholami
2024 Poster

Abstract

Video

Chat is not available.