Skip to yearly menu bar Skip to main content


Poster
in
Workshop: ML for Systems

QAQ: Query-adaptive Mixed-precision Quantization for Large Language Models

Shuxing Li ⋅ Huanrong Liu ⋅ Zelin Wang ⋅ Ruoyang Du ⋅ S Lee ⋅ Chunlin Tian ⋅ Qingbiao Li

Abstract

Chat is not available.