Skip to yearly menu bar Skip to main content


CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing

Wenhao Zheng ⋅ Yixiao Chen ⋅ Weitong Zhang ⋅ Souvik Kundu ⋅ Yun Li ⋅ Zhengzhong Liu ⋅ Eric Xing ⋅ Hongyi Wang ⋅ Huaxiu Yao

Abstract

Chat is not available.