Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Machine Learning for Systems

Efficient Prompt Caching for Large Language Model Inference via Embedding Similarity

Hanlin Zhu · Banghua Zhu · Jiantao Jiao

Abstract

Chat is not available.