Talk Thu, Dec 4, 2025 • 4:30 PM – 4:42 PM PST Exhibit Hall A,B

How to make your AI models faster, smaller, cheaper, greener?

Bertrand Charpentier

Abstract

AI models become more complex, the cost of inference—both in terms of computation and energy—continues to rise. In this talk, we will explore how combining compression techniques such as quantization, pruning, caching, and distillation can significantly optimize model performance during inference. By applying these methods, combining compression make possible to reduce model size and computational load while maintaining quality, thus making AI more accessible and environmentally sustainable.

Video

Chat is not available.