How to make your AI models faster, smaller, cheaper, greener?
Bertrand Charpentier
Abstract
AI models become more complex, the cost of inference—both in terms of computation and energy—continues to rise. In this talk, we will explore how combining compression techniques such as quantization, pruning, caching, and distillation can significantly optimize model performance during inference. By applying these methods, combining compression make possible to reduce model size and computational load while maintaining quality, thus making AI more accessible and environmentally sustainable.
Successful Page Load