Contributing to an Efficient and Democratized Large Model Era
James Demmel · Yang You
La Nouvelle Orleans Ballroom A-C (level 2)
The success of the Transformer model has pushed the limits of deep learning to operate on the scale of trillions of parameters. This proliferation of large model size has outpaced the advances in hardware, resulting in an urgent need to distribute enormous models across multiple GPUs. Despite this trend, best practices for choosing an optimal strategy are still lacking due to the breadth of knowledge required across both deep learning and parallel computing.
This drives researchers to question deeply about: How to improve the training and inference efficiency of large models to reduce costs? Can we accommodate larger models with limited resources? What efforts can we make to enable more AI community members to access big models easily? In this tutorial, we investigate the efforts to solving above problems. A diverse set of parallelism is an important tool to improving the efficiency of large model training and inference. Heterogeneous memory management can enhance the model accommodation capacity of processors (e.g. GPUs).Further, deep learning systems for large AI models will significantly reduce the specialized background knowledge required from users, allowing AI users to quickly get started with larger models. We believe that with the benefits of these effective and extensive technologies for AI models, realizing an efficient and democratic big model era has become possible. We will provide participants with a systemic open-source solution and practical demonstrations for big models, in the hope of encouraging more practitioners and helping them apply mentioned technologies to their own practice.