Invited Speaker 2: Song Han
in
Workshop: Machine Learning for Systems
Abstract
In the post-Moore’s Law era, the amount of computation per unit cost and power is no longer increasing at its historic rate. In the post-ImageNet era, researchers are solving more complicated AI problems using larger data sets which drives the demand for more computation. This mismatch between supply and demand for computation highlights the need for co-designing efficient machine learning algorithms and domain-specific hardware architectures. Such algorithm-hardware co-design opens up a much larger design space, which requires domain experts on both sides (ML+systems), and human heuristics might be sub-optimal to explore the vast design space. We introduce three of our recent work of using machine learning to optimize the machine learning system: learning the optimal pruning strategy (AMC) and quantization strategy (HAQ) on the target hardware, rather than relying on rule-based strategies; learning the optimal neural network architecture that is specialized for a target hardware architecture, optimizing both accuracy and latency (ProxylessNAS), rather than using a generic neural network architecture across all hardware architectures; learning to optimize analog circuit parameters, rather than relying on experienced analog engineers to tune those transistors. On the other side of the loop (design hardware-friendly machine learning algorithms), I'll introduce the temporal shift module (TSM) that offers 8x lower latency, 12x higher throughput than 3D convolution-based methods, while ranking the first on both Something-Something V1 and V2 leaderboards. I’ll conclude the talk by giving an outlook of the design automation for efficient machine learning system.