Timezone: »
In this paper, we explore the limits of Microsoft Floating Point (MSFP), a new class of datatypes developed for production cloud-scale inferencing on custom hardware. Through the co-evolution of hardware design and algorithms, MSFP16 incurs 3x lower cost compared to Bfloat16 and MSFP12 has 4x lower cost compared to INT8 while delivering a comparable or better accuracy. MSFP incurs negligible impact to accuracy (<1%), requires no changes to the model topology, and is integrated with a mature cloud production pipeline. MSFP supports various classes of deep learning models including CNNs, RNNs, and Transformers without modification. Finally, we characterize the accuracy and implementation of MSFP and demonstrate its efficacy on a number of production scenarios, including models that power major online scenarios such as web search, question-answering, and image classification.
Author Information
Bita Darvish Rouhani (Microsoft)
Daniel Lo (Microsoft)
Ritchie Zhao (Microsoft)
Ming Liu (Microsoft)
Jeremy Fowers (Microsoft)
Kalin Ovtcharov (Microsoft)
Anna Vinogradsky (Caltech)
Sarah Massengill (Microsoft)
Lita Yang (Microsoft)
Ray Bittner (Microsoft Research)
Alessandro Forin (Microsoft)
Haishan Zhu (Microsoft)
Taesik Na (Microsoft)
Prerak Patel (Microsoft)
Shuai Che (Microsoft)
Lok Chand Koppaka (Microsoft)
XIA SONG (Microsoft)
Subhojit Som (Microsoft)
Kaustav Das (Microsoft)
Saurabh K T (Microsoft Corporation)
Steve Reinhardt (Microsoft)
Sitaram Lanka (Microsoft)
Eric Chung (Microsoft)
Doug Burger (Microsoft Research)
More from the Same Authors
-
2021 : Constraints with Doug Burger, Alysson Muotri, Ralph-Etienne-Cummings, Florian Engert »
Doug Burger · Florian Engert · Ralph Etienne-Cummings · Soledad Villar · Teresa Huang -
2022 Poster: On the Representation Collapse of Sparse Mixture of Experts »
Zewen Chi · Li Dong · Shaohan Huang · Damai Dai · Shuming Ma · Barun Patra · Saksham Singhal · Payal Bajaj · XIA SONG · Xian-Ling Mao · Heyan Huang · Furu Wei -
2023 Poster: Language Is Not All You Need: Aligning Perception with Language Models »
Shaohan Huang · Li Dong · Wenhui Wang · Yaru Hao · Saksham Singhal · Shuming Ma · Tengchao Lv · Lei Cui · Owais Khan Mohammed · Barun Patra · Qiang Liu · Kriti Aggarwal · Zewen Chi · Nils Bjorck · Vishrav Chaudhary · Subhojit Som · XIA SONG · Furu Wei -
2022 Poster: VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts »
Hangbo Bao · Wenhui Wang · Li Dong · Qiang Liu · Owais Khan Mohammed · Kriti Aggarwal · Subhojit Som · Songhao Piao · Furu Wei -
2021 Poster: COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining »
Yu Meng · Chenyan Xiong · Payal Bajaj · saurabh tiwary · Paul Bennett · Jiawei Han · XIA SONG -
2019 : Configurable Cloud-Scale DNN Processor for Real-Time AI »
Bita Darvish Rouhani -
2017 : Invited Talk: Accelerating Persistent Neural Networks at Datacenter Scale, Daniel Lo, Microsoft Research »
Daniel Lo