NeurIPS Poster Pushing the Limits of Narrow Precision Inferencing at Cloud Scale with Microsoft Floating Point

Poster

Pushing the Limits of Narrow Precision Inferencing at Cloud Scale with Microsoft Floating Point

Bita Darvish Rouhani · Daniel Lo · Ritchie Zhao · Ming Liu · Jeremy Fowers · Kalin Ovtcharov · Anna Vinogradsky · Sarah Massengill · Lita Yang · Ray Bittner · Alessandro Forin · Haishan Zhu · Taesik Na · Prerak Patel · Shuai Che · Lok Chand Koppaka · XIA SONG · Subhojit Som · Kaustav Das · Saurabh K T · Steve Reinhardt · Sitaram Lanka · Eric Chung · Doug Burger

Poster Session 1 #532

[ Abstract ] [ Paper PDF ]

[ Paper ]

Abstract:

In this paper, we explore the limits of Microsoft Floating Point (MSFP), a new class of datatypes developed for production cloud-scale inferencing on custom hardware. Through the co-evolution of hardware design and algorithms, MSFP16 incurs 3x lower cost compared to Bfloat16 and MSFP12 has 4x lower cost compared to INT8 while delivering a comparable or better accuracy. MSFP incurs negligible impact to accuracy (<1%), requires no changes to the model topology, and is integrated with a mature cloud production pipeline. MSFP supports various classes of deep learning models including CNNs, RNNs, and Transformers without modification. Finally, we characterize the accuracy and implementation of MSFP and demonstrate its efficacy on a number of production scenarios, including models that power major online scenarios such as web search, question-answering, and image classification.

Chat is not available.