Timezone: »
Neural language models (NLMs) have recently gained a renewed interest by achieving state-of-the-art performance across many natural language processing (NLP) tasks. However, NLMs are very computationally demanding largely due to the computational cost of the decoding process, which consists of a softmax layer over a large vocabulary.We observe that in the decoding of many NLP tasks, only the probabilities of the top-K hypotheses need to be calculated preciously and K is often much smaller than the vocabulary size. This paper proposes a novel softmax layer approximation algorithm, called Fast Graph Decoder (FGD), which quickly identifies, for a given context, a set of K words that are most likely to occur according to a NLM. We demonstrate that FGD reduces the decoding time by an order of magnitude while attaining close to the full softmax baseline accuracy on neural machine translation and language modeling tasks. We also prove the theoretical guarantee on the softmax approximation quality.
Author Information
Minjia Zhang (Microsoft)
Wenhan Wang (Microsoft)
Xiaodong Liu (Microsoft)
Jianfeng Gao (Microsoft Research, Redmond, WA)
Yuxiong He (Microsoft)
More from the Same Authors
-
2021 Spotlight: Focal Attention for Long-Range Interactions in Vision Transformers »
Jianwei Yang · Chunyuan Li · Pengchuan Zhang · Xiyang Dai · Bin Xiao · Lu Yuan · Jianfeng Gao -
2021 : Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models »
Boxin Wang · Chejian Xu · Shuohang Wang · Zhe Gan · Yu Cheng · Jianfeng Gao · Ahmed Awadallah · Bo Li -
2021 : Few-Shot Learning Evaluation in Natural Language Understanding »
Subhabrata Mukherjee · Xiaodong Liu · Guoqing Zheng · Saghar Hosseini · Hao Cheng · Ge Yang · Christopher Meek · Ahmed Awadallah · Jianfeng Gao -
2023 : MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts »
Pan Lu · Hritik Bansal · Tanglin Xia · Jiacheng Liu · Chunyuan Li · Hannaneh Hajishirzi · Hao Cheng · Kai-Wei Chang · Michel Galley · Jianfeng Gao -
2023 : Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models »
Pan Lu · Baolin Peng · Hao Cheng · Michel Galley · Kai-Wei Chang · Ying Nian Wu · Song-Chun Zhu · Jianfeng Gao -
2023 : Explaining black box text modules in natural language with language models »
Chandan Singh · Aliyah Hsu · Richard Antonello · Shailee Jain · Alexander Huth · Bin Yu · Jianfeng Gao -
2023 : DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies »
Shuaiwen Song · Bonnie Kruft · Minjia Zhang · Conglong Li · Shiyang Chen · Chengming Zhang · Masahiro Tanaka · Xiaoxia Wu · Mohammed AlQuraishi · Gustaf Ahdritz · Christina Floristean · Rick Stevens · Venkatram Vishwanath · Arvind Ramanathan · Sam Foreman · Kyle Hippe · Prasanna Balaprakash · Yuxiong He -
2023 : Automatic Hallucination Assessment for Aligned Large Language Models via Transferable Adversarial Attacks »
Xiaodong Yu · Hao Cheng · Xiaodong Liu · Dan Roth · Jianfeng Gao -
2023 : Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs »
Qingru Zhang · Chandan Singh · Liyuan Liu · Xiaodong Liu · Bin Yu · Jianfeng Gao · Tuo Zhao -
2023 : An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models »
Yadong Lu · Chunyuan Li · Haotian Liu · Jianwei Yang · Jianfeng Gao · yelong shen -
2023 : Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs »
Suyu Ge · Yunan Zhang · Liyuan Liu · Minjia Zhang · Jiawei Han · Jianfeng Gao -
2023 : Sparse Backpropagation for MoE Training »
Liyuan Liu · Jianfeng Gao · Weizhu Chen -
2023 : Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs »
Suyu Ge · Yunan Zhang · Liyuan Liu · Minjia Zhang · Jiawei Han · Jianfeng Gao -
2023 : Fast-ELECTRA for Efficient Pre-training »
Chengyu Dong · Liyuan Liu · Hao Cheng · Jingbo Shang · Jianfeng Gao · Xiaodong Liu -
2023 : DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing »
Conglong Li · Zhewei Yao · Xiaoxia Wu · Minjia Zhang · Connor Holmes · Cheng Li · Yuxiong He -
2023 : Interactive Panel Discussion »
Nazneen Rajani · Tanya Roosta · Tim Dettmers · Minjia Zhang -
2023 Poster: LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day »
Chunyuan Li · Cliff Wong · Sheng Zhang · Naoto Usuyama · Haotian Liu · Jianwei Yang · Tristan Naumann · Hoifung Poon · Jianfeng Gao -
2023 Poster: Localized Symbolic Knowledge Distillation for Visual Commonsense Models »
Jae Sung Park · Jack Hessel · Khyathi Chandu · Paul Pu Liang · Ximing Lu · Peter West · Youngjae Yu · Qiuyuan Huang · Jianfeng Gao · Ali Farhadi · Yejin Choi -
2023 Poster: Guiding Large Language Models via Directional Stimulus Prompting »
Zekun Li · Baolin Peng · Pengcheng He · Michel Galley · Jianfeng Gao · Xifeng Yan -
2023 Poster: Segment Everything Everywhere All at Once »
Xueyan Zou · Jianwei Yang · Hao Zhang · Feng Li · Linjie Li · Jianfeng Wang · Lijuan Wang · Jianfeng Gao · Yong Jae Lee -
2023 Poster: Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models »
Pan Lu · Baolin Peng · Hao Cheng · Michel Galley · Kai-Wei Chang · Ying Nian Wu · Song-Chun Zhu · Jianfeng Gao -
2023 Poster: Bridging Discrete and Backpropagation: Straight-Through and Beyond »
Liyuan Liu · Chengyu Dong · Xiaodong Liu · Bin Yu · Jianfeng Gao -
2023 Poster: Augmenting Language Models with Long-Term Memory »
Weizhi Wang · Li Dong · Hao Cheng · Xiaodong Liu · Xifeng Yan · Jianfeng Gao · Furu Wei -
2023 Oral: Bridging Discrete and Backpropagation: Straight-Through and Beyond »
Liyuan Liu · Chengyu Dong · Xiaodong Liu · Bin Yu · Jianfeng Gao -
2022 Spotlight: ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers »
Zhewei Yao · Reza Yazdani Aminabadi · Minjia Zhang · Xiaoxia Wu · Conglong Li · Yuxiong He -
2022 Spotlight: Lightning Talks 5B-2 »
Conglong Li · Mohammad Azizmalayeri · Mojan Javaheripi · Pratik Vaishnavi · Jon Hasselgren · Hao Lu · Kevin Eykholt · Arshia Soltani Moakhar · Wenze Liu · Gustavo de Rosa · Nikolai Hofmann · Minjia Zhang · Zixuan Ye · Jacob Munkberg · Amir Rahmati · Arman Zarei · Subhabrata Mukherjee · Yuxiong He · Shital Shah · Reihaneh Zohrabi · Hongtao Fu · Tomasz Religa · Yuliang Liu · Mohammad Manzuri · Mohammad Hossein Rohban · Zhiguo Cao · Caio Cesar Teodoro Mendes · Sebastien Bubeck · Farinaz Koushanfar · Debadeepta Dey -
2022 Spotlight: The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models »
Conglong Li · Minjia Zhang · Yuxiong He -
2022 Spotlight: Focal Modulation Networks »
Jianwei Yang · Chunyuan Li · Xiyang Dai · Jianfeng Gao -
2022 Spotlight: ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models »
Chunyuan Li · Haotian Liu · Liunian Li · Pengchuan Zhang · Jyoti Aneja · Jianwei Yang · Ping Jin · Houdong Hu · Zicheng Liu · Yong Jae Lee · Jianfeng Gao -
2022 Panel: Panel 2B-4: Extreme Compression for… & Exploring Length Generalization… »
Cem Anil · Minjia Zhang -
2022 Spotlight: Fault-Aware Neural Code Rankers »
Jeevana Priya Inala · Chenglong Wang · Mei Yang · Andres Codas · Mark Encarnación · Shuvendu Lahiri · Madanlal Musuvathi · Jianfeng Gao -
2022 Poster: K-LITE: Learning Transferable Visual Models with External Knowledge »
Sheng Shen · Chunyuan Li · Xiaowei Hu · Yujia Xie · Jianwei Yang · Pengchuan Zhang · Zhe Gan · Lijuan Wang · Lu Yuan · Ce Liu · Kurt Keutzer · Trevor Darrell · Anna Rohrbach · Jianfeng Gao -
2022 Poster: ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers »
Zhewei Yao · Reza Yazdani Aminabadi · Minjia Zhang · Xiaoxia Wu · Conglong Li · Yuxiong He -
2022 Poster: Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone »
Zi-Yi Dou · Aishwarya Kamath · Zhe Gan · Pengchuan Zhang · Jianfeng Wang · Linjie Li · Zicheng Liu · Ce Liu · Yann LeCun · Nanyun Peng · Jianfeng Gao · Lijuan Wang -
2022 Poster: ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models »
Chunyuan Li · Haotian Liu · Liunian Li · Pengchuan Zhang · Jyoti Aneja · Jianwei Yang · Ping Jin · Houdong Hu · Zicheng Liu · Yong Jae Lee · Jianfeng Gao -
2022 Poster: Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models »
Dongkuan (DK) Xu · Subhabrata Mukherjee · Xiaodong Liu · Debadeepta Dey · Wenhui Wang · Xiang Zhang · Ahmed Awadallah · Jianfeng Gao -
2022 Poster: The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models »
Conglong Li · Minjia Zhang · Yuxiong He -
2022 Poster: Focal Modulation Networks »
Jianwei Yang · Chunyuan Li · Xiyang Dai · Jianfeng Gao -
2022 Poster: Fault-Aware Neural Code Rankers »
Jeevana Priya Inala · Chenglong Wang · Mei Yang · Andres Codas · Mark Encarnación · Shuvendu Lahiri · Madanlal Musuvathi · Jianfeng Gao -
2022 Poster: GLIPv2: Unifying Localization and Vision-Language Understanding »
Haotian Zhang · Pengchuan Zhang · Xiaowei Hu · Yen-Chun Chen · Liunian Li · Xiyang Dai · Lijuan Wang · Lu Yuan · Jenq-Neng Hwang · Jianfeng Gao -
2022 Poster: XTC: Extreme Compression for Pre-trained Transformers Made Simple and Efficient »
Xiaoxia Wu · Zhewei Yao · Minjia Zhang · Conglong Li · Yuxiong He -
2021 : Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models »
Boxin Wang · Chejian Xu · Shuohang Wang · Zhe Gan · Yu Cheng · Jianfeng Gao · Ahmed Awadallah · Bo Li -
2021 Poster: NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM »
Connor Holmes · Minjia Zhang · Yuxiong He · Bo Wu -
2021 Poster: Focal Attention for Long-Range Interactions in Vision Transformers »
Jianwei Yang · Chunyuan Li · Pengchuan Zhang · Xiyang Dai · Bin Xiao · Lu Yuan · Jianfeng Gao -
2021 Poster: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer »
Ge Yang · Edward Hu · Igor Babuschkin · Szymon Sidor · Xiaodong Liu · David Farhi · Nick Ryder · Jakub Pachocki · Weizhu Chen · Jianfeng Gao -
2021 : WebQA Competition + Q&A »
Yingshan CHANG · Yonatan Bisk · Mridu Narang · Levi Melnick · Jianfeng Gao · Hisami Suzuki · Guihong Cao -
2020 Poster: HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogeneous Memory »
Jie Ren · Minjia Zhang · Dong Li -
2020 Poster: Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping »
Minjia Zhang · Yuxiong He -
2020 Poster: AdaTune: Adaptive Tensor Program Compilation Made Efficient »
Menghao Li · Minjia Zhang · Chi Wang · Mingqin Li -
2019 Poster: Unified Language Model Pre-training for Natural Language Understanding and Generation »
Li Dong · Nan Yang · Wenhui Wang · Furu Wei · Xiaodong Liu · Yu Wang · Jianfeng Gao · Ming Zhou · Hsiao-Wuen Hon -
2018 Poster: M-Walk: Learning to Walk over Graphs using Monte Carlo Tree Search »
Yelong Shen · Jianshu Chen · Po-Sen Huang · Yuqing Guo · Jianfeng Gao -
2018 Poster: Generating Informative and Diverse Conversational Responses via Adversarial Information Maximization »
Yizhe Zhang · Michel Galley · Jianfeng Gao · Zhe Gan · Xiujun Li · Chris Brockett · Bill Dolan -
2017 : Invited Talk: Microsoft (Asli and Jianfeng) »
Jianfeng Gao -
2015 Poster: End-to-end Learning of LDA by Mirror-Descent Back Propagation over a Deep Architecture »
Jianshu Chen · Ji He · Yelong Shen · Lin Xiao · Xiaodong He · Jianfeng Gao · Xinying Song · Li Deng