Timezone: »

LightRNN: Memory and Computation-Efficient Recurrent Neural Networks
Xiang Li · Tao Qin · Jian Yang · Xiaolin Hu · Tie-Yan Liu

Tue Dec 06 09:00 AM -- 12:30 PM (PST) @ Area 5+6+7+8 #66 #None
Recurrent neural networks (RNNs) have achieved state-of-the-art performances in many natural language processing tasks, such as language modeling and machine translation. However, when the vocabulary is large, the RNN model will become very big (e.g., possibly beyond the memory capacity of a GPU device) and its training will become very inefficient. In this work, we propose a novel technique to tackle this challenge. The key idea is to use 2-Component (2C) shared embedding for word representations. We allocate every word in the vocabulary into a table, each row of which is associated with a vector, and each column associated with another vector. Depending on its position in the table, a word is jointly represented by two components: a row vector and a column vector. Since the words in the same row share the row vector and the words in the same column share the column vector, we only need $2 \sqrt{|V|}$ vectors to represent a vocabulary of $|V|$ unique words, which are far less than the $|V|$ vectors required by existing approaches. Based on the 2-Component shared embedding, we design a new RNN algorithm and evaluate it using the language modeling task on several benchmark datasets. The results show that our algorithm significantly reduces the model size and speeds up the training process, without sacrifice of accuracy (it achieves similar, if not better, perplexity as compared to state-of-the-art language models). Remarkably, on the One-Billion-Word benchmark Dataset, our algorithm achieves comparable perplexity to previous language models, whilst reducing the model size by a factor of 40-100, and speeding up the training process by a factor of 2. We name our proposed algorithm \emph{LightRNN} to reflect its very small model size and very high training speed.

Author Information

Xiang Li (NJUST)
Tao Qin (Microsoft)
Jian Yang (Facebook Inc.)
Xiaolin Hu (Tsinghua University)
Tie-Yan Liu (Microsoft Research)

Tie-Yan Liu is an assistant managing director of Microsoft Research Asia, leading the machine learning research area. He is very well known for his pioneer work on learning to rank and computational advertising, and his recent research interests include deep learning, reinforcement learning, and distributed machine learning. Many of his technologies have been transferred to Microsoft’s products and online services (such as Bing, Microsoft Advertising, Windows, Xbox, and Azure), and open-sourced through Microsoft Cognitive Toolkit (CNTK), Microsoft Distributed Machine Learning Toolkit (DMTK), and Microsoft Graph Engine. He has also been actively contributing to academic communities. He is an adjunct/honorary professor at Carnegie Mellon University (CMU), University of Nottingham, and several other universities in China. He has published 200+ papers in refereed conferences and journals, with over 17000 citations. He has won quite a few awards, including the best student paper award at SIGIR (2008), the most cited paper award at Journal of Visual Communications and Image Representation (2004-2006), the research break-through award (2012) and research-team-of-the-year award (2017) at Microsoft Research, and Top-10 Springer Computer Science books by Chinese authors (2015), and the most cited Chinese researcher by Elsevier (2017). He has been invited to serve as general chair, program committee chair, local chair, or area chair for a dozen of top conferences including SIGIR, WWW, KDD, ICML, NIPS, IJCAI, AAAI, ACL, ICTIR, as well as associate editor of ACM Transactions on Information Systems, ACM Transactions on the Web, and Neurocomputing. Tie-Yan Liu is a fellow of the IEEE, and a distinguished member of the ACM.

More from the Same Authors