Timezone: »

MomentumRNN: Integrating Momentum into Recurrent Neural Networks
Tan Nguyen · Richard Baraniuk · Andrea Bertozzi · Stanley Osher · Bao Wang

Wed Dec 09 09:00 AM -- 11:00 AM (PST) @ Poster Session 3 #760

Designing deep neural networks is an art that often involves an expensive search over candidate architectures. To overcome this for recurrent neural nets (RNNs), we establish a connection between the hidden state dynamics in an RNN and gradient descent (GD). We then integrate momentum into this framework and propose a new family of RNNs, called {\em MomentumRNNs}. We theoretically prove and numerically demonstrate that MomentumRNNs alleviate the vanishing gradient issue in training RNNs. We study the momentum long-short term memory (MomentumLSTM) and verify its advantages in convergence speed and accuracy over its LSTM counterpart across a variety of benchmarks. We also demonstrate that MomentumRNN is applicable to many types of recurrent cells, including those in the state-of-the-art orthogonal RNNs. Finally, we show that other advanced momentum-based optimization methods, such as Adam and Nesterov accelerated gradients with a restart, can be easily incorporated into the MomentumRNN framework for designing new recurrent cells with even better performance.

Author Information

Tan Nguyen (Rice University/UCLA)

I am currently a postdoctoral scholar in the Department of Mathematics at the University of California, Los Angeles, working with Dr. Stanley J. Osher. I have obtained my Ph.D. in Machine Learning from Rice University, where I was advised by Dr. Richard G. Baraniuk. My research is focused on the intersection of Deep Learning, Probabilistic Modeling, Optimization, and ODEs/PDEs. I gave an invited talk in the Deep Learning Theory Workshop at NeurIPS 2018 and organized the 1st Workshop on Integration of Deep Neural Models and Differential Equations at ICLR 2020. I also had two awesome long internships with Amazon AI and NVIDIA Research, during which he worked with Dr. Anima Anandkumar. I am the recipient of the prestigious Computing Innovation Postdoctoral Fellowship (CIFellows) from the Computing Research Association (CRA), the NSF Graduate Research Fellowship, and the IGERT Neuroengineering Traineeship. I received his MSEE and BSEE from Rice in May 2018 and May 2014, respectively.

Richard Baraniuk (Rice University)
Andrea Bertozzi (UCLA)
Stanley Osher (UCLA)
Bao Wang (UCLA)

More from the Same Authors