Timezone: »
Poster
Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations
Behnam Neyshabur · Yuhuai Wu · Russ Salakhutdinov · Nati Srebro
We investigate the parameter-space geometry of recurrent neural networks (RNNs), and develop an adaptation of path-SGD optimization method, attuned to this geometry, that can learn plain RNNs with ReLU activations. On several datasets that require capturing long-term dependency structure, we show that path-SGD can significantly improve trainability of ReLU RNNs compared to RNNs trained with SGD, even with various recently suggested initialization schemes.
Author Information
Behnam Neyshabur (TTI-Chicago)
Yuhuai Wu (University of Toronto)
Russ Salakhutdinov (University of Toronto)
Nati Srebro (TTI-Chicago)
More from the Same Authors
-
2020 Poster: Weakly-Supervised Reinforcement Learning for Controllable Behavior »
Lisa Lee · Ben Eysenbach · Russ Salakhutdinov · Shixiang (Shane) Gu · Chelsea Finn -
2020 Poster: On Uniform Convergence and Low-Norm Interpolation Learning »
Lijia Zhou · Danica Sutherland · Nati Srebro -
2020 Poster: Reducing Adversarially Robust Learning to Non-Robust PAC Learning »
Omar Montasser · Steve Hanneke · Nati Srebro -
2020 Spotlight: On Uniform Convergence and Low-Norm Interpolation Learning »
Lijia Zhou · Danica Sutherland · Nati Srebro -
2020 Poster: Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy »
Edward Moroshko · Blake Woodworth · Suriya Gunasekar · Jason Lee · Nati Srebro · Daniel Soudry -
2020 Poster: Minibatch vs Local SGD for Heterogeneous Distributed Learning »
Blake Woodworth · Kumar Kshitij Patel · Nati Srebro -
2020 Spotlight: Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy »
Edward Moroshko · Blake Woodworth · Suriya Gunasekar · Jason Lee · Nati Srebro · Daniel Soudry -
2020 Poster: Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement »
Ben Eysenbach · XINYANG GENG · Sergey Levine · Russ Salakhutdinov -
2020 Poster: A Closer Look at Accuracy vs. Robustness »
Yao-Yuan Yang · Cyrus Rashtchian · Hongyang Zhang · Russ Salakhutdinov · Kamalika Chaudhuri -
2020 Poster: Planning with General Objective Functions: Going Beyond Total Rewards »
Ruosong Wang · Peilin Zhong · Simon Du · Russ Salakhutdinov · Lin Yang -
2020 Oral: Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement »
Ben Eysenbach · XINYANG GENG · Sergey Levine · Russ Salakhutdinov -
2020 Poster: On Reward-Free Reinforcement Learning with Linear Function Approximation »
Ruosong Wang · Simon Du · Lin Yang · Russ Salakhutdinov -
2020 Poster: Object Goal Navigation using Goal-Oriented Semantic Exploration »
Devendra Singh Chaplot · Dhiraj Prakashchand Gandhi · Abhinav Gupta · Russ Salakhutdinov -
2020 Poster: Neural Methods for Point-wise Dependency Estimation »
Yao-Hung Hubert Tsai · Han Zhao · Makoto Yamada · Louis-Philippe Morency · Russ Salakhutdinov -
2020 Poster: Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension »
Ruosong Wang · Russ Salakhutdinov · Lin Yang -
2020 Spotlight: Neural Methods for Point-wise Dependency Estimation »
Yao-Hung Hubert Tsai · Han Zhao · Makoto Yamada · Louis-Philippe Morency · Russ Salakhutdinov -
2019 Workshop: Sets and Partitions »
Nicholas Monath · Manzil Zaheer · Andrew McCallum · Ari Kobren · Junier Oliva · Barnabas Poczos · Ruslan Salakhutdinov -
2019 Workshop: Learning with Rich Experience: Integration of Learning Paradigms »
Zhiting Hu · Andrew Wilson · Chelsea Finn · Lisa Lee · Taylor Berg-Kirkpatrick · Ruslan Salakhutdinov · Eric Xing -
2019 Poster: XLNet: Generalized Autoregressive Pretraining for Language Understanding »
Zhilin Yang · Zihang Dai · Yiming Yang · Jaime Carbonell · Russ Salakhutdinov · Quoc V Le -
2019 Oral: XLNet: Generalized Autoregressive Pretraining for Language Understanding »
Zhilin Yang · Zihang Dai · Yiming Yang · Jaime Carbonell · Russ Salakhutdinov · Quoc V Le -
2019 Poster: Learning Neural Networks with Adaptive Regularization »
Han Zhao · Yao-Hung Hubert Tsai · Russ Salakhutdinov · Geoffrey Gordon -
2019 Poster: Search on the Replay Buffer: Bridging Planning and Reinforcement Learning »
Ben Eysenbach · Russ Salakhutdinov · Sergey Levine -
2019 Poster: Learning Data Manipulation for Augmentation and Weighting »
Zhiting Hu · Bowen Tan · Russ Salakhutdinov · Tom Mitchell · Eric Xing -
2019 Poster: Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels »
Simon Du · Kangcheng Hou · Russ Salakhutdinov · Barnabas Poczos · Ruosong Wang · Keyulu Xu -
2019 Poster: Mixtape: Breaking the Softmax Bottleneck Efficiently »
Zhilin Yang · Thang Luong · Russ Salakhutdinov · Quoc V Le -
2019 Poster: Deep Gamblers: Learning to Abstain with Portfolio Theory »
Liu Ziyin · Zhikang Wang · Paul Pu Liang · Russ Salakhutdinov · Louis-Philippe Morency · Masahito Ueda -
2019 Poster: Multiple Futures Prediction »
Charlie Tang · Russ Salakhutdinov -
2019 Poster: On Exact Computation with an Infinitely Wide Neural Net »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Russ Salakhutdinov · Ruosong Wang -
2019 Spotlight: On Exact Computation with an Infinitely Wide Neural Net »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Russ Salakhutdinov · Ruosong Wang -
2018 Poster: Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization »
Blake Woodworth · Jialei Wang · Adam Smith · Brendan McMahan · Nati Srebro -
2018 Spotlight: Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization »
Blake Woodworth · Jialei Wang · Adam Smith · Brendan McMahan · Nati Srebro -
2018 Poster: How Many Samples are Needed to Estimate a Convolutional Neural Network? »
Simon Du · Yining Wang · Xiyu Zhai · Sivaraman Balakrishnan · Russ Salakhutdinov · Aarti Singh -
2018 Poster: Implicit Bias of Gradient Descent on Linear Convolutional Networks »
Suriya Gunasekar · Jason Lee · Daniel Soudry · Nati Srebro -
2018 Poster: The Everlasting Database: Statistical Validity at a Fair Price »
Blake Woodworth · Vitaly Feldman · Saharon Rosset · Nati Srebro -
2018 Poster: On preserving non-discrimination when combining expert advice »
Avrim Blum · Suriya Gunasekar · Thodoris Lykouris · Nati Srebro -
2018 Poster: Deep Generative Models with Learnable Knowledge Constraints »
Zhiting Hu · Zichao Yang · Russ Salakhutdinov · LIANHUI Qin · Xiaodan Liang · Haoye Dong · Eric Xing -
2018 Poster: GLoMo: Unsupervised Learning of Transferable Relational Graphs »
Zhilin Yang · Jake Zhao · Bhuwan Dhingra · Kaiming He · William Cohen · Russ Salakhutdinov · Yann LeCun -
2018 Poster: The Importance of Sampling inMeta-Reinforcement Learning »
Bradly Stadie · Ge Yang · Rein Houthooft · Peter Chen · Yan Duan · Yuhuai Wu · Pieter Abbeel · Ilya Sutskever -
2017 Workshop: Deep Learning: Bridging Theory and Practice »
Sanjeev Arora · Maithra Raghu · Russ Salakhutdinov · Ludwig Schmidt · Oriol Vinyals -
2017 Oral: Deep Sets »
Manzil Zaheer · Satwik Kottur · Siamak Ravanbakhsh · Barnabas Poczos · Ruslan Salakhutdinov · Alexander Smola -
2017 Poster: Deep Sets »
Manzil Zaheer · Satwik Kottur · Siamak Ravanbakhsh · Barnabas Poczos · Ruslan Salakhutdinov · Alexander Smola -
2017 Poster: The Marginal Value of Adaptive Gradient Methods in Machine Learning »
Ashia C Wilson · Becca Roelofs · Mitchell Stern · Nati Srebro · Benjamin Recht -
2017 Poster: Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation »
Yuhuai Wu · Elman Mansimov · Roger Grosse · Shun Liao · Jimmy Ba -
2017 Poster: Good Semi-supervised Learning That Requires a Bad GAN »
Zihang Dai · Zhilin Yang · Fan Yang · William Cohen · Ruslan Salakhutdinov -
2017 Poster: Sticking the Landing: Simple, Lower-Variance Gradient Estimators for Variational Inference »
Geoffrey Roeder · Yuhuai Wu · David Duvenaud -
2017 Spotlight: Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation »
Yuhuai Wu · Elman Mansimov · Roger Grosse · Shun Liao · Jimmy Ba -
2017 Oral: The Marginal Value of Adaptive Gradient Methods in Machine Learning »
Ashia C Wilson · Becca Roelofs · Mitchell Stern · Nati Srebro · Benjamin Recht -
2017 Poster: Stochastic Approximation for Canonical Correlation Analysis »
Raman Arora · Teodor Vanislavov Marinov · Poorya Mianjy · Nati Srebro -
2017 Poster: Exploring Generalization in Deep Learning »
Behnam Neyshabur · Srinadh Bhojanapalli · David Mcallester · Nati Srebro -
2017 Poster: Implicit Regularization in Matrix Factorization »
Suriya Gunasekar · Blake Woodworth · Srinadh Bhojanapalli · Behnam Neyshabur · Nati Srebro -
2017 Spotlight: Implicit Regularization in Matrix Factorization »
Suriya Gunasekar · Blake Woodworth · Srinadh Bhojanapalli · Behnam Neyshabur · Nati Srebro -
2016 Poster: Tight Complexity Bounds for Optimizing Composite Objectives »
Blake Woodworth · Nati Srebro -
2016 Poster: Architectural Complexity Measures of Recurrent Neural Networks »
Saizheng Zhang · Yuhuai Wu · Tong Che · Zhouhan Lin · Roland Memisevic · Russ Salakhutdinov · Yoshua Bengio -
2016 Poster: Efficient Globally Convergent Stochastic Optimization for Canonical Correlation Analysis »
Weiran Wang · Jialei Wang · Dan Garber · Dan Garber · Nati Srebro -
2016 Poster: Iterative Refinement of the Approximate Posterior for Directed Belief Networks »
R Devon Hjelm · Russ Salakhutdinov · Kyunghyun Cho · Nebojsa Jojic · Vince Calhoun · Junyoung Chung -
2016 Poster: Global Optimality of Local Search for Low Rank Matrix Recovery »
Srinadh Bhojanapalli · Behnam Neyshabur · Nati Srebro -
2016 Poster: On Multiplicative Integration with Recurrent Neural Networks »
Yuhuai Wu · Saizheng Zhang · Ying Zhang · Yoshua Bengio · Russ Salakhutdinov -
2016 Poster: Equality of Opportunity in Supervised Learning »
Moritz Hardt · Eric Price · Eric Price · Nati Srebro -
2016 Poster: Review Networks for Caption Generation »
Zhilin Yang · Ye Yuan · Yuexin Wu · William Cohen · Russ Salakhutdinov -
2016 Poster: Stochastic Variational Deep Kernel Learning »
Andrew Wilson · Zhiting Hu · Russ Salakhutdinov · Eric Xing -
2016 Poster: Normalized Spectral Map Synchronization »
Yanyao Shen · Qixing Huang · Nati Srebro · Sujay Sanghavi -
2015 Poster: Skip-Thought Vectors »
Jamie Kiros · Yukun Zhu · Russ Salakhutdinov · Richard Zemel · Raquel Urtasun · Antonio Torralba · Sanja Fidler -
2015 Poster: Learning Wake-Sleep Recurrent Attention Models »
Jimmy Ba · Russ Salakhutdinov · Roger Grosse · Brendan J Frey -
2015 Spotlight: Learning Wake-Sleep Recurrent Attention Models »
Jimmy Ba · Russ Salakhutdinov · Roger Grosse · Brendan J Frey -
2015 Poster: Path-SGD: Path-Normalized Optimization in Deep Neural Networks »
Behnam Neyshabur · Russ Salakhutdinov · Nati Srebro -
2014 Poster: Learning Generative Models with Visual Attention »
Charlie Tang · Nitish Srivastava · Russ Salakhutdinov -
2014 Poster: A Multiplicative Model for Learning Distributed Text-Based Attribute Representations »
Jamie Kiros · Richard Zemel · Russ Salakhutdinov -
2014 Poster: Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm »
Deanna Needell · Rachel Ward · Nati Srebro -
2014 Demonstration: Toronto Deep Learning »
Jamie Kiros · Russ Salakhutdinov · Nitish Srivastava · Charlie Tang -
2014 Oral: Learning Generative Models with Visual Attention »
Charlie Tang · Nitish Srivastava · Russ Salakhutdinov -
2013 Workshop: Learning Faster From Easy Data »
Peter Grünwald · Wouter M Koolen · Sasha Rakhlin · Nati Srebro · Alekh Agarwal · Karthik Sridharan · Tim van Erven · Sebastien Bubeck -
2013 Workshop: Deep Learning »
Yoshua Bengio · Hugo Larochelle · Russ Salakhutdinov · Tomas Mikolov · Matthew D Zeiler · David Mcallester · Nando de Freitas · Josh Tenenbaum · Jian Zhou · Volodymyr Mnih -
2013 Workshop: Large Scale Matrix Analysis and Inference »
Reza Zadeh · Gunnar Carlsson · Michael Mahoney · Manfred K. Warmuth · Wouter M Koolen · Nati Srebro · Satyen Kale · Malik Magdon-Ismail · Ashish Goel · Matei A Zaharia · David Woodruff · Ioannis Koutis · Benjamin Recht -
2013 Poster: One-shot learning by inverting a compositional causal process »
Brenden M Lake · Russ Salakhutdinov · Josh Tenenbaum -
2013 Poster: Learning Stochastic Feedforward Neural Networks »
Charlie Tang · Russ Salakhutdinov -
2013 Poster: Stochastic Optimization of PCA with Capped MSG »
Raman Arora · Andrew Cotter · Nati Srebro -
2013 Poster: Discriminative Transfer Learning with Tree-based Priors »
Nitish Srivastava · Russ Salakhutdinov -
2013 Poster: Annealing between distributions by averaging moments »
Roger Grosse · Chris Maddison · Russ Salakhutdinov -
2013 Oral: Annealing between distributions by averaging moments »
Roger Grosse · Chris Maddison · Russ Salakhutdinov -
2013 Poster: Auditing: Active Learning with Outcome-Dependent Query Costs »
Sivan Sabato · Anand D Sarwate · Nati Srebro -
2013 Poster: The Power of Asymmetry in Binary Hashing »
Behnam Neyshabur · Nati Srebro · Russ Salakhutdinov · Yury Makarychev · Payman Yadollahpour -
2012 Poster: Hamming Distance Metric Learning »
Mohammad Norouzi · Russ Salakhutdinov · David Fleet -
2012 Poster: Sparse Prediction with the $k$-Support Norm »
Andreas Argyriou · Rina Foygel · Nati Srebro -
2012 Spotlight: Sparse Prediction with the $k$-Support Norm »
Andreas Argyriou · Rina Foygel · Nati Srebro -
2012 Poster: Matrix reconstruction with the local max norm »
Rina Foygel · Nati Srebro · Russ Salakhutdinov -
2012 Poster: Multimodal Learning with Deep Boltzmann Machines »
Nitish Srivastava · Russ Salakhutdinov -
2012 Poster: A Better Way to Pre-Train Deep Boltzmann Machines »
Russ Salakhutdinov · Geoffrey E Hinton -
2012 Oral: Multimodal Learning with Deep Boltzmann Machines »
Nitish Srivastava · Russ Salakhutdinov -
2012 Poster: Cardinality Restricted Boltzmann Machines »
Kevin Swersky · Daniel Tarlow · Ilya Sutskever · Richard Zemel · Russ Salakhutdinov · Ryan Adams -
2011 Workshop: Challenges in Learning Hierarchical Models: Transfer Learning and Optimization »
Quoc V. Le · Marc'Aurelio Ranzato · Russ Salakhutdinov · Josh Tenenbaum · Andrew Y Ng -
2011 Poster: Beating SGD: Learning SVMs in Sublinear Time »
Elad Hazan · Tomer Koren · Nati Srebro -
2011 Poster: Learning to Learn with Compound HD Models »
Russ Salakhutdinov · Josh Tenenbaum · Antonio Torralba -
2011 Spotlight: Learning to Learn with Compound HD Models »
Russ Salakhutdinov · Josh Tenenbaum · Antonio Torralba -
2011 Poster: Better Mini-Batch Algorithms via Accelerated Gradient Methods »
Andrew Cotter · Ohad Shamir · Nati Srebro · Karthik Sridharan -
2011 Poster: On the Universality of Online Mirror Descent »
Nati Srebro · Karthik Sridharan · Ambuj Tewari -
2011 Poster: Learning with the weighted trace-norm under arbitrary sampling distributions »
Rina Foygel · Russ Salakhutdinov · Ohad Shamir · Nati Srebro -
2011 Poster: Transfer Learning by Borrowing Examples »
Joseph Lim · Russ Salakhutdinov · Antonio Torralba -
2010 Workshop: Transfer Learning Via Rich Generative Models. »
Russ Salakhutdinov · Ryan Adams · Josh Tenenbaum · Zoubin Ghahramani · Tom Griffiths -
2010 Session: Spotlights Session 11 »
Nati Srebro -
2010 Session: Oral Session 13 »
Nati Srebro -
2010 Poster: Tight Sample Complexity of Large-Margin Learning »
Sivan Sabato · Nati Srebro · Naftali Tishby -
2010 Poster: Collaborative Filtering in a Non-Uniform World: Learning with the Weighted Trace Norm »
Russ Salakhutdinov · Nati Srebro -
2010 Poster: Practical Large-Scale Optimization for Max-norm Regularization »
Jason D Lee · Benjamin Recht · Russ Salakhutdinov · Nati Srebro · Joel A Tropp -
2010 Poster: Smoothness, Low Noise and Fast Rates »
Nati Srebro · Karthik Sridharan · Ambuj Tewari -
2009 Workshop: Approximate Learning of Large Scale Graphical Models »
Russ Salakhutdinov · Amir Globerson · David Sontag -
2009 Workshop: Understanding Multiple Kernel Learning Methods »
Brian McFee · Gert Lanckriet · Francis Bach · Nati Srebro -
2009 Poster: Statistical Analysis of Semi-Supervised Learning: The Limit of Infinite Unlabelled Data »
Boaz Nadler · Nati Srebro · Xueyuan Zhou -
2009 Poster: Replicated Softmax: an Undirected Topic Model »
Russ Salakhutdinov · Geoffrey E Hinton -
2009 Spotlight: Statistical Analysis of Semi-Supervised Learning: The Limit of Infinite Unlabelled Data »
Boaz Nadler · Nati Srebro · Xueyuan Zhou -
2009 Poster: Learning in Markov Random Fields using Tempered Transitions »
Russ Salakhutdinov -
2009 Poster: Modelling Relational Data using Bayesian Clustered Tensor Factorization »
Ilya Sutskever · Russ Salakhutdinov · Josh Tenenbaum -
2008 Poster: Fast Rates for Regularized Objectives »
Karthik Sridharan · Shai Shalev-Shwartz · Nati Srebro -
2008 Poster: Evaluating probabilities under high-dimensional latent variable models »
Iain Murray · Russ Salakhutdinov -
2008 Spotlight: Evaluating probabilities under high-dimensional latent variable models »
Iain Murray · Russ Salakhutdinov -
2007 Poster: Probabilistic Matrix Factorization »
Russ Salakhutdinov · Andriy Mnih -
2007 Oral: Probabilistic Matrix Factorization »
Russ Salakhutdinov · Andriy Mnih -
2007 Poster: Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes »
Russ Salakhutdinov · Geoffrey E Hinton