Timezone: »
Feed-forward neural networks can be understood as a combination of an intermediate representation and a linear hypothesis. While most previous works aim to diversify the representations, we explore the complementary direction by performing an adaptive and data-dependent regularization motivated by the empirical Bayes method. Specifically, we propose to construct a matrix-variate normal prior (on weights) whose covariance matrix has a Kronecker product structure. This structure is designed to capture the correlations in neurons through backpropagation. Under the assumption of this Kronecker factorization, the prior encourages neurons to borrow statistical strength from one another. Hence, it leads to an adaptive and data-dependent regularization when training networks on small datasets. To optimize the model, we present an efficient block coordinate descent algorithm with analytical solutions. Empirically, we demonstrate that the proposed method helps networks converge to local optima with smaller stable ranks and spectral norms. These properties suggest better generalizations and we present empirical results to support this expectation. We also verify the effectiveness of the approach on multiclass classification and multitask regression problems with various network structures. Our code is publicly available at:~\url{https://github.com/yaohungt/Adaptive-Regularization-Neural-Network}.
Author Information
Han Zhao (Carnegie Mellon University)
Yao-Hung Hubert Tsai (Carnegie Mellon University)
Russ Salakhutdinov (Carnegie Mellon University)
Geoffrey Gordon (MSR Montréal & CMU)
Dr. Gordon is an Associate Research Professor in the Department of Machine Learning at Carnegie Mellon University, and co-director of the Department's Ph. D. program. He works on multi-robot systems, statistical machine learning, game theory, and planning in probabilistic, adversarial, and general-sum domains. His previous appointments include Visiting Professor at the Stanford Computer Science Department and Principal Scientist at Burning Glass Technologies in San Diego. Dr. Gordon received his B.A. in Computer Science from Cornell University in 1991, and his Ph.D. in Computer Science from Carnegie Mellon University in 1999.
More from the Same Authors
-
2020 Poster: Weakly-Supervised Reinforcement Learning for Controllable Behavior »
Lisa Lee · Ben Eysenbach · Russ Salakhutdinov · Shixiang (Shane) Gu · Chelsea Finn -
2020 Poster: Trade-offs and Guarantees of Adversarial Representation Learning for Information Obfuscation »
Han Zhao · Jianfeng Chi · Yuan Tian · Geoffrey Gordon -
2020 Poster: Domain Adaptation with Conditional Distribution Matching and Generalized Label Shift »
Remi Tachet des Combes · Han Zhao · Yu-Xiang Wang · Geoffrey Gordon -
2020 Poster: Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement »
Ben Eysenbach · XINYANG GENG · Sergey Levine · Russ Salakhutdinov -
2020 Poster: A Closer Look at Accuracy vs. Robustness »
Yao-Yuan Yang · Cyrus Rashtchian · Hongyang Zhang · Russ Salakhutdinov · Kamalika Chaudhuri -
2020 Poster: Planning with General Objective Functions: Going Beyond Total Rewards »
Ruosong Wang · Peilin Zhong · Simon Du · Russ Salakhutdinov · Lin Yang -
2020 Oral: Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement »
Ben Eysenbach · XINYANG GENG · Sergey Levine · Russ Salakhutdinov -
2020 Poster: On Reward-Free Reinforcement Learning with Linear Function Approximation »
Ruosong Wang · Simon Du · Lin Yang · Russ Salakhutdinov -
2020 Poster: Object Goal Navigation using Goal-Oriented Semantic Exploration »
Devendra Singh Chaplot · Dhiraj Prakashchand Gandhi · Abhinav Gupta · Russ Salakhutdinov -
2020 Poster: Neural Methods for Point-wise Dependency Estimation »
Yao-Hung Hubert Tsai · Han Zhao · Makoto Yamada · Louis-Philippe Morency · Russ Salakhutdinov -
2020 Poster: Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension »
Ruosong Wang · Russ Salakhutdinov · Lin Yang -
2020 Spotlight: Neural Methods for Point-wise Dependency Estimation »
Yao-Hung Hubert Tsai · Han Zhao · Makoto Yamada · Louis-Philippe Morency · Russ Salakhutdinov -
2019 Workshop: Sets and Partitions »
Nicholas Monath · Manzil Zaheer · Andrew McCallum · Ari Kobren · Junier Oliva · Barnabas Poczos · Ruslan Salakhutdinov -
2019 Workshop: Learning with Rich Experience: Integration of Learning Paradigms »
Zhiting Hu · Andrew Wilson · Chelsea Finn · Lisa Lee · Taylor Berg-Kirkpatrick · Ruslan Salakhutdinov · Eric Xing -
2019 Poster: XLNet: Generalized Autoregressive Pretraining for Language Understanding »
Zhilin Yang · Zihang Dai · Yiming Yang · Jaime Carbonell · Russ Salakhutdinov · Quoc V Le -
2019 Poster: Inherent Tradeoffs in Learning Fair Representations »
Han Zhao · Geoff Gordon -
2019 Oral: XLNet: Generalized Autoregressive Pretraining for Language Understanding »
Zhilin Yang · Zihang Dai · Yiming Yang · Jaime Carbonell · Russ Salakhutdinov · Quoc V Le -
2019 Poster: Towards modular and programmable architecture search »
Renato Negrinho · Matthew Gormley · Geoffrey Gordon · Darshan Patil · Nghia Le · Daniel Ferreira -
2019 Poster: Search on the Replay Buffer: Bridging Planning and Reinforcement Learning »
Ben Eysenbach · Russ Salakhutdinov · Sergey Levine -
2019 Poster: Learning Data Manipulation for Augmentation and Weighting »
Zhiting Hu · Bowen Tan · Russ Salakhutdinov · Tom Mitchell · Eric Xing -
2019 Poster: Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels »
Simon Du · Kangcheng Hou · Russ Salakhutdinov · Barnabas Poczos · Ruosong Wang · Keyulu Xu -
2019 Poster: Mixtape: Breaking the Softmax Bottleneck Efficiently »
Zhilin Yang · Thang Luong · Russ Salakhutdinov · Quoc V Le -
2019 Poster: Deep Gamblers: Learning to Abstain with Portfolio Theory »
Liu Ziyin · Zhikang Wang · Paul Pu Liang · Russ Salakhutdinov · Louis-Philippe Morency · Masahito Ueda -
2019 Poster: Multiple Futures Prediction »
Charlie Tang · Russ Salakhutdinov -
2019 Poster: On Exact Computation with an Infinitely Wide Neural Net »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Russ Salakhutdinov · Ruosong Wang -
2019 Spotlight: On Exact Computation with an Infinitely Wide Neural Net »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Russ Salakhutdinov · Ruosong Wang -
2018 Poster: Learning Beam Search Policies via Imitation Learning »
Renato Negrinho · Matthew Gormley · Geoffrey Gordon -
2018 Poster: How Many Samples are Needed to Estimate a Convolutional Neural Network? »
Simon Du · Yining Wang · Xiyu Zhai · Sivaraman Balakrishnan · Russ Salakhutdinov · Aarti Singh -
2018 Poster: Dual Policy Iteration »
Wen Sun · Geoffrey Gordon · Byron Boots · J. Bagnell -
2018 Poster: Deep Generative Models with Learnable Knowledge Constraints »
Zhiting Hu · Zichao Yang · Russ Salakhutdinov · LIANHUI Qin · Xiaodan Liang · Haoye Dong · Eric Xing -
2018 Poster: Adversarial Multiple Source Domain Adaptation »
Han Zhao · Shanghang Zhang · Guanhang Wu · José M. F. Moura · Joao P Costeira · Geoffrey Gordon -
2018 Poster: GLoMo: Unsupervised Learning of Transferable Relational Graphs »
Zhilin Yang · Jake Zhao · Bhuwan Dhingra · Kaiming He · William Cohen · Russ Salakhutdinov · Yann LeCun -
2017 Workshop: Deep Learning: Bridging Theory and Practice »
Sanjeev Arora · Maithra Raghu · Russ Salakhutdinov · Ludwig Schmidt · Oriol Vinyals -
2017 Oral: Deep Sets »
Manzil Zaheer · Satwik Kottur · Siamak Ravanbakhsh · Barnabas Poczos · Ruslan Salakhutdinov · Alexander Smola -
2017 Poster: Deep Sets »
Manzil Zaheer · Satwik Kottur · Siamak Ravanbakhsh · Barnabas Poczos · Ruslan Salakhutdinov · Alexander Smola -
2017 Poster: Good Semi-supervised Learning That Requires a Bad GAN »
Zihang Dai · Zhilin Yang · Fan Yang · William Cohen · Ruslan Salakhutdinov -
2017 Poster: Linear Time Computation of Moments in Sum-Product Networks »
Han Zhao · Geoffrey Gordon -
2017 Poster: Predictive State Recurrent Neural Networks »
Carlton Downey · Ahmed Hefny · Byron Boots · Geoffrey Gordon · Boyue Li -
2016 Poster: Architectural Complexity Measures of Recurrent Neural Networks »
Saizheng Zhang · Yuhuai Wu · Tong Che · Zhouhan Lin · Roland Memisevic · Russ Salakhutdinov · Yoshua Bengio -
2016 Poster: A Unified Approach for Learning the Parameters of Sum-Product Networks »
Han Zhao · Pascal Poupart · Geoffrey Gordon -
2016 Poster: Iterative Refinement of the Approximate Posterior for Directed Belief Networks »
R Devon Hjelm · Russ Salakhutdinov · Kyunghyun Cho · Nebojsa Jojic · Vince Calhoun · Junyoung Chung -
2016 Poster: Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations »
Behnam Neyshabur · Yuhuai Wu · Russ Salakhutdinov · Nati Srebro -
2016 Poster: On Multiplicative Integration with Recurrent Neural Networks »
Yuhuai Wu · Saizheng Zhang · Ying Zhang · Yoshua Bengio · Russ Salakhutdinov -
2016 Poster: Review Networks for Caption Generation »
Zhilin Yang · Ye Yuan · Yuexin Wu · William Cohen · Russ Salakhutdinov -
2016 Poster: Stochastic Variational Deep Kernel Learning »
Andrew Wilson · Zhiting Hu · Russ Salakhutdinov · Eric Xing -
2015 Poster: Skip-Thought Vectors »
Jamie Kiros · Yukun Zhu · Russ Salakhutdinov · Richard Zemel · Raquel Urtasun · Antonio Torralba · Sanja Fidler -
2015 Poster: Learning Wake-Sleep Recurrent Attention Models »
Jimmy Ba · Russ Salakhutdinov · Roger Grosse · Brendan J Frey -
2015 Spotlight: Learning Wake-Sleep Recurrent Attention Models »
Jimmy Ba · Russ Salakhutdinov · Roger Grosse · Brendan J Frey -
2015 Poster: Path-SGD: Path-Normalized Optimization in Deep Neural Networks »
Behnam Neyshabur · Russ Salakhutdinov · Nati Srebro -
2015 Poster: Supervised Learning for Dynamical System Learning »
Ahmed Hefny · Carlton Downey · Geoffrey Gordon -
2014 Session: Oral Session 7 »
Geoffrey Gordon -
2014 Poster: Learning Generative Models with Visual Attention »
Charlie Tang · Nitish Srivastava · Russ Salakhutdinov -
2014 Poster: A Multiplicative Model for Learning Distributed Text-Based Attribute Representations »
Jamie Kiros · Richard Zemel · Russ Salakhutdinov -
2014 Demonstration: Toronto Deep Learning »
Jamie Kiros · Russ Salakhutdinov · Nitish Srivastava · Charlie Tang -
2014 Oral: Learning Generative Models with Visual Attention »
Charlie Tang · Nitish Srivastava · Russ Salakhutdinov -
2013 Workshop: Deep Learning »
Yoshua Bengio · Hugo Larochelle · Russ Salakhutdinov · Tomas Mikolov · Matthew D Zeiler · David Mcallester · Nando de Freitas · Josh Tenenbaum · Jian Zhou · Volodymyr Mnih -
2013 Poster: One-shot learning by inverting a compositional causal process »
Brenden M Lake · Russ Salakhutdinov · Josh Tenenbaum -
2013 Poster: Learning Stochastic Feedforward Neural Networks »
Charlie Tang · Russ Salakhutdinov -
2013 Poster: Discriminative Transfer Learning with Tree-based Priors »
Nitish Srivastava · Russ Salakhutdinov -
2013 Poster: Annealing between distributions by averaging moments »
Roger Grosse · Chris Maddison · Russ Salakhutdinov -
2013 Oral: Annealing between distributions by averaging moments »
Roger Grosse · Chris Maddison · Russ Salakhutdinov -
2013 Poster: The Power of Asymmetry in Binary Hashing »
Behnam Neyshabur · Nati Srebro · Russ Salakhutdinov · Yury Makarychev · Payman Yadollahpour -
2012 Poster: Hamming Distance Metric Learning »
Mohammad Norouzi · Russ Salakhutdinov · David Fleet -
2012 Poster: Matrix reconstruction with the local max norm »
Rina Foygel · Nati Srebro · Russ Salakhutdinov -
2012 Poster: Multimodal Learning with Deep Boltzmann Machines »
Nitish Srivastava · Russ Salakhutdinov -
2012 Poster: A Better Way to Pre-Train Deep Boltzmann Machines »
Russ Salakhutdinov · Geoffrey E Hinton -
2012 Oral: Multimodal Learning with Deep Boltzmann Machines »
Nitish Srivastava · Russ Salakhutdinov -
2012 Poster: Cardinality Restricted Boltzmann Machines »
Kevin Swersky · Daniel Tarlow · Ilya Sutskever · Richard Zemel · Russ Salakhutdinov · Ryan Adams -
2012 Tutorial: Machine Learning for Student Learning »
Emma Brunskill · Geoffrey Gordon -
2011 Workshop: Challenges in Learning Hierarchical Models: Transfer Learning and Optimization »
Quoc V. Le · Marc'Aurelio Ranzato · Russ Salakhutdinov · Josh Tenenbaum · Andrew Y Ng -
2011 Poster: Learning to Learn with Compound HD Models »
Russ Salakhutdinov · Josh Tenenbaum · Antonio Torralba -
2011 Spotlight: Learning to Learn with Compound HD Models »
Russ Salakhutdinov · Josh Tenenbaum · Antonio Torralba -
2011 Poster: Learning with the weighted trace-norm under arbitrary sampling distributions »
Rina Foygel · Russ Salakhutdinov · Ohad Shamir · Nati Srebro -
2011 Poster: Transfer Learning by Borrowing Examples »
Joseph Lim · Russ Salakhutdinov · Antonio Torralba -
2010 Workshop: Transfer Learning Via Rich Generative Models. »
Russ Salakhutdinov · Ryan Adams · Josh Tenenbaum · Zoubin Ghahramani · Tom Griffiths -
2010 Poster: Collaborative Filtering in a Non-Uniform World: Learning with the Weighted Trace Norm »
Russ Salakhutdinov · Nati Srebro -
2010 Poster: Practical Large-Scale Optimization for Max-norm Regularization »
Jason D Lee · Benjamin Recht · Russ Salakhutdinov · Nati Srebro · Joel A Tropp -
2010 Poster: Predictive State Temporal Difference Learning »
Byron Boots · Geoffrey Gordon -
2009 Workshop: Approximate Learning of Large Scale Graphical Models »
Russ Salakhutdinov · Amir Globerson · David Sontag -
2009 Poster: Replicated Softmax: an Undirected Topic Model »
Russ Salakhutdinov · Geoffrey E Hinton -
2009 Poster: Learning in Markov Random Fields using Tempered Transitions »
Russ Salakhutdinov -
2009 Poster: Modelling Relational Data using Bayesian Clustered Tensor Factorization »
Ilya Sutskever · Russ Salakhutdinov · Josh Tenenbaum -
2008 Poster: Evaluating probabilities under high-dimensional latent variable models »
Iain Murray · Russ Salakhutdinov -
2008 Spotlight: Evaluating probabilities under high-dimensional latent variable models »
Iain Murray · Russ Salakhutdinov -
2007 Oral: A Constraint Generation Approach to Learning Stable Linear Dynamical Systems »
Sajid M Siddiqi · Byron Boots · Geoffrey Gordon -
2007 Poster: A Constraint Generation Approach to Learning Stable Linear Dynamical Systems »
Sajid M Siddiqi · Byron Boots · Geoffrey Gordon -
2007 Poster: Probabilistic Matrix Factorization »
Russ Salakhutdinov · Andriy Mnih -
2007 Oral: Probabilistic Matrix Factorization »
Russ Salakhutdinov · Andriy Mnih -
2007 Poster: Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes »
Russ Salakhutdinov · Geoffrey E Hinton -
2006 Poster: No-regret algorithms for Online Convex Programs »
Geoffrey Gordon -
2006 Talk: No-regret algorithms for Online Convex Programs »
Geoffrey Gordon -
2006 Poster: Multi-Robot Negotiation: Approximating the Set of Subgame Perfect Equilibria in General Sum Stochastic Games »
Chris D Murray · Geoffrey Gordon