Timezone: »

Learning Neural Networks with Adaptive Regularization
Han Zhao · Yao-Hung Hubert Tsai · Russ Salakhutdinov · Geoffrey Gordon

Thu Dec 12 10:45 AM -- 12:45 PM (PST) @ East Exhibition Hall B + C #167

Feed-forward neural networks can be understood as a combination of an intermediate representation and a linear hypothesis. While most previous works aim to diversify the representations, we explore the complementary direction by performing an adaptive and data-dependent regularization motivated by the empirical Bayes method. Specifically, we propose to construct a matrix-variate normal prior (on weights) whose covariance matrix has a Kronecker product structure. This structure is designed to capture the correlations in neurons through backpropagation. Under the assumption of this Kronecker factorization, the prior encourages neurons to borrow statistical strength from one another. Hence, it leads to an adaptive and data-dependent regularization when training networks on small datasets. To optimize the model, we present an efficient block coordinate descent algorithm with analytical solutions. Empirically, we demonstrate that the proposed method helps networks converge to local optima with smaller stable ranks and spectral norms. These properties suggest better generalizations and we present empirical results to support this expectation. We also verify the effectiveness of the approach on multiclass classification and multitask regression problems with various network structures. Our code is publicly available at:~\url{https://github.com/yaohungt/Adaptive-Regularization-Neural-Network}.

Author Information

Han Zhao (Carnegie Mellon University)
Yao-Hung Hubert Tsai (Carnegie Mellon University)
Russ Salakhutdinov (Carnegie Mellon University)
Geoffrey Gordon (MSR Montréal & CMU)

Dr. Gordon is an Associate Research Professor in the Department of Machine Learning at Carnegie Mellon University, and co-director of the Department's Ph. D. program. He works on multi-robot systems, statistical machine learning, game theory, and planning in probabilistic, adversarial, and general-sum domains. His previous appointments include Visiting Professor at the Stanford Computer Science Department and Principal Scientist at Burning Glass Technologies in San Diego. Dr. Gordon received his B.A. in Computer Science from Cornell University in 1991, and his Ph.D. in Computer Science from Carnegie Mellon University in 1999.

More from the Same Authors