Timezone: »
Multilayer Perceptrons (MLPs) defines a fundamental model class that forms the backbone of many modern deep learning architectures. Despite their universality guarantees, practical training via stochastic gradient descent often struggles to attain theoretical error bounds due to issues including (but not limited to) frequency bias, vanishing gradients, and stiff gradient flows. In this work we postulate that many of such issues find origins in the initialization of the network's parameters. While the initialization schemes proposed by Glorot {\it et al.} and He {\it et al.} have become the de-facto choices among practitioners, their goal to preserve the variance of forward- and backward-propagated signals is mainly achieved by assumptions on linearity, while the presence of nonlinear activation functions may partially destroy these efforts. Here, we revisit the initialization of MLPs from a dynamical systems viewpoint to explore why and how under these classical scheme, the MLP could still fail even at the beginning. Drawing inspiration from classical numerical methods for differential equations that leverage orthogonal feature representations, we propose a novel initialization scheme that promotes orthogonality in the features of the last hidden layer, ultimately leading to more diverse and localized features. Our results demonstrate that network initialization alone can be sufficient in mitigating frequency bias and yields competitive results for high-frequency function approximation and image regression tasks, without any additional modifications to the network architecture or activation functions.
Author Information
Hanwen Wang (University of Pennsylvania)
Paris Perdikaris (University of Pennsylvania)
More from the Same Authors
-
2022 : On the impact of larger batch size in the training of Physics Informed Neural Networks »
Shyam Sankaran · Hanwen Wang · Leonardo Ferreira Guilhoto · Paris Perdikaris -
2022 Spotlight: Learning Operators with Coupled Attention »
Georgios Kissas · Jacob Seidman · Leonardo Ferreira Guilhoto · Victor M. Preciado · George J. Pappas · Paris Perdikaris -
2022 Poster: NOMAD: Nonlinear Manifold Decoders for Operator Learning »
Jacob Seidman · Georgios Kissas · Paris Perdikaris · George J. Pappas -
2022 Poster: Learning Operators with Coupled Attention »
Georgios Kissas · Jacob Seidman · Leonardo Ferreira Guilhoto · Victor M. Preciado · George J. Pappas · Paris Perdikaris -
2018 : Poster Session 1 »
Stefan Gadatsch · Danil Kuzin · Navneet Kumar · Patrick Dallaire · Tom Ryder · Remus-Petru Pop · Nathan Hunt · Adam Kortylewski · Sophie Burkhardt · Mahmoud Elnaggar · Dieterich Lawson · Yifeng Li · Jongha (Jon) Ryu · Juhan Bae · Micha Livne · Tim Pearce · Mariia Vladimirova · Jason Ramapuram · Jiaming Zeng · Xinyu Hu · Jiawei He · Danielle Maddix · Arunesh Mittal · Albert Shaw · Tuan Anh Le · Alexander Sagel · Lisha Chen · Victor Gallego · Mahdi Karami · Zihao Zhang · Tal Kachman · Noah Weber · Matt Benatan · Kumar K Sricharan · Vincent Cartillier · Ivan Ovinnikov · Buu Phan · Mahmoud Hossam · Liu Ziyin · Valerii Kharitonov · Eugene Golikov · Qiang Zhang · Jae Myung Kim · Sebastian Farquhar · Jishnu Mukhoti · Xu Hu · Gregory Gundersen · Lavanya Sita Tekumalla · Paris Perdikaris · Ershad Banijamali · Siddhartha Jain · Ge Liu · Martin Gottwald · Katy Blumer · Sukmin Yun · Ranganath Krishnan · Roman Novak · Yilun Du · Yu Gong · Beliz Gokkaya · Jessica Ai · Daniel Duckworth · Johannes von Oswald · Christian Henning · Louis-Philippe Morency · Ali Ghodsi · Mahesh Subedar · Jean-Pascal Pfister · RĂ©mi Lebret · Chao Ma · Aleksander Wieczorek · Laurence Perreault Levasseur