Timezone: »
Normalization operations are widely used to train deep neural networks, and they can improve both convergence and generalization in most tasks. The theories for normalization's effectiveness and new forms of normalization have always been hot topics in research. To better understand normalization, one question can be whether normalization is indispensable for training deep neural network? In this paper, we study what would happen when normalization layers are removed from the network, and show how to train deep neural networks without normalization layers and without performance degradation. Our proposed method can achieve the same or even slightly better performance in a variety of tasks: image classification in ImageNet, object detection and segmentation in MS-COCO, video classification in Kinetics, and machine translation in WMT English-German, etc. Our study may help better understand the role of normalization layers and can be a competitive alternative to normalization layers. Codes are available.
Author Information
Jie Shao (Fudan University)
Kai Hu (Carnegie Mellon University)
Changhu Wang (ByteDance.Inc)
Xiangyang Xue (Fudan University)
Bhiksha Raj (Carnegie Mellon University)
Related Events (a corresponding poster, oral, or spotlight)
-
2020 Poster: Is normalization indispensable for training deep neural network? »
Fri. Dec 11th 05:00 -- 07:00 AM Room Poster Session 6 #1887
More from the Same Authors
-
2022 Poster: USB: A Unified Semi-supervised Learning Benchmark for Classification »
Yidong Wang · Hao Chen · Yue Fan · Wang SUN · Ran Tao · Wenxin Hou · Renjie Wang · Linyi Yang · Zhi Zhou · Lan-Zhe Guo · Heli Qi · Zhen Wu · Yu-Feng Li · Satoshi Nakamura · Wei Ye · Marios Savvides · Bhiksha Raj · Takahiro Shinozaki · Bernt Schiele · Jindong Wang · Xing Xie · Yue Zhang -
2021 Poster: Directed Graph Contrastive Learning »
Zekun Tong · Yuxuan Liang · Henghui Ding · Yongxing Dai · Xinke Li · Changhu Wang -
2021 Poster: Adaptive Data Augmentation on Temporal Graphs »
Yiwei Wang · Yujun Cai · Yuxuan Liang · Henghui Ding · Changhu Wang · Siddharth Bhatia · Bryan Hooi -
2021 Poster: Progressive Coordinate Transforms for Monocular 3D Object Detection »
Li Wang · Li Zhang · Yi Zhu · Zhi Zhang · Tong He · Mu Li · Xiangyang Xue -
2021 Poster: The Image Local Autoregressive Transformer »
Chenjie Cao · Yuxin Hong · Xiang Li · Chengrong Wang · Chengming Xu · Yanwei Fu · Xiangyang Xue -
2021 : HEAR 2021: Holistic Evaluation of Audio Representations + Q&A »
Joseph Turian · Jordan Shier · Bhiksha Raj · Bjoern Schuller · Christian Steinmetz · George Tzanetakis · Gissel Velarde · Kirk McNally · Max Henry · Nicolas Pinto · Yonatan Bisk · George Tzanetakis · Camille Noufi · Dorien Herremans · Jesse Engel · Justin Salamon · Prany Manocha · Philippe Esling · Shinji Watanabe -
2020 Poster: Improving Generalization in Reinforcement Learning with Mixture Regularization »
KAIXIN WANG · Bingyi Kang · Jie Shao · Jiashi Feng -
2019 Poster: Face Reconstruction from Voice using Generative Adversarial Networks »
Yandong Wen · Bhiksha Raj · Rita Singh -
2017 : Poster Session Music and environmental sounds »
Oriol Nieto · Jordi Pons · Bhiksha Raj · Tycho Tax · Benjamin Elizalde · Juhan Nam · Anurag Kumar -
2012 Poster: Unsupervised Structure Discovery for Semantic Analysis of Audio »
Sourish Chaudhuri · Bhiksha Raj -
2010 Poster: Multiparty Differential Privacy via Aggregation of Locally Trained Classifiers »
Manas A Pathak · Shantanu Rane · Bhiksha Raj -
2009 Poster: A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds »
Paris Smaragdis · Madhusudana Shashanka · Bhiksha Raj