Timezone: »
As an indispensable component, Batch Normalization (BN) has successfully improved the training of deep neural networks (DNNs) with mini-batches, by normalizing the distribution of the internal representation for each hidden layer. However, the effectiveness of BN would diminish with the scenario of micro-batch (e.g. less than 4 samples in a mini-batch), since the estimated statistics in a mini-batch are not reliable with insufficient samples. This limits BN's room in training larger models on segmentation, detection, and video-related problems, which require small batches constrained by memory consumption. In this paper, we present a novel normalization method, called Kalman Normalization (KN), for improving and accelerating the training of DNNs, particularly under the context of micro-batches. Specifically, unlike the existing solutions treating each hidden layer as an isolated system, KN treats all the layers in a network as a whole system, and estimates the statistics of a certain layer by considering the distributions of all its preceding layers, mimicking the merits of Kalman Filtering. On ResNet50 trained in ImageNet, KN has 3.4% lower error than its BN counterpart when using a batch size of 4; Even when using typical batch sizes, KN still maintains an advantage over BN while other BN variants suffer a performance degradation. Moreover, KN can be naturally generalized to many existing normalization variants to obtain gains, e.g. equipping Group Normalization with Group Kalman Normalization (GKN). KN can outperform BN and its variants for large scale object detection and segmentation task in COCO 2017.
Author Information
Guangrun Wang (Sun Yat-sen University)
jiefeng peng (Sun Yat-sen University)
Ping Luo (The Chinese University of Hong Kong)
Xinjiang Wang ( SenseTime Group Ltd.)
Liang Lin (Sun Yat-Sen University)
More from the Same Authors
-
2021 : Geometric Question Answering Towards Multimodal Numerical Reasoning »
Jiaqi Chen · Jianheng Tang · Jinghui Qin · Xiaodan Liang · Lingbo Liu · Eric Xing · Liang Lin -
2022 Spotlight: Divide and Contrast: Source-free Domain Adaptation via Adaptive Contrastive Learning »
Ziyi Zhang · Weikai Chen · Hui Cheng · Zhen Li · Siyuan Li · Liang Lin · Guanbin Li -
2022 Poster: Divide and Contrast: Source-free Domain Adaptation via Adaptive Contrastive Learning »
Ziyi Zhang · Weikai Chen · Hui Cheng · Zhen Li · Siyuan Li · Liang Lin · Guanbin Li -
2022 Poster: Structure-Preserving 3D Garment Modeling with Neural Sewing Machines »
Xipeng Chen · Guangrun Wang · Dizhong Zhu · Xiaodan Liang · Philip Torr · Liang Lin -
2021 Poster: Rethinking the Pruning Criteria for Convolutional Neural Network »
Zhongzhan Huang · Wenqi Shao · Xinjiang Wang · Liang Lin · Ping Luo -
2020 Poster: Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation »
Yangxin Wu · Gengwei Zhang · Hang Xu · Xiaodan Liang · Liang Lin -
2018 Poster: Symbolic Graph Reasoning Meets Convolutions »
Xiaodan Liang · Zhiting Hu · Hao Zhang · Liang Lin · Eric Xing -
2018 Poster: Hybrid Knowledge Routed Modules for Large-scale Object Detection »
ChenHan Jiang · Hang Xu · Xiaodan Liang · Liang Lin -
2014 Poster: Deep Joint Task Learning for Generic Object Extraction »
Xiaolong Wang · Liliang Zhang · Liang Lin · Zhujin Liang · Wangmeng Zuo