On Depth Efficiency of Convolutional Networks: the use of Hierarchical Tensor Decomposition for Network Design and Analysis
in
Workshop: Learning with Tensors: Why Now and How?
Abstract
Our formal understanding of the inductive bias that drives the success of deep convolutional networks on computer vision tasks is limited. In particular, it is unclear what makes hypotheses spaces born from convolution and pooling operations so suitable for natural images.
I will present recent work that derive an equivalence between convolutional networks and hierarchical tensor decompositions. Under this equivalence, the structure of a network corresponds to the type of decomposition, and the network weights correspond to the decomposition parameters. This allows analyzing hypotheses spaces of networks by studying tensor spaces of corresponding decompositions, facilitating the use of algebraic and measure theoretical tools.
Specifically, the results I will present include showing how exponential depth efficiency is achieved in a family of deep networks called Convolutional Arithmetic Circuits, show that CAC is equivalent to SimNets, show that depth efficiency is superior to conventional ConvNets and show how inductive bias is tied to correlations between regions of the input image. In particular,
correlations are formalized through the notion of separation rank, which for a given input partition, measures how far a function is from being separable.
I will show that a polynomially sized deep network supports exponentially high separation ranks for certain input partitions, while being limited to polynomial separation ranks for others.
The network's pooling geometry effectively determines which input partitions are favored, thus serves as a means for controlling the inductive bias.
Contiguous pooling windows as commonly employed in practice favor interleaved partitions over coarse ones, orienting the inductive bias towards the statistics of natural images.
In addition to analyzing deep networks, I will show that shallow ones support only linear separation ranks, and by this gain insight into the benefit of functions brought forth by depth -- they are able to efficiently model strong correlation under favored partitions of the input.
This work covers material recently presented in COLT, ICML and CVPR including recent Arxiv submissions. The work was jointly done with doctoral students Nadav Cohen and Or Sharir.