NeurIPS Poster Neural Network Architecture Beyond Width and Depth

Poster

Neural Network Architecture Beyond Width and Depth

Shijun Zhang · Zuowei Shen · Haizhao Yang

Hall J (level 1) #825

Keywords: [ Neural Network Approximation ] [ Nested Architecture ] [ Function Composition ] [ Parameter Sharing ]

[ Abstract ]

[ Paper] [ Slides] [ Poster] [ OpenReview]

Abstract: This paper proposes a new neural network architecture by introducing an additional dimension called height beyond width and depth. Neural network architectures with height, width, and depth as hyper-parameters are called three-dimensional architectures. It is shown that neural networks with three-dimensional architectures are significantly more expressive than the ones with two-dimensional architectures (those with only width and depth as hyper-parameters), e.g., standard fully connected networks. The new network architecture is constructed recursively via a nested structure, and hence we call a network with the new architecture nested network (NestNet). A NestNet of height

s

$s$ is built with each hidden neuron activated by a NestNet of height

\leq s - 1

$\le s-1$ . When

s = 1

$s=1$ , a NestNet degenerates to a standard network with a two-dimensional architecture. It is proved by construction that height-

s

$s$ ReLU NestNets with

O (n)

$\mathcal{O}(n)$ parameters can approximate

1

$1$ -Lipschitz continuous functions on

[0, 1]^{d}

$[0,1]^d$ with an error

O (n^{- (s + 1) / d})

$\mathcal{O}(n^{-(s+1)/d})$ , while the optimal approximation error of standard ReLU networks with

O (n)

$\mathcal{O}(n)$ parameters is

O (n^{- 2 / d})

$\mathcal{O}(n^{-2/d})$ . Furthermore, such a result is extended to generic continuous functions on

[0, 1]^{d}

$[0,1]^d$ with the approximation error characterized by the modulus of continuity. Finally, we use numerical experimentation to show the advantages of the super-approximation power of ReLU NestNets.

Chat is not available.