Poster
Only Strict Saddles in the Energy Landscape of Predictive Coding Networks?
Francesco Innocenti · El Mehdi Achour · Ryan Singh · Christopher L Buckley
West Ballroom A-D #5707
Predictive coding (PC) is a brain-inspired learning algorithm that performs local updates of network activities as well as weights. Recent work has begun to study the properties of PC compared to backpropagation (BP) with gradient descent (GD), but its training dynamics still remain poorly understood. It is well known that the loss landscape of deep neural networks abounds with "non-strict" saddle points where the Hessian is positive semidefinite, which can lead to vanishing gradients and exponentially slow GD convergence. Here, we present theoretical and empirical evidence that the PC energy at the equilibrium of the network activities has only "strict" saddles with negative curvature. For deep linear networks, we prove that the saddle at the origin of the energy is strict, in contrast to the mean squared error (MSE) loss where it is non-strict for any network with more than one hidden layer. We support our theory with experiments on both linear and non-linear networks, showing that when initialised close to the origin, PC converges substantially faster than BP with stochastic GD. In addition, we prove that a set of non-strict saddles of the MSE other than the origin becomes strict in the equilibrated energy. Overall, these results highlight the higher robustness to initialisation of PC and raise further questions about the relationship between the loss and the energy landscape.
Live content is unavailable. Log in and register to view live content