Deep networks can be learned efficiently from unlabeled data. The layers of representation are learned one at a time using a simple learning module that has only one layer of latent variables. The values of the latent variables of one module form the data for training the next module. The most commonly used modules are Restricted Boltzmann Machines or autoencoders with a sparsity penalty on the hidden activities. Although deep networks have been quite successful for tasks such as object recognition, information retrieval, and modeling motion capture data, the simple learning modules do not have multiplicative interactions which are very useful for some types of data.
The talk will show how a third-order energy function can be factorized to yield a simple learning module that retains advantageous properties of a Restricted Boltzmann Machine such as very simple exact inference and a very simple learning rule based on pair-wise statistics. The new module contains multiplicative interactions that are useful for a variety of unsupervised learning tasks. Researchers at the University of Toronto have been using this type of module to extract oriented energy from image patches and dense flow fields from image sequences. The new module can also be used to allow the style of a motion to blend autoregressive models of motion capture data. Finally, the new module can be used to combine an eye-position with a feature-vector to allow a system that has a variable resolution retina to integrate information about shape over many fixations.