Timezone: »

Learning with Tensors: Why Now and How?
Anima Anandkumar · Rong Ge · Yan Liu · Maximilian Nickel · Qi (Rose) Yu

Fri Dec 09 11:00 PM -- 09:30 AM (PST) @ Area 5 + 6
Event URL: http://tensor-learn.org/ »

Real world data in many domains is multimodal and heterogeneous, such as healthcare, social media, and climate science. Tensors, as generalizations of vectors and matrices, provide a natural and scalable framework for handling data with inherent structures and complex dependencies. Recent renaissance of tensor methods in machine learning ranges from academic research on scalable algorithms for tensor operations, novel models through tensor representations, to industry solutions including Google TensorFlow and Tensor Processing Unit (TPU). In particular, scalable tensor methods have attracted considerable amount of attention, with successes in a series of learning tasks, such as learning latent variable models [Anandkumar et al., 2014; Huang et al., 2015, Ge et al., 2015], relational learning [Nickle et al., 2011, 2014, 2016], spatio-temporal forecasting [Yu et al., 2014, 2015, 2016] and training deep neural networks [Alexander et al., 2015].

These progresses trigger new directions and problems towards tensor methods in machine learning. The workshop aims to foster discussion, discovery, and dissemination of research activities and outcomes in this area and encourages breakthroughs. We will bring together researchers in theories and applications who are interested in tensors analysis and development of tensor-based algorithms. We will also invite researchers from related areas, such as numerical linear algebra, high-performance computing, deep learning, statistics, data analysis, and many others, to contribute to this workshop. We believe that this workshop can foster new directions, closer collaborations and novel applications. We also expect a deeper conversation regarding why learning with tensors at current stage is important, where it is useful, what tensor computation softwares and hardwares work well in practice and, how we can progress further with interesting research directions and open problems.

Fri 11:30 p.m. - 11:40 p.m.
Opening Remarks
Fri 11:40 p.m. - 12:20 a.m.

Our formal understanding of the inductive bias that drives the success of deep convolutional networks on computer vision tasks is limited. In particular, it is unclear what makes hypotheses spaces born from convolution and pooling operations so suitable for natural images. I will present recent work that derive an equivalence between convolutional networks and hierarchical tensor decompositions. Under this equivalence, the structure of a network corresponds to the type of decomposition, and the network weights correspond to the decomposition parameters. This allows analyzing hypotheses spaces of networks by studying tensor spaces of corresponding decompositions, facilitating the use of algebraic and measure theoretical tools.
Specifically, the results I will present include showing how exponential depth efficiency is achieved in a family of deep networks called Convolutional Arithmetic Circuits, show that CAC is equivalent to SimNets, show that depth efficiency is superior to conventional ConvNets and show how inductive bias is tied to correlations between regions of the input image. In particular, correlations are formalized through the notion of separation rank, which for a given input partition, measures how far a function is from being separable. I will show that a polynomially sized deep network supports exponentially high separation ranks for certain input partitions, while being limited to polynomial separation ranks for others. The network's pooling geometry effectively determines which input partitions are favored, thus serves as a means for controlling the inductive bias. Contiguous pooling windows as commonly employed in practice favor interleaved partitions over coarse ones, orienting the inductive bias towards the statistics of natural images. In addition to analyzing deep networks, I will show that shallow ones support only linear separation ranks, and by this gain insight into the benefit of functions brought forth by depth -- they are able to efficiently model strong correlation under favored partitions of the input.

This work covers material recently presented in COLT, ICML and CVPR including recent Arxiv submissions. The work was jointly done with doctoral students Nadav Cohen and Or Sharir.

Sat 12:20 a.m. - 1:00 a.m.
Contributed Talks (Talk)
Sat 1:00 a.m. - 1:30 a.m.
Poster Spotlight 1 (Poster)
Sat 1:30 a.m. - 2:00 a.m.
Coffee Break and Poster Session 1 (Break)
Sat 2:00 a.m. - 2:40 a.m.
Tensor Network Ranks (Keynote)
Sat 2:40 a.m. - 3:20 a.m.
Keynote Speech by Jimeng Sun (Keynote)
Sat 2:40 a.m. - 3:20 a.m.
Computational Phenotyping using Tensor Factorization (Keynote)
Sat 3:20 a.m. - 5:00 a.m.
Lunch (Break)
Sat 5:00 a.m. - 5:40 a.m.

From a theoretical perspective, low-rank tensor factorization is an algorithmic miracle, allowing for (provably correct) reconstruction and learning in a number of settings. From a practical standpoint, we still lack sufficiently robust, versatile, and efficient tensor factorization algorithms, particularly for large-scale problems. Many of the algorithms with provable guarantees either suffer from an expensive initialization step, and require the iterative removal of rank-1 factors, destroying any sparsity that might be present in the original tensor. On the other hand, the most commonly used algorithm in practice is "alternating least squares" [ALS], which iteratively fixes all but one mode, and optimizes the remaining mode. This algorithm is extremely efficient, but often converges to bad local optima, particularly when the weights of the factors are non-uniform. In this work, we propose a modification of the ALS approach that enjoys practically viable efficiency, as well as provable recovery (assuming the factors are random or have small pairwise inner products) even for highly non-uniform weights. We demonstrate the significant superiority of our recovery algorithm over the traditional ALS on both random synthetic data, and on computing word embeddings from a third-order word tri-occurrence tensor.

This is based on joint work with Vatsal Sharan.

Sat 5:40 a.m. - 6:00 a.m.
Poster Spotlight (Poster)
Sat 5:40 a.m. - 6:00 a.m.
Poster Spotlight 2 (Poster)
Sat 6:00 a.m. - 6:30 a.m.
Coffee Break and Poster Session (Break)
Sat 6:00 a.m. - 6:30 a.m.
Coffee Break and Poster Session 2 (Break)
Sat 6:30 a.m. - 7:10 a.m.

Tensors and tensor decompositions have been very popular and effective tools for analyzing multi-aspect data in a wide variety of fields, ranging from Psychology to Chemometrics, and from Signal Processing to Data Mining and Machine Learning.

Using tensors in the era of big data poses the challenge of scalability and efficiency. In this talk, I will discuss recent techniques on tackling this challenge by parallelizing and speeding up tensor decompositions, especially for very sparse datasets (such as the ones encountered for example in online social network analysis).

In addition to scalability, I will also touch upon the challenge of unsupervised quality assessment, where in absence of ground truth, we seek to automatically select the decomposition model that captures best the structure in our data.

The talk will conclude with a discussion on future research directions and open problems in tensors for big data analytics.

Sat 7:10 a.m. - 8:00 a.m.
PhD Symposium (Talk)
Sat 8:00 a.m. - 9:00 a.m.
Panel Discussion and Closing Remarks (Panel)

Author Information

Anima Anandkumar (Caltech)
Rong Ge (Princeton University)
Yan Liu (DiDi AI Labs)
Maximilian Nickel (Facebook AI Research)
Rose Yu (University of Southern California)

More from the Same Authors