Workshop
Second Workshop on Quantum Tensor Networks in Machine Learning
XiaoYang Liu · Qibin Zhao · Ivan Oseledets · Yufei Ding · Guillaume Rabusseau · Jean Kossaifi · Khadijeh Najafi · Anwar Walid · Andrzej Cichocki · Masashi Sugiyama
Quantum tensor networks in machine learning (QTNML) are envisioned to have great potential to advance AI technologies. Quantum machine learning [1][2] promises quantum advantages (potentially exponential speedups in training [3], quadratic improvements in learning efficiency [4]) over classical machine learning, while tensor networks provide powerful simulations of quantum machine learning algorithms on classical computers. As a rapidly growing interdisciplinary area, QTNML may serve as an amplifier for computational intelligence, a transformer for machine learning innovations, and a propeller for AI industrialization.
Tensor networks [5], a contracted network of factor core tensors, have arisen independently in several areas of science and engineering. Such networks appear in the description of physical processes and an accompanying collection of numerical techniques have elevated the use of quantum tensor networks into a variational model of machine learning. These techniques have recently proven ripe to apply to many traditional problems faced in deep learning [6,7,8]. More potential QTNML technologies are rapidly emerging, such as approximating probability functions, and probabilistic graphical models [9,10,11,12]. Quantum algorithms are typically described by quantum circuits (quantum computational networks) that are indeed a class of tensor networks, creating an evident interplay between classical tensor network contraction algorithms and executing tensor contractions on quantum processors. The modern field of quantum enhanced machine learning has started to utilize several tools from tensor network theory to create new quantum models of machine learning and to better understand existing ones. However, the topic of QTNML is relatively young and many open problems are still to be explored.
Schedule
Tue 6:00 a.m.  6:05 a.m.

Opening Remarks
(
Opening
)
>
SlidesLive Video Greeting. 
XiaoYang Liu 🔗 
Tue 6:05 a.m.  6:35 a.m.

Efficient Quantum Optimization via MultiBasis Encodings and Tensor Rings
(
Talk
)
>
Despite extensive research efforts, few quantum algorithms for classical optimization demonstrate an advantage that is realizable on nearterm devices. The utility of many quantum algorithms is limited by high requisite circuit depth and nonconvex optimization landscapes. We tackle these challenges by introducing a new variational quantum algorithm that utilizes multibasis graph encodings and nonlinear activation functions. Our technique results in increased optimization performance, a factor of two increase in effective quantum resources, and a quadratic reduction in measurement complexity. Further, we construct exact circuit representations using factorized tensor rings. This enables us to successfully optimize the MaxCut of the nonlocal 512vertex DIMACS library graphs on a single A100 GPU using only shallow circuits. We further provide efficient distributed implementation via the TensorlyQuantum library. 
Anima Anandkumar 🔗 
Tue 6:35 a.m.  6:45 a.m.

Anima Anandkumar
(
Q&A
)
>

🔗 
Tue 6:45 a.m.  7:15 a.m.

High Performance Computation for Tensor Networks Learning
(
Talk
)
>
SlidesLive Video In this talk, we study high performance computation for tensor networks to address time and space complexities that grow rapidly with the tensor size. We propose efficient primitives that exploit parallelism in tensor learning for efficient implementation on GPU. 
Anwar Walid · XiaoYang Liu 🔗 
Tue 7:15 a.m.  7:25 a.m.

Anwar Walid
(
Q&A
)
>

Anwar Walid 🔗 
Tue 7:25 a.m.  7:55 a.m.

Multigraph Tensor Networks: Big Data Analytics on Irregular Domains
(
Talk
)
>
SlidesLive Video The current availability of powerful computers and huge data sets is creating new opportunities in computational mathematics to bring together concepts and tools from tensor algebra, graph theory, machine learning and signal processing. In discrete mathematics, a tensor is merely a collection of points (nodes in a graph) which are arranged as a multiway arrray. The power of such tensors lies in the fact that they can represent entities as diverse as the users of social networks or financial market data, and that these can be transformed into lowdimensional signals which can be analyzed using data analytics tools. In this talk, we aim to provide a comprehensive introduction to advanced data analytics on graphs using tensor. We will then establish a relationship between tensors and graphs, in order to move beyond the standard regular sampling in time and space. This facilitates modelling in many important areas, including communication networks, computer science, linguistics, social sciences, biology, physics, chemistry, transport, town planning, financial systems, personal health and many others. The tensor and graph topologies will be revisited from a modern data analytics point of view, and we will then proceed to establish a taxonomy of graph tensor networks. With this as a basis, we show such a framework allows for even the most challenging machine learning tasks, such as clustering, being performed in an intuitive and physically meaningful way. Unique aspects of the multigraph tensor networks (MGTN) framework will be outlined, such as their benefits for processing data acquired on irregular domains, their ability to finelytune statistical learning procedures through local information processing, the concepts of random signals on graphs and tensors, learning of graph topology from data observed on graphs, and confluence with deep neural networksnd Big Data. Extensive examples are included to render the concepts more concrete and to facilitate a greater understanding of the underlying principles. 
Danilo Mandic 🔗 
Tue 7:55 a.m.  8:05 a.m.

Danilo P. Mandic
(
Q&A
)
>

Danilo Mandic 🔗 
Tue 8:05 a.m.  8:35 a.m.

Implicit Regularization in Quantum Tensor Networks
(
Talk
)
>
SlidesLive Video The mysterious ability of neural networks to generalize is believed to stem from an implicit regularization, a tendency of gradientbased optimization to fit training data with predictors of low “complexity.” Despite vast efforts, a satisfying formalization of this intuition is lacking. In this talk I will present a series of works theoretically analyzing the implicit regularization in quantum tensor networks, known to be equivalent to certain (nonlinear) neural networks. Through dynamical characterizations, I will establish an implicit regularization towards low tensor ranks, different from any type of norm minimization, in contrast to prior beliefs. I will then discuss implications of this finding to both theory (potential explanation for generalization over natural data) and practice (compression of neural network layers, novel regularization schemes). An underlying theme of the talk will be the potential of quantum tensor networks to unravel mysteries behind deep learning. Works covered in the talk were in collaboration with Sanjeev Arora, Wei Hu, Yuping Luo, Asaf Maman and Noam Razin. 
Nadav Cohen 🔗 
Tue 8:35 a.m.  8:45 a.m.

Nadav Cohen
(
Q&A
)
>

Nadav Cohen 🔗 
Tue 8:45 a.m.  9:15 a.m.

Stefanos Kourtis
(
Talk
)
>
SlidesLive Video 
Stefanos Kourtis 🔗 
Tue 9:15 a.m.  9:25 a.m.

Stefanos Kourtis
(
Q&A
)
>

🔗 
Tue 9:30 a.m.  11:30 a.m.

Coffee Break + Poster Session (GatherTown) ( poster session ) > link  🔗 
Tue 11:30 a.m.  11:35 a.m.

Model based multiagent reinforcement learning with tensor decompositions
(
Oral
)
>
SlidesLive Video A challenge in multiagent reinforcement learning is to be able to generalize over intractable stateaction spaces. This work achieves generalisation in stateaction space over unexplored stateaction pairs by modelling the transition and reward functions as tensors of low CPrank. Initial experiments show that using tensor decompositions in a modelbased reinforcement learning algorithm can lead to much faster convergence if the true transition and reward functions are indeed of low rank. 
Pascal van der Vaart · Anuj Mahajan · Shimon Whiteson 🔗 
Tue 11:35 a.m.  11:40 a.m.

Improvements to gradient descent methods for quantum tensor network machine learning
(
Oral
)
>
SlidesLive Video Tensor networks have demonstrated significant value for machine learning in a myriad of different applications. However, optimizing tensor networks using standard gradient descent has proven to be difficult in practice. Tensor networks suffer from initialization problems resulting in exploding or vanishing gradients and require extensive hyperparameter tuning. Efforts to overcome these problems usually depend on specific network architectures, or ad hoc prescriptions. In this paper we address the problems of initialization and hyperparameter tuning, making it possible to train tensor networks using established machine learning techniques. We introduce a `copy node' method that successfully initializes arbitrary tensor networks, in addition to a gradient based regularization technique for bond dimensions. We present numerical results that show that the combination of techniques presented here produces quantuminspired tensor network models with far fewer parameters, while improving generalization performance. 
James Dborin 🔗 
Tue 11:40 a.m.  11:45 a.m.

Tensor Rings for Learning Circular Hidden Markov Models
(
Oral
)
>
SlidesLive Video Temporal hidden Markov models (HMMs) and tensor trains are two powerful mathematical tools for learning and representing highdimensional largescale data. However, both HMMs and tensor trains have temporal topology but are not ergodic. Ergodicity occurs in a broad range of systems in physics and in geometry, and many instances of realworld data originate from an ergodic system that exhibits temporal behavior. To overcome the fundamental limitations of temporal HMMs and tensor trains, i.e., ergodicity deficiency, we propose a new tensor topology inspired by tensor rings and circular HMMs. Our new tensor ring models namely Circular Matrix Product State (CMPS), Circular Born Machine (CBM), and Circular Locally Purified State (CLPS) are both temporal and ergodic. Then, we show the relationship between our tensor ring models and circular hidden Markov models/hidden quantum Markov models. Our findings through numerical experiments indicate that our tensor ring models have significant advantages over tensor trains in varied situations including cases where ergodicity is not required. 
Mohammad Ali Javidian · Vaneet Aggarwal · Zubin Jacob 🔗 
Tue 11:45 a.m.  11:50 a.m.

ContracTN: A Tensor Network Library Designed for Machine Learning
(
Oral
)
>
SlidesLive Video Although first developed for the needs of quantum manybody physics and quantum computing, tensor networks (TNs) are increasingly being deployed to solve a wide range of problems in machine learning, optimization, and applied mathematics. Inspired by the distinct implementation challenges of TN methods in these new settings, we present ContracTN, a lightweight Python library for generalpurpose TN calculations. Beyond the use of the dense tensor cores supported in standard TN libraries, ContracTN also supports the use of copy tensors, parameterfree objects which allow diverse concepts like batch computation, elementwise multiplication, and summation to be expressed entirely in the language of TN diagrams. The contraction engine of ContracTN also implements a novel form of stabilization, which largely mitigates the issue of numerical overflow arising from the use of lowprecision machine learning libraries for TN contraction. Overall, we wish to popularize a collection of methods which have proven invaluable in implementing efficient and robust TN models, in the hope that this can help catalyze the wider adoption of TN methods for problems in machine learning. 
Jacob Miller · Guillaume Rabusseau 🔗 
Tue 11:50 a.m.  11:55 a.m.

Tensor Ring Parametrized Variational Quantum Circuits for Large Scale Quantum Machine Learning
(
Oral
)
>
SlidesLive Video Quantum Machine Learning (QML) is an emerging research area advocating the use of quantum computing for advancement in machine learning. Since the discovery of the capability of Parametrized Variational Quantum Circuits (VQC) to replace Artificial Neural Networks, they have been widely adopted to different tasks in Quantum Machine Learning. However, despite their potential to outperform neural networks, VQCs are limited to small scale applications given the challenges in scalability of quantum circuits. To address this shortcoming, we propose an algorithm that compresses the quantum state within the circuit using a tensor ring representation. Using the input qubit state in the tensor ring representation, single qubit gates maintain the tensor ring representation. However, the same is not true for two qubit gates in general, where an approximation is used to have the output as a tensor ring representation. Using this approximation, the storage and computational time increases linearly in the number of qubits and number of layers, as compared to the exponential increase with exact simulation algorithms. This approximation is used to implement the tensor ring VQC. The training of the parameters of tensor ring VQC is performed using a gradient descent based algorithm, where efficient approaches for backpropagation are used. The proposed approach is evaluated on two datasets: Iris and MNIST for the classification task to show the improved accuracy using more number of qubits. We achieve a test accuracy of 83.33\% on Iris dataset and 79.57\% and 74.08\% on binary and ternary classification of MNIST dataset using a 4 qubit circuit. A test accuracy of 90.01\% with binary classification and 76.31\% with ternary classification is obtained on MNIST data using an 8 qubit circuit. These results outperform the results on VQCs implemented on Qiskit, and being scalable, demonstrates the potential for VQCs to be used for large scale Quantum Machine Learning applications. 
Dheeraj Peddireddy · Vipul Bansal · Zubin Jacob · Vaneet Aggarwal 🔗 
Tue 11:55 a.m.  12:00 p.m.

Nonparametric tensor estimation with unknown permutations
(
Oral
)
>
SlidesLive Video
We consider the problem of structured tensor denoising in the presence of unknown permutations. Such data problems arise commonly in recommendation system, neuroimaging, community detection, and multiway comparison applications. Here, we develop a general family of smooth tensors up to arbitrarily index permutations; the model incorporates the popular block models and graphon models. We show that a constrained leastsquares estimate in the blockwise polynomial family achieves the minimax error bound. A phase transition phenomenon is revealed with respect to the smoothness threshold needed for optimal recovery. In particular, we find that a polynomial of degree of $m(m1)/2$ is sufficient for accurate recovery of order$m$ tensors, whereas higher smoothness exhibits no further benefits. Furthermore, we provide an efficient polynomialtime Borda count algorithm that provably achieves optimal rate under monotonicity assumptions. The efficacy of our procedure is demonstrated through both simulations and Chicago crime date analysis.

Chanwoo Lee · Miaoyan Wang 🔗 
Tue 12:00 p.m.  12:05 p.m.

Bayesian Tensor Networks
(
Oral
)
>
SlidesLive Video Tensor network (TN) methods have proven their considerable potential in deterministic regression and classification related paradigms, but remain underexplored in probabilistic settings. To this end, we introduce a variational inference TN framework for supervised learning, referred to as the Bayesian Tensor Network (BTN).This is achieved by making use of the multilinear nature of tensor networks to construct a structured variational model which scales linearly with data dimensionality.The so imposed low rank structure on the tensor mean and Kronecker separability on the local covariances, make it possible to efficiently induce weight dependencies in the posterior distribution, thus enhancing model expressiveness at a drastically lower parameter complexity compared to the standard meanfield approach. A comprehensive validation of the proposed approach indicates the competitiveness of BTNs against modern structured Bayesian neural network approaches, while exhibiting enhanced interpretability and efficiency 
Kriton Konstantinidis · Yao Lei Xu · Qibin Zhao · Danilo Mandic 🔗 
Tue 12:05 p.m.  12:10 p.m.

A Tensorized Spectral Attention Mechanism for Efficient Natural Language Processing
(
Oral
)
>
SlidesLive Video The attention mechanism is at the core of stateoftheart Natural Language Processing (NLP) models, owing to its ability to focus on the most contextually relevant part of a sequence. However, current attention models rely on "flatview" matrix methods to process sequence of tokens embedded in vector spaces, resulting in exceedingly high parameter complexity for practical applications. To this end, we introduce a novel Tensorized Spectral Attention (TSA) mechanism, which leverages on the Graph Tensor Network (GTN) framework to efficiently process tensorized token embeddings via attention based spectral graph filters. By virtue of multilinear algebra, such tensorized token embeddings are shown to effectively bypass the Curse of Dimensionality, reducing the parameter complexity of the attention mechanism from exponential to linear in the weight matrix dimensions. Furthermore, the graph formulation of the attention domain enables the processing of tensorized embeddings through spectral graph convolution filters, which further increases its expressive power. The benefits of the TSA are demonstrated through five benchmark NLP experiments, where the proposed mechanism is shown to achieve better or comparable results against traditional attention models, while incurring drastically lower parameter complexity. 
Yao Lei Xu · Kriton Konstantinidis · Shengxi Li · Danilo Mandic 🔗 
Tue 12:10 p.m.  12:15 p.m.

Rademacher Random Projections with Tensor Networks
(
Oral
)
>
SlidesLive Video Random projection~(RP) have recently emerged as popular techniques in the machine learning community for their ability in reducing the dimension of very highdimensional tensors. Following the work in \cite{rakhshan2020tensorized}, we consider a tensorized random projection relying on Tensor Train (TT) decomposition where each element of the core tensors is drawn from a Rademacher distribution. Our theoretical results reveal that the Gaussian lowrank tensor represented in compressed form in TT format in \cite{rakhshan2020tensorized} can be replaced by a TT tensor with core elements drawn from a Rademacher distribution with the same embedding size. Experiments on synthetic data demonstrate that tensorized Rademacher RP can outperform the tensorized Gaussian RP studied in \cite{rakhshan2020tensorized}. In addition, we show both theoretically and experimentally, that the tensorized RP in the Matrix Product Operator~(MPO) format proposed in \cite{batselier2018computing} for performing SVD on large matrices is not a JohnsonLindenstrauss transform~(JLT) and therefore not a wellsuited random projection map. 
Beheshteh Rakhshan · Guillaume Rabusseau 🔗 
Tue 12:15 p.m.  12:20 p.m.

Reinforcement Learning in Factored Action Spaces using Tensor Decompositions
(
Oral
)
>
SlidesLive Video We present an extended abstract for the previously published work Tesseract, Mahajan et al. 2021, which proposes a novel solution for Reinforcement Learning (RL) in large, factored action spaces using tensor decompositions. The goal of this abstract is twofold: (1) To garner greater interest amongst the tensor research community for creating methods and analysis for approximate RL, (2) To elucidate the generalised setting of factored action spaces where tensor decompositions can be used. We use cooperative multiagent reinforcement learning scenario as the exemplary setting where the action space is naturally factored across agents and learning becomes intractable without resorting to approximation on the underlying hypothesis space for candidate solutions. 
Anuj Mahajan · Mikayel Samvelyan · Lei Mao · Viktor Makoviichuk · Animesh Garg · Jean Kossaifi · Shimon Whiteson · Yuke Zhu · Anima Anandkumar 🔗 
Tue 12:20 p.m.  12:25 p.m.

Towards a TracePreserving Tensor Network Representation of Quantum Channels
(
Oral
)
>
SlidesLive Video The problem of characterizing quantum channels arises in a number of contexts such as quantum process tomography and quantum error correction.
However, direct approaches to parameterizing and optimizing the Choi matrix representation of quantum channels face a curse of dimensionality: the number of parameters scales exponentially in the number of qubits. Recently, Torlai et al. [2020] proposed using locally purified density operators (LPDOs), a tensor network representation of Choi matrices, to overcome the unfavourable scaling in parameters. While the LPDO structure allows it to satisfy a 
Siddarth Srinivasan · Sandesh Adhikary · Jacob Miller · Guillaume Rabusseau · Byron Boots 🔗 
Tue 12:25 p.m.  12:30 p.m.

Distributive Pretraining of Generative Modeling Using Matrix Product States
(
Oral
)
>
SlidesLive Video Tensor networks have recently found applications in machine learning for both supervised learning and unsupervised learning. The most common approaches for training these models are gradient descent methods. In this work, we consider an alternative training scheme utilizing basic tensor network operations, summation and compression. The training algorithm is based on compressing the superposition state constructed from all the training data in product state representation. The algorithm could be parallelized easily and only iterate through the dataset once. Hence, it serves as a pretraining algorithm. We demonstrate the algorithm on the MNIST dataset and show reasonable results for generating new images and classification tasks. Furthermore, we provide an interpretation of the algorithm as a compressed quantum kernel density estimation for the probability amplitude of input data. 
ShengHsuan Lin 🔗 
Tue 12:30 p.m.  2:30 p.m.

Discussion Pannel
(
Discussion Pannel
)
>
SlidesLive Video 
XiaoYang Liu · Qibin Zhao · Chao Li · Guillaume Rabusseau 🔗 
Tue 2:30 p.m.  2:35 p.m.

Closing Remarks
(
Closing
)
>
SlidesLive Video 
🔗 


Bayesian Tensor Networks
(
Poster
)
>
Tensor network (TN) methods have proven their considerable potential in deterministic regression and classification related paradigms, but remain underexplored in probabilistic settings. To this end, we introduce a variational inference TN framework for supervised learning, referred to as the Bayesian Tensor Network (BTN).This is achieved by making use of the multilinear nature of tensor networks to construct a structured variational model which scales linearly with data dimensionality.The so imposed low rank structure on the tensor mean and Kronecker separability on the local covariances, make it possible to efficiently induce weight dependencies in the posterior distribution, thus enhancing model expressiveness at a drastically lower parameter complexity compared to the standard meanfield approach. A comprehensive validation of the proposed approach indicates the competitiveness of BTNs against modern structured Bayesian neural network approaches, while exhibiting enhanced interpretability and efficiency 
Kriton Konstantinidis · Yao Lei Xu · Qibin Zhao · Danilo Mandic 🔗 


A Tensorized Spectral Attention Mechanism for Efficient Natural Language Processing
(
Poster
)
>
The attention mechanism is at the core of stateoftheart Natural Language Processing (NLP) models, owing to its ability to focus on the most contextually relevant part of a sequence. However, current attention models rely on "flatview" matrix methods to process sequence of tokens embedded in vector spaces, resulting in exceedingly high parameter complexity for practical applications. To this end, we introduce a novel Tensorized Spectral Attention (TSA) mechanism, which leverages on the Graph Tensor Network (GTN) framework to efficiently process tensorized token embeddings via attention based spectral graph filters. By virtue of multilinear algebra, such tensorized token embeddings are shown to effectively bypass the Curse of Dimensionality, reducing the parameter complexity of the attention mechanism from exponential to linear in the weight matrix dimensions. Furthermore, the graph formulation of the attention domain enables the processing of tensorized embeddings through spectral graph convolution filters, which further increases its expressive power. The benefits of the TSA are demonstrated through five benchmark NLP experiments, where the proposed mechanism is shown to achieve better or comparable results against traditional attention models, while incurring drastically lower parameter complexity. 
Yao Lei Xu · Kriton Konstantinidis · Shengxi Li · Danilo Mandic 🔗 


Model based multiagent reinforcement learning with tensor decompositions
(
Poster
)
>
A challenge in multiagent reinforcement learning is to be able to generalize over intractable stateaction spaces. This work achieves generalisation in stateaction space over unexplored stateaction pairs by modelling the transition and reward functions as tensors of low CPrank. Initial experiments show that using tensor decompositions in a modelbased reinforcement learning algorithm can lead to much faster convergence if the true transition and reward functions are indeed of low rank. 
Pascal van der Vaart · Anuj Mahajan · Shimon Whiteson 🔗 


Improvements to gradient descent methods for quantum tensor network machine learning
(
Poster
)
>
Tensor networks have demonstrated significant value for machine learning in a myriad of different applications. However, optimizing tensor networks using standard gradient descent has proven to be difficult in practice. Tensor networks suffer from initialization problems resulting in exploding or vanishing gradients and require extensive hyperparameter tuning. Efforts to overcome these problems usually depend on specific network architectures, or ad hoc prescriptions. In this paper we address the problems of initialization and hyperparameter tuning, making it possible to train tensor networks using established machine learning techniques. We introduce a `copy node' method that successfully initializes arbitrary tensor networks, in addition to a gradient based regularization technique for bond dimensions. We present numerical results that show that the combination of techniques presented here produces quantuminspired tensor network models with far fewer parameters, while improving generalization performance. 
James Dborin 🔗 


Rademacher Random Projections with Tensor Networks
(
Poster
)
>
Random projection~(RP) have recently emerged as popular techniques in the machine learning community for their ability in reducing the dimension of very highdimensional tensors. Following the work in \cite{rakhshan2020tensorized}, we consider a tensorized random projection relying on Tensor Train (TT) decomposition where each element of the core tensors is drawn from a Rademacher distribution. Our theoretical results reveal that the Gaussian lowrank tensor represented in compressed form in TT format in \cite{rakhshan2020tensorized} can be replaced by a TT tensor with core elements drawn from a Rademacher distribution with the same embedding size. Experiments on synthetic data demonstrate that tensorized Rademacher RP can outperform the tensorized Gaussian RP studied in \cite{rakhshan2020tensorized}. In addition, we show both theoretically and experimentally, that the tensorized RP in the Matrix Product Operator~(MPO) format proposed in \cite{batselier2018computing} for performing SVD on large matrices is not a JohnsonLindenstrauss transform~(JLT) and therefore not a wellsuited random projection map. 
Beheshteh Rakhshan · Guillaume Rabusseau 🔗 


Reinforcement Learning in Factored Action Spaces using Tensor Decompositions
(
Poster
)
>
We present an extended abstract for the previously published work Tesseract, Mahajan et al. 2021, which proposes a novel solution for Reinforcement Learning (RL) in large, factored action spaces using tensor decompositions. The goal of this abstract is twofold: (1) To garner greater interest amongst the tensor research community for creating methods and analysis for approximate RL, (2) To elucidate the generalised setting of factored action spaces where tensor decompositions can be used. We use cooperative multiagent reinforcement learning scenario as the exemplary setting where the action space is naturally factored across agents and learning becomes intractable without resorting to approximation on the underlying hypothesis space for candidate solutions. 
Anuj Mahajan · Mikayel Samvelyan · Lei Mao · Viktor Makoviichuk · Animesh Garg · Jean Kossaifi · Shimon Whiteson · Yuke Zhu · Anima Anandkumar 🔗 


Tensor Rings for Learning Circular Hidden Markov Models
(
Poster
)
>
Temporal hidden Markov models (HMMs) and tensor trains are two powerful mathematical tools for learning and representing highdimensional largescale data. However, both HMMs and tensor trains have temporal topology but are not ergodic. Ergodicity occurs in a broad range of systems in physics and in geometry, and many instances of realworld data originate from an ergodic system that exhibits temporal behavior. To overcome the fundamental limitations of temporal HMMs and tensor trains, i.e., ergodicity deficiency, we propose a new tensor topology inspired by tensor rings and circular HMMs. Our new tensor ring models namely Circular Matrix Product State (CMPS), Circular Born Machine (CBM), and Circular Locally Purified State (CLPS) are both temporal and ergodic. Then, we show the relationship between our tensor ring models and circular hidden Markov models/hidden quantum Markov models. Our findings through numerical experiments indicate that our tensor ring models have significant advantages over tensor trains in varied situations including cases where ergodicity is not required. 
Mohammad Ali Javidian · Vaneet Aggarwal · Zubin Jacob 🔗 


Towards a TracePreserving Tensor Network Representation of Quantum Channels
(
Poster
)
>
The problem of characterizing quantum channels arises in a number of contexts such as quantum process tomography and quantum error correction.
However, direct approaches to parameterizing and optimizing the Choi matrix representation of quantum channels face a curse of dimensionality: the number of parameters scales exponentially in the number of qubits. Recently, Torlai et al. [2020] proposed using locally purified density operators (LPDOs), a tensor network representation of Choi matrices, to overcome the unfavourable scaling in parameters. While the LPDO structure allows it to satisfy a 
Siddarth Srinivasan · Sandesh Adhikary · Jacob Miller · Guillaume Rabusseau · Byron Boots 🔗 


Distributive Pretraining of Generative Modeling Using Matrix Product States
(
Poster
)
>
Tensor networks have recently found applications in machine learning for both supervised learning and unsupervised learning. The most common approaches for training these models are gradient descent methods. In this work, we consider an alternative training scheme utilizing basic tensor network operations, summation and compression. The training algorithm is based on compressing the superposition state constructed from all the training data in product state representation. The algorithm could be parallelized easily and only iterate through the dataset once. Hence, it serves as a pretraining algorithm. We demonstrate the algorithm on the MNIST dataset and show reasonable results for generating new images and classification tasks. Furthermore, we provide an interpretation of the algorithm as a compressed quantum kernel density estimation for the probability amplitude of input data. 
ShengHsuan Lin 🔗 


ContracTN: A Tensor Network Library Designed for Machine Learning
(
Poster
)
>
Although first developed for the needs of quantum manybody physics and quantum computing, tensor networks (TNs) are increasingly being deployed to solve a wide range of problems in machine learning, optimization, and applied mathematics. Inspired by the distinct implementation challenges of TN methods in these new settings, we present ContracTN, a lightweight Python library for generalpurpose TN calculations. Beyond the use of the dense tensor cores supported in standard TN libraries, ContracTN also supports the use of copy tensors, parameterfree objects which allow diverse concepts like batch computation, elementwise multiplication, and summation to be expressed entirely in the language of TN diagrams. The contraction engine of ContracTN also implements a novel form of stabilization, which largely mitigates the issue of numerical overflow arising from the use of lowprecision machine learning libraries for TN contraction. Overall, we wish to popularize a collection of methods which have proven invaluable in implementing efficient and robust TN models, in the hope that this can help catalyze the wider adoption of TN methods for problems in machine learning. 
Jacob Miller · Guillaume Rabusseau 🔗 


Tensor Ring Parametrized Variational Quantum Circuits for Large Scale Quantum Machine Learning
(
Poster
)
>
Quantum Machine Learning (QML) is an emerging research area advocating the use of quantum computing for advancement in machine learning. Since the discovery of the capability of Parametrized Variational Quantum Circuits (VQC) to replace Artificial Neural Networks, they have been widely adopted to different tasks in Quantum Machine Learning. However, despite their potential to outperform neural networks, VQCs are limited to small scale applications given the challenges in scalability of quantum circuits. To address this shortcoming, we propose an algorithm that compresses the quantum state within the circuit using a tensor ring representation. Using the input qubit state in the tensor ring representation, single qubit gates maintain the tensor ring representation. However, the same is not true for two qubit gates in general, where an approximation is used to have the output as a tensor ring representation. Using this approximation, the storage and computational time increases linearly in the number of qubits and number of layers, as compared to the exponential increase with exact simulation algorithms. This approximation is used to implement the tensor ring VQC. The training of the parameters of tensor ring VQC is performed using a gradient descent based algorithm, where efficient approaches for backpropagation are used. The proposed approach is evaluated on two datasets: Iris and MNIST for the classification task to show the improved accuracy using more number of qubits. We achieve a test accuracy of 83.33\% on Iris dataset and 79.57\% and 74.08\% on binary and ternary classification of MNIST dataset using a 4 qubit circuit. A test accuracy of 90.01\% with binary classification and 76.31\% with ternary classification is obtained on MNIST data using an 8 qubit circuit. These results outperform the results on VQCs implemented on Qiskit, and being scalable, demonstrates the potential for VQCs to be used for large scale Quantum Machine Learning applications. 
Dheeraj Peddireddy · Vipul Bansal · Zubin Jacob · Vaneet Aggarwal 🔗 


Nonparametric tensor estimation with unknown permutations
(
Poster
)
>
We consider the problem of structured tensor denoising in the presence of unknown permutations. Such data problems arise commonly in recommendation system, neuroimaging, community detection, and multiway comparison applications. Here, we develop a general family of smooth tensors up to arbitrarily index permutations; the model incorporates the popular block models and graphon models. We show that a constrained leastsquares estimate in the blockwise polynomial family achieves the minimax error bound. A phase transition phenomenon is revealed with respect to the smoothness threshold needed for optimal recovery. In particular, we find that a polynomial of degree of $m(m1)/2$ is sufficient for accurate recovery of order$m$ tensors, whereas higher smoothness exhibits no further benefits. Furthermore, we provide an efficient polynomialtime Borda count algorithm that provably achieves optimal rate under monotonicity assumptions. The efficacy of our procedure is demonstrated through both simulations and Chicago crime date analysis.

Chanwoo Lee · Miaoyan Wang 🔗 


LowRank Tensor Completion via Coupled Framelet Transform
(
Poster
)
>
How to represent a lowrank structure embedded in the underlying data is the key issue in tensor completion. In this work, we suggest a novel lowrank tensor representation based on coupled transform, which fully exploits the spatial multiscale nature and redundancy in spatial and spectral/temporal dimensions, leading to a better low tensor multirank approximation. More precisely, this representation is achieved by using twodimensional framelet transform for the two spatial dimensions, one/twodimensional Fourier transform for the temporal/spectral dimension, and then Karhunen–Loeve transform (via singular value decomposition) for the transformed tensor. Based on this lowrank tensor representation, we formulate a novel lowrank tensor completion model for recovering missing information in multidimensional visual data. To tackle the proposed model, we develop the alternating directional method of multipliers (ADMM) algorithm tailored for the structured optimization problem. 
JianLi Wang · TingZhu Huang · XiLe Zhao · TaiXiang Jiang · Michael Ng 🔗 


Matrix product state for quantuminspired feature extraction and compressed sensing
(
Poster
)
>
Improving the interpretability and efficiency of machine learning methods are challenging and important issues. The concept of entanglement, which is a quantity from quantum information science, may contribute to these issues. In this extended abstract, we introduce two tensornetwork (TN) machine learning methods, which concerns feature extraction and compressed sensing, respectively, based on entanglement. For the former [Front. Appl. Math. Stat. 7, 716044 (2021)], the entanglement obtained from matrix product state (MPS) is used as a measure of the importance of the features in the reallife datasets. An entanglementbased feature extraction algorithm is proposed, and used to improve the efficiency of TN machine learning. In the latter [Phys. Rev. Research. 2, 033293 (2020)], TN states are used to realize efficient compressed sampling by entanglementguided projective measurements. This scheme can be applied in the future to compress and communicate the reallife data by entangled qubits. 
WenJun Li · ZhengZhi Sun · ShiJu Ran · Gang Su 🔗 


Bayesian Latent Factor Model for Higherorder Data: an Extended Abstract
(
Poster
)
>
Latent factor models are canonical tools to learn lowdimensional and linear embedding of original data. Traditional latent factor models are based on lowrank matrix factorization of covariance matrices. However, for higherorder data with multiple modes, i.e., tensors, this simple treatment fails to take into account the modespecific relations. This ignorance leads to inefficiency in analysis of complex structures as well as poor data compression ability. In this paper, unlike covariance matrices, we investigate highorder covariance tensor directly by exploiting tensor ring (TR) format and propose the Bayesian TR latent factor model, which can represent complex multilinear correlations and achieves efficient data compression. To overcome the difficulty of finding the optimal TRranks and simultaneously imposing sparsity on loading coefficients, a multiplicative Gamma process (MGP) prior is adopted to automatically infer the ranks and obtain sparsity. Then, we establish efficient parameterexpanded EM algorithm to learn the maximum a posteriori (MAP) estimate of model parameters. 
Zerui Tao · Xuyang ZHAO · Toshihisa Tanaka · Qibin Zhao 🔗 


Is Rank Minimization of the Essence to Learn Tensor Network Structure?
(
Poster
)
>
Structure learning for tensor network (TN) representation is to select the optimal network for TN contraction to fit a tensor. In the existing literature and view of many tensor researchers, this task is widely considered as the same to learning tensor network (model) ranks. In the manuscript, we briefly analyze the relation of these two critical tasks in a rigorous fashion, stating that rank minimization is actually a subtopic of the structure learning for TN, where the graph essence of TN structures is ignored in rank minimization. On one hand, we agree that the two tasks are identical to each other if the TN structure is not constrained on graph. We propose, on the other hand, an open problem called permutation learning in structure learning, i.e., learning the optimal matching between tensor modes and vertices in TN, to point out rank minimization would be failure in structure learning in this case, due to its limitation of exploring graph spaces. We last focus on permutation learning and give several preliminary results to help understand this open problem. 
Chao Li · Qibin Zhao 🔗 


Born Machines for Periodic and Open XY Quantum Spin Chains
(
Poster
)
>
Quantum phase transitions are ubiquitous in quantum many body systems. The quantum fluctuations that occur at very low temperatures are known to be responsible for driving the system across different phases as a function of an external control parameter. The XY Hamiltonian with a transverse field is a basic model that manifests two distinct quantum phase transitions, including spontaneous Z2 symmetry breaking from an ordered to a disordered state. While programmable quantum devices have shown great success in investigating the various exotic quantum phases of matter, in parallel, the quest for harnessing machine learning tools in learning quantum phases of matter is ongoing. In this paper, we present a numerical study of the power of a quantuminspired generative model known as the Born machine in learning quantum phases of matter. Data obtained from the system under open and periodic boundary conditions is considered. Our results indicate that a Born machine based on matrix product states can successfully capture the quantum state across various phases of the XY Hamiltonian and close to a critical point, despite the existence of longrange correlations. We further impose boundary conditions on the Born machine and show that matching the boundary condition of the Born machine and that of the training data improves performance when limited data is available and a small bond dimension is employed. 
Abigail McClain Gomez · Susanne Yelin · Khadijeh Najafi 🔗 


Probabilistic Graphical Models and Tensor Networks: A Hybrid Framework
(
Poster
)
>
We investigate a correspondence between two formalisms for discrete probabilistic modeling: probabilistic graphical models (PGMs) and tensor networks (TNs), a powerful modeling framework for simulating complex quantum systems. The graphical calculus of PGMs and TNs exhibits many similarities, with discrete undirected graphical models (UGMs) being a special case of TNs. However, more general probabilistic TN models such as Born machines (BMs) employ complexvalued hidden states to produce novel forms of correlation among the probabilities. While representing a new modeling resource for capturing structure in discrete probability distributions, this behavior also renders the direct application of standard PGM tools impossible. We aim to bridge this gap by introducing a hybrid PGMTN formalism that integrates quantumlike correlations into PGM models in a principled manner, using the physicallymotivated concept of decoherence. We first prove that applying decoherence to the entirety of a BM model converts it into a discrete UGM, and conversely, that any subgraph of a discrete UGM can be represented as a decohered BM. This method allows a broad family of probabilistic TN models to be encoded as partially decohered BMs, a fact we leverage to combine the representational strengths of both model families. We experimentally verify the performance of such hybrid models in a sequential modeling task, and identify promising uses of our method within the context of existing applications of graphical models. 
Jacob Miller · Geoffrey Roeder 🔗 


QTNVQC: An EndtoEnd Learning Framework for Quantum Neural Networks
(
Poster
)
>
The advent of noisy intermediatescale quantum (NISQ) computers raises a crucial challenge to design quantum neural networks for fully quantum learning tasks. To bridge the gap, this work proposes an endtoend learning framework named QTNVQC, by introducing a trainable quantum tensor network (QTN) for quantum embedding on a variational quantum circuit (VQC). The architecture of QTN is composed of a parametric tensortrain network for feature extraction and a tensor product encoding for quantum embedding. We highlight the QTN for quantum embedding in terms of two perspectives: (1) we theoretically characterize QTN by analyzing its representation power of input features; (2) QTN enables an endtoend parametric model pipeline, namely QTNVQC, from the generation of quantum embedding to the output measurement. Our experiments on the MNIST dataset demonstrate the advantages of QTN for quantum embedding over other quantum embedding approaches. 
Jun Qi · Huck Yang · PinYu Chen 🔗 


Multiway Spherical Clustering via DegreeCorrected Tensor Block Models
(
Poster
)
>
We consider the problem of multiway clustering in the presence of unknown degree heterogeneity. Such data problems arise commonly in applications such as recommendation system, neuroimaging, community detection, and hypergraph partitions in social networks. The allowance of degree heterogeneity provides great flexibility in clustering models, but the extra complexity poses significant challenges in both statistics and computation. Here, we develop a degreecorrected tensor block model with estimation accuracy guarantees. We present the phase transition of clustering performance based on the notion of angle separability, and we characterize three signaltonoise regimes corresponding to different statisticalcomputational behaviors. In particular, we demonstrate that an intrinsic statisticaltocomputational gap emerges only for tensors of order three or greater. Further, we develop an efficient polynomialtime algorithm that provably achieves exact clustering under mild signal conditions. The efficacy of our procedure is demonstrated through both simulations and analyses of Peru Legislation dataset. 
Jiaxin Hu · Miaoyan Wang 🔗 


GraphTensor Singular Value Decomposition for Data Recovery
(
Poster
)
>
In realworld scenarios, data are commonly generated with graph structures, e.g., sensory data in transportation networks, user profiles in social networks, and traffic trace data in Internet. Incomplete graph data limits further data analysis. Hence recovering missing graph data becomes crucial. In this work, we exploit graphtensor decomposition strategies for data recovery. Experimental results show that the recovery performance of graph data can be significantly improved by adopting graphtensor decomposition. 
Lei Deng · Haifeng Zheng · XiaoYang Liu 🔗 


DTAE: Deep Tensor Autoencoder for 3D Seismic Data Interpolation
(
Poster
)
>
The core challenge of seismic data interpolation is how to capture latent spatialtemporal relationships between unknown and known traces in 3D space. The prevailing tensorbased interpolation schemes seek a globally lowrank approximation to mine the highdimensional relationships hidden in 3D seismic data. However, when the lowrank assumption is violated for data involving complex geological structures, the existing interpolation schemes fail to precisely capture the trace relationships, which may influence the interpolation results. As an alternative, this article presents a basic deep tensor autoencoder (DTAE) and two variants to implicitly learn a datadriven, nonlinear, and highdimensional mapping to explore the complicated relationship among traces without the need for any underlying assumption. Then, tensor backpropagation (TBP), which can be essentially viewed as a tensor version of traditional backpropagation (BP), is introduced to solve for the new model parameters. For ease of implementation, a mathematical relationship between tensor and matrix autoencoders is constructed by taking advantage of the properties of a tensortensor product. Based on the derived relationship, the DTAE weight parameters are inferred by applying a matrix autoencoder to each frontal slice in the discrete cosine transform (DCT) domain, and this process is further summarized into a general theoretical and practical framework. Finally, the performance benefits of the proposed DTAEbased method are demonstrated in experiments with both synthetic and real field seismic data. 
Feng Qian 🔗 


High Performance Hierarchical Tucker Tensor Learning Using GPU Tensor Cores
(
Poster
)
>
We present efficient HT tensor learning primitives using GPU tensor cores and demonstrate three applications,namely thirdorder and highorder HT decompositions, a HT tensor layer for deep neural network and a HT tensortree structure quantum machine learning algorithm TTN. Experimental results demonstrated that our optimized algorithms have higher efficiency. 
hao huang · XiaoYang Liu · Weiqin Tong · Tao Zhang · Anwar Walid 🔗 


FullyConnected Tensor Network Decomposition
(
Poster
)
>
The popular tensor train (TT) and tensor ring (TR) decompositions have achieved promising results in science and engineering. However, TT and TR decompositions only establish an operation between adjacent two factors and are highly sensitive to the permutation of tensor modes, leading to an inadequate and inflexible representation. In this paper, we propose a generalized tensor decomposition, which decomposes an $N$thorder tensor into a set of $N$thorder factors and establishes an operation between any two factors. Since it can be graphically interpreted as a fullyconnected network, we named it fullyconnected tensor network (FCTN) decomposition. The superiorities of the FCTN decomposition lie in the outstanding capability for characterizing adequately the intrinsic correlations between any two modes of tensors and the essential invariance for transposition. Furthermore, we employ the FCTN decomposition to one representative task, i.e., tensor completion, and develop an efficient solving algorithm based on proximal alternating minimization. Theoretically, we prove the convergence of the developed algorithm, i.e., the sequence obtained by it globally converges to a critical point. Experimental results substantiate that the proposed method compares favorably to the stateoftheart methods based on other tensor decompositions.

YuBang Zheng · TingZhu Huang · XiLe Zhao · Qibin Zhao · TaiXiang Jiang 🔗 


Codee: A Tensor Embedding Scheme for Binary Code Search
(
Poster
)
>
Given a target binary function, the binary code search retrieves topK similar functions in the repository, and similar functions represent that they are compiled from the same source codes. Searching binary code is particularly challenging due to large variations of compiler toolchains and options and CPU architectures, as well as thousands of binary codes. Furthermore, there are some pivotal issues in current binary code search schemes, including inaccurate textbased or tokenbased analysis, slow graph matching, or complex deep learning processes. In this paper, we present an unsupervised tensor embedding scheme, Codee, to carry out code search efficiently and accurately at the binary function level. First, we use an NLPbased neural network to generate the semanticaware token embedding. Second, we propose an efficient basic block embedding generation algorithm based on the network representation learning model. We learn both the semantic information of instructions and the control flow structural information to generate the basic block embedding. Then we use all basic block embeddings in a function to obtain a variablelength function feature vector. Third, we build a tensor to generate function embeddings based on the tensor singular value decomposition, which compresses the variablelength vectors into short fixedlength vectors to facilitate efficient search afterward. We further propose a dynamic tensor compression algorithm to incrementally update the function embedding database. Finally, we use the local sensitive hash method to find the topK similar matching functions in the repository. Compared with stateoftheart crossplatform and crossoptimizationlevel code search schemes,our scheme achieves higher average search accuracy, shorter feature vectors, and faster feature generation performance using four datasets, OpenSSL, Coreutils, libgmp and libcurl. 
Jia Yang · Cai Fu · XiaoYang Liu 🔗 


Deep variational reinforcement learning by optimizing Hamiltonian equation
(
Poster
)
>
Deep variational reinforcement learning by optimizing Hamiltonian equation is a novel training method in reinforcement learning. Liu \cite{liu2020vrl} proposed to maximize the Hamiltonian equation to obtain the policy network. In this poster, we apply the massively parallel simulation to sample trajectories (collecting information of the reward tensor) and train the deep policy network by maximizing a partial Hamiltonian equation. On the FrozenLake $8\times8$ and GridWorld $10\times10$ examples, we verify the theory in \cite{liu2020vrl} by showing that deep Hamiltonian network (DHN) for variational reinforcement learning is more stable and efficient than DQN \cite{mnih2013playing}. Our codes are available at:\href{https://github.com/AI4FinanceFoundation/QuantumTensorNetworksforVariationalReinforcementLearningNeurIPS2020}.

Zeliang Zhang · XiaoYang Liu 🔗 


Quantum Machine Learning for Earth Observation Images
(
Poster
)
>
In this work, we present our first study on Quantum Machine Learning (QML) applied to the Earth Observation (EO) domain, which has consistently leveraged stateoftheart advances in Machine Learning for imagery, including recent researches on QML based techniques. Based on Ref. [1], we suggest an alternative approach for Quantum Convolutional Neural Networks (QCNN) and test its performance as image classifiers with different singlequbit and twoqubit gates for the EuroSAT dataset. As the original image of size 64x64 is too large for the current quantum hardware, we reduce its dimension through classical feature extractions methods, PCA, and convolutional autoencoder. We start with a binary classification of real EuroSAT features and fake data which are sampled from a uniform distribution. Then, we perform a set of binary classification between different couples of EuroSAT classes to investigate its capability for more detailed classification. Finally, we use the QCNN architecture as a 4class classifier for MNIST and EuroSAT datasets and compare their results. Although improvements are still required in some areas, especially for the EuroSAT dataset, our preliminary work shows the potentialities of using QCNN as a quantum classifier in the context of computer vision and in particular Earth observation. Our ultimate goal is to extend this QCNN to generative models, including Generative Adversarial Networks (GAN), by replacing the generative part, and possibly the discriminative part, with quantum circuits. [1] T. Hur, L. Kim, and D. K. Park. Quantum convolutional neural network for classical data classification, 2021. 
Su Yeon Chang · Bertrand Le Saux · SOFIA VALLECORSA 🔗 


Spectral Tensor Layer for ModelParallel Deep Neural Networks
(
Poster
)
>
We propose a novel spectral tensor layer for modelparallel deep neural networks, which can be trainedin the spectrum domain. Spectral tensor neural network is trained by training a number of small,weak subnetworks which can be trained in asynchronous parallel. The output of the spectral tensorlayer neural network is the ensemble of the subnetworks’ outputs. Compared with conventional neuralnetworks, spectral tensor neural network has intrinsical data parallelism as well as model parallelism,which is very suitable for distributed training, and hascompetitive ensemble results. 
Zhiyuan Wang · XiaoYang Liu 🔗 