`

Timezone: »

 
Workshop
Second Workshop on Quantum Tensor Networks in Machine Learning
Xiao-Yang Liu · Qibin Zhao · Ivan Oseledets · Yufei Ding · Guillaume Rabusseau · Khadijeh Najafi · Andrzej Cichocki · Masashi Sugiyama · Anwar Walid

Tue Dec 14 06:00 AM -- 02:35 PM (PST) @ None
Event URL: https://tensorworkshop.github.io/NeurIPS2021/index.html »

Quantum tensor networks in machine learning (QTNML) are envisioned to have great potential to advance AI technologies. Quantum machine learning [1][2] promises quantum advantages (potentially exponential speedups in training [3], quadratic improvements in learning efficiency [4]) over classical machine learning, while tensor networks provide powerful simulations of quantum machine learning algorithms on classical computers. As a rapidly growing interdisciplinary area, QTNML may serve as an amplifier for computational intelligence, a transformer for machine learning innovations, and a propeller for AI industrialization.

Tensor networks [5], a contracted network of factor core tensors, have arisen independently in several areas of science and engineering. Such networks appear in the description of physical processes and an accompanying collection of numerical techniques have elevated the use of quantum tensor networks into a variational model of machine learning. These techniques have recently proven ripe to apply to many traditional problems faced in deep learning [6,7,8]. More potential QTNML technologies are rapidly emerging, such as approximating probability functions, and probabilistic graphical models [9,10,11,12]. Quantum algorithms are typically described by quantum circuits (quantum computational networks) that are indeed a class of tensor networks, creating an evident interplay between classical tensor network contraction algorithms and executing tensor contractions on quantum processors. The modern field of quantum enhanced machine learning has started to utilize several tools from tensor network theory to create new quantum models of machine learning and to better understand existing ones. However, the topic of QTNML is relatively young and many open problems are still to be explored.

Tue 6:00 a.m. - 6:05 a.m.

Greeting.

Xiao-Yang Liu
Tue 6:05 a.m. - 6:35 a.m.
Anima Anandkumar (Talk)
Anima Anandkumar
Tue 6:35 a.m. - 6:45 a.m.
Anima Anandkumar (Q&A)
Tue 6:45 a.m. - 7:15 a.m.

In this talk, we study high performance computation for tensor networks to address time and space complexities that grow rapidly with the tensor size. We propose efficient primitives that exploit parallelism in tensor learning for efficient implementation on GPU.

Anwar Walid
Tue 7:15 a.m. - 7:25 a.m.
Anwar Walid (Q&A)
Anwar Walid
Tue 7:25 a.m. - 7:55 a.m.

The current availability of powerful computers and huge data sets is creating new opportunities in computational mathematics to bring together concepts and tools from tensor algebra, graph theory, machine learning and signal processing. In discrete mathematics, a tensor is merely a collection of points (nodes in a graph) which are arranged as a multiway arrray. The power of such tensors lies in the fact that they can represent entities as diverse as the users of social networks or financial market data, and that these can be transformed into low-dimensional signals which can be analyzed using data analytics tools. In this talk, we aim to provide a comprehensive introduction to advanced data analytics on graphs using tensor. We will then establish a relationship between tensors and graphs, in order to move beyond the standard regular sampling in time and space. This facilitates modelling in many important areas, including communication networks, computer science, linguistics, social sciences, biology, physics, chemistry, transport, town planning, financial systems, personal health and many others. The tensor and graph topologies will be revisited from a modern data analytics point of view, and we will then proceed to establish a taxonomy of graph tensor networks. With this as a basis, we show such a framework allows for even the most challenging machine learning tasks, such as clustering, being performed in an intuitive and physically meaningful way. Unique aspects of the multi-graph tensor networks (MGTN) framework will be outlined, such as their benefits for processing data acquired on irregular domains, their ability to finely-tune statistical learning procedures through local information processing, the concepts of random signals on graphs and tensors, learning of graph topology from data observed on graphs, and confluence with deep neural networksnd Big Data. Extensive examples are included to render the concepts more concrete and to facilitate a greater understanding of the underlying principles.

Danilo Mandic
Tue 7:55 a.m. - 8:05 a.m.
Danilo P. Mandic (Q&A)
Danilo Mandic
Tue 8:05 a.m. - 8:35 a.m.
Nadav Cohen (Talk)
Nadav Cohen
Tue 8:35 a.m. - 8:45 a.m.
Nadav Cohen (Q&A)
Nadav Cohen
Tue 8:45 a.m. - 9:15 a.m.
Stefanos Kourtis (Talk)   
Stefanos Kourtis
Tue 9:15 a.m. - 9:25 a.m.
Stefanos Kourtis (Q&A)
Tue 9:30 a.m. - 11:30 a.m.
Coffee Break + Poster Session (GatherTown) (poster session)
Tue 11:30 a.m. - 11:35 a.m.

A challenge in multi-agent reinforcement learning is to be able to generalize over intractable state-action spaces. This work achieves generalisation in state-action space over unexplored state-action pairs by modelling the transition and reward functions as tensors of low CP-rank. Initial experiments show that using tensor decompositions in a model-based reinforcement learning algorithm can lead to much faster convergence if the true transition and reward functions are indeed of low rank.

Pascal van der Vaart · Anuj Mahajan · Shimon Whiteson
Tue 11:35 a.m. - 11:40 a.m.
  

Tensor networks have demonstrated significant value for machine learning in a myriad of different applications. However, optimizing tensor networks using standard gradient descent has proven to be difficult in practice. Tensor networks suffer from initialization problems resulting in exploding or vanishing gradients and require extensive hyperparameter tuning. Efforts to overcome these problems usually depend on specific network architectures, or ad hoc prescriptions. In this paper we address the problems of initialization and hyperparameter tuning, making it possible to train tensor networks using established machine learning techniques. We introduce a `copy node' method that successfully initializes arbitrary tensor networks, in addition to a gradient based regularization technique for bond dimensions. We present numerical results that show that the combination of techniques presented here produces quantum-inspired tensor network models with far fewer parameters, while improving generalization performance.

James Dborin
Tue 11:40 a.m. - 11:45 a.m.

Temporal hidden Markov models (HMMs) and tensor trains are two powerful mathematical tools for learning and representing high-dimensional large-scale data. However, both HMMs and tensor trains have temporal topology but are not ergodic. Ergodicity occurs in a broad range of systems in physics and in geometry, and many instances of real-world data originate from an ergodic system that exhibits temporal behavior. To overcome the fundamental limitations of temporal HMMs and tensor trains, i.e., ergodicity deficiency, we propose a new tensor topology inspired by tensor rings and circular HMMs. Our new tensor ring models-- namely Circular Matrix Product State (CMPS), Circular Born Machine (CBM), and Circular Locally Purified State (CLPS)-- are both temporal and ergodic. Then, we show the relationship between our tensor ring models and circular hidden Markov models/hidden quantum Markov models. Our findings through numerical experiments indicate that our tensor ring models have significant advantages over tensor trains in varied situations including cases where ergodicity is not required.

Mohammad Ali Javidian · Vaneet Aggarwal · Zubin Jacob
Tue 11:45 a.m. - 11:50 a.m.
  

Although first developed for the needs of quantum many-body physics and quantum computing, tensor networks (TNs) are increasingly being deployed to solve a wide range of problems in machine learning, optimization, and applied mathematics. Inspired by the distinct implementation challenges of TN methods in these new settings, we present ContracTN, a lightweight Python library for general-purpose TN calculations. Beyond the use of the dense tensor cores supported in standard TN libraries, ContracTN also supports the use of copy tensors, parameter-free objects which allow diverse concepts like batch computation, elementwise multiplication, and summation to be expressed entirely in the language of TN diagrams. The contraction engine of ContracTN also implements a novel form of stabilization, which largely mitigates the issue of numerical overflow arising from the use of low-precision machine learning libraries for TN contraction. Overall, we wish to popularize a collection of methods which have proven invaluable in implementing efficient and robust TN models, in the hope that this can help catalyze the wider adoption of TN methods for problems in machine learning.

Jacob Miller · Guillaume Rabusseau
Tue 11:50 a.m. - 11:55 a.m.
  

Quantum Machine Learning (QML) is an emerging research area advocating the use of quantum computing for advancement in machine learning. Since the discovery of the capability of Parametrized Variational Quantum Circuits (VQC) to replace Artificial Neural Networks, they have been widely adopted to different tasks in Quantum Machine Learning. However, despite their potential to outperform neural networks, VQCs are limited to small scale applications given the challenges in scalability of quantum circuits. To address this shortcoming, we propose an algorithm that compresses the quantum state within the circuit using a tensor ring representation. Using the input qubit state in the tensor ring representation, single qubit gates maintain the tensor ring representation. However, the same is not true for two qubit gates in general, where an approximation is used to have the output as a tensor ring representation. Using this approximation, the storage and computational time increases linearly in the number of qubits and number of layers, as compared to the exponential increase with exact simulation algorithms. This approximation is used to implement the tensor ring VQC. The training of the parameters of tensor ring VQC is performed using a gradient descent based algorithm, where efficient approaches for backpropagation are used. The proposed approach is evaluated on two datasets: Iris and MNIST for the classification task to show the improved accuracy using more number of qubits. We achieve a test accuracy of 83.33\% on Iris dataset and 79.57\% and 74.08\% on binary and ternary classification of MNIST dataset using a 4 qubit circuit. A test accuracy of 90.01\% with binary classification and 76.31\% with ternary classification is obtained on MNIST data using an 8 qubit circuit. These results outperform the results on VQCs implemented on Qiskit, and being scalable, demonstrates the potential for VQCs to be used for large scale Quantum Machine Learning applications.

Dheeraj Peddireddy · Vipul Bansal · Zubin Jacob · Vaneet Aggarwal
Tue 11:55 a.m. - 12:00 p.m.
  
We consider the problem of structured tensor denoising in the presence of unknown permutations. Such data problems arise commonly in recommendation system, neuroimaging, community detection, and multiway comparison applications. Here, we develop a general family of smooth tensors up to arbitrarily index permutations; the model incorporates the popular block models and graphon models. We show that a constrained least-squares estimate in the block-wise polynomial family achieves the minimax error bound. A phase transition phenomenon is revealed with respect to the smoothness threshold needed for optimal recovery. In particular, we find that a polynomial of degree of $m(m-1)/2$ is sufficient for accurate recovery of order-$m$ tensors, whereas higher smoothness exhibits no further benefits. Furthermore, we provide an efficient polynomial-time Borda count algorithm that provably achieves optimal rate under monotonicity assumptions. The efficacy of our procedure is demonstrated through both simulations and Chicago crime date analysis.
Chanwoo Lee · Miaoyan Wang
Tue 12:00 p.m. - 12:05 p.m.

Tensor network (TN) methods have proven their considerable potential in determin-istic regression and classification related paradigms, but remain underexplored in probabilistic settings. To this end, we introduce a variational inference TN frame-work for supervised learning, referred to as the Bayesian Tensor Network (BTN).This is achieved by making use of the multi-linear nature of tensor networks to con-struct a structured variational model which scales linearly with data dimensionality.The so imposed low rank structure on the tensor mean and Kronecker separability on the local covariances, make it possible to efficiently induce weight dependencies in the posterior distribution, thus enhancing model expressiveness at a drastically lower parameter complexity compared to the standard mean-field approach. A comprehensive validation of the proposed approach indicates the competitiveness of BTNs against modern structured Bayesian neural network approaches, while exhibiting enhanced interpretability and efficiency

Kriton Konstantinidis · Yao Lei Xu · Qibin Zhao · Danilo Mandic
Tue 12:05 p.m. - 12:10 p.m.
  

The attention mechanism is at the core of state-of-the-art Natural Language Processing (NLP) models, owing to its ability to focus on the most contextually relevant part of a sequence. However, current attention models rely on "flat-view" matrix methods to process sequence of tokens embedded in vector spaces, resulting in exceedingly high parameter complexity for practical applications. To this end, we introduce a novel Tensorized Spectral Attention (TSA) mechanism, which leverages on the Graph Tensor Network (GTN) framework to efficiently process tensorized token embeddings via attention based spectral graph filters. By virtue of multi-linear algebra, such tensorized token embeddings are shown to effectively bypass the Curse of Dimensionality, reducing the parameter complexity of the attention mechanism from exponential to linear in the weight matrix dimensions. Furthermore, the graph formulation of the attention domain enables the processing of tensorized embeddings through spectral graph convolution filters, which further increases its expressive power. The benefits of the TSA are demonstrated through five benchmark NLP experiments, where the proposed mechanism is shown to achieve better or comparable results against traditional attention models, while incurring drastically lower parameter complexity.

Yao Lei Xu · Kriton Konstantinidis · Shengxi Li · Danilo Mandic
Tue 12:10 p.m. - 12:15 p.m.
  

Random projection~(RP) have recently emerged as popular techniques in the machine learning community for their ability in reducing the dimension of very high-dimensional tensors. Following the work in \cite{rakhshan2020tensorized}, we consider a tensorized random projection relying on Tensor Train (TT) decomposition where each element of the core tensors is drawn from a Rademacher distribution. Our theoretical results reveal that the Gaussian low-rank tensor represented in compressed form in TT format in \cite{rakhshan2020tensorized} can be replaced by a TT tensor with core elements drawn from a Rademacher distribution with the same embedding size. Experiments on synthetic data demonstrate that tensorized Rademacher RP can outperform the tensorized Gaussian RP studied in \cite{rakhshan2020tensorized}. In addition, we show both theoretically and experimentally, that the tensorized RP in the Matrix Product Operator~(MPO) format proposed in \cite{batselier2018computing} for performing SVD on large matrices is not a Johnson-Lindenstrauss transform~(JLT) and therefore not a well-suited random projection map.

Beheshteh Rakhshan · Guillaume Rabusseau
Tue 12:15 p.m. - 12:20 p.m.

We present an extended abstract for the previously published work Tesseract, Mahajan et al. 2021, which proposes a novel solution for Reinforcement Learning (RL) in large, factored action spaces using tensor decompositions. The goal of this abstract is twofold: (1) To garner greater interest amongst the tensor research community for creating methods and analysis for approximate RL, (2) To elucidate the generalised setting of factored action spaces where tensor decompositions can be used. We use cooperative multi-agent reinforcement learning scenario as the exemplary setting where the action space is naturally factored across agents and learning becomes intractable without resorting to approximation on the underlying hypothesis space for candidate solutions.

Anuj Mahajan · Mikayel Samvelyan · Lei Mao · Viktor Makoviichuk · Animesh Garg · Jean Kossaifi · Shimon Whiteson · Yuke Zhu · Anima Anandkumar
Tue 12:20 p.m. - 12:25 p.m.
  

The problem of characterizing quantum channels arises in a number of contexts such as quantum process tomography and quantum error correction. However, direct approaches to parameterizing and optimizing the Choi matrix representation of quantum channels face a curse of dimensionality: the number of parameters scales exponentially in the number of qubits. Recently, Torlai et al. [2020] proposed using locally purified density operators (LPDOs), a tensor network representation of Choi matrices, to overcome the unfavourable scaling in parameters. While the LPDO structure allows it to satisfy a complete positivity' (CP) constraint required of physically valid quantum channels, it makes no guarantees about a similarly requiredtrace preservation' (TP) constraint. In practice, the TP constraint is violated, and the learned quantum channel may even be trace-increasing, which is non-physical. In this work, we present the problem of optimizing over TP LPDOs, discuss two approaches to characterizing the TP constraints on LPDOs, and outline the next steps for developing an optimization scheme.

Siddarth Srinivasan · · Jacob Miller · Guillaume Rabusseau · Byron Boots
Tue 12:25 p.m. - 12:30 p.m.
  

Tensor networks have recently found applications in machine learning for both supervised learning and unsupervised learning. The most common approaches for training these models are gradient descent methods. In this work, we consider an alternative training scheme utilizing basic tensor network operations, summation and compression. The training algorithm is based on compressing the superposition state constructed from all the training data in product state representation. The algorithm could be parallelized easily and only iterate through the dataset once. Hence, it serves as a pre-training algorithm. We demonstrate the algorithm on the MNIST dataset and show reasonable results for generating new images and classification tasks. Furthermore, we provide an interpretation of the algorithm as a compressed quantum kernel density estimation for the probability amplitude of input data.

Sheng-Hsuan Lin
Tue 12:30 p.m. - 2:30 p.m.
Discussion Pannel
Tue 2:30 p.m. - 2:35 p.m.
Closing Remarks (Closing)
-
[ Visit Poster at Spot C6 in Virtual World ]

Tensor network (TN) methods have proven their considerable potential in determin-istic regression and classification related paradigms, but remain underexplored in probabilistic settings. To this end, we introduce a variational inference TN frame-work for supervised learning, referred to as the Bayesian Tensor Network (BTN).This is achieved by making use of the multi-linear nature of tensor networks to con-struct a structured variational model which scales linearly with data dimensionality.The so imposed low rank structure on the tensor mean and Kronecker separability on the local covariances, make it possible to efficiently induce weight dependencies in the posterior distribution, thus enhancing model expressiveness at a drastically lower parameter complexity compared to the standard mean-field approach. A comprehensive validation of the proposed approach indicates the competitiveness of BTNs against modern structured Bayesian neural network approaches, while exhibiting enhanced interpretability and efficiency

Kriton Konstantinidis · Yao Lei Xu · Qibin Zhao · Danilo Mandic
-
[ Visit Poster at Spot C5 in Virtual World ]

The attention mechanism is at the core of state-of-the-art Natural Language Processing (NLP) models, owing to its ability to focus on the most contextually relevant part of a sequence. However, current attention models rely on "flat-view" matrix methods to process sequence of tokens embedded in vector spaces, resulting in exceedingly high parameter complexity for practical applications. To this end, we introduce a novel Tensorized Spectral Attention (TSA) mechanism, which leverages on the Graph Tensor Network (GTN) framework to efficiently process tensorized token embeddings via attention based spectral graph filters. By virtue of multi-linear algebra, such tensorized token embeddings are shown to effectively bypass the Curse of Dimensionality, reducing the parameter complexity of the attention mechanism from exponential to linear in the weight matrix dimensions. Furthermore, the graph formulation of the attention domain enables the processing of tensorized embeddings through spectral graph convolution filters, which further increases its expressive power. The benefits of the TSA are demonstrated through five benchmark NLP experiments, where the proposed mechanism is shown to achieve better or comparable results against traditional attention models, while incurring drastically lower parameter complexity.

Yao Lei Xu · Kriton Konstantinidis · Shengxi Li · Danilo Mandic
-
[ Visit Poster at Spot C4 in Virtual World ]

A challenge in multi-agent reinforcement learning is to be able to generalize over intractable state-action spaces. This work achieves generalisation in state-action space over unexplored state-action pairs by modelling the transition and reward functions as tensors of low CP-rank. Initial experiments show that using tensor decompositions in a model-based reinforcement learning algorithm can lead to much faster convergence if the true transition and reward functions are indeed of low rank.

Pascal van der Vaart · Anuj Mahajan · Shimon Whiteson
-
[ Visit Poster at Spot C3 in Virtual World ]

Tensor networks have demonstrated significant value for machine learning in a myriad of different applications. However, optimizing tensor networks using standard gradient descent has proven to be difficult in practice. Tensor networks suffer from initialization problems resulting in exploding or vanishing gradients and require extensive hyperparameter tuning. Efforts to overcome these problems usually depend on specific network architectures, or ad hoc prescriptions. In this paper we address the problems of initialization and hyperparameter tuning, making it possible to train tensor networks using established machine learning techniques. We introduce a `copy node' method that successfully initializes arbitrary tensor networks, in addition to a gradient based regularization technique for bond dimensions. We present numerical results that show that the combination of techniques presented here produces quantum-inspired tensor network models with far fewer parameters, while improving generalization performance.

James Dborin
-
[ Visit Poster at Spot C2 in Virtual World ]

Random projection~(RP) have recently emerged as popular techniques in the machine learning community for their ability in reducing the dimension of very high-dimensional tensors. Following the work in \cite{rakhshan2020tensorized}, we consider a tensorized random projection relying on Tensor Train (TT) decomposition where each element of the core tensors is drawn from a Rademacher distribution. Our theoretical results reveal that the Gaussian low-rank tensor represented in compressed form in TT format in \cite{rakhshan2020tensorized} can be replaced by a TT tensor with core elements drawn from a Rademacher distribution with the same embedding size. Experiments on synthetic data demonstrate that tensorized Rademacher RP can outperform the tensorized Gaussian RP studied in \cite{rakhshan2020tensorized}. In addition, we show both theoretically and experimentally, that the tensorized RP in the Matrix Product Operator~(MPO) format proposed in \cite{batselier2018computing} for performing SVD on large matrices is not a Johnson-Lindenstrauss transform~(JLT) and therefore not a well-suited random projection map.

Beheshteh Rakhshan · Guillaume Rabusseau
-
[ Visit Poster at Spot C1 in Virtual World ]

We present an extended abstract for the previously published work Tesseract, Mahajan et al. 2021, which proposes a novel solution for Reinforcement Learning (RL) in large, factored action spaces using tensor decompositions. The goal of this abstract is twofold: (1) To garner greater interest amongst the tensor research community for creating methods and analysis for approximate RL, (2) To elucidate the generalised setting of factored action spaces where tensor decompositions can be used. We use cooperative multi-agent reinforcement learning scenario as the exemplary setting where the action space is naturally factored across agents and learning becomes intractable without resorting to approximation on the underlying hypothesis space for candidate solutions.

Anuj Mahajan · Mikayel Samvelyan · Lei Mao · Viktor Makoviichuk · Animesh Garg · Jean Kossaifi · Shimon Whiteson · Yuke Zhu · Anima Anandkumar
-
[ Visit Poster at Spot C0 in Virtual World ]

Temporal hidden Markov models (HMMs) and tensor trains are two powerful mathematical tools for learning and representing high-dimensional large-scale data. However, both HMMs and tensor trains have temporal topology but are not ergodic. Ergodicity occurs in a broad range of systems in physics and in geometry, and many instances of real-world data originate from an ergodic system that exhibits temporal behavior. To overcome the fundamental limitations of temporal HMMs and tensor trains, i.e., ergodicity deficiency, we propose a new tensor topology inspired by tensor rings and circular HMMs. Our new tensor ring models-- namely Circular Matrix Product State (CMPS), Circular Born Machine (CBM), and Circular Locally Purified State (CLPS)-- are both temporal and ergodic. Then, we show the relationship between our tensor ring models and circular hidden Markov models/hidden quantum Markov models. Our findings through numerical experiments indicate that our tensor ring models have significant advantages over tensor trains in varied situations including cases where ergodicity is not required.

Mohammad Ali Javidian · Vaneet Aggarwal · Zubin Jacob
-
[ Visit Poster at Spot B6 in Virtual World ]

The problem of characterizing quantum channels arises in a number of contexts such as quantum process tomography and quantum error correction. However, direct approaches to parameterizing and optimizing the Choi matrix representation of quantum channels face a curse of dimensionality: the number of parameters scales exponentially in the number of qubits. Recently, Torlai et al. [2020] proposed using locally purified density operators (LPDOs), a tensor network representation of Choi matrices, to overcome the unfavourable scaling in parameters. While the LPDO structure allows it to satisfy a complete positivity' (CP) constraint required of physically valid quantum channels, it makes no guarantees about a similarly requiredtrace preservation' (TP) constraint. In practice, the TP constraint is violated, and the learned quantum channel may even be trace-increasing, which is non-physical. In this work, we present the problem of optimizing over TP LPDOs, discuss two approaches to characterizing the TP constraints on LPDOs, and outline the next steps for developing an optimization scheme.

Siddarth Srinivasan · · Jacob Miller · Guillaume Rabusseau · Byron Boots
-
[ Visit Poster at Spot D0 in Virtual World ]

Tensor networks have recently found applications in machine learning for both supervised learning and unsupervised learning. The most common approaches for training these models are gradient descent methods. In this work, we consider an alternative training scheme utilizing basic tensor network operations, summation and compression. The training algorithm is based on compressing the superposition state constructed from all the training data in product state representation. The algorithm could be parallelized easily and only iterate through the dataset once. Hence, it serves as a pre-training algorithm. We demonstrate the algorithm on the MNIST dataset and show reasonable results for generating new images and classification tasks. Furthermore, we provide an interpretation of the algorithm as a compressed quantum kernel density estimation for the probability amplitude of input data.

Sheng-Hsuan Lin
-
[ Visit Poster at Spot B5 in Virtual World ]

Although first developed for the needs of quantum many-body physics and quantum computing, tensor networks (TNs) are increasingly being deployed to solve a wide range of problems in machine learning, optimization, and applied mathematics. Inspired by the distinct implementation challenges of TN methods in these new settings, we present ContracTN, a lightweight Python library for general-purpose TN calculations. Beyond the use of the dense tensor cores supported in standard TN libraries, ContracTN also supports the use of copy tensors, parameter-free objects which allow diverse concepts like batch computation, elementwise multiplication, and summation to be expressed entirely in the language of TN diagrams. The contraction engine of ContracTN also implements a novel form of stabilization, which largely mitigates the issue of numerical overflow arising from the use of low-precision machine learning libraries for TN contraction. Overall, we wish to popularize a collection of methods which have proven invaluable in implementing efficient and robust TN models, in the hope that this can help catalyze the wider adoption of TN methods for problems in machine learning.

Jacob Miller · Guillaume Rabusseau
-
[ Visit Poster at Spot B4 in Virtual World ]

Quantum Machine Learning (QML) is an emerging research area advocating the use of quantum computing for advancement in machine learning. Since the discovery of the capability of Parametrized Variational Quantum Circuits (VQC) to replace Artificial Neural Networks, they have been widely adopted to different tasks in Quantum Machine Learning. However, despite their potential to outperform neural networks, VQCs are limited to small scale applications given the challenges in scalability of quantum circuits. To address this shortcoming, we propose an algorithm that compresses the quantum state within the circuit using a tensor ring representation. Using the input qubit state in the tensor ring representation, single qubit gates maintain the tensor ring representation. However, the same is not true for two qubit gates in general, where an approximation is used to have the output as a tensor ring representation. Using this approximation, the storage and computational time increases linearly in the number of qubits and number of layers, as compared to the exponential increase with exact simulation algorithms. This approximation is used to implement the tensor ring VQC. The training of the parameters of tensor ring VQC is performed using a gradient descent based algorithm, where efficient approaches for backpropagation are used. The proposed approach is evaluated on two datasets: Iris and MNIST for the classification task to show the improved accuracy using more number of qubits. We achieve a test accuracy of 83.33\% on Iris dataset and 79.57\% and 74.08\% on binary and ternary classification of MNIST dataset using a 4 qubit circuit. A test accuracy of 90.01\% with binary classification and 76.31\% with ternary classification is obtained on MNIST data using an 8 qubit circuit. These results outperform the results on VQCs implemented on Qiskit, and being scalable, demonstrates the potential for VQCs to be used for large scale Quantum Machine Learning applications.

Dheeraj Peddireddy · Vipul Bansal · Zubin Jacob · Vaneet Aggarwal
-
[ Visit Poster at Spot B3 in Virtual World ]
We consider the problem of structured tensor denoising in the presence of unknown permutations. Such data problems arise commonly in recommendation system, neuroimaging, community detection, and multiway comparison applications. Here, we develop a general family of smooth tensors up to arbitrarily index permutations; the model incorporates the popular block models and graphon models. We show that a constrained least-squares estimate in the block-wise polynomial family achieves the minimax error bound. A phase transition phenomenon is revealed with respect to the smoothness threshold needed for optimal recovery. In particular, we find that a polynomial of degree of $m(m-1)/2$ is sufficient for accurate recovery of order-$m$ tensors, whereas higher smoothness exhibits no further benefits. Furthermore, we provide an efficient polynomial-time Borda count algorithm that provably achieves optimal rate under monotonicity assumptions. The efficacy of our procedure is demonstrated through both simulations and Chicago crime date analysis.
Chanwoo Lee · Miaoyan Wang
-
[ Visit Poster at Spot B2 in Virtual World ]

How to represent a low-rank structure embedded in the underlying data is the key issue in tensor completion. In this work, we suggest a novel low-rank tensor representation based on coupled transform, which fully exploits the spatial multi-scale nature and redundancy in spatial and spectral/temporal dimensions, leading to a better low tensor multi-rank approximation. More precisely, this representation is achieved by using two-dimensional framelet transform for the two spatial dimensions, one/two-dimensional Fourier transform for the temporal/spectral dimension, and then Karhunen–Loeve transform (via singular value decomposition) for the transformed tensor. Based on this low-rank tensor representation, we formulate a novel low-rank tensor completion model for recovering missing information in multi-dimensional visual data. To tackle the proposed model, we develop the alternating directional method of multipliers (ADMM) algorithm tailored for the structured optimization problem.

Jian-Li Wang · Ting-Zhu Huang · Xi-Le Zhao · Tai-Xiang Jiang · Michael Ng
-
[ Visit Poster at Spot D1 in Virtual World ]

Improving the interpretability and efficiency of machine learning methods are challenging and important issues. The concept of entanglement, which is a quantity from quantum information science, may contribute to these issues. In this extended abstract, we introduce two tensor-network (TN) machine learning methods, which concerns feature extraction and compressed sensing, respectively, based on entanglement. For the former [Front. Appl. Math. Stat. 7, 716044 (2021)], the entanglement obtained from matrix product state (MPS) is used as a measure of the importance of the features in the real-life datasets. An entanglement-based feature extraction algorithm is proposed, and used to improve the efficiency of TN machine learning. In the latter [Phys. Rev. Research. 2, 033293 (2020)], TN states are used to realize efficient compressed sampling by entanglement-guided projective measurements. This scheme can be applied in the future to compress and communicate the real-life data by entangled qubits.

Wen-Jun Li · Zheng-Zhi Sun · Shi-Ju Ran · Gang Su
-
[ Visit Poster at Spot D2 in Virtual World ]

Latent factor models are canonical tools to learn low-dimensional and linear embedding of original data. Traditional latent factor models are based on low-rank matrix factorization of covariance matrices. However, for higher-order data with multiple modes, i.e., tensors, this simple treatment fails to take into account the mode-specific relations. This ignorance leads to inefficiency in analysis of complex structures as well as poor data compression ability. In this paper, unlike covariance matrices, we investigate high-order covariance tensor directly by exploiting tensor ring (TR) format and propose the Bayesian TR latent factor model, which can represent complex multi-linear correlations and achieves efficient data compression. To overcome the difficulty of finding the optimal TR-ranks and simultaneously imposing sparsity on loading coefficients, a multiplicative Gamma process (MGP) prior is adopted to automatically infer the ranks and obtain sparsity. Then, we establish efficient parameter-expanded EM algorithm to learn the maximum a posteriori (MAP) estimate of model parameters.

Zerui Tao · Xuyang ZHAO · Toshihisa Tanaka · Qibin Zhao
-
[ Visit Poster at Spot B1 in Virtual World ]

Structure learning for tensor network (TN) representation is to select the optimal network for TN contraction to fit a tensor. In the existing literature and view of many tensor researchers, this task is widely considered as the same to learning tensor network (model) ranks. In the manuscript, we briefly analyze the relation of these two critical tasks in a rigorous fashion, stating that rank minimization is actually a subtopic of the structure learning for TN, where the graph essence of TN structures is ignored in rank minimization. On one hand, we agree that the two tasks are identical to each other if the TN structure is not constrained on graph. We propose, on the other hand, an open problem called permutation learning in structure learning, i.e., learning the optimal matching between tensor modes and vertices in TN, to point out rank minimization would be failure in structure learning in this case, due to its limitation of exploring graph spaces. We last focus on permutation learning and give several preliminary results to help understand this open problem.

Chao Li · Qibin Zhao
-
[ Visit Poster at Spot D3 in Virtual World ]

Quantum phase transitions are ubiquitous in quantum many body systems. The quantum fluctuations that occur at very low temperatures are known to be responsible for driving the system across different phases as a function of an external control parameter. The XY Hamiltonian with a transverse field is a basic model that manifests two distinct quantum phase transitions, including spontaneous Z2 symmetry breaking from an ordered to a disordered state. While programmable quantum devices have shown great success in investigating the various exotic quantum phases of matter, in parallel, the quest for harnessing machine learning tools in learning quantum phases of matter is ongoing. In this paper, we present a numerical study of the power of a quantum-inspired generative model known as the Born machine in learning quantum phases of matter. Data obtained from the system under open and periodic boundary conditions is considered. Our results indicate that a Born machine based on matrix product states can successfully capture the quantum state across various phases of the XY Hamiltonian and close to a critical point, despite the existence of long-range correlations. We further impose boundary conditions on the Born machine and show that matching the boundary condition of the Born machine and that of the training data improves performance when limited data is available and a small bond dimension is employed.

Abigail McClain Gomez · Susanne Yelin · Khadijeh Najafi
-
[ Visit Poster at Spot B0 in Virtual World ]

We investigate a correspondence between two formalisms for discrete probabilistic modeling: probabilistic graphical models (PGMs) and tensor networks (TNs), a powerful modeling framework for simulating complex quantum systems. The graphical calculus of PGMs and TNs exhibits many similarities, with discrete undirected graphical models (UGMs) being a special case of TNs. However, more general probabilistic TN models such as Born machines (BMs) employ complex-valued hidden states to produce novel forms of correlation among the probabilities. While representing a new modeling resource for capturing structure in discrete probability distributions, this behavior also renders the direct application of standard PGM tools impossible. We aim to bridge this gap by introducing a hybrid PGM-TN formalism that integrates quantum-like correlations into PGM models in a principled manner, using the physically-motivated concept of decoherence. We first prove that applying decoherence to the entirety of a BM model converts it into a discrete UGM, and conversely, that any subgraph of a discrete UGM can be represented as a decohered BM. This method allows a broad family of probabilistic TN models to be encoded as partially decohered BMs, a fact we leverage to combine the representational strengths of both model families. We experimentally verify the performance of such hybrid models in a sequential modeling task, and identify promising uses of our method within the context of existing applications of graphical models.

Jacob Miller · Geoffrey Roeder
-
[ Visit Poster at Spot D4 in Virtual World ]

The advent of noisy intermediate-scale quantum (NISQ) computers raises a crucial challenge to design quantum neural networks for fully quantum learning tasks. To bridge the gap, this work proposes an end-to-end learning framework named QTN-VQC, by introducing a trainable quantum tensor network (QTN) for quantum embedding on a variational quantum circuit (VQC). The architecture of QTN is composed of a parametric tensor-train network for feature extraction and a tensor product encoding for quantum embedding. We highlight the QTN for quantum embedding in terms of two perspectives: (1) we theoretically characterize QTN by analyzing its representation power of input features; (2) QTN enables an end-to-end parametric model pipeline, namely QTN-VQC, from the generation of quantum embedding to the output measurement. Our experiments on the MNIST dataset demonstrate the advantages of QTN for quantum embedding over other quantum embedding approaches.

Jun Qi · Huck Yang · Pin-Yu Chen
-
[ Visit Poster at Spot A6 in Virtual World ]

We consider the problem of multiway clustering in the presence of unknown degree heterogeneity. Such data problems arise commonly in applications such as recommendation system, neuroimaging, community detection, and hypergraph partitions in social networks. The allowance of degree heterogeneity provides great flexibility in clustering models, but the extra complexity poses significant challenges in both statistics and computation. Here, we develop a degree-corrected tensor block model with estimation accuracy guarantees. We present the phase transition of clustering performance based on the notion of angle separability, and we characterize three signal-to-noise regimes corresponding to different statistical-computational behaviors. In particular, we demonstrate that an intrinsic statistical-to-computational gap emerges only for tensors of order three or greater. Further, we develop an efficient polynomial-time algorithm that provably achieves exact clustering under mild signal conditions. The efficacy of our procedure is demonstrated through both simulations and analyses of Peru Legislation dataset.

Jiaxin Hu · Miaoyan Wang
-
[ Visit Poster at Spot A5 in Virtual World ]

In real-world scenarios, data are commonly generated with graph structures, e.g., sensory data in transportation networks, user profiles in social networks, and traffic trace data in Internet. Incomplete graph data limits further data analysis. Hence recovering missing graph data becomes crucial. In this work, we exploit graph-tensor decomposition strategies for data recovery. Experimental results show that the recovery performance of graph data can be significantly improved by adopting graph-tensor decomposition.

Lei Deng · Haifeng Zheng · Xiao-Yang Liu
-
[ Visit Poster at Spot A4 in Virtual World ]

The core challenge of seismic data interpolation is how to capture latent spatial-temporal relationships between unknown and known traces in 3-D space. The prevailing tensorbased interpolation schemes seek a globally low-rank approximation to mine the high-dimensional relationships hidden in 3-D seismic data. However, when the low-rank assumption is violated for data involving complex geological structures, the existing interpolation schemes fail to precisely capture the trace relationships, which may influence the interpolation results. As an alternative, this article presents a basic deep tensor autoencoder (DTAE) and two variants to implicitly learn a datadriven, nonlinear, and high-dimensional mapping to explore the complicated relationship among traces without the need for any underlying assumption. Then, tensor backpropagation (TBP), which can be essentially viewed as a tensor version of traditional backpropagation (BP), is introduced to solve for the new model parameters. For ease of implementation, a mathematical relationship between tensor and matrix autoencoders is constructed by taking advantage of the properties of a tensor-tensor product. Based on the derived relationship, the DTAE weight parameters are inferred by applying a matrix autoencoder to each frontal slice in the discrete cosine transform (DCT) domain, and this process is further summarized into a general theoretical and practical framework. Finally, the performance benefits of the proposed DTAE-based method are demonstrated in experiments with both synthetic and real field seismic data.

Feng Qian
-
[ Visit Poster at Spot A3 in Virtual World ]

We present efficient HT tensor learning primitives using GPU tensor cores and demonstrate three applications,namely third-order and high-order HT decompositions, a HT tensor layer for deep neural network and a HT tensor-tree structure quantum machine learning algorithm TTN. Experimental results demonstrated that our optimized algorithms have higher efficiency.

hao huang · Xiao-Yang Liu · Weiqin Tong · Tao Zhang · Anwar Walid
-
[ Visit Poster at Spot A2 in Virtual World ]
The popular tensor train (TT) and tensor ring (TR) decompositions have achieved promising results in science and engineering. However, TT and TR decompositions only establish an operation between adjacent two factors and are highly sensitive to the permutation of tensor modes, leading to an inadequate and inflexible representation. In this paper, we propose a generalized tensor decomposition, which decomposes an $N$th-order tensor into a set of $N$th-order factors and establishes an operation between any two factors. Since it can be graphically interpreted as a fully-connected network, we named it fully-connected tensor network (FCTN) decomposition. The superiorities of the FCTN decomposition lie in the outstanding capability for characterizing adequately the intrinsic correlations between any two modes of tensors and the essential invariance for transposition. Furthermore, we employ the FCTN decomposition to one representative task, i.e., tensor completion, and develop an efficient solving algorithm based on proximal alternating minimization. Theoretically, we prove the convergence of the developed algorithm, i.e., the sequence obtained by it globally converges to a critical point. Experimental results substantiate that the proposed method compares favorably to the state-of-the-art methods based on other tensor decompositions.
Yu-Bang Zheng · Ting-Zhu Huang · Xi-Le Zhao · Qibin Zhao · Tai-Xiang Jiang
-
[ Visit Poster at Spot A1 in Virtual World ]

Given a target binary function, the binary code search retrieves top-K similar functions in the repository, and similar functions represent that they are compiled from the same source codes. Searching binary code is particularly challenging due to large variations of compiler tool-chains and options and CPU architectures, as well as thousands of binary codes. Furthermore, there are some pivotal issues in current binary code search schemes, including inaccurate text-based or token-based analysis, slow graph matching, or complex deep learning processes. In this paper, we present an unsupervised tensor embedding scheme, Codee, to carry out code search efficiently and accurately at the binary function level. First, we use an NLP-based neural network to generate the semantic-aware token embedding. Second, we propose an efficient basic block embedding generation algorithm based on the network representation learning model. We learn both the semantic information of instructions and the control flow structural information to generate the basic block embedding. Then we use all basic block embeddings in a function to obtain a variable-length function feature vector. Third, we build a tensor to generate function embeddings based on the tensor singular value decomposition, which compresses the variable-length vectors into short fixed-length vectors to facilitate efficient search afterward. We further propose a dynamic tensor compression algorithm to incrementally update the function embedding database. Finally, we use the local sensitive hash method to find the top-K similar matching functions in the repository. Compared with state-of-the-art cross-platform and cross-optimization-level code search schemes,our scheme achieves higher average search accuracy, shorter feature vectors, and faster feature generation performance using four datasets, OpenSSL, Coreutils, libgmp and libcurl.

Jia Yang · Cai Fu · Xiao-Yang Liu
-
[ Visit Poster at Spot D5 in Virtual World ]
Deep variational reinforcement learning by optimizing Hamiltonian equation is a novel training method in reinforcement learning. Liu \cite{liu2020vrl} proposed to maximize the Hamiltonian equation to obtain the policy network. In this poster, we apply the massively parallel simulation to sample trajectories (collecting information of the reward tensor) and train the deep policy network by maximizing a partial Hamiltonian equation. On the FrozenLake $8\times8$ and GridWorld $10\times10$ examples, we verify the theory in \cite{liu2020vrl} by showing that deep Hamiltonian network (DHN) for variational reinforcement learning is more stable and efficient than DQN \cite{mnih2013playing}. Our codes are available at:\href{https://github.com/AI4Finance-Foundation/Quantum-Tensor-Networks-for-Variational-Reinforcement-Learning-NeurIPS-2020}.
Zeliang Zhang · Xiao-Yang Liu
-
[ Visit Poster at Spot D6 in Virtual World ]

In this work, we present our first study on Quantum Machine Learning (QML) applied to the Earth Observation (EO) domain, which has consistently leveraged state-of-the-art advances in Machine Learning for imagery, including recent researches on QML based techniques. Based on Ref. [1], we suggest an alternative approach for Quantum Convolutional Neural Networks (QCNN) and test its performance as image classifiers with different single-qubit and two-qubit gates for the EuroSAT dataset. As the original image of size 64x64 is too large for the current quantum hardware, we reduce its dimension through classical feature extractions methods, PCA, and convolutional autoencoder. We start with a binary classification of real EuroSAT features and fake data which are sampled from a uniform distribution. Then, we perform a set of binary classification between different couples of EuroSAT classes to investigate its capability for more detailed classification. Finally, we use the QCNN architecture as a 4-class classifier for MNIST and EuroSAT datasets and compare their results. Although improvements are still required in some areas, especially for the EuroSAT dataset, our preliminary work shows the potentialities of using QCNN as a quantum classifier in the context of computer vision and in particular Earth observation. Our ultimate goal is to extend this QCNN to generative models, including Generative Adversarial Networks (GAN), by replacing the generative part, and possibly the discriminative part, with quantum circuits.

[1] T. Hur, L. Kim, and D. K. Park. Quantum convolutional neural network for classical data classification, 2021.

Su Yeon Chang · Bertrand Le Saux · SOFIA VALLECORSA
-
[ Visit Poster at Spot A0 in Virtual World ]

We propose a novel spectral tensor layer for model-parallel deep neural networks, which can be trainedin the spectrum domain. Spectral tensor neu-ral network is trained by training a number of small,weak subnetworks which can be trained in asyn-chronous parallel. The output of the spectral tensorlayer neural network is the ensemble of the subnet-works’ outputs. Compared with conventional neuralnetworks, spectral tensor neural network has intrin-sical data parallelism as well as model parallelism,which is very suitable for distributed training, and hascompetitive ensemble results.

Zhiyuan Wang · Xiao-Yang Liu

Author Information

Xiao-Yang Liu (Columbia University)
Qibin Zhao (RIKEN AIP)
Ivan Oseledets (Skolkovo Institute of Science and Technology)
Yufei Ding (UC Santa Barbara)
Guillaume Rabusseau (Mila - Université de Montréal)
Khadijeh Najafi (Harvard and Caltech)
Andrzej Cichocki (Skolkovo Institute of Science and Technology)
Masashi Sugiyama (RIKEN AIP/The University of Tokyo)
Anwar Walid (Columbia University)

More from the Same Authors