Timezone: »

Workshop
Optimal Transport and Machine Learning
Olivier Bousquet · Marco Cuturi · Gabriel Peyré · Fei Sha · Justin Solomon

Sat Dec 09 08:00 AM -- 06:30 PM (PST) @ Hyatt Hotel, Seaview Ballroom

Optimal transport (OT) is gradually establishing itself as a powerful and essential tool to compare probability measures, which in machine learning take the form of point clouds, histograms, bags-of-features, or more generally datasets to be compared with probability densities and generative models. OT can be traced back to early work by Monge, and later to Kantorovich and Dantzig during the birth of linear programming. The mathematical theory of OT has produced several important developments since the 90's, crowned by Cédric Villani's Fields Medal in 2010. OT is now transitioning into more applied spheres, including recent applications to machine learning, because it can tackle challenging learning scenarios including dimensionality reduction, structured prediction problems that involve histograms, and estimation of generative models in highly degenerate, high-dimensional problems. This workshop will follow that organized 3 years ago (NIPS 2014) and will seek to amplify that trend. We will provide the audience with an update on all of the very recent successes brought forward by efficient solvers and innovative applications through a long list of invited talks. We will add to that a few contributed presentations (oral, and, if needed posters) and, finally, a panel for all invited speakers to take questions from the audience and formulate more nuanced opinions on this nascent field.

 Sat 8:00 a.m. - 8:20 a.m. [iCal] Structured Optimal Transport (with T. Jaakkola, S. Jegelka) (Contributed 1) David Alvarez Melis Sat 8:20 a.m. - 9:00 a.m. [iCal] Approximate Bayesian computation with the Wasserstein distance (Invited 1) » A growing range of generative statistical models prohibits the numerical evaluation of their likelihood functions. Approximate Bayesian computation has become a popular approach to overcome this issue, simulating synthetic data given parameters and comparing summaries of these simulations with the corresponding observed values. We propose to avoid these summaries and the ensuing loss of information through the use of Wasserstein distances between empirical distributions of observed and synthetic data. We describe how the approach can be used in the setting of dependent data such as time series, and how approximations of the Wasserstein distance allow in practice the method to scale to large datasets. In particular, we propose a new approximation to the optimal assignment problem using the Hilbert space-filling curve. The approach is illustrated on various examples including i.i.d. data and time series. Pierre E Jacob Sat 9:00 a.m. - 9:40 a.m. [iCal] Gradient flow in the Wasserstein metric (Invited 2) » Optimal transport not only provides powerful techniques for comparing probability measures, but also for analyzing their evolution over time. For a range of partial differential equations arising in physics, biology, and engineering, solutions are gradient flows in the Wasserstein metric: each equation has a notion of energy for which solutions dissipate energy as quickly as possible, with respect to the Wasserstein structure. Steady states of the equation correspond to minimizers of the energy, and stability properties of the equation translate into convexity properties of the energy. In this talk, I will compare Wasserstein gradient flow with more classical gradient flows arising in optimization and machine learning. I’ll then introduce a class of particle blob methods for simulating Wasserstein gradient flows numerically. Katy Craig Sat 9:40 a.m. - 10:00 a.m. [iCal] Approximate inference with Wasserstein gradient flows (with T. Poggio) (Contributed 2) Charlie Frogner Sat 10:00 a.m. - 10:20 a.m. [iCal] 6 x 3 minutes spotlights (Poster Spotlights) » Nicolas Courty, Rémi Flamary and Mélanie Ducoffe. Learning Wasserstein Embeddings Yongxin Chen, Tryphon Georgiou and Allen Tannenbaum. Optimal transport for Gaussian mixture models Napat Rujeerapaiboon, Kilian Schindler, Daniel Kuhn and Wolfram Wiesemann. Size Matters: Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization Jonas Adler, Axel Ringh, Ozan Öktem and Johan Karlsson. Learning to solve inverse problems using Wasserstein loss John Lee, Adam Charles, Nicholas Bertrand and Christopher Rozell. An Optimal Transport Tracking Regularizer Lucas Roberts, Leo Razoumov, Lin Su and Yuyang Wang. Gini-regularized Optimal Transport with an Application in Spatio-Temporal Forecasting Rémi Flamary, Yongxin Chen, Napat Rujeerapaiboon, Jonas Adler, John Lee, Lucas R Roberts Sat 11:00 a.m. - 11:40 a.m. [iCal] Optimal planar transport in near-linear time (Invited 3) » We show how to compute the Earth Mover Distance between two planar sets of size N in N^{1+o(1)} time. The algorithm is based on a generic framework that decomposes the natural Linear Programming formulation for the transport problem into a tree of smaller LPs, and recomposes it in a divide-and-conquer fashion. The main enabling idea is use sketching -- a generalization of the dimension reduction method -- in order to reduce the size of the "partial computation" so that the conquer step is more efficient. We will conclude with some open questions in the area. This is joint work with Aleksandar Nikolov, Krzysztof Onak, and Grigory Yaroslavtsev. Alexandr Andoni Sat 11:40 a.m. - 12:20 p.m. [iCal] Laplacian operator and Brownian motions on the Wasserstein space (Invited 4) » We endow the space of probability measures on $\mathbb{R}^d$ with $\Delta_w$, a Laplacian operator. A Brownian motion is shown to be consistent with the Laplacian operator. The smoothing effect of the heat equation is established for a class of functions. Special perturbations of the Laplacian operator, denoted $\Delta_{w,\epsilon}$, appearing in Mean Field Games theory, are considered (Joint work with Y. T. Chow). Wilfrid Gangbo Sat 1:40 p.m. - 2:20 p.m. [iCal] Geometrical Insights for Unsupervised Learning (Invited 6) » After arguing that choosing the right probability distance is critical for achieving the elusive goals of unsupervised learning, we compare the geometric properties of the two currently most promising distances: (1) the earth-mover distance, and (2) the energy distance, also known as maximum mean discrepancy. These insights allow us to give a fresh viewpoint on reported experimental results and to risk a couple predictions. Joint work with Leon Bottou, Martin Arjovsky, David Lopez-Paz, and Maxime Oquab. Leon Bottou Sat 2:20 p.m. - 2:40 p.m. [iCal] Improving GANs Using Optimal Transport (with H. Zhang, A. Radford, D. Metaxas) (Contributed 3) Tim Salimans Sat 2:40 p.m. - 3:00 p.m. [iCal] Overrelaxed Sinkhorn-Knopp Algorithm for Regularized Optimal Transport (with L. Chizat, C. Dossal, N. Papadakis) (Contributed 4) Alexis THIBAULT Sat 3:30 p.m. - 4:10 p.m. [iCal] Domain adaptation with optimal transport : from mapping to learning with joint distribution (Invited 6) » This presentation deals with the unsupervised domain adaptation problem, where one wants to estimate a prediction function f in a given target domain without any labeled sample by exploiting the knowledge available from a source domain where labels are known. After a short introduction of recent developent in domain adaptation and their relation to optimal transport we will present a method that estimates a barycentric mapping between the feature distributions in order to adapt the training dataset prior to learning. Next we propose a novel method that model with optimal transport the transformation between the joint feature/labels space distributions of the two domains. We aim at recovering an estimated target distribution ptf=(X,f(X)) by optimizing simultaneously the optimal coupling and f. We discuss the generalization of the proposed method, and provide an efficient algorithmic solution. The versatility of the approach, both in terms of class of hypothesis or loss functions is demonstrated with real world classification, regression problems and large datasets where stochastic approaches become necessary. Joint work with Nicolas COURTY, Devis TUIA, Amaury HABRARD, and Alain RAKOTOMAMONJY Rémi Flamary Sat 4:10 p.m. - 4:50 p.m. [iCal] Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance (Invited 7) » The Wasserstein distance between two probability measures on a metric space is a measure of closeness with applications in statistics, probability, and machine learning. In this work, we consider the fundamental question of how quickly the empirical measure obtained fromnindependent samples from μ approaches μ in the Wasserstein distance of any order. We prove sharp asymptotic and finite-sample results for this rate of convergence for general measures on general compact metric spaces. Our finite-sample results show the existence of multi-scale behavior, where measures can exhibit radically different rates of convergence as n grows. See more details in: J. Weed, F. Bach. Sharp asymptotic and finite-sample ratesof convergence of empirical measures in Wasserstein distance. Technical Report, Arxiv-1707.00087, 2017. Francis Bach Sat 4:50 p.m. - 5:10 p.m. [iCal] 7 x 3 minutes spotlights (Poster Spotlights) » Jérémie Bigot, Elsa Cazelles and Nicolas Papadakis. Central limit theorems for Sinkhorn divergence between probability distributions on finite spaces. Aude Genevay. Learning Generative Models with Sinkhorn Divergences Gonzalo Mena, David Belanger, Gonzalo Muñoz and Jasper Snoek. Sinkhorn Networks: Using Optimal Transport Techniques to Learn Permutations Christoph Brauer, Christian Clason, Dirk Lorenz and Benedikt Wirth. A Sinkhorn-Newton method for entropic optimal transport Henning Petzka, Asja Fischer and Denis Lukovnikov. On the regularization of Wasserstein GANs Vivien Seguy, Bharath Bhushan Damodaran, Rémi Flamary, Nicolas Courty, Antoine Rolet and Mathieu Blondel. Large Scale Optimal Transport and Mapping Estimation Sho Sonoda and Noboru Murata. Transportation analysis of denoising autoencoders: a novel method for analyzing deep neural networks Elsa Cazelles, Aude Genevay, Gonzalo Mena, Christoph Brauer, Asja Fischer, Henning Petzka, Vivien Seguy, Antoine Rolet, Sho Sonoda Sat 5:10 p.m. - 5:30 p.m. [iCal] short Q&A session with plenary speakers (Roundtable) Sat 5:30 p.m. - 6:30 p.m. [iCal] Closing session (Poster Session)

#### Author Information

##### Marco Cuturi (Google Brain & CREST - ENSAE)

Marco Cuturi is a research scientist at Google AI, Brain team in Paris. He received his Ph.D. in 11/2005 from the Ecole des Mines de Paris in applied mathematics. Before that he graduated from National School of Statistics (ENSAE) with a master degree (MVA) from ENS Cachan. He worked as a post-doctoral researcher at the Institute of Statistical Mathematics, Tokyo, between 11/2005 and 3/2007 and then in the financial industry between 4/2007 and 9/2008. After working at the ORFE department of Princeton University as a lecturer between 2/2009 and 8/2010, he was at the Graduate School of Informatics of Kyoto University between 9/2010 and 9/2016 as a tenured associate professor. He joined ENSAE in 9/2016 as a professor, where he is now working part-time. His main employment is now with Google AI (Brain team in Paris) since 10/2018, as a research scientist working on fundamental aspects of machine learning.