Timezone: »
As the ninth in its series, OPT 2016 builds on remarkable precedent established by the highly successful series of workshops: OPT 2008--OPT 2015, which have been instrumental in bridging the OPT and ML communities closer together.
The previous OPT workshops enjoyed packed to overpacked attendance. This huge interest is no surprise: optimization is the 2nd largest topic at NIPS and is indeed foundational for the wider ML community.
Looking back over the past decade, a strong trend is apparent: The intersection of OPT and ML has grown monotonically to the point that now several cutting-edge advances in optimization arise from the ML community. The distinctive feature of optimization within ML is its departure from textbook approaches, in particular, by having a different set of goals driven by “big-data,” where both models and practical implementation are crucial.
This intimate relation between OPT and ML is the core theme of our workshop. We wish to use OPT2016 as a platform to foster discussion, discovery, and dissemination of the state-of-the-art in optimization as relevant to machine learning. And even beyond that, as a platform to identify new directions and challenges that will drive future research.
How OPT differs from other related workshops:
Compared to the other optimization focused workshops that we are aware of, the distinguishing features of OPT are: (a) it provides a unique bridge between the ML community and the wider optimization community; (b) it encourages theoretical work on an equal footing with practical efficiency; and (c) it caters to a wide body of NIPS attendees, experts and beginners alike (some OPT talks are always of a more “tutorial” nature).
Extended abstract
The OPT workshops have previously covered a variety of topics, such as frameworks for convex programs (D. Bertsekas), the intersection of ML and optimization, classification (S. Wright), stochastic gradient and its tradeoffs (L. Bottou, N. Srebro), structured sparsity (Vandenberghe), randomized methods for convex optimization (A. Nemirovski), complexity theory of convex optimization (Y. Nesterov), distributed optimization (S. Boyd), asynchronous stochastic gradient (B. Recht), algebraic techniques (P. Parrilo), nonconvex optimization (A. Lewis), sums-of-squares techniques (J. Lasserre), deep learning tricks (Y. Bengio), stochastic convex optimization (G. Lan), new views on interior point (E. Hazan), among others.
Several ideas propounded in OPT have by now become important research topics in ML and optimization --- especially in the field of randomized algorithms, stochastic gradient and variance reduced stochastic gradient methods. An edited book "Optimization for Machine Learning" (S. Sra, S. Nowozin, and S. Wright; MIT Press, 2011) grew out of the first three OPT workshops, and contains high-quality contributions from many of the speakers and attendees, and there have been sustained requests for the next edition of such a volume.
Much of the recent focus has been on large-scale first-order convex optimization algorithms for machine learning, both from a theoretical and methodological point of view. Covered topics included stochastic gradient algorithms, (accelerated) proximal algorithms, decomposition and coordinate descent algorithms, parallel and distributed optimization. Theoretical and practical advances in these methods remain a topic of core interest to the workshop. Recent years have also seen interesting advances in non-convex optimization such as a growing body of results on alternating minimization, tensor factorization etc.
We also do not wish to ignore the not particularly large scale setting, where one does have time to wield substantial computational resources. In this setting, high-accuracy solutions and deep understanding of the lessons contained in the data are needed. Examples valuable to MLers may be exploration of genetic and environmental data to identify risk factors for disease; or problems dealing with setups where the amount of observed data is not huge, but the mathematical model is complex. Consequently, we encourage optimization methods on manifolds, ML problems with differential geometric antecedents, those using advanced algebraic techniques, and computational topology, for instance.
At this point, we would like to emphasize again that OPT2016 is one of the few optimization+ML workshops that lies at the intersection of theory and practice: both actual efficiency of algorithms in practice as well as their theoretical analysis are given equal value.
Fri 11:15 p.m. - 11:30 p.m.
|
Opening Remarks
|
🔗 |
Fri 11:30 p.m. - 12:15 a.m.
|
Invited Talk: Online Optimization, Smoothing, and Worst-case Competitive Ratio (Maryam Fazel, University of Washington)
(
Talk
)
In Online Optimization, the data in an optimization problem is revealed over time and at each step a decision variable needs to be set without knowing the future data. This setup covers online resource allocation, from classical inventory problems to the Adwords problem popular in online advertising. In this talk, we prove bounds on the competitive ratio of two primal-dual algorithms for a broad class of online convex optimization problems. We give a sufficient condition on the objective function that guarantees a constant worst-case competitive ratio for monotone functions. We show how smoothing the objective can improve the competitive ratio of these algorithms, and for separable functions, we show that the optimal smoothing can be derived by solving a convex optimization problem. This result allows us to directly optimize the competitive ratio bound over a class of smoothing functions, and hence design effective smoothing customized for a given cost function. |
🔗 |
Sat 12:15 a.m. - 12:30 a.m.
|
Spotlight: Markov Chain Lifting and Distributed ADMM
(
Spotlight
)
The time to converge to the steady state of a finite Markov chain can be greatly reduced by a lifting operation, which creates a new Markov chain on an expanded state space. For a class of quadratic objectives, we show an analogous behavior whereby a distributed ADMM algorithm can be seen as a lifting of Gradient Descent. This provides a deep insight for its faster convergence rate under optimal parameter tuning. We conjecture that this gain is always present, contrary to when lifting a Markov chain, where sometimes we only obtain a marginal speedup. |
🔗 |
Sat 12:30 a.m. - 1:30 a.m.
|
Poster Session 1
|
🔗 |
Sat 1:30 a.m. - 2:00 a.m.
|
Coffee Break 1
(
Poster Session
)
|
🔗 |
Sat 2:00 a.m. - 2:45 a.m.
|
Invited Talk: Kernel-based Methods for Bandit Convex Optimization (Sébastien Bubeck, Microsoft Research)
(
Talk
)
A lot of progress has been made in recent years on extending classical multi-armed bandit strategies to very large set of actions. A particularly challenging setting is the one where the action set is continuous and the underlying cost function is convex, this is the so-called bandit convex optimization (BCO) problem. I will tell the story of BCO and explain some of the new ideas that we recently developed to solve it. I will focus on three new ideas from our recent work http://arxiv.org/abs/1607.03084 with Yin Tat Lee and Ronen Eldan: (i) a new connection between kernel methods and the popular multiplicative weights strategy; (ii) a new connection between kernel methods and one of Erdos’ favorite mathematical object, the Bernoulli convolution, and (iii) a new adaptive (and increasing!) learning rate for multiplicative weights. These ideas could be of broader interest in learning/algorithm’s design |
🔗 |
Sat 2:45 a.m. - 3:00 a.m.
|
Spotlight: Frank-Wolfe Algorithms for Saddle Point Problems
(
Spotlight
)
We extend the Frank-Wolfe (FW) optimization algorithm to solve constrained smooth convex-concave saddle point (SP) problems. Remarkably, the method only requires access to linear minimization oracles. Leveraging recent advances in FW optimization, we provide the first proof of convergence of a FW-type saddle point solver over polytopes, thereby partially answering a 30 year-old conjecture. We verify our convergence rates empirically and observe that by using a heuristic step-size, we can get empirical convergence under more general conditions, paving the way for future theoretical work. |
🔗 |
Sat 3:00 a.m. - 5:00 a.m.
|
Lunch Break
|
🔗 |
Sat 5:00 a.m. - 5:45 a.m.
|
Invited Talk: Semidefinite Programs with a Dash of Smoothness: Why and When the Low-Rank Approach Works (Nicolas Boumal, Princeton University)
(
Talk
)
Semidefinite programs (SDPs) can be solved in polynomial time by interior point methods, but scalability can be an issue. To address this shortcoming, over a decade ago, Burer and Monteiro proposed to solve SDPs with few equality constraints via low-rank, non-convex surrogates. Remarkably, for some applications, local optimization methods seem to converge to global optima of these non-convex surrogates reliably. In this presentation, we show that the Burer-Monteiro formulation of SDPs in a certain class almost never has any spurious local optima, that is: the non-convexity of the low-rank formulation is benign (even saddles are strict). This class of SDPs covers applications such as max-cut, community detection in the stochastic block model, robust PCA, phase retrieval and synchronization of rotations. The crucial assumption we make is that the low-rank problem lives on a manifold. Then, theory and algorithms from optimization on manifolds can be used. Optimization on manifolds is about minimizing a cost function over a smooth manifold, such as spheres, low-rank matrices, orthonormal frames, rotations, etc. We will present the basic framework as well as parts of the more general convergence theory, including recent complexity results. (Toolbox: http://www.manopt.org) Select parts are joint work with P.-A. Absil, A. Bandeira, C. Cartis and V. Voroninski. |
🔗 |
Sat 5:45 a.m. - 6:00 a.m.
|
Spotlight: QuickeNing: A Generic Quasi-Newton Algorithm for Faster Gradient-Based Optimization
(
Spotlight
)
We propose a technique to accelerate gradient-based optimization algorithms by giving them the ability to exploit L-BFGS heuristics. Our scheme is (i) generic and can be applied to a large class of first-order algorithms; (ii) it is compatible with composite objectives, meaning that it may provide exactly sparse solutions when a sparsity-inducing regularization is involved; (iii) it admits a linear convergence rate for strongly-convex problems; (iv) it is easy to use and it does not require any line search. Our work is inspired in part by the Catalyst meta-algorithm, which accelerates gradient-based techniques in the sense of Nesterov; here, we adopt a different strategy based on L-BFGS rules to learn and exploit the local curvature. In most practical cases, we observe significant improvements over Catalyst for solving large-scale high-dimensional machine learning problems. |
🔗 |
Sat 6:00 a.m. - 6:30 a.m.
|
Coffee Break 2
(
Poster Session
)
|
🔗 |
Sat 6:30 a.m. - 7:30 a.m.
|
Poster Session 2
(
Poster Session
)
|
🔗 |
Sat 7:30 a.m. - 8:15 a.m.
|
Invited Talk: Oracle Complexity of Second-Order Methods for Finite-Sum Problems (Ohad Shamir, Weizmann Institute of Science)
(
Talk
)
Finite-sum optimization problems are ubiquitous in machine learning, and are commonly solved using first-order methods which rely on gradient computations. Recently, there has been growing interest in second-order methods, which rely on both gradients and Hessians. In principle, second-order methods can require much fewer iterations than first-order methods, and hold the promise for more efficient algorithms. Although computing and manipulating Hessians is prohibitive for high-dimensional problems in general, the Hessians of individual functions in finite-sum problems can often be efficiently computed, e.g. because they possess a low-rank structure. Can second-order information indeed be used to solve such problems more efficiently? In this talk, I'll provide evidence that the answer -- perhaps surprisingly -- is negative, at least in terms of worst-case guarantees. However, I'll also discuss what additional assumptions and algorithmic approaches might potentially circumvent this negative result. Joint work with Yossi Arjevani. |
🔗 |
Sat 8:15 a.m. - 8:30 a.m.
|
Spotlight: Reliably Learning the ReLU in Polynomial Time
(
Spotlight
)
We give the first dimension-efficient algorithms for learning Rectified Linear Units (ReLUs), which are functions of the form max(0, w.x) with w a unit vector (2-norm equal to 1). Our algorithm works in the challenging Reliable Agnostic learning model of Kalai, Kanade, and Mansour where the learner is given access to a distribution D on labeled examples but the labeling may be arbitrary. We construct a hypothesis that simultaneously minimizes the false-positive rate and the l_p loss (for p=1 or p >=2) of inputs given positive labels by D. It runs in polynomial-time (in n) with respect to {\em any} distribution on S^{n-1} (the unit sphere in n dimensions) and for any error parameter \epsilon = \Omega(1/ \log n). These results are in contrast to known efficient algorithms for reliably learning linear threshold functions, where epsilon must be Omega(1) and strong assumptions are required on the marginal distribution. We can compose our results to obtain the first set of efficient algorithms for learning constant-depth networks of ReLUs. Our techniques combine kernel methods and polynomial approximations with a |
🔗 |
Author Information
Suvrit Sra (MIT)
Suvrit Sra is a faculty member within the EECS department at MIT, where he is also a core faculty member of IDSS, LIDS, MIT-ML Group, as well as the statistics and data science center. His research spans topics in optimization, matrix theory, differential geometry, and probability theory, which he connects with machine learning --- a key focus of his research is on the theme "Optimization for Machine Learning” (http://opt-ml.org)
Francis Bach (INRIA - Ecole Normale Superieure)
Sashank J. Reddi (Carnegie Mellon University)
Niao He (UIUC)
More from the Same Authors
-
2021 Spotlight: Batch Normalization Orthogonalizes Representations in Deep Random Networks »
Hadi Daneshmand · Amir Joudaki · Francis Bach -
2022 Poster: A Non-asymptotic Analysis of Non-parametric Temporal-Difference Learning »
Eloïse Berthier · Ziad Kobeissi · Francis Bach -
2022 Spotlight: Lightning Talks 1A-4 »
Siwei Wang · Jing Liu · Nianqiao Ju · Shiqian Li · Eloïse Berthier · Muhammad Faaiz Taufiq · Arsene Fansi Tchango · Chen Liang · Chulin Xie · Jordan Awan · Jean-Francois Ton · Ziad Kobeissi · Wenguan Wang · Xinwang Liu · Kewen Wu · Rishab Goel · Jiaxu Miao · Suyuan Liu · Julien Martel · Ruobin Gong · Francis Bach · Chi Zhang · Rob Cornish · Sanmi Koyejo · Zhi Wen · Yee Whye Teh · Yi Yang · Jiaqi Jin · Bo Li · Yixin Zhu · Vinayak Rao · Wenxuan Tu · Gaetan Marceau Caron · Arnaud Doucet · Xinzhong Zhu · Joumana Ghosn · En Zhu -
2022 Spotlight: A Non-asymptotic Analysis of Non-parametric Temporal-Difference Learning »
Eloïse Berthier · Ziad Kobeissi · Francis Bach -
2022 Poster: Variational inference via Wasserstein gradient flows »
Marc Lambert · Sinho Chewi · Francis Bach · Silvère Bonnabel · Philippe Rigollet -
2022 Poster: CCCP is Frank-Wolfe in disguise »
Alp Yurtsever · Suvrit Sra -
2022 Poster: Efficient Sampling on Riemannian Manifolds via Langevin MCMC »
Xiang Cheng · Jingzhao Zhang · Suvrit Sra -
2022 Poster: Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays »
Konstantin Mishchenko · Francis Bach · Mathieu Even · Blake Woodworth -
2022 Poster: On the Theoretical Properties of Noise Correlation in Stochastic Optimization »
Aurelien Lucchi · Frank Proske · Antonio Orvieto · Francis Bach · Hans Kersting -
2022 Poster: Fast Stochastic Composite Minimization and an Accelerated Frank-Wolfe Algorithm under Parallelization »
Benjamin Dubois-Taine · Francis Bach · Quentin Berthet · Adrien Taylor -
2022 Poster: Active Labeling: Streaming Stochastic Gradients »
Vivien Cabannes · Francis Bach · Vianney Perchet · Alessandro Rudi -
2021 Test Of Time: Online Learning for Latent Dirichlet Allocation »
Matthew Hoffman · Francis Bach · David Blei -
2021 Poster: Can contrastive learning avoid shortcut solutions? »
Joshua Robinson · Li Sun · Ke Yu · Kayhan Batmanghelich · Stefanie Jegelka · Suvrit Sra -
2021 Poster: Overcoming the curse of dimensionality with Laplacian regularization in semi-supervised learning »
Vivien Cabannes · Loucas Pillaud-Vivien · Francis Bach · Alessandro Rudi -
2021 Poster: Three Operator Splitting with Subgradients, Stochastic Gradients, and Adaptive Learning Rates »
Alp Yurtsever · Alex Gu · Suvrit Sra -
2021 Oral: Continuized Accelerations of Deterministic and Stochastic Gradient Descents, and of Gossip Algorithms »
Mathieu Even · Raphaël Berthier · Francis Bach · Nicolas Flammarion · Hadrien Hendrikx · Pierre Gaillard · Laurent Massoulié · Adrien Taylor -
2021 Poster: Batch Normalization Orthogonalizes Representations in Deep Random Networks »
Hadi Daneshmand · Amir Joudaki · Francis Bach -
2021 Poster: Continuized Accelerations of Deterministic and Stochastic Gradient Descents, and of Gossip Algorithms »
Mathieu Even · Raphaël Berthier · Francis Bach · Nicolas Flammarion · Hadrien Hendrikx · Pierre Gaillard · Laurent Massoulié · Adrien Taylor -
2020 : Invited speaker: SGD without replacement: optimal rate analysis and more, Suvrit Sra »
Suvrit Sra -
2020 : Francis Bach - Where is Machine Learning Going? »
Francis Bach -
2020 Poster: SGD with shuffling: optimal rates without component convexity and large epoch requirements »
Kwangjun Ahn · Chulhee Yun · Suvrit Sra -
2020 Spotlight: SGD with shuffling: optimal rates without component convexity and large epoch requirements »
Kwangjun Ahn · Chulhee Yun · Suvrit Sra -
2020 Poster: Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model »
Raphaël Berthier · Francis Bach · Pierre Gaillard -
2020 Poster: Learning with Differentiable Pertubed Optimizers »
Quentin Berthet · Mathieu Blondel · Olivier Teboul · Marco Cuturi · Jean-Philippe Vert · Francis Bach -
2020 Poster: Batch normalization provably avoids ranks collapse for randomly initialised deep networks »
Hadi Daneshmand · Jonas Kohler · Francis Bach · Thomas Hofmann · Aurelien Lucchi -
2020 Poster: Non-parametric Models for Non-negative Functions »
Ulysse Marteau-Ferey · Francis Bach · Alessandro Rudi -
2020 Spotlight: Non-parametric Models for Non-negative Functions »
Ulysse Marteau-Ferey · Francis Bach · Alessandro Rudi -
2020 Session: Orals & Spotlights Track 30: Optimization/Theory »
Yuxin Chen · Francis Bach -
2020 Poster: Dual-Free Stochastic Decentralized Optimization with Variance Reduction »
Hadrien Hendrikx · Francis Bach · Laurent Massoulié -
2020 Poster: Why are Adaptive Methods Good for Attention Models? »
Jingzhao Zhang · Sai Praneeth Karimireddy · Andreas Veit · Seungyeon Kim · Sashank Reddi · Sanjiv Kumar · Suvrit Sra -
2020 Poster: Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes »
Yi Tian · Jian Qian · Suvrit Sra -
2020 Spotlight: Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes »
Yi Tian · Jian Qian · Suvrit Sra -
2019 : Closing Remarks »
Bo Dai · Niao He · Nicolas Le Roux · Lihong Li · Dale Schuurmans · Martha White -
2019 : Poster and Coffee Break 2 »
Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall -
2019 : Poster Spotlight 1 »
David Brandfonbrener · Joan Bruna · Tom Zahavy · Haim Kaplan · Yishay Mansour · Nikos Karampatziakis · John Langford · Paul Mineiro · Donghwan Lee · Niao He -
2019 Workshop: Bridging Game Theory and Deep Learning »
Ioannis Mitliagkas · Gauthier Gidel · Niao He · Reyhane Askari Hemmat · N H · Nika Haghtalab · Simon Lacoste-Julien -
2019 Workshop: The Optimization Foundations of Reinforcement Learning »
Bo Dai · Niao He · Nicolas Le Roux · Lihong Li · Dale Schuurmans · Martha White -
2019 : Opening Remarks »
Bo Dai · Niao He · Nicolas Le Roux · Lihong Li · Dale Schuurmans · Martha White -
2019 Poster: Flexible Modeling of Diversity with Strongly Log-Concave Distributions »
Joshua Robinson · Suvrit Sra · Stefanie Jegelka -
2019 Poster: Are deep ResNets provably better than linear predictors? »
Chulhee Yun · Suvrit Sra · Ali Jadbabaie -
2019 Poster: Exponential Family Estimation via Adversarial Dynamics Embedding »
Bo Dai · Zhen Liu · Hanjun Dai · Niao He · Arthur Gretton · Le Song · Dale Schuurmans -
2019 Poster: Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity »
Chulhee Yun · Suvrit Sra · Ali Jadbabaie -
2019 Spotlight: Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity »
Chulhee Yun · Suvrit Sra · Ali Jadbabaie -
2019 Poster: Fast Decomposable Submodular Function Minimization using Constrained Total Variation »
Senanayak Sesh Kumar Karri · Francis Bach · Thomas Pock -
2019 Poster: Towards closing the gap between the theory and practice of SVRG »
Othmane Sebbouh · Nidham Gazagnadou · Samy Jelassi · Francis Bach · Robert Gower -
2019 Poster: An Accelerated Decentralized Stochastic Proximal Algorithm for Finite Sums »
Hadrien Hendrikx · Francis Bach · Laurent Massoulié -
2019 Poster: On Lazy Training in Differentiable Programming »
Lénaïc Chizat · Edouard Oyallon · Francis Bach -
2019 Poster: Implicit Regularization of Discrete Gradient Dynamics in Linear Neural Networks »
Gauthier Gidel · Francis Bach · Simon Lacoste-Julien -
2019 Poster: Massively scalable Sinkhorn distances via the Nyström method »
Jason Altschuler · Francis Bach · Alessandro Rudi · Jonathan Niles-Weed -
2019 Poster: Localized Structured Prediction »
Carlo Ciliberto · Francis Bach · Alessandro Rudi -
2019 Poster: UniXGrad: A Universal, Adaptive Algorithm with Optimal Guarantees for Constrained Optimization »
Ali Kavis · Kfir Y. Levy · Francis Bach · Volkan Cevher -
2019 Poster: Learning Positive Functions with Pseudo Mirror Descent »
Yingxiang Yang · Haoxiang Wang · Negar Kiyavash · Niao He -
2019 Spotlight: UniXGrad: A Universal, Adaptive Algorithm with Optimal Guarantees for Constrained Optimization »
Ali Kavis · Kfir Y. Levy · Francis Bach · Volkan Cevher -
2019 Spotlight: Learning Positive Functions with Pseudo Mirror Descent »
Yingxiang Yang · Haoxiang Wang · Negar Kiyavash · Niao He -
2019 Poster: Partially Encrypted Deep Learning using Functional Encryption »
Théo Ryffel · David Pointcheval · Francis Bach · Edouard Dufour-Sans · Romain Gay -
2019 Poster: Globally Convergent Newton Methods for Ill-conditioned Generalized Self-concordant Losses »
Ulysse Marteau-Ferey · Francis Bach · Alessandro Rudi -
2018 : Smooth Games in Machine Learning Beyond GANs »
Niao He -
2018 Poster: Optimal Algorithms for Non-Smooth Distributed Optimization in Networks »
Kevin Scaman · Francis Bach · Sebastien Bubeck · Laurent Massoulié · Yin Tat Lee -
2018 Poster: Direct Runge-Kutta Discretization Achieves Acceleration »
Jingzhao Zhang · Aryan Mokhtari · Suvrit Sra · Ali Jadbabaie -
2018 Poster: Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes »
Loucas Pillaud-Vivien · Alessandro Rudi · Francis Bach -
2018 Spotlight: Direct Runge-Kutta Discretization Achieves Acceleration »
Jingzhao Zhang · Aryan Mokhtari · Suvrit Sra · Ali Jadbabaie -
2018 Oral: Optimal Algorithms for Non-Smooth Distributed Optimization in Networks »
Kevin Scaman · Francis Bach · Sebastien Bubeck · Laurent Massoulié · Yin Tat Lee -
2018 Poster: Relating Leverage Scores and Density using Regularized Christoffel Functions »
Edouard Pauwels · Francis Bach · Jean-Philippe Vert -
2018 Poster: Efficient Algorithms for Non-convex Isotonic Regression through Submodular Optimization »
Francis Bach -
2018 Poster: Coupled Variational Bayes via Optimization Embedding »
Bo Dai · Hanjun Dai · Niao He · Weiyang Liu · Zhen Liu · Jianshu Chen · Lin Xiao · Le Song -
2018 Poster: Rest-Katyusha: Exploiting the Solution's Structure via Scheduled Restart Schemes »
Junqi Tang · Mohammad Golbabaee · Francis Bach · Mike Davies -
2018 Poster: Exponentiated Strongly Rayleigh Distributions »
Zelda Mariet · Suvrit Sra · Stefanie Jegelka -
2018 Poster: Predictive Approximate Bayesian Computation via Saddle Points »
Yingxiang Yang · Bo Dai · Negar Kiyavash · Niao He -
2018 Poster: SING: Symbol-to-Instrument Neural Generator »
Alexandre Defossez · Neil Zeghidour · Nicolas Usunier · Leon Bottou · Francis Bach -
2018 Poster: Quadratic Decomposable Submodular Function Minimization »
Pan Li · Niao He · Olgica Milenkovic -
2018 Poster: On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport »
Lénaïc Chizat · Francis Bach -
2018 Tutorial: Negative Dependence, Stable Polynomials, and All That »
Suvrit Sra · Stefanie Jegelka -
2017 : Concluding remarks »
Francis Bach · Benjamin Guedj · Pascal Germain -
2017 : Neil Lawrence, Francis Bach and François Laviolette »
Neil Lawrence · Francis Bach · Francois Laviolette -
2017 : Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance »
Francis Bach -
2017 : Overture »
Benjamin Guedj · Francis Bach · Pascal Germain -
2017 Workshop: (Almost) 50 shades of Bayesian Learning: PAC-Bayesian trends and insights »
Benjamin Guedj · Pascal Germain · Francis Bach -
2017 Workshop: OPT 2017: Optimization for Machine Learning »
Suvrit Sra · Sashank J. Reddi · Alekh Agarwal · Benjamin Recht -
2017 Poster: On Structured Prediction Theory with Calibrated Convex Surrogate Losses »
Anton Osokin · Francis Bach · Simon Lacoste-Julien -
2017 Poster: Online Learning for Multivariate Hawkes Processes »
Yingxiang Yang · Jalal Etesami · Niao He · Negar Kiyavash -
2017 Oral: On Structured Prediction Theory with Calibrated Convex Surrogate Losses »
Anton Osokin · Francis Bach · Simon Lacoste-Julien -
2017 Poster: Elementary Symmetric Polynomials for Optimal Experimental Design »
Zelda Mariet · Suvrit Sra -
2017 Poster: Nonlinear Acceleration of Stochastic Algorithms »
Damien Scieur · Francis Bach · Alexandre d'Aspremont -
2017 Poster: Integration Methods and Optimization Algorithms »
Damien Scieur · Vincent Roulet · Francis Bach · Alexandre d'Aspremont -
2017 Poster: Polynomial time algorithms for dual volume sampling »
Chengtao Li · Stefanie Jegelka · Suvrit Sra -
2016 : Francis Bach. Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression. »
Francis Bach -
2016 : Taming non-convexity via geometry »
Suvrit Sra -
2016 : Submodular Functions: from Discrete to Continuous Domains »
Francis Bach -
2016 Workshop: Learning in High Dimensions with Structure »
Nikhil Rao · Prateek Jain · Hsiang-Fu Yu · Ming Yuan · Francis Bach -
2016 Poster: Fast Mixing Markov Chains for Strongly Rayleigh Measures, DPPs, and Constrained Sampling »
Chengtao Li · Suvrit Sra · Stefanie Jegelka -
2016 Poster: Kronecker Determinantal Point Processes »
Zelda Mariet · Suvrit Sra -
2016 Poster: Variance Reduction in Stochastic Gradient Langevin Dynamics »
Kumar Avinava Dubey · Sashank J. Reddi · Sinead Williamson · Barnabas Poczos · Alexander Smola · Eric Xing -
2016 Poster: Parameter Learning for Log-supermodular Distributions »
Tatiana Shpakova · Francis Bach -
2016 Poster: Regularized Nonlinear Acceleration »
Damien Scieur · Alexandre d'Aspremont · Francis Bach -
2016 Oral: Regularized Nonlinear Acceleration »
Damien Scieur · Alexandre d'Aspremont · Francis Bach -
2016 Poster: Stochastic Variance Reduction Methods for Saddle-Point Problems »
Balamurugan Palaniappan · Francis Bach -
2016 Poster: PAC-Bayesian Theory Meets Bayesian Inference »
Pascal Germain · Francis Bach · Alexandre Lacoste · Simon Lacoste-Julien -
2016 Poster: Proximal Stochastic Methods for Nonsmooth Nonconvex Finite-Sum Optimization »
Sashank J. Reddi · Suvrit Sra · Barnabas Poczos · Alexander Smola -
2016 Poster: Riemannian SVRG: Fast Stochastic Optimization on Riemannian Manifolds »
Hongyi Zhang · Sashank J. Reddi · Suvrit Sra -
2016 Poster: Stochastic Optimization for Large-scale Optimal Transport »
Aude Genevay · Marco Cuturi · Gabriel Peyré · Francis Bach -
2016 Tutorial: Large-Scale Optimization: Beyond Stochastic Gradient Descent and Convexity »
Suvrit Sra · Francis Bach -
2015 : Structured Sparsity and convex optimization »
Francis Bach -
2015 : Sharp Analysis of Random Feature Expansions »
Francis Bach -
2015 : Convergence Rates of Kernel Quadrature Rules »
Francis Bach -
2015 Workshop: Optimization for Machine Learning (OPT2015) »
Suvrit Sra · Alekh Agarwal · Leon Bottou · Sashank J. Reddi -
2015 Poster: Matrix Manifold Optimization for Gaussian Mixtures »
Reshad Hosseini · Suvrit Sra -
2015 Poster: On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants »
Sashank J. Reddi · Ahmed Hefny · Suvrit Sra · Barnabas Poczos · Alexander Smola -
2015 Poster: Rethinking LDA: Moment Matching for Discrete ICA »
Anastasia Podosinnikova · Francis Bach · Simon Lacoste-Julien -
2015 Poster: Spectral Norm Regularization of Orthonormal Representations for Graph Transduction »
Rakesh Shivanna · Bibaswan K Chatterjee · Raman Sankaran · Chiranjib Bhattacharyya · Francis Bach -
2014 Workshop: OPT2014: Optimization for Machine Learning »
Zaid Harchaoui · Suvrit Sra · Alekh Agarwal · Martin Jaggi · Miro Dudik · Aaditya Ramdas · Jean Lasserre · Yoshua Bengio · Amir Beck -
2014 Poster: Efficient Structured Matrix Rank Minimization »
Adams Wei Yu · Wanli Ma · Yaoliang Yu · Jaime Carbonell · Suvrit Sra -
2014 Poster: Metric Learning for Temporal Sequence Alignment »
Rémi Lajugie · Damien Garreau · Francis Bach · Sylvain Arlot -
2014 Poster: SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives »
Aaron Defazio · Francis Bach · Simon Lacoste-Julien -
2013 Workshop: OPT2013: Optimization for Machine Learning »
Suvrit Sra · Alekh Agarwal -
2013 Poster: Geometric optimisation on positive definite matrices for elliptically contoured distributions »
Suvrit Sra · Reshad Hosseini -
2013 Poster: Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) »
Francis Bach · Eric Moulines -
2013 Spotlight: Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) »
Francis Bach · Eric Moulines -
2013 Session: Oral Session 2 »
Francis Bach -
2013 Poster: Convex Relaxations for Permutation Problems »
Fajwel Fogel · Rodolphe Jenatton · Francis Bach · Alexandre d'Aspremont -
2013 Poster: Reflection methods for user-friendly submodular optimization »
Stefanie Jegelka · Francis Bach · Suvrit Sra -
2013 Session: Tutorial Session B »
Francis Bach -
2012 Workshop: Optimization for Machine Learning »
Suvrit Sra · Alekh Agarwal -
2012 Workshop: Analysis Operator Learning vs. Dictionary Learning: Fraternal Twins in Sparse Modeling »
Martin Kleinsteuber · Francis Bach · Remi Gribonval · John Wright · Simon Hawe -
2012 Poster: Multiple Operator-valued Kernel Learning »
Hachem Kadri · Alain Rakotomamonjy · Francis Bach · philippe preux -
2012 Poster: A new metric on the manifold of kernel matrices with application to matrix geometric means »
Suvrit Sra -
2012 Poster: A Stochastic Gradient Method with an Exponential Convergence
Rate for Finite Training Sets »
Nicolas Le Roux · Mark Schmidt · Francis Bach -
2012 Oral: A Stochastic Gradient Method with an Exponential Convergence
Rate for Finite Training Sets »
Nicolas Le Roux · Mark Schmidt · Francis Bach -
2012 Poster: Scalable nonconvex inexact proximal splitting »
Suvrit Sra -
2011 Workshop: Optimization for Machine Learning »
Suvrit Sra · Stephen Wright · Sebastian Nowozin -
2011 Workshop: Sparse Representation and Low-rank Approximation »
Ameet S Talwalkar · Lester W Mackey · Mehryar Mohri · Michael W Mahoney · Francis Bach · Mike Davies · Remi Gribonval · Guillaume R Obozinski -
2011 Poster: Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization »
Mark Schmidt · Nicolas Le Roux · Francis Bach -
2011 Oral: Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization »
Mark Schmidt · Nicolas Le Roux · Francis Bach -
2011 Poster: Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning »
Francis Bach · Eric Moulines -
2011 Poster: Trace Lasso: a trace norm regularization for correlated designs »
Edouard Grave · Guillaume R Obozinski · Francis Bach -
2011 Spotlight: Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning »
Francis Bach · Eric Moulines -
2011 Poster: Shaping Level Sets with Submodular Functions »
Francis Bach -
2010 Workshop: New Directions in Multiple Kernel Learning »
Marius Kloft · Ulrich Rueckert · Cheng Soon Ong · Alain Rakotomamonjy · Soeren Sonnenburg · Francis Bach -
2010 Workshop: Numerical Mathematics Challenges in Machine Learning »
Matthias Seeger · Suvrit Sra -
2010 Workshop: Optimization for Machine Learning »
Suvrit Sra · Sebastian Nowozin · Stephen Wright -
2010 Spotlight: Online Learning for Latent Dirichlet Allocation »
Matthew D. Hoffman · David Blei · Francis Bach -
2010 Poster: Efficient Optimization for Discriminative Latent Class Models »
Armand Joulin · Francis Bach · Jean A Ponce -
2010 Poster: Online Learning for Latent Dirichlet Allocation »
Matthew D. Hoffman · David Blei · Francis Bach -
2010 Oral: Structured sparsity-inducing norms through submodular functions »
Francis Bach -
2010 Poster: Structured sparsity-inducing norms through submodular functions »
Francis Bach -
2010 Poster: Network Flow Algorithms for Structured Sparsity »
Julien Mairal · Rodolphe Jenatton · Guillaume R Obozinski · Francis Bach -
2009 Workshop: Optimization for Machine Learning »
Sebastian Nowozin · Suvrit Sra · S.V.N Vishwanthan · Stephen Wright -
2009 Workshop: Understanding Multiple Kernel Learning Methods »
Brian McFee · Gert Lanckriet · Francis Bach · Nati Srebro -
2009 Poster: Data-driven calibration of linear estimators with minimal penalties »
Sylvain Arlot · Francis Bach -
2009 Poster: Asymptotically Optimal Regularization in Smooth Parametric Models »
Percy Liang · Francis Bach · Guillaume Bouchard · Michael Jordan -
2009 Tutorial: Sparse Methods for Machine Learning: Theory and Algorithms »
Francis Bach -
2008 Workshop: Optimization for Machine Learning »
Suvrit Sra · Sebastian Nowozin · Vishwanathan S V N -
2008 Poster: Clustered Multi-Task Learning: A Convex Formulation »
Laurent Jacob · Francis Bach · Jean-Philippe Vert -
2008 Poster: Sparse probabilistic projections »
Cedric Archambeau · Francis Bach -
2008 Spotlight: Sparse probabilistic projections »
Cedric Archambeau · Francis Bach -
2008 Spotlight: Clustered Multi-Task Learning: A Convex Formulation »
Laurent Jacob · Francis Bach · Jean-Philippe Vert -
2008 Poster: Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning »
Francis Bach -
2008 Poster: Kernel Change-point Analysis »
Zaid Harchaoui · Francis Bach · Eric Moulines -
2008 Poster: SDL: Supervised Dictionary Learning »
Julien Mairal · Francis Bach · Jean A Ponce · Guillermo Sapiro · Andrew Zisserman -
2007 Poster: Testing for Homogeneity with Kernel Fisher Discriminant Analysis »
Zaid Harchaoui · Francis Bach · Moulines Eric -
2007 Poster: DIFFRAC: a discriminative and flexible framework for clustering »
Francis Bach · Zaid Harchaoui -
2007 Session: Session 2: Probabilistic Optimization »
Francis Bach -
2006 Poster: Active learning for misspecified generalized linear models »
Francis Bach