Poster Session (same as above)
in
Workshop: Beyond first order methods in machine learning systems
Abstract
An Accelerated Method for Derivative-Free Smooth Stochastic Convex Optimization. Eduard Gorbunov (Moscow Institute of Physics and Technology); Pavel Dvurechenskii (WIAS Germany); Alexander Gasnikov (Moscow Institute of Physics and Technology)
Fast Bregman Gradient Methods for Low-Rank Minimization Problems. Radu-Alexandru Dragomir (Université Toulouse 1); Jérôme Bolte (Université Toulouse 1); Alexandre d'Aspremont (Ecole Normale Superieure)
Gluster: Variance Reduced Mini-Batch SGD with Gradient Clustering. Fartash Faghri (University of Toronto); David Duvenaud (University of Toronto); David Fleet (University of Toronto); Jimmy Ba (University of Toronto)
Neural Policy Gradient Methods: Global Optimality and Rates of Convergence. Lingxiao Wang (Northwestern University); Qi Cai (Northwestern University); Zhuoran Yang (Princeton University); Zhaoran Wang (Northwestern University)
A Gram-Gauss-Newton Method Learning Overparameterized Deep Neural Networks for Regression Problems. Tianle Cai (Peking University); Ruiqi Gao (Peking University); Jikai Hou (Peking University); Siyu Chen (Peking University); Dong Wang (Peking University); Di He (Peking University); Zhihua Zhang (Peking University); Liwei Wang (Peking University)
Stochastic Gradient Methods with Layerwise Adaptive Moments for Training of Deep Networks. Boris Ginsburg (NVIDIA); Oleksii Hrinchuk (NVIDIA); Jason Li (NVIDIA); Vitaly Lavrukhin (NVIDIA); Ryan Leary (NVIDIA); Oleksii Kuchaiev (NVIDIA); Jonathan Cohen (NVIDIA); Huyen Nguyen (NVIDIA); Yang Zhang (NVIDIA)
Accelerating Neural ODEs with Spectral Elements. Alessio Quaglino (NNAISENSE SA); Marco Gallieri (NNAISENSE); Jonathan Masci (NNAISENSE); Jan Koutnik (NNAISENSE)
An Inertial Newton Algorithm for Deep Learning. Camille Castera (CNRS, IRIT); Jérôme Bolte (Université Toulouse 1); Cédric Févotte (CNRS, IRIT); Edouard Pauwels (Toulouse 3 University)
Nonlinear Conjugate Gradients for Scaling Synchronous Distributed DNN Training. Saurabh Adya (Apple); Vinay Palakkode (Apple Inc.); Oncel Tuzel (Apple Inc.)
- How does mini-batching affect Curvature information for second order deep learning optimization? Diego Granziol (Oxford); Stephen Roberts (Oxford); Xingchen Wan (Oxford University); Stefan Zohren (University of Oxford); Binxin Ru (University of Oxford); Michael A. Osborne (University of Oxford); Andrew Wilson (NYU); sebastien ehrhardt (Oxford); Dmitry P Vetrov (Higher School of Economics); Timur Garipov (Samsung AI Center in Moscow)
On the Convergence of a Biased Version of Stochastic Gradient Descent. Rudrajit Das (University of Texas at Austin); Jiong Zhang (UT-Austin); Inderjit S. Dhillon (UT Austin & Amazon)
Adaptive Sampling Quasi-Newton Methods for Derivative-Free Stochastic Optimization. Raghu Bollapragada (Argonne National Laboratory); Stefan Wild (Argonne National Laboratory)
- Acceleration through Spectral Modeling. Fabian Pedregosa (Google); Damien Scieur (Princeton University)
Accelerating Distributed Stochastic L-BFGS by sampled 2nd-Order Information. Jie Liu (Lehigh University); Yu Rong (Tencent AI Lab); Martin Takac (Lehigh University); Junzhou Huang (Tencent AI Lab)
Grow Your Samples and Optimize Better via Distributed Newton CG and Accumulating Strategy. Majid Jahani (Lehigh University); Xi He (Lehigh University); Chenxin Ma (Lehigh University); Aryan Mokhtari (UT Austin); Dheevatsa Mudigere (Intel Labs); Alejandro Ribeiro (University of Pennsylvania); Martin Takac (Lehigh University)
Global linear convergence of trust-region Newton's method without strong-convexity or smoothness. Sai Praneeth Karimireddy (EPFL); Sebastian Stich (EPFL); Martin Jaggi (EPFL)
FD-Net with Auxiliary Time Steps: Fast Prediction of PDEs using Hessian-Free Trust-Region Methods. Nur Sila Gulgec (Lehigh University); Zheng Shi (Lehigh University); Neil Deshmukh (MIT BeaverWorks - Medlytics); Shamim Pakzad (Lehigh University); Martin Takac (Lehigh University)
- Using better models in stochastic optimization. Hilal Asi (Stanford University); John Duchi (Stanford University)
Tangent space separability in feedforward neural networks. Bálint Daróczy (Institute for Computer Science and Control, Hungarian Academy of Sciences); Rita Aleksziev (Institute for Computer Science and Control, Hungarian Academy of Sciences); Andras Benczur (Hungarian Academy of Sciences)
- Ellipsoidal Trust Region Methods for Neural Nets. Leonard Adolphs (ETHZ); Jonas Kohler (ETHZ)
Closing the K-FAC Generalisation Gap Using Stochastic Weight Averaging. Xingchen Wan (University of Oxford); Diego Granziol (Oxford); Stefan Zohren (University of Oxford); Stephen Roberts (Oxford)
- Sub-sampled Newton Methods Under Interpolation. Si Yi Meng (University of British Columbia); Sharan Vaswani (Mila, Université de Montréal); Issam Laradji (University of British Columbia); Mark Schmidt (University of British Columbia); Simon Lacoste-Julien (Mila, Université de Montréal)
Learned First-Order Preconditioning. Aditya Rawal (Uber AI Labs); Rui Wang (Uber AI); Theodore Moskovitz (Gatsby Computational Neuroscience Unit); Sanyam Kapoor (Uber); Janice Lan (Uber AI); Jason Yosinski (Uber AI Labs); Thomas Miconi (Uber AI Labs)
Iterative Hessian Sketch in Input Sparsity Time. Charlie Dickens (University of Warwick); Graham Cormode (University of Warwick)
Nonlinear matrix recovery. Florentin Goyens (University of Oxford); Coralia Cartis (Oxford University); Armin Eftekhari (EPFL)
Making Variance Reduction more Effective for Deep Networks. Nicolas Brandt (EPFL); Farnood Salehi (EPFL); Patrick Thiran (EPFL)
Novel and Efficient Approximations for Zero-One Loss of Linear Classifiers. Hiva Ghanbari (Lehigh University); Minhan Li (Lehigh University); Katya Scheinberg (Lehigh)
A Model-Based Derivative-Free Approach to Black-Box Adversarial Examples: BOBYQA. Giuseppe Ughi (University of Oxford)
Distributed Accelerated Inexact Proximal Gradient Method via System of Coupled Ordinary Differential Equations. Chhavi Sharma (IIT Bombay); Vishnu Narayanan (IIT Bombay); Balamurugan Palaniappan (IIT Bombay)
Finite-Time Convergence of Continuous-Time Optimization Algorithms via Differential Inclusions. Orlando Romero (Rensselaer Polytechnic Institute); Mouhacine Benosman (MERL)
Loss Landscape Sightseeing by Multi-Point Optimization. Ivan Skorokhodov (MIPT); Mikhail Burtsev (NI)
- Symmetric Multisecant quasi-Newton methods. Damien Scieur (Samsung AI Research Montreal); Thomas Pumir (Princeton University); Nicolas Boumal (Princeton University)
Does Adam optimizer keep close to the optimal point? Kiwook Bae (KAIST)*; Heechang Ryu (KAIST); Hayong Shin (KAIST)
Stochastic Newton Method and its Cubic Regularization via Majorization-Minimization. Konstantin Mishchenko (King Abdullah University of Science & Technology (KAUST)); Peter Richtarik (KAUST); Dmitry Koralev (KAUST)
Full Matrix Preconditioning Made Practical. Rohan Anil (Google); Vineet Gupta (Google); Tomer Koren (Google); Kevin Regan (Google); Yoram Singer (Princeton)
Memory-Sample Tradeoffs for Linear Regression with Small Error. Vatsal Sharan (Stanford University); Aaron Sidford (Stanford); Gregory Valiant (Stanford University)
On the Higher-order Moments in Adam. Zhanhong Jiang (Johnson Controls International); Aditya Balu (Iowa State University); Sin Yong Tan (Iowa State University); Young M Lee (Johnson Controls International); Chinmay Hegde (Iowa State University); Soumik Sarkar (Iowa State University)
h-matrix approximation for Gauss-Newton Hessian. Chao Chen (UT Austin)
- Hessian-Aware trace-Weighted Quantization. Zhen Dong (UC Berkeley); Zhewei Yao (University of California, Berkeley); Amir Gholami (UC Berkeley); Yaohui Cai (Peking University); Daiyaan Arfeen (UC Berkeley); Michael Mahoney ("University of California, Berkeley"); Kurt Keutzer (UC Berkeley)
Random Projections for Learning Non-convex Models. Tolga Ergen (Stanford University); Emmanuel Candes (Stanford University); Mert Pilanci (Stanford)
- New Methods for Regularization Path Optimization via Differential Equations. Paul Grigas (UC Berkeley); Heyuan Liu (University of California, Berkeley)
Hessian-Aware Zeroth-Order Optimization. Haishan Ye (HKUST); Zhichao Huang (HKUST); Cong Fang (Peking University); Chris Junchi Li (Tencent); Tong Zhang (HKUST)
Higher-Order Accelerated Methods for Faster Non-Smooth Optimization. Brian Bullins (TTIC)