NeurIPS 2019 Awards

Test of Time Award	Test of Time: Dual Averaging Method for Regularized Stochastic Learning and Online Optimization Lin Xiao [ West Exhibition Hall C + B3 ]
Outstanding Paper Honorable Mention	Nonparametric Density Estimation & Convergence Rates for GANs under Besov IPM Losses Ananya Uppal · Shashank Singh · Barnabas Poczos [ East Exhibition Hall B + C ] Abstract We study the problem of estimating a nonparametric probability distribution under a family of losses called Besov IPMs. This family is quite large, including, for example, L^p distances, total variation distance, and generalizations of both Wasserstein (earthmover's) and Kolmogorov-Smirnov distances. For a wide variety of settings, we provide both lower and upper bounds, identifying precisely how the choice of loss function and assumptions on the data distribution interact to determine the mini-max optimal convergence rate. We also show that, in many cases, linear distribution estimates, such as the empirical distribution or kernel density estimator, cannot converge at the optimal rate. These bounds generalize, unify, or improve on several recent and classical results. Moreover, IPMs can be used to formalize a statistical model of generative adversarial networks (GANs). Thus, we show how our results imply bounds on the statistical error of a GAN, showing, for example, that, in many cases, GANs can strictly outperform the best linear estimator.
Outstanding Paper	Distribution-Independent PAC Learning of Halfspaces with Massart Noise Ilias Diakonikolas · Themis Gouleakis · Christos Tzamos [ East Exhibition Hall B + C ] Abstract We study the problem of {\em distribution-independent} PAC learning of halfspaces in the presence of Massart noise. Specifically, we are given a set of labeled examples $(\bx, y)$ drawn from a distribution $\D$ on $\R^{d+1}$ such that the marginal distribution on the unlabeled points $\bx$ is arbitrary and the labels $y$ are generated by an unknown halfspace corrupted with Massart noise at noise rate $\eta<1/2$ . The goal is to find a hypothesis $h$ that minimizes the misclassification error $\pr_{(\bx, y) \sim \D} \left[ h(\bx) \neq y \right]$ . We give a $\poly\left(d, 1/\eps\right)$ time algorithm for this problem with misclassification error $\eta+\eps$ . We also provide evidence that improving on the error guarantee of our algorithm might be computationally hard. Prior to our work, no efficient weak (distribution-independent) learner was known in this model, even for the class of disjunctions. The existence of such an algorithm for halfspaces (or even disjunctions) has been posed as an open question in various works, starting with Sloan (1988), Cohen (1997), and was most recently highlighted in Avrim Blum's FOCS 2003 tutorial.
Outstanding Paper Honorable Mention	Fast and Accurate Least-Mean-Squares Solvers Ibrahim Jubran · Alaa Maalouf · Dan Feldman [ East Exhibition Hall B + C ] Abstract Least-mean squares (LMS) solvers such as Linear / Ridge / Lasso-Regression, SVD and Elastic-Net not only solve fundamental machine learning problems, but are also the building blocks in a variety of other methods, such as decision trees and matrix factorizations. We suggest an algorithm that gets a finite set of $n$ $d$ -dimensional real vectors and returns a weighted subset of $d+1$ vectors whose sum is \emph{exactly} the same. The proof in Caratheodory's Theorem (1907) computes such a subset in $O(n^2d^2)$ time and thus not used in practice. Our algorithm computes this subset in $O(nd)$ time, using $O(\log n)$ calls to Caratheodory's construction on small but "smart" subsets. This is based on a novel paradigm of fusion between different data summarization techniques, known as sketches and coresets. As an example application, we show how it can be used to boost the performance of existing LMS solvers, such as those in scikit-learn library, up to x100. Generalization for streaming and distributed (big) data is trivial. Extensive experimental results and complete open source code are also provided.
Outstanding New Directions Paper	Uniform convergence may be unable to explain generalization in deep learning Vaishnavh Nagarajan · J. Zico Kolter [ East Exhibition Hall B + C ] Abstract Aimed at explaining the surprisingly good generalization behavior of overparameterized deep networks, recent works have developed a variety of generalization bounds for deep learning, all based on the fundamental learning-theoretic technique of uniform convergence. While it is well-known that many of these existing bounds are numerically large, through numerous experiments, we bring to light a more concerning aspect of these bounds: in practice, these bounds can {\em increase} with the training dataset size. Guided by our observations, we then present examples of overparameterized linear classifiers and neural networks trained by gradient descent (GD) where uniform convergence provably cannot explain generalization'' -- even if we take into account the implicit bias of GD {\em to the fullest extent possible}. More precisely, even if we consider only the set of classifiers output by GD, which have test errors less than some small $\epsilon$ in our settings, we show that applying (two-sided) uniform convergence on this set of classifiers will yield only a vacuous generalization guarantee larger than $1-\epsilon$ . Through these findings, we cast doubt on the power of uniform convergence-based generalization bounds to provide a complete picture of why overparameterized deep networks generalize well.
Outstanding New Directions Paper Honorable Mention	Putting An End to End-to-End: Gradient-Isolated Learning of Representations Sindy Löwe · Peter O'Connor · Bas Veeling [ East Exhibition Hall B + C ] Abstract We propose a novel deep learning method for local self-supervised representation learning that does not require labels nor end-to-end backpropagation but exploits the natural order in data instead. Inspired by the observation that biological neural networks appear to learn without backpropagating a global error signal, we split a deep neural network into a stack of gradient-isolated modules. Each module is trained to maximally preserve the information of its inputs using the InfoNCE bound from Oord et al [2018]. Despite this greedy training, we demonstrate that each module improves upon the output of its predecessor, and that the representations created by the top module yield highly competitive results on downstream classification tasks in the audio and visual domain. The proposal enables optimizing modules asynchronously, allowing large-scale distributed training of very deep neural networks on unlabelled datasets.
Outstanding New Directions Paper Honorable Mention	Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations Vincent Sitzmann · Michael Zollhoefer · Gordon Wetzstein [ East Exhibition Hall B + C ] Abstract Unsupervised learning with generative models has the potential of discovering rich representations of 3D scenes. While geometric deep learning has explored 3D-structure-aware representations of scene geometry, these models typically require explicit 3D supervision. Emerging neural scene representations can be trained only with posed 2D images, but existing methods ignore the three-dimensional structure of scenes. We propose Scene Representation Networks (SRNs), a continuous, 3D-structure-aware scene representation that encodes both geometry and appearance. SRNs represent scenes as continuous functions that map world coordinates to a feature representation of local scene properties. By formulating the image formation as a differentiable ray-marching algorithm, SRNs can be trained end-to-end from only 2D images and their camera poses, without access to depth or shape. This formulation naturally generalizes across scenes, learning powerful geometry and appearance priors in the process. We demonstrate the potential of SRNs by evaluating them for novel view synthesis, few-shot reconstruction, joint shape and appearance interpolation, and unsupervised discovery of a non-rigid face model.
Outstanding Demonstration	Immersions - How Does Music Sound to Artificial Ears? Vincent Herrmann [ East Exhibition Hall B + C ] Abstract Immersions is system that sonifies the inner workings of a sound processing neural network in real-time. An optimization procedure generates sounds that activate certain regions in the network. This renders audible the way music sounds to this artificial ear. In addition the mechanisms are also visualized, creating the experience of immersing oneself into the depths of an artificial intelligence.
Best Demonstration Honorable Mention	F1/10: An open-source 1/10th scale platform for autonomous racing and reinforcement learning Matthew O'Kelly · Dhruv Karthik · Hongrui Zheng · Joseph Auckley · Siddharth Singh · Shashank D Prasad · Kim Luong · Matthew R Lebermann · Rahul Mangharam [ East Exhibition Hall B + C ] Abstract The deployment of learning algorithms on autonomous vehicles is expensive, slow, and potentially unsafe. We present the F1/10 platform, a low-cost open-source 1/10th scale racecar and validated simulator which enables safe and rapid experimentation suitable for laboratory research settings. The F1/10 environment and hardware offer an accessible way to validate reinforcement learning policies learned in simulation and on a real car via the same intuitive OpenAI Gym API. To this effect, we demonstrate two approaches based on reinforcement learning that enable the transfer of policies from simulation to a real life track. The first demonstration details the use of our modular photorealistic simulator to learn an end-to-end racing policy in a self-supervised manner. Here, we demonstrate the F1/10’s capability for distributed online learning by sending batches of ‘experiences’ (video streams, odometry, etc.) back to a server, which asynchronously trains on this data and updates the racecar’s network weights on the fly. We also show a way to take quasi-deterministic ‘steps’ and perform ‘resets’ on the real car, thereby more closely mimicking the Gym API standards. The second demonstration uses our lightweight physics simulator to perform a joint optimization over a parameterized description of the racing trajectory, planning algorithms, and car …