Test of Time Award

[ West Exhibition Hall C + B3 ] 
Outstanding Paper

[ East Exhibition Hall B + C ]
We study the problem of {\em distributionindependent} PAC learning of halfspaces in the presence of Massart noise.
Specifically, we are given a set of labeled examples $(\bx, y)$ drawn
from a distribution $\D$ on $\R^{d+1}$ such that the marginal distribution
on the unlabeled points $\bx$ is arbitrary and the labels $y$ are generated by an unknown halfspace
corrupted with Massart noise at noise rate $\eta<1/2$. The goal is to find
a hypothesis $h$ that minimizes the misclassification error $\pr_{(\bx, y) \sim \D} \left[ h(\bx) \neq y \right]$.
We give a $\poly\left(d, 1/\eps\right)$ time algorithm for this problem with misclassification error $\eta+\eps$.
We also provide evidence that improving on the error guarantee of our algorithm
might be computationally hard. Prior to our work, no efficient weak (distributionindependent) learner
was known in this model, even for the class of disjunctions. The existence of such an algorithm
for halfspaces (or even disjunctions) has been posed as an open question in various works,
starting with Sloan (1988), Cohen (1997), and was most recently highlighted in Avrim Blum's FOCS 2003 tutorial.

Outstanding Paper Honorable Mention

[ East Exhibition Hall B + C ] We study the problem of estimating a nonparametric probability distribution under a family of losses called Besov IPMs. This family is quite large, including, for example, L^p distances, total variation distance, and generalizations of both Wasserstein (earthmover's) and KolmogorovSmirnov distances. For a wide variety of settings, we provide both lower and upper bounds, identifying precisely how the choice of loss function and assumptions on the data distribution interact to determine the minimax optimal convergence rate. We also show that, in many cases, linear distribution estimates, such as the empirical distribution or kernel density estimator, cannot converge at the optimal rate. These bounds generalize, unify, or improve on several recent and classical results. Moreover, IPMs can be used to formalize a statistical model of generative adversarial networks (GANs). Thus, we show how our results imply bounds on the statistical error of a GAN, showing, for example, that, in many cases, GANs can strictly outperform the best linear estimator. 
Outstanding Paper Honorable Mention

[ East Exhibition Hall B + C ]
Leastmean squares (LMS) solvers such as Linear / Ridge / LassoRegression, SVD and ElasticNet not only solve fundamental machine learning problems, but are also the building blocks in a variety of other methods, such as decision trees and matrix factorizations.
We suggest an algorithm that gets a finite set of $n$ $d$dimensional real vectors and returns a weighted subset of $d+1$ vectors whose sum is \emph{exactly} the same. The proof in Caratheodory's Theorem (1907) computes such a subset in $O(n^2d^2)$ time and thus not used in practice. Our algorithm computes this subset in $O(nd)$ time, using $O(\log n)$ calls to Caratheodory's construction on small but "smart" subsets. This is based on a novel paradigm of fusion between different data summarization techniques, known as sketches and coresets.
As an example application, we show how it can be used to boost the performance of existing LMS solvers, such as those in scikitlearn library, up to x100. Generalization for streaming and distributed (big) data is trivial.
Extensive experimental results and complete open source code are also provided.

Outstanding New Directions Paper

[ East Exhibition Hall B + C ]
Aimed at explaining the surprisingly good generalization behavior of overparameterized deep networks, recent works have developed a variety of generalization bounds for deep learning, all based on the fundamental learningtheoretic technique of uniform convergence. While
it is wellknown that many of these existing bounds are numerically large, through numerous experiments, we bring to light a more concerning aspect of these bounds:
in practice, these bounds can {\em increase} with the training dataset size. Guided by our observations,
we then present examples of overparameterized linear classifiers and neural networks trained by gradient descent (GD) where uniform convergence provably cannot ``explain generalization''  even if we take into account the implicit bias of GD {\em to the fullest extent possible}. More precisely, even if we consider only the set of classifiers output by GD, which have test errors less than some small $\epsilon$ in our settings, we show that applying (twosided) uniform convergence on this set of classifiers will yield only a vacuous generalization guarantee larger than $1\epsilon$. Through these findings,
we cast doubt on the power of uniform convergencebased generalization bounds to provide a complete picture of why overparameterized deep networks generalize well.

Outstanding New Directions Paper Honorable Mention

[ East Exhibition Hall B + C ] We propose a novel deep learning method for local selfsupervised representation learning that does not require labels nor endtoend backpropagation but exploits the natural order in data instead. Inspired by the observation that biological neural networks appear to learn without backpropagating a global error signal, we split a deep neural network into a stack of gradientisolated modules. Each module is trained to maximally preserve the information of its inputs using the InfoNCE bound from Oord et al [2018]. Despite this greedy training, we demonstrate that each module improves upon the output of its predecessor, and that the representations created by the top module yield highly competitive results on downstream classification tasks in the audio and visual domain. The proposal enables optimizing modules asynchronously, allowing largescale distributed training of very deep neural networks on unlabelled datasets. 
Outstanding New Directions Paper Honorable Mention

[ East Exhibition Hall B + C ] Unsupervised learning with generative models has the potential of discovering rich representations of 3D scenes. While geometric deep learning has explored 3Dstructureaware representations of scene geometry, these models typically require explicit 3D supervision. Emerging neural scene representations can be trained only with posed 2D images, but existing methods ignore the threedimensional structure of scenes. We propose Scene Representation Networks (SRNs), a continuous, 3Dstructureaware scene representation that encodes both geometry and appearance. SRNs represent scenes as continuous functions that map world coordinates to a feature representation of local scene properties. By formulating the image formation as a differentiable raymarching algorithm, SRNs can be trained endtoend from only 2D images and their camera poses, without access to depth or shape. This formulation naturally generalizes across scenes, learning powerful geometry and appearance priors in the process. We demonstrate the potential of SRNs by evaluating them for novel view synthesis, fewshot reconstruction, joint shape and appearance interpolation, and unsupervised discovery of a nonrigid face model. 
Outstanding Demonstration

[ East Exhibition Hall B + C ] Immersions is system that sonifies the inner workings of a sound processing neural network in realtime. An optimization procedure generates sounds that activate certain regions in the network. This renders audible the way music sounds to this artificial ear. In addition the mechanisms are also visualized, creating the experience of immersing oneself into the depths of an artificial intelligence. 
Best Demonstration Honorable Mention

[ East Exhibition Hall B + C ] The deployment of learning algorithms on autonomous vehicles is expensive, slow, and potentially unsafe. We present the F1/10 platform, a lowcost opensource 1/10th scale racecar and validated simulator which enables safe and rapid experimentation suitable for laboratory research settings. The F1/10 environment and hardware offer an accessible way to validate reinforcement learning policies learned in simulation and on a real car via the same intuitive OpenAI Gym API. To this effect, we demonstrate two approaches based on reinforcement learning that enable the transfer of policies from simulation to a real life track. The first demonstration details the use of our modular photorealistic simulator to learn an endtoend racing policy in a selfsupervised manner. Here, we demonstrate the F1/10’s capability for distributed online learning by sending batches of ‘experiences’ (video streams, odometry, etc.) back to a server, which asynchronously trains on this data and updates the racecar’s network weights on the fly. We also show a way to take quasideterministic ‘steps’ and perform ‘resets’ on the real car, thereby more closely mimicking the Gym API standards. The second demonstration uses our lightweight physics simulator to perform a joint optimization over a parameterized description of the racing trajectory, planning algorithms, and car … 