Timezone: »

Workshop
Adaptive and Scalable Nonparametric Methods in Machine Learning
Aaditya Ramdas · Arthur Gretton · Bharath Sriperumbudur · Han Liu · John Lafferty · Samory Kpotufe · Zoltán Szabó

Fri Dec 09 11:00 PM -- 09:30 AM (PST) @ Room 120 + 121

Large amounts of high-dimensional data are routinely acquired in scientific fields ranging from biology, genomics and health sciences to astronomy and economics due to improvements in engineering and data acquisition techniques. Nonparametric methods allow for better modelling of complex systems underlying data generating processes compared to traditionally used linear and parametric models. From statistical point of view, scientists have enough data to reliably fit nonparametric models. However, from computational point of view, nonparametric methods often do not scale well to big data problems.

The aim of this workshop is to bring together practitioners, who are interested in developing and applying nonparametric methods in their domains, and theoreticians, who are interested in providing sound methodology. We hope to effectively communicate advances in development of computational tools for fitting nonparametric models and discuss challenging future directions that prevent applications of nonparametric methods to big data problems.

We encourage submissions on a variety of topics, including but not limited to:
- Randomized procedures for fitting nonparametric models. For example, sketching, random projections, core set selection, etc.
- Nonparametric probabilistic graphical models
- Scalable nonparametric methods
- Multiple kernel learning
- Random feature expansion
- Novel applications of nonparametric methods
- Bayesian nonparametric methods
- Nonparametric network models

This workshop is a fourth in a series of NIPS workshops on modern nonparametric methods in machine learning. Previous workshops focused on time/accuracy tradeoffs, high dimensionality and dimension reduction strategies, and automating the learning pipeline.

 Fri 11:30 p.m. - 12:00 a.m. [iCal] Richard Samworth. Adaptation in log-concave density estimation (Invited talk) » The log-concave maximum likelihood estimator of a density on the real line based on a sample of size $n$ is known to attain the minimax optimal rate of convergence of $O(n^{-4/5})$ with respect to, e.g., squared Hellinger distance. In this talk, we show that it also enjoys attractive adaptation properties, in the sense that it achieves a faster rate of convergence when the logarithm of the true density is $k$-affine (i.e. made up of $k$ affine pieces), provided $k$ is not too large. Our results use two different techniques: the first relies on a new Marshall's inequality for log-concave density estimation, and reveals that when the true density is close to log-linear on its support, the log-concave maximum likelihood estimator can achieve the parametric rate of convergence in total variation distance. Our second approach depends on local bracketing entropy methods, and allows us to prove a sharp oracle inequality, which implies in particular that the rate of convergence with respect to various global loss functions, including Kullback--Leibler divergence, is $O(kn^{-1} \log^{5/4} n)$ when the true density is log-concave and its logarithm is close to $k$-affine. Link » Richard J Samworth Sat 12:00 a.m. - 12:30 a.m. [iCal] Ming Yuan. Functional nuclear norm and low rank function estimation. (Invited talk) » The problem of low rank estimation naturally arises in a number of functional or relational data analysis settings, for example when dealing with spatio-temporal data or link prediction with attributes. We consider a unified framework for these problems and devise a novel penalty function to exploit the low rank structure in such contexts. The resulting empirical risk minimization estimator can be shown to be optimal under fairly general conditions. Link » Ming Yuan Sat 12:30 a.m. - 1:00 a.m. [iCal] Mladen Kolar. Post-Regularization Inference for Dynamic Nonparanormal Graphical Models. (Invited talk) » We propose a novel class of dynamic nonparanormal graphical models, which allows us to model high dimensional heavy-tailed systems and the evolution of their latent network structures. Under this model we develop statistical tests for presence of edges both locally at a fixed index value and globally over a range of values. The tests are developed for a high-dimensional regime, are robust to model selection mistakes and do not require commonly assumed minimum signal strength. The testing procedures are based on a high dimensional, debiasing-free moment estimator, which uses a novel kernel smoothed Kendall's tau correlation matrix as an input statistic. The estimator consistently estimates the latent inverse Pearson correlation matrix uniformly in both index variable and kernel bandwidth. Its rate of convergence is shown to be minimax optimal. Thorough numerical simulations and an application to a neural imaging dataset support the usefulness of our method. Joint work with Junwei Lu and Han Liu. Link » Mladen Kolar Sat 2:00 a.m. - 2:20 a.m. [iCal] Debarghya Ghoshdastidar, Ulrike von Luxburg. Do Nonparametric Two-sample Tests work for Small Sample Size? A Study on Random Graphs. (Contributed talks) » We consider the problem of two-sample hypothesis testing for inhomogeneous unweighted random graphs, where one has access to only a very small number of samples from each model. Standard tests cannot be guaranteed to perform well in this setting due to the small sample size. We present a nonparametric test based on comparison of the adjacency matrices of the graphs, and prove that the test is consistent for increasing sample size as well as when the graph size increases with sample size held fixed. Numerical simulations exhibit the practical significance of the test. Link » Sat 2:20 a.m. - 2:40 a.m. [iCal] Diana Cai, Trevor Campbell, Tamara Broderick. Paintboxes and probability functions for edge-exchangeable graphs. (Contributed talks) Sat 2:40 a.m. - 3:00 a.m. [iCal] Alessandro Rudi, Raffaello Camoriano, Lorenzo Rosasco. Generalization Properties of Learning with Random Features. (Contributed talks) » We study the generalization properties of regularized learning with random features in the statistical learning theory framework. We show that optimal learning errors can be achieved with a number of features smaller than the number of examples. Link » Sat 3:00 a.m. - 3:20 a.m. [iCal] Makoto Yamada, Yuta Umezu, Kenji Fukumizu, Ichiro Takeuchi. Post Selection Inference with Kernels. (Contributed talks) » We propose a novel kernel based post selection inference (PSI) algorithm, which can not only handle non-linearity in data but also structured output such as multi-dimensional and multi-label outputs. Specifically, we develop a PSI algorithm for independence measures, and propose the Hilbert-Schmidt Independence Criterion (HSIC) based PSI algorithm (hsicInf). We apply the hsicInf algorithm to a real-world data, and show that hsicInf can successfully identify important features. Link » Sat 3:20 a.m. - 3:40 a.m. [iCal] Yunpeng Pan, Xinyan Yan, Evangelos Theodorou, Byron Boots. Solving the Linear Bellman Equation via Kernel Embeddings and Stochastic Gradient Descent. (Contributed talks) » We introduce a data-efficient approach for solving the linear Bellman equation, which corresponds to a class of Markov decision processes (MDPs) and stochastic optimal control (SOC) problems. We show that this class of control problem can be reformulated as a stochastic composition optimization problem, which can be further reformulated as a saddle point problem and solved via dual kernel embeddings. Our method is model-free and using only one sample per state transition from stochastic dynamical systems. Different from related work such as Z-learning based on temporal-difference learning, our method is an on-line algorithm exploiting stochastic optimization. Numerical results are provided, showing that our method outperforms the Z-learning algorithm. Link » Sat 3:40 a.m. - 5:30 a.m. [iCal] Lunch break Sat 5:30 a.m. - 6:00 a.m. [iCal] Francis Bach. Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression. (Invited talk) » We consider the optimization of a quadratic objective function whose gradients are only accessible through a stochastic oracle that returns the gradient at any given point plus a zero-mean finite variance random error. We present the first algorithm that achieves jointly the optimal prediction error rates for least-squares regression, both in terms of forgetting of initial conditions in O(1/n^2), and in terms of dependence on the noise and dimension d of the problem, as O(d/n). Our new algorithm is based on averaged accelerated regularized gradient descent, and may also be analyzed through finer assumptions on initial conditions and the Hessian matrix, leading to dimension-free quantities that may still be small while the "optimal " terms above are large. In order to characterize the tightness of these new bounds, we consider an application to non-parametric regression and use the known lower bounds on the statistical performance (without computational limits), which happen to match our bounds obtained from a single pass on the data and thus show optimality of our algorithm in a wide variety of particular trade-offs between bias and variance. [joint work with Aymeric Dieuleveut and Nicolas Flammarion] Link » Francis Bach Sat 6:00 a.m. - 6:30 a.m. [iCal] Richard (Fangjian) Guo. Boosting Variational Inference. (Invited talk) » Modern Bayesian inference typically requires some form of posterior approximation, and mean-field variational inference (MFVI) is an increasingly popular choice due to its speed. But MFVI is inaccurate in several aspects, including an inability to capture multimodality in the posterior and underestimation of the posterior covariance. These issues arise since MFVI considers approximations to the posterior only in a family of factorized parametric distributions. We instead consider a much more flexible approximating family consisting of all possible mixtures of a parametric base distribution (e.g., Gaussians) without constraining the number of mixture components. In order to efficiently find a high-quality posterior approximation within this family, we borrow ideas from gradient boosting and propose the boosting variational inference (BVI) method, which iteratively improves the current approximation by mixing it with a new component from the base distribution family. We develop practical algorithms for BVI and demonstrate their performance on both real and simulated data. Joint work with Xiangyu Wang, Kai Fan, Tamara Broderick and David Dunson. Link » Fangjian Guo Sat 6:30 a.m. - 6:45 a.m. [iCal] Break Sat 6:45 a.m. - 7:15 a.m. [iCal] Olga Klopp. Network models and sparse graphon estimation. (Invited talk) » Inhomogeneous random graph models encompass many network models such as stochastic block models and latent position models. We consider the problem of statistical estimation of the matrix of connection probabilities based on the observations of the adjacency matrix of the network and derive optimal rates of convergence for this problem. Our results cover the important setting of sparse networks. We also establish upper bounds on the minimax risk for graphon estimation when the probability matrix is sampled according to a graphon model. Link » Olga Klopp Sat 7:15 a.m. - 7:45 a.m. [iCal] Emily Fox. Sparse Graphs via Exchangeable Random Measures. (Invited talk) » Statistical network modeling has focused on representing the graph as a discrete structure, namely the adjacency matrix. Assuming exchangeability of this array, the Aldous-Hoover theorem informs us that the graph is necessarily either dense or empty. We instead consider representing the graph as a point process on the positive quadrant. We then propose a graph construction leveraging completely random measures (CRMs) that leads to an exchangeable point process representation of graphs ranging from dense to sparse and exhibiting power-law degree distributions. We show how these properties are simply tuned by three hyperparameters. The resulting model lends itself to an efficient MCMC scheme from which we can infer these network attributes. We demonstrate our methods on a series of real-world networks with up to hundreds of thousands of nodes and millions of edges. We also discuss some recent advances in this area and open challenges. Joint work with Francois Caron. Link » Emily Fox Sat 7:45 a.m. - 9:00 a.m. [iCal] Coffee break + posters

#### Author Information

##### Arthur Gretton (Gatsby Unit, UCL)

Arthur Gretton is a Professor with the Gatsby Computational Neuroscience Unit at UCL. He received degrees in Physics and Systems Engineering from the Australian National University, and a PhD with Microsoft Research and the Signal Processing and Communications Laboratory at the University of Cambridge. He previously worked at the MPI for Biological Cybernetics, and at the Machine Learning Department, Carnegie Mellon University. Arthur's recent research interests in machine learning include the design and training of generative models, both implicit (e.g. GANs) and explicit (high/infinite dimensional exponential family models), nonparametric hypothesis testing, and kernel methods. He has been an associate editor at IEEE Transactions on Pattern Analysis and Machine Intelligence from 2009 to 2013, an Action Editor for JMLR since April 2013, an Area Chair for NeurIPS in 2008 and 2009, a Senior Area Chair for NeurIPS in 2018, an Area Chair for ICML in 2011 and 2012, and a member of the COLT Program Committee in 2013. Arthur was program chair for AISTATS in 2016 (with Christian Robert), tutorials chair for ICML 2018 (with Ruslan Salakhutdinov), workshops chair for ICML 2019 (with Honglak Lee), program chair for the Dali workshop in 2019 (with Krikamol Muandet and Shakir Mohammed), and co-organsier of the Machine Learning Summer School 2019 in London (with Marc Deisenroth).

##### Zoltán Szabó (École Polytechnique)

[Homepage](http://www.cmap.polytechnique.fr/~zoltan.szabo/)