Modern Nonparametric Methods in Machine Learning
Sivaraman Balakrishnan · Arthur Gretton · Mladen Kolar · John Lafferty · Han Liu · Tong Zhang

Fri Dec 7th 07:30 AM -- 06:30 PM @ Sand Harbor 3, Harrah’s Special Events Center 2nd Floor
Event URL: »

The objective of this workshop is to bring together practitioners and theoreticians who are interested in developing scalable and principled nonparametric learning algorithms for analyzing complex and large-scale datasets. The workshop will communicate the newest research results and attack several important bottlenecks of nonparametric learning by exploring (i) new models and methods that enable high-dimensional nonparametric learning, (ii) new computational techniques that enable scalable nonparametric learning in online and parallel fashion, and (iii) new statistical theory that characterizes the performance and information-theoretic limits of nonparametric learning algorithms. The expected goals of this workshop include (i) reporting the state-of-the-art of modern nonparametrics, (ii) identifying major challenges and setting up the frontiers for nonparametric methods, (iii) connecting different disjoint communities in machine learning and statistics. The targeted application areas include genomics, cognitive neuroscience, climate science, astrophysics, and natural language processing.

Modern data acquisition routinely produces massive and complex datasets, including chip data from high throughput genomic experiments, image data from functional Magnetic Resonance Imaging (fMRI), proteomic data from tandem mass spectrometry analysis, and climate data from geographically distributed data centers. Existing high dimensional theories and learning algorithms rely heavily on parametric models, which assume the data come from an underlying distribution (e.g. Gaussian or linear models) that can be characterized by a finite number of parameters. If these assumptions are correct, accurate and precise estimates can be expected. However, given the increasing complexity of modern scientific datasets, conclusions inferred under these restrictive assumptions can be misleading. To handle this challenge, this workshop focuses on nonparametric methods, which directly conduct inference in infinite-dimensional spaces and thus are powerful enough to capture the subtleties in most modern applications.

We are targeting submissions in a variety of areas. Potential topics include, but are not limited to, the following areas where high dimensional nonparametric methods have found past success:
1. Nonparametric graphical models are a flexible way to model continuous distributions. For example, copulas can be used to separate the dependency structure between random variables from their marginal distributions (Liu et al. 2009). Fully nonparametric model of networks can be obtained using kernel density estimation and restricting the graphs to trees and forests (Liu et al. 2011).
2. Causal inference using kernel-based conditional independence testing is a nonparametric method, which improves a lot over previous approaches to estimate or test for conditional independence (Zhang et al. 2012).
3. Sparse additive models are used in many applications where linear regression models do not provide enough flexibility (Lin and Zhang, 2006), (Koltchinskii and Yuan, 2010), (Huang et al. 2010), (Ravikumar et al. 2009), (Meier et al. 2009).
4. Nonparametric methods are used to consistently estimate a large class of divergence measures, which have a wide range of applications (Poczos and Schneider, 2011).
5. Recently sparse matrix decompositions (Witten et al., 2009) were proposed as exploratory data analysis tools for high dimensional genomic data. Motivated by the need for additional modelling flexibility, sparse nonparametric generalizations of these matrix decompositions have been introduced (Balakrishnan et al., 2012).
6. Nonparametric learning promises flexibility, where flexible methods minimize assumptions such as linearity and Gaussianity that are often made only for convenience, or lack of alternatives. However, nonparametric estimation often comes with increased computational demands. To develop algorithms that are applicable on large-scale data, we need to take advantage of parallel computation. Promising parallel computing techniques include GPU programming, multi-core computing, and cloud computing.

[1] Francis R. Bach and Michael I. Jordan. Kernel independent component analysis. JMLR, 3:1–48, March 2003.
[2] Sivaraman Balakrishnan, Kriti Puniyani, and John Lafferty. Sparse additive functional and kernel CCA. ICML, 2012.
[3] Jian Huang, Joel L. Horowitz, and Fengrong Wei. Variable selection in nonparametric additive models. Ann. Statist., 2010.
[4] Vladimir Koltchinskii and Ming Yuan. Sparsity in multiple kernel learning. Ann. Statist., 2010.
[5] John Lafferty, Han Liu, and Larry Wasserman. Sparse nonparametric graphical models. arXiv:1201.0794v1, 2012.
[6] Yi Lin and Hao Helen Zhang. Component selection and smoothing in multivariate nonparametric regression. Ann. Statist., 2006.
[7] Han Liu, John D. Lafferty, and Larry A. Wasserman. The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. JMLR, 2009.
[8] Han Liu, Min Xu, Haijie Gu, Anupam Gupta, John D. Lafferty, and Larry A. Wasserman. Forest density estimation. JMLR, 2011.
[9] Lukas Meier, Sara van de Geer, and Peter Bu ̈hlmann. High-dimensional additive modeling. Ann. Statist., 2009.
[10] B. Poczos and J. Schneider. Nonparametric estimation of conditional information and divergences. AISTATS, 2012.
[11] Garvesh Raskutti, Martin Wainwright, and Bin Yu. Minimax-optimal rates for sparse additive models over kernel classes via convex programming. JMLR,2010.
[12] Pradeep Ravikumar, John Lafferty, Han Liu, and Larry Wasserman. Sparse additive models. JRSSB (Statistical Methodology), 2009.
[13] B. Scholkopf, A. Smola, and K.R. Muller. Nonlinear component analysis as a kernel eigenvalue problem. Neural computation, 1998.
[14] Kun Zhang, Jonas Peters, Dominik Janzing, and Bernhard Scholkopf. Kernel-based conditional independence test and application in causal discovery. CoRR, abs/1202.3775, 2012.
[15] Daniela M. Witten, Robert Tibshirani, and Trevor Hastie. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 2009.


Author Information

Sivaraman Balakrishnan (CMU)
Arthur Gretton (Gatsby Unit, UCL)

Arthur Gretton is a Professor with the Gatsby Computational Neuroscience Unit at UCL. He received degrees in Physics and Systems Engineering from the Australian National University, and a PhD with Microsoft Research and the Signal Processing and Communications Laboratory at the University of Cambridge. He previously worked at the MPI for Biological Cybernetics, and at the Machine Learning Department, Carnegie Mellon University. Arthur's recent research interests in machine learning include the design and training of generative models, both implicit (e.g. GANs) and explicit (high/infinite dimensional exponential family models), nonparametric hypothesis testing, and kernel methods. He has been an associate editor at IEEE Transactions on Pattern Analysis and Machine Intelligence from 2009 to 2013, an Action Editor for JMLR since April 2013, an Area Chair for NeurIPS in 2008 and 2009, a Senior Area Chair for NeurIPS in 2018, an Area Chair for ICML in 2011 and 2012, and a member of the COLT Program Committee in 2013. Arthur was program chair for AISTATS in 2016 (with Christian Robert), tutorials chair for ICML 2018 (with Ruslan Salakhutdinov), workshops chair for ICML 2019 (with Honglak Lee), program chair for the Dali workshop in 2019 (with Krikamol Muandet and Shakir Mohammed), and co-organsier of the Machine Learning Summer School 2019 in London (with Marc Deisenroth).

Mladen Kolar (University of Chicago)
John Lafferty (Yale University)
Han Liu (Tencent AI Lab)
Tong Zhang (Tencent)

More from the Same Authors