Timezone: »
Poster
Robust Model Selection and Nearly-Proper Learning for GMMs
Allen Liu · Jerry Li · Ankur Moitra
In learning theory, a standard assumption is that the data is generated from a finite mixture model. But what happens when the number of components is not known in advance? The problem of estimating the number of components, also called model selection, is important in its own right but there are essentially no known efficient algorithms with provable guarantees. In this work, we study the problem of model selection for univariate Gaussian mixture models (GMMs). Given $\textsf{poly}(k/\epsilon)$ samples from a distribution that is $\epsilon$-close in TV distance to a GMM with $k$ components, we can construct a GMM with $\widetilde{O}(k)$ components that approximates the distribution to within $\widetilde{O}(\epsilon)$ in $\textsf{poly}(k/\epsilon)$ time. Thus we are able to approximately determine the minimum number of components needed to fit the distribution within a logarithmic factor. Moreover, by adapting the techniques we obtain similar results for reconstructing Fourier-sparse signals. Prior to our work, the only known algorithms for learning arbitrary univariate GMMs either output significantly more than $k$ components (e.g. $k/\epsilon^2$ components for kernel density estimates) or run in time exponential in $k$.
Author Information
Allen Liu (MIT)
Jerry Li (Microsoft)
Ankur Moitra (MIT)
More from the Same Authors
-
2021 Spotlight: List-Decodable Mean Estimation in Nearly-PCA Time »
Ilias Diakonikolas · Daniel Kane · Daniel Kongsgaard · Jerry Li · Kevin Tian -
2022 : Semi-Random Sparse Recovery in Nearly-Linear Time »
Jonathan Kelner · Jerry Li · Allen Liu · Aaron Sidford · Kevin Tian -
2022 : Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions »
Sitan Chen · Sinho Chewi · Jerry Li · Yuanzhi Li · Adil Salim · Anru Zhang -
2022 : REAP: A Large-Scale Realistic Adversarial Patch Benchmark »
Nabeel Hingun · Chawin Sitawarin · Jerry Li · David Wagner -
2022 : Poster Session 1 »
Andrew Lowy · Thomas Bonnier · Yiling Xie · Guy Kornowski · Simon Schug · Seungyub Han · Nicolas Loizou · xinwei zhang · Laurent Condat · Tabea E. Röber · Si Yi Meng · Marco Mondelli · Runlong Zhou · Eshaan Nichani · Adrian Goldwaser · Rudrajit Das · Kayhan Behdin · Atish Agarwala · Mukul Gagrani · Gary Cheng · Tian Li · Haoran Sun · Hossein Taheri · Allen Liu · Siqi Zhang · Dmitrii Avdiukhin · Bradley Brown · Miaolan Xie · Junhyung Lyle Kim · Sharan Vaswani · Xinmeng Huang · Ganesh Ramachandra Kini · Angela Yuan · Weiqiang Zheng · Jiajin Li -
2022 Poster: Polynomial time guarantees for the Burer-Monteiro method »
Diego Cifuentes · Ankur Moitra -
2022 Poster: Learning in Observable POMDPs, without Computationally Intractable Oracles »
Noah Golowich · Ankur Moitra · Dhruv Rohatgi -
2022 Poster: Learning (Very) Simple Generative Models Is Hard »
Sitan Chen · Jerry Li · Yuanzhi Li -
2021 Poster: Margin-Independent Online Multiclass Learning via Convex Geometry »
Guru Guruganesh · Allen Liu · Jon Schneider · Joshua Wang -
2021 Poster: Robust Regression Revisited: Acceleration and Improved Estimation Rates »
Arun Jambulapati · Jerry Li · Tselil Schramm · Kevin Tian -
2021 Poster: List-Decodable Mean Estimation in Nearly-PCA Time »
Ilias Diakonikolas · Daniel Kane · Daniel Kongsgaard · Jerry Li · Kevin Tian -
2021 Poster: A No-go Theorem for Robust Acceleration in the Hyperbolic Plane »
Linus Hamilton · Ankur Moitra -
2020 Poster: Learning Some Popular Gaussian Graphical Models without Condition Number Bounds »
Jonathan Kelner · Frederic Koehler · Raghu Meka · Ankur Moitra -
2020 Spotlight: Learning Some Popular Gaussian Graphical Models without Condition Number Bounds »
Jonathan Kelner · Frederic Koehler · Raghu Meka · Ankur Moitra -
2020 Poster: Tensor Completion Made Practical »
Allen Liu · Ankur Moitra -
2020 Poster: Robust Gaussian Covariance Estimation in Nearly-Matrix Multiplication Time »
Jerry Li · Guanghao Ye -
2020 Poster: Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing »
Arun Jambulapati · Jerry Li · Kevin Tian -
2020 Spotlight: Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing »
Arun Jambulapati · Jerry Li · Kevin Tian -
2020 Poster: Robust and Heavy-Tailed Mean Estimation Made Simple, via Regret Minimization »
Sam Hopkins · Jerry Li · Fred Zhang -
2020 Poster: Myersonian Regression »
Allen Liu · Renato Leme · Jon Schneider -
2020 Poster: Classification Under Misspecification: Halfspaces, Generalized Linear Models, and Evolvability »
Sitan Chen · Frederic Koehler · Ankur Moitra · Morris Yau -
2020 Poster: Learning Structured Distributions From Untrusted Batches: Faster and Simpler »
Sitan Chen · Jerry Li · Ankur Moitra -
2020 Spotlight: Classification Under Misspecification: Halfspaces, Generalized Linear Models, and Evolvability »
Sitan Chen · Frederic Koehler · Ankur Moitra · Morris Yau -
2019 Poster: Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers »
Hadi Salman · Jerry Li · Ilya Razenshteyn · Pengchuan Zhang · Huan Zhang · Sebastien Bubeck · Greg Yang -
2019 Spotlight: Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers »
Hadi Salman · Jerry Li · Ilya Razenshteyn · Pengchuan Zhang · Huan Zhang · Sebastien Bubeck · Greg Yang -
2019 Poster: Quantum Entropy Scoring for Fast Robust Mean Estimation and Improved Outlier Detection »
Yihe Dong · Samuel Hopkins · Jerry Li -
2019 Spotlight: Quantum Entropy Scoring for Fast Robust Mean Estimation and Improved Outlier Detection »
Yihe Dong · Samuel Hopkins · Jerry Li -
2017 Poster: Information Theoretic Properties of Markov Random Fields, and their Algorithmic Applications »
Linus Hamilton · Frederic Koehler · Ankur Moitra