Timezone: »
Estimating properties of discrete distributions is a fundamental problem in statistical learning. We design the first unified, linear-time, competitive, property estimator that for a wide class of properties and for all underlying distributions uses just 2n samples to achieve the performance attained by the empirical estimator with n\sqrt{\log n} samples. This provides off-the-shelf, distribution-independent, ``amplification'' of the amount of data available relative to common-practice estimators.
We illustrate the estimator's practical advantages by comparing it to existing estimators for a wide variety of properties and distributions. In most cases, its performance with n samples is even as good as that of the empirical estimator with n\log n samples, and for essentially all properties, its performance is comparable to that of the best existing estimator designed specifically for that property.
Author Information
Yi Hao (University of California, San Diego)
Fifth-year Ph.D. student supervised by Prof. Alon Orlitsky at UC San Diego. Broadly interested in Machine Learning, Learning Theory, Algorithm Design, Symbolic and Numerical Optimization. Seeking a summer 2020 internship in Data Science and Machine Learning.
Alon Orlitsky (University of California, San Diego)
Ananda Theertha Suresh (Google)
Yihong Wu (Yale University)
More from the Same Authors
-
2020 Poster: Linear-Sample Learning of Low-Rank Distributions »
Ayush Jain · Alon Orlitsky -
2020 Poster: A General Method for Robust Learning from Batches »
Ayush Jain · Alon Orlitsky -
2020 Poster: Optimal Prediction of the Number of Unseen Species with Multiplicity »
Yi Hao · Ping Li -
2020 Poster: SURF: A Simple, Universal, Robust, Fast Distribution Learning Algorithm »
Yi Hao · Ayush Jain · Alon Orlitsky · Vaishakh Ravindrakumar -
2020 Spotlight: Optimal Prediction of the Number of Unseen Species with Multiplicity »
Yi Hao · Ping Li -
2020 Poster: Profile Entropy: A Fundamental Measure for the Learnability and Compressibility of Distributions »
Yi Hao · Alon Orlitsky -
2020 Poster: Learning discrete distributions: user vs item-level privacy »
Yuhan Liu · Ananda Theertha Suresh · Felix Xinnan Yu · Sanjiv Kumar · Michael D Riley -
2019 Poster: Unified Sample-Optimal Property Estimation in Near-Linear Time »
Yi Hao · Alon Orlitsky -
2019 Poster: The Broad Optimality of Profile Maximum Likelihood »
Yi Hao · Alon Orlitsky -
2019 Spotlight: The Broad Optimality of Profile Maximum Likelihood »
Yi Hao · Alon Orlitsky -
2019 Poster: Sampled Softmax with Random Fourier Features »
Ankit Singh Rawat · Jiecao Chen · Felix Xinnan Yu · Ananda Theertha Suresh · Sanjiv Kumar -
2019 Poster: Differentially Private Anonymized Histograms »
Ananda Theertha Suresh -
2018 Poster: On Learning Markov Chains »
Yi Hao · Alon Orlitsky · Venkatadheeraj Pichapati -
2018 Poster: Entropy Rate Estimation for Markov Chains with Large State Space »
Yanjun Han · Jiantao Jiao · Chuan-Zheng Lee · Tsachy Weissman · Yihong Wu · Tiancheng Yu -
2018 Spotlight: Entropy Rate Estimation for Markov Chains with Large State Space »
Yanjun Han · Jiantao Jiao · Chuan-Zheng Lee · Tsachy Weissman · Yihong Wu · Tiancheng Yu -
2018 Poster: cpSGD: Communication-efficient and differentially-private distributed SGD »
Naman Agarwal · Ananda Theertha Suresh · Felix Xinnan Yu · Sanjiv Kumar · Brendan McMahan -
2018 Spotlight: cpSGD: Communication-efficient and differentially-private distributed SGD »
Naman Agarwal · Ananda Theertha Suresh · Felix Xinnan Yu · Sanjiv Kumar · Brendan McMahan -
2017 Poster: Multiscale Quantization for Fast Similarity Search »
Xiang Wu · Ruiqi Guo · Ananda Theertha Suresh · Sanjiv Kumar · Daniel Holtmann-Rice · David Simcha · Felix Yu -
2017 Poster: The power of absolute discounting: all-dimensional distribution estimation »
Moein Falahatgar · Mesrob Ohannessian · Alon Orlitsky · Venkatadheeraj Pichapati -
2017 Poster: Model-Powered Conditional Independence Test »
Rajat Sen · Ananda Theertha Suresh · Karthikeyan Shanmugam · Alexandros Dimakis · Sanjay Shakkottai -
2017 Poster: Maxing and Ranking with Few Assumptions »
Moein Falahatgar · Yi Hao · Alon Orlitsky · Venkatadheeraj Pichapati · Vaishakh Ravindrakumar -
2016 Poster: Near-Optimal Smoothing of Structured Conditional Probability Matrices »
Moein Falahatgar · Mesrob Ohannessian · Alon Orlitsky -
2015 Poster: Competitive Distribution Estimation: Why is Good-Turing Good »
Alon Orlitsky · Ananda Theertha Suresh -
2015 Oral: Competitive Distribution Estimation: Why is Good-Turing Good »
Alon Orlitsky · Ananda Theertha Suresh -
2014 Poster: Near-Optimal-Sample Estimators for Spherical Gaussian Mixtures »
Ananda Theertha Suresh · Alon Orlitsky · Jayadev Acharya · Ashkan Jafarpour -
2012 Poster: Tight Bounds on Redundancy and Distinguishability of Label-Invariant Distributions »
Jayadev Acharya · Hirakendu Das · Alon Orlitsky