Poster
Hypothesis Selection with Memory Constraints
Maryam Aliakbarpour · Mark Bun · Adam Smith
Great Hall & Hall B1+B2 (level 1) #2027
Abstract:
Hypothesis selection is a fundamental problem in learning theory and statistics. Given a dataset and a finite set of candidate distributions, the goal is to select a distribution that matches the data as well as possible. More specifically, suppose we have sample access to an unknown distribution over a domain that we know is well-approximated by one of a a class of distributions (a.k.a. hypotheses), . The goal is to design an algorithm that outputs a distribution whose total variation distance from is nearly minimal.In this work, we study the hypothesis selection problem under memory constraints. We consider a model where samples from are presented in a stream and we access each sample via PDF-comparison'' queries that allow us to compare the probability densities of any pair of hypothesesat the domain point (i.e., is ?). This model allows us to study how much memory is needed at any point in time to store information about the portion of the stream seen so far.Our main result is an algorithm that achieves a nearly optimal tradeoff between memory usage and the number of samples required. In particular, given bits of memory (for roughly between and ), our algorithm solves the hypothesis selection problem with samples, where . This result is optimal up to an factor, for all .
Chat is not available.