Poster
Optimal Hypothesis Selection in (Almost) Linear Time
Maryam Aliakbarpour · Mark Bun · Adam Smith
West Ballroom A-D #5710
Abstract:
Hypothesis selection, also known as density estimation, is a fundamental problem in statistics and learning theory. Suppose we are given a sample set from an unknown distribution and a finite class of candidate distributions (called hypotheses) . The aim is to design an algorithm that selects a distribution in that best fits the data. The algorithm's accuracy is measured based on the distance between and compared to the distance of the closest distribution in to (denoted by ). Concretely, we aim for to be at most for some small and . While it is possible to decrease the value of as the number of samples increases, is an inherent characteristic of the algorithm. In fact, one cannot hope to achieve even when there are only two candidate hypotheses, unless the number of samples is proportional to the domain size of [Bousquet, Kane, Moran '19]. Finding the best has been one of the main focuses of studies of the problem since early work of [Devroye, Lugosi '01]. Prior to our work, no algorithm was known that achieves in near-linear time. We provide the first algorithm that operates in almost linear time ( time) and achieves . This result improves upon a long list of results in hypothesis selection. Previously known algorithms either had worse time complexity, a larger factor , or extra assumptions about the problem setting.In addition to this algorithm, we provide another (almost) linear-time algorithm with better dependency on the additive accuracy parameter , albeit with a slightly worse accuracy parameter, .
Chat is not available.