NeurIPS Poster Optimal Hypothesis Selection in (Almost) Linear Time

Poster

Optimal Hypothesis Selection in (Almost) Linear Time

Maryam Aliakbarpour · Mark Bun · Adam Smith

West Ballroom A-D #5710

[ Abstract ]

[ Paper] [ Poster] [ OpenReview]

Thu 12 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract: Hypothesis selection, also known as density estimation, is a fundamental problem in statistics and learning theory. Suppose we are given a sample set from an unknown distribution

P

$P$ and a finite class of candidate distributions (called hypotheses)

H \coloneqq {H_{1}, H_{2}, \dots, H_{n}}

$\mathcal{H} \coloneqq \{H_1, H_2, \ldots, H_n\}$ . The aim is to design an algorithm that selects a distribution

\hat{H}

$\hat H$ in

H

$\mathcal{H}$ that best fits the data. The algorithm's accuracy is measured based on the distance between

\hat{H}

$\hat{H}$ and

P

$P$ compared to the distance of the closest distribution in

H

$\mathcal{H}$ to

P

$P$ (denoted by

O P T

$OPT$ ). Concretely, we aim for

‖ \hat{H} - P ‖_{T V}

$\|\hat{H} - P\|_{TV}$ to be at most

α \cdot O P T + ϵ

$\alpha \cdot OPT + \epsilon$ for some small

ϵ

$\epsilon$ and

α

$\alpha$ . While it is possible to decrease the value of

ϵ

$\epsilon$ as the number of samples increases,

α

$\alpha$ is an inherent characteristic of the algorithm. In fact, one cannot hope to achieve

α < 3

$\alpha < 3$ even when there are only two candidate hypotheses, unless the number of samples is proportional to the domain size of

P

$P$ [Bousquet, Kane, Moran '19]. Finding the best

α

$\alpha$ has been one of the main focuses of studies of the problem since early work of [Devroye, Lugosi '01]. Prior to our work, no algorithm was known that achieves

α = 3

$\alpha = 3$ in near-linear time. We provide the first algorithm that operates in almost linear time (

\tilde{O} (n / ϵ^{3})

$\tilde{O}(n/\epsilon^3)$ time) and achieves

α = 3

$\alpha = 3$ . This result improves upon a long list of results in hypothesis selection. Previously known algorithms either had worse time complexity, a larger factor

α

$\alpha$ , or extra assumptions about the problem setting.In addition to this algorithm, we provide another (almost) linear-time algorithm with better dependency on the additive accuracy parameter

ϵ

$\epsilon$ , albeit with a slightly worse accuracy parameter,

α = 4

$\alpha = 4$ .

Chat is not available.