Poster
Testing Closeness With Unequal Sized Samples
Bhaswar Bhattacharya · Gregory Valiant
210 C #66
[
Abstract
]
Abstract:
We consider the problem of testing whether two unequal-sized samples were drawn from identical distributions, versus distributions that differ significantly. Specifically, given a target error parameter , independent draws from an unknown distribution with discrete support, and draws from an unknown distribution of discrete support, we describe a test for distinguishing the case that from the case that . If and are supported on at most elements, then our test is successful with high probability provided and We show that this tradeoff is information theoretically optimal throughout this range, in the dependencies on all parameters, and , to constant factors. As a consequence, we obtain an algorithm for estimating the mixing time of a Markov chain on states up to a factor that uses queries to a next node'' oracle. The core of our testing algorithm is a relatively simple statistic that seems to perform well in practice, both on synthetic data and on natural language data. We believe that this statistic might prove to be a useful primitive within larger machine learning and natural language processing systems.
Live content is unavailable. Log in and register to view live content