Skip to yearly menu bar Skip to main content


Poster

Optimal Subsampling with Influence Functions

Daniel Ting · Eric Brochu

Room 210 #76

Keywords: [ Efficient Training Methods ] [ Frequentist Statistics ] [ Stochastic Methods ]


Abstract:

Subsampling is a common and often effective method to deal with the computational challenges of large datasets. However, for most statistical models, there is no well-motivated approach for drawing a non-uniform subsample. We show that the concept of an asymptotically linear estimator and the associated influence function leads to asymptotically optimal sampling probabilities for a wide class of popular models. This is the only tight optimality result for subsampling we are aware of as other methods only provide probabilistic error bounds or optimal rates. Furthermore, for linear regression models, which have well-studied procedures for non-uniform subsampling, we empirically show our optimal influence function based method outperforms previous approaches even when using approximations to the optimal probabilities.

Live content is unavailable. Log in and register to view live content