Timezone: »
Poster
Coresets for Classification – Simplified and Strengthened
Tung Mai · Cameron Musco · Anup Rao
We give relative error coresets for training linear classifiers with a broad class of loss functions, including the logistic loss and hinge loss. Our construction achieves $(1\pm \epsilon)$ relative error with $\tilde O(d \cdot \mu_y(X)^2/\epsilon^2)$ points, where $\mu_y(X)$ is a natural complexity measure of the data matrix $X \in \mathbb{R}^{n \times d}$ and label vector $y \in \{-1,1\}^n$, introduced by Munteanu et al. 2018. Our result is based on subsampling data points with probabilities proportional to their $\ell_1$ $Lewis$ $weights$. It significantly improves on existing theoretical bounds and performs well in practice, outperforming uniform subsampling along with other importance sampling methods. Our sampling distribution does not depend on the labels, so can be used for active learning. It also does not depend on the specific loss function, so a single coreset can be used in multiple training scenarios.
Author Information
Tung Mai (Adobe Research)
Cameron Musco (University of Massachusetts Amherst)
Anup Rao (Adobe)
More from the Same Authors
-
2023 Poster: No-regret Algorithms for Fair Resource Allocation »
Abhishek Sinha · Ativ Joshi · Rajarshi Bhattacharjee · Cameron Musco · Mohammad Hajiesmaili -
2023 Poster: Exact Representation of Sparse Networks with Symmetric Nonnegative Embeddings »
Sudhanshu Chanpuriya · Ryan Rossi · Anup Rao · Tung Mai · Nedim Lipka · Zhao Song · Cameron Musco -
2023 Poster: Finite Population Regression Adjustment and Non-asymptotic Guarantees for Treatment Effect Estimation »
Mehrdad Ghadiri · David Arbour · Tung Mai · Cameron Musco · Anup Rao -
2022 Spotlight: Kernel Interpolation with Sparse Grids »
Mohit Yadav · Daniel Sheldon · Cameron Musco -
2022 Poster: Kernel Interpolation with Sparse Grids »
Mohit Yadav · Daniel Sheldon · Cameron Musco -
2022 Poster: Modeling Transitivity and Cyclicity in Directed Graphs via Binary Code Box Embeddings »
Dongxu Zhang · Michael Boratko · Cameron Musco · Andrew McCallum -
2022 Poster: Simplified Graph Convolution with Heterophily »
Sudhanshu Chanpuriya · Cameron Musco -
2022 Poster: Sample Constrained Treatment Effect Estimation »
Raghavendra Addanki · David Arbour · Tung Mai · Cameron Musco · Anup Rao -
2021 Poster: On the Power of Edge Independent Graph Models »
Sudhanshu Chanpuriya · Cameron Musco · Konstantinos Sotiropoulos · Charalampos Tsourakakis -
2020 Poster: Fourier Sparse Leverage Scores and Approximate Kernel Learning »
Tamas Erdelyi · Cameron Musco · Christopher Musco -
2020 Spotlight: Fourier Sparse Leverage Scores and Approximate Kernel Learning »
Tamas Erdelyi · Cameron Musco · Christopher Musco -
2020 Poster: Node Embeddings and Exact Low-Rank Representations of Complex Networks »
Sudhanshu Chanpuriya · Cameron Musco · Konstantinos Sotiropoulos · Charalampos Tsourakakis -
2019 Poster: Toward a Characterization of Loss Functions for Distribution Learning »
Nika Haghtalab · Cameron Musco · Bo Waggoner