NeurIPS Poster On Generalization Bounds for Projective Clustering

Poster

On Generalization Bounds for Projective Clustering

Maria Sofia Bucarelli · Matilde Larsen · Chris Schwiegelshohn · Mads Toftrup

Great Hall & Hall B1+B2 (level 1) #1821

[ Abstract ]

[ Paper] [ Poster] [ OpenReview]

Abstract: Given a set of points, clustering consists of finding a partition of a point set into

k

$k$ clusters such that the center to which a point is assigned is as close as possible. Most commonly, centers are points themselves, which leads to the famous

k

$k$ -median and

k

$k$ -means objectives. One may also choose centers to be

j

$j$ dimensional subspaces, which gives rise to subspace clustering. In this paper, we consider learning bounds for these problems. That is, given a set of

n

$n$ samples

P

$P$ drawn independently from some unknown, but fixed distribution

D

$\mathcal{D}$ , how quickly does a solution computed on

P

$P$ converge to the optimal clustering of

D

$\mathcal{D}$ ?We give several near optimal results. In particular, 1. For center-based objectives, we show a convergence rate of

\tilde{O} (\sqrt{k / n})

$\tilde{O}\left(\sqrt{{k}/{n}}\right)$ . This matches the known optimal bounds of [Fefferman, Mitter, and Narayanan, Journal of the Mathematical Society 2016] and [Bartlett, Linder, and Lugosi, IEEE Trans. Inf. Theory 1998] for

k

$k$ -means and extends it to other important objectives such as

k

$k$ -median. 2. For subspace clustering with

j

$j$ -dimensional subspaces, we show a convergence rate of

\tilde{O} (\sqrt{(k j^{2}) / n})

$\tilde{O}\left(\sqrt{{(kj^2)}/{n}}\right)$ . These are the first provable bounds for most of these problems. For the specific case of projective clustering, which generalizes

k

$k$ -means, we show a converge rate of

Ω (\sqrt{(k j) / n})

$\Omega\left(\sqrt{{(kj)}/{n}}\right)$ is necessary, thereby proving that the bounds from [Fefferman, Mitter, and Narayanan, Journal of the Mathematical Society 2016] are essentially optimal.

Chat is not available.