Poster
Coreset for Line-Sets Clustering
Sagi Lotan · Ernesto Evgeniy Sanches Shayda · Dan Feldman
Hall J (level 1) #818
Abstract:
The input to the {line-sets k-median} problem is an integer k≥1, and a set L={L1,…,Ln}that contains n sets of lines in Rd. The goal is to compute a set C of k centers (points in Rd) that minimizes the sum ∑L∈Lminℓ∈L,c∈Cdist(ℓ,c) of Euclidean distances from each set to its closest center, where dist(ℓ,c):=minx∈ℓ\normx−c2.An \emph{ε-coreset} for this problem is a weighted subset of sets in L that approximates this sum up to 1±ε multiplicative factor, for every set C of k centers. We prove that \emph{every} such input set \setL has a small ε-coreset, and provide the first coreset construction for this problem and its variants. The coreset consists of O(log2n) weighted line-sets from \setL, and is constructed in O(nlogn) time for every fixed d,k≥1 and ε∈(0,1). The main technique is based on a novel reduction to a fair clustering'' of colored points to colored centers. We then provide a coreset for this coloring problem, which may be of independent interest. Open source code and experiments are also provided.
Chat is not available.