Timezone: »
Life-logging video streams, financial time series, and Twitter tweets are a few examples of high-dimensional signals over practically unbounded time. We consider the problem of computing optimal segmentation of such signals by k-piecewise linear function, using only one pass over the data by maintaining a coreset for the signal. The coreset enables fast further analysis such as automatic summarization and analysis of such signals. A coreset (core-set) is a compact representation of the data seen so far, which approximates the data well for a specific task -- in our case, segmentation of the stream. We show that, perhaps surprisingly, the segmentation problem admits coresets of cardinality only linear in the number of segments k, independently of both the dimension d of the signal, and its number n of points. More precisely, we construct a representation of size O(klog n /eps^2) that provides a (1+eps)-approximation for the sum of squared distances to any given k-piecewise linear function. Moreover, such coresets can be constructed in a parallel streaming approach. Our results rely on a novel eduction of statistical estimations to problems in computational geometry. We empirically evaluate our algorithms on very large synthetic and real data sets from GPS, video and financial domains, using 255 machines in Amazon cloud.
Author Information
Guy Rosman (Massachusetts Institute of Technology)
Mikhail Volkov (Massachusetts Institute of Technology)
Dan Feldman (Massachusetts Institute of Technology)
John Fisher III (MIT)
Daniela Rus (Massachusetts Institute of Technology)
More from the Same Authors
-
2020 Poster: Sequential Bayesian Experimental Design with Variable Cost Structure »
Sue Zheng · David Hayden · Jason Pacheco · John Fisher III -
2020 Poster: Deep Evidential Regression »
Alexander Amini · Wilko Schwarting · Ava P Soleimany · Daniela Rus -
2020 Poster: Belief-Dependent Macro-Action Discovery in POMDPs using the Value of Information »
Genevieve Flaspohler · Nicholas Roy · John Fisher III -
2019 Poster: Learning-In-The-Loop Optimization: End-To-End Control And Co-Design Of Soft Robots Through Learned Deep Latent Representations »
Andrew Spielberg · Allan Zhao · Yuanming Hu · Tao Du · Wojciech Matusik · Daniela Rus -
2016 Poster: Dimensionality Reduction of Massive Sparse Datasets Using Coresets »
Dan Feldman · Mikhail Volkov · Daniela Rus -
2015 Poster: Streaming, Distributed Variational Inference for Bayesian Nonparametrics »
Trevor Campbell · Julian Straub · John Fisher III · Jonathan How -
2015 Poster: Probabilistic Variational Bounds for Graphical Models »
Qiang Liu · John Fisher III · Alexander Ihler -
2014 Poster: Parallel Sampling of HDPs using Sub-Cluster Splits »
Jason Chang · John Fisher III -
2013 Poster: Parallel Sampling of DP Mixture Models using Sub-Cluster Splits »
Jason Chang · John Fisher III -
2012 Workshop: Bayesian Nonparametric Models For Reliable Planning And Decision-Making Under Uncertainty »
Jonathan How · Lawrence Carin · John Fisher III · Michael Jordan · Alborz Geramifard -
2012 Poster: Coupling Nonparametric Mixtures via Latent Dirichlet Processes »
Dahua Lin · John Fisher III -
2011 Oral: Scalable Training of Mixture Models via Coresets »
Dan Feldman · Matthew Faulkner · Andreas Krause -
2011 Poster: Scalable Training of Mixture Models via Coresets »
Dan Feldman · Matthew Faulkner · Andreas Krause -
2010 Oral: Construction of Dependent Dirichlet Processes based on Poisson Processes »
Dahua Lin · Eric Grimson · John Fisher III -
2010 Poster: Construction of Dependent Dirichlet Processes based on Poisson Processes »
Dahua Lin · Eric Grimson · John Fisher III