Timezone: »
The ability to measure similarity between documents enables intelligent summarization and analysis of large corpora. Past distances between documents suffer from either an inability to incorporate semantic similarities between words or from scalability issues. As an alternative, we introduce hierarchical optimal transport as a meta-distance between documents, where documents are modeled as distributions over topics, which themselves are modeled as distributions over words. We then solve an optimal transport problem on the smaller topic space to compute a similarity score. We give conditions on the topics under which this construction defines a distance, and we relate it to the word mover's distance. We evaluate our technique for k-NN classification and show better interpretability and scalability with comparable performance to current methods at a fraction of the cost.
Author Information
Mikhail Yurochkin (IBM Research, MIT-IBM Watson AI Lab)
Sebastian Claici (MIT)
Edward Chien (Massachusetts Institute of Technology)
Farzaneh Mirzazadeh (MIT IBM Watson AI Lab)
Justin Solomon (MIT)
More from the Same Authors
-
2021 : Measuring the sensitivity of Gaussian processes to kernel choice »
Will Stephenson · Soumya Ghosh · Tin Nguyen · Mikhail Yurochkin · Sameer Deshpande · Tamara Broderick -
2023 Poster: Self-Consistent Velocity Matching of Probability Flows »
Lingxiao Li · Samuel Hurault · Justin Solomon -
2021 Poster: Does enforcing fairness mitigate biases caused by subpopulation shift? »
Subha Maity · Debarghya Mukherjee · Mikhail Yurochkin · Yuekai Sun -
2021 Poster: Post-processing for Individual Fairness »
Felix Petersen · Debarghya Mukherjee · Yuekai Sun · Mikhail Yurochkin -
2021 Poster: On sensitivity of meta-learning to support data »
Mayank Agarwal · Mikhail Yurochkin · Yuekai Sun -
2021 Poster: Object DGCNN: 3D Object Detection using Dynamic Graphs »
Yue Wang · Justin Solomon -
2021 Poster: Large-Scale Wasserstein Gradient Flows »
Petr Mokrov · Alexander Korotin · Lingxiao Li · Aude Genevay · Justin Solomon · Evgeny Burnaev -
2021 Poster: MarioNette: Self-Supervised Sprite Learning »
Dmitriy Smirnov · MICHAEL GHARBI · Matthew Fisher · Vitor Guizilini · Alexei Efros · Justin Solomon -
2021 Poster: Do Neural Optimal Transport Solvers Work? A Continuous Wasserstein-2 Benchmark »
Alexander Korotin · Lingxiao Li · Aude Genevay · Justin Solomon · Alexander Filippov · Evgeny Burnaev -
2020 Poster: Continuous Regularized Wasserstein Barycenters »
Lingxiao Li · Aude Genevay · Mikhail Yurochkin · Justin Solomon -
2020 Demonstration: IBM Federated Learning Community Edition: An Interactive Demonstration »
Laura Wynter · Chaitanya Kumar · Pengqian Yu · Mikhail Yurochkin · Amogh Tarcar -
2019 Poster: PRNet: Self-Supervised Learning for Partial-to-Partial Registration »
Yue Wang · Justin Solomon -
2019 Poster: Alleviating Label Switching with Optimal Transport »
Pierre Monteiller · Sebastian Claici · Edward Chien · Farzaneh Mirzazadeh · Justin Solomon · Mikhail Yurochkin -
2019 Poster: Scalable inference of topic evolution via models for latent geometric structures »
Mikhail Yurochkin · Zhiwei Fan · Aritra Guha · Paraschos Koutris · XuanLong Nguyen -
2019 Poster: Statistical Model Aggregation via Parameter Matching »
Mikhail Yurochkin · Mayank Agarwal · Soumya Ghosh · Kristjan Greenewald · Nghia Hoang -
2017 Poster: Parallel Streaming Wasserstein Barycenters »
Matt Staib · Sebastian Claici · Justin Solomon · Stefanie Jegelka -
2017 Tutorial: A Primer on Optimal Transport »
Marco Cuturi · Justin Solomon -
2015 Poster: Embedding Inference for Structured Multilabel Prediction »
Farzaneh Mirzazadeh · Siamak Ravanbakhsh · Nan Ding · Dale Schuurmans