Timezone: »
The notion of task similarity is at the core of various machine learning paradigms, such as domain adaptation and meta-learning. Current methods to quantify it are often heuristic, make strong assumptions on the label sets across the tasks, and many are architecture-dependent, relying on task-specific optimal parameters (e.g., require training a model on each dataset). In this work we propose an alternative notion of distance between datasets that (i) is model-agnostic, (ii) does not involve training, (iii) can compare datasets even if their label sets are completely disjoint and (iv) has solid theoretical footing. This distance relies on optimal transport, which provides it with rich geometry awareness, interpretable correspondences and well-understood properties. Our results show that this novel distance provides meaningful comparison of datasets, and correlates well with transfer learning hardness across various experimental settings and datasets.
Author Information
David Alvarez-Melis (Microsoft Research)
Nicolo Fusi (Microsoft Research)
More from the Same Authors
-
2021 : Optimizing Functionals on the Space of Probabilities with Input Convex Neural Network »
David Alvarez-Melis · Yair Schiff · Youssef Mroueh -
2021 : Optimizing Functionals on the Space of Probabilities with Input Convex Neural Network »
David Alvarez-Melis · Yair Schiff · Youssef Mroueh -
2022 : Neural Unbalanced Optimal Transport via Cycle-Consistent Semi-Couplings »
Frederike Lübeck · Charlotte Bunne · Gabriele Gut · Jacobo Sarabia del Castillo · Lucas Pelkmans · David Alvarez-Melis -
2022 : Neural Unbalanced Optimal Transport via Cycle-Consistent Semi-Couplings »
Frederike Lübeck · Charlotte Bunne · Gabriele Gut · Jacobo Sarabia del Castillo · Lucas Pelkmans · David Alvarez-Melis -
2022 Spotlight: Are GANs overkill for NLP? »
David Alvarez-Melis · Vikas Garg · Adam Kalai -
2022 : Generating Synthetic Datasets by Interpolating along Generalized Geodesics »
Jiaojiao Fan · David Alvarez-Melis -
2022 Poster: Rapid Model Architecture Adaption for Meta-Learning »
Yiren Zhao · Xitong Gao · I Shumailov · Nicolo Fusi · Robert Mullins -
2022 Poster: Are GANs overkill for NLP? »
David Alvarez-Melis · Vikas Garg · Adam Kalai -
2019 : Poster session »
Jindong Gu · Alice Xiang · Atoosa Kasirzadeh · Zhiwei Han · Omar U. Florez · Frederik Harder · An-phi Nguyen · Amir Hossein Akhavan Rahnama · Michele Donini · Dylan Slack · Junaid Ali · Paramita Koley · Michiel Bakker · Anna Hilgard · Hailey James · Gonzalo Ramos · Jialin Lu · Jingying Yang · Margarita Boyarskaya · Martin Pawelczyk · Kacper Sokol · Mimansa Jaiswal · Umang Bhatt · David Alvarez-Melis · Aditya Grover · Charles Marx · Mengjiao (Sherry) Yang · Jingyan Wang · Gökhan Çapan · Hanchen Wang · Steffen Grünewälder · Moein Khajehnejad · Gourab Patro · Russell Kunes · Samuel Deng · Yuanting Liu · Luca Oneto · Mengze Li · Thomas Weber · Stefan Matthes · Duy Patrick Tu -
2018 : Poster spotlight #2 »
Nicolo Fusi · Chidubem Arachie · Joao Monteiro · Steffen Wolf -
2018 Poster: Gaussian Process Prior Variational Autoencoders »
Francesco Paolo Casale · Adrian Dalca · Luca Saglietti · Jennifer Listgarten · Nicolo Fusi -
2018 Poster: Probabilistic Matrix Factorization for Automated Machine Learning »
Nicolo Fusi · Rishit Sheth · Melih Elibol -
2018 Poster: Towards Robust Interpretability with Self-Explaining Neural Networks »
David Alvarez-Melis · Tommi Jaakkola -
2017 Workshop: Machine Learning in Computational Biology »
James Zou · Anshul Kundaje · Gerald Quon · Nicolo Fusi · Sara Mostafavi -
2017 : Structured Optimal Transport (with T. Jaakkola, S. Jegelka) »
David Alvarez-Melis -
2016 Workshop: Machine Learning in Computational Biology »
Gerald Quon · Sara Mostafavi · James Y Zou · Barbara Engelhardt · Oliver Stegle · Nicolo Fusi -
2015 Workshop: Machine Learning in Computational Biology »
Nicolo Fusi · Anna Goldenberg · Sara Mostafavi · Gerald Quon · Oliver Stegle