Skip to yearly menu bar Skip to main content

Workshop: Optimal Transport and Machine Learning

Applications of Optimal Transport Distances in Unsupervised AutoML

prabhant singh · Joaquin Vanschoren


In this work, we explore the utility of Optimal Transport-based dataset similarity to find similar \textit{unlabeled tabular} datasets, especially in the context of automated machine learning (AutoML) on unsupervised tasks. Since unsupervised tasks don't have a ground truth that optimization techniques can optimize towards, but often do have historical information on which pipelines work best, we propose to meta-learn over prior tasks to transfer useful pipelines to new tasks. Our intuition behind this work is that pipelines that worked well on datasets with a \textit{similar underlying data distribution} will work well on new datasets. We use Optimal Transport distances to find this similarity between unlabeled tabular datasets and recommend machine learning pipelines on two downstream unsupervised tasks: Outlier Detection and Clustering. We obtain very promising results against existing baselines and state-of-the-art methods.

Chat is not available.