Timezone: »

Automatic Unsupervised Outlier Model Selection
Yue Zhao · Ryan Rossi · Leman Akoglu

Tue Dec 07 08:30 AM -- 10:00 AM (PST) @

Given an unsupervised outlier detection task on a new dataset, how can we automatically select a good outlier detection algorithm and its hyperparameter(s) (collectively called a model)? In this work, we tackle the unsupervised outlier model selection (UOMS) problem, and propose MetaOD, a principled, data-driven approach to UOMS based on meta-learning. The UOMS problem is notoriously challenging, as compared to model selection for classification and clustering, since (i) model evaluation is infeasible due to the lack of hold-out data with labels, and (ii) model comparison is infeasible due to the lack of a universal objective function. MetaOD capitalizes on the performances of a large body of detection models on historical outlier detection benchmark datasets, and carries over this prior experience to automatically select an effective model to be employed on a new dataset without any labels, model evaluations or model comparisons. To capture task similarity within our meta-learning framework, we introduce specialized meta-features that quantify outlying characteristics of a dataset. Extensive experiments show that selecting a model by MetaOD significantly outperforms no model selection (e.g. always using the same popular model or the ensemble of many) as well as other meta-learning techniques that we tailored for UOMS. Moreover upon (meta-)training, MetaOD is extremely efficient at test time; selecting from a large pool of 300+ models takes less than 1 second for a new task. We open-source MetaOD and our meta-learning database for practical use and to foster further research on the UOMS problem.

Author Information

Yue Zhao (Carnegie Mellon University)

I am pursuing a Ph.D. in Information Systems at Carnegie Mellon University, advised by Prof. Leman Akoglu. Different from most IS researchers, I focus on data mining algorithms, systems, and applications. Research Keywords: Outlier & Anomaly Detection; Ensemble Learning; Scalable Machine Learning; Machine Learning Systems.

Ryan Rossi (Purdue University)
Leman Akoglu (CMU)

More from the Same Authors