Timezone: »
Given an unsupervised outlier detection task on a new dataset, how can we automatically select a good outlier detection algorithm and its hyperparameter(s) (collectively called a model)? In this work, we tackle the unsupervised outlier model selection (UOMS) problem, and propose MetaOD, a principled, data-driven approach to UOMS based on meta-learning. The UOMS problem is notoriously challenging, as compared to model selection for classification and clustering, since (i) model evaluation is infeasible due to the lack of hold-out data with labels, and (ii) model comparison is infeasible due to the lack of a universal objective function. MetaOD capitalizes on the performances of a large body of detection models on historical outlier detection benchmark datasets, and carries over this prior experience to automatically select an effective model to be employed on a new dataset without any labels, model evaluations or model comparisons. To capture task similarity within our meta-learning framework, we introduce specialized meta-features that quantify outlying characteristics of a dataset. Extensive experiments show that selecting a model by MetaOD significantly outperforms no model selection (e.g. always using the same popular model or the ensemble of many) as well as other meta-learning techniques that we tailored for UOMS. Moreover upon (meta-)training, MetaOD is extremely efficient at test time; selecting from a large pool of 300+ models takes less than 1 second for a new task. We open-source MetaOD and our meta-learning database for practical use and to foster further research on the UOMS problem.
Author Information
Yue Zhao (Carnegie Mellon University)
I am pursuing a Ph.D. in Information Systems at Carnegie Mellon University, advised by Prof. Leman Akoglu. Different from most IS researchers, I focus on data mining algorithms, systems, and applications. Research Keywords: Outlier & Anomaly Detection; Ensemble Learning; Scalable Machine Learning; Machine Learning Systems.
Ryan Rossi (Purdue University)
Leman Akoglu (CMU)
More from the Same Authors
-
2021 : Revisiting Time Series Outlier Detection: Definitions and Benchmarks »
Kwei-Herng Lai · Daochen Zha · Junjie Xu · Yue Zhao · Guanchu Wang · Xia Hu -
2021 : Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development »
Kexin Huang · Tianfan Fu · Wenhao Gao · Yue Zhao · Yusuf Roohani · Jure Leskovec · Connor Coley · Cao Xiao · Jimeng Sun · Marinka Zitnik -
2022 Poster: ADBench: Anomaly Detection Benchmark »
Songqiao Han · Xiyang Hu · Hailiang Huang · Minqi Jiang · Yue Zhao -
2022 Poster: BOND: Benchmarking Unsupervised Outlier Node Detection on Static Attributed Graphs »
Kay Liu · Yingtong Dou · Yue Zhao · Xueying Ding · Xiyang Hu · Ruitong Zhang · Kaize Ding · Canyu Chen · Hao Peng · Kai Shu · Lichao Sun · Jundong Li · George H Chen · Zhihao Jia · Philip S Yu -
2022 Poster: Hyperparameter Sensitivity in Deep Outlier Detection: Analysis and a Scalable Hyper-Ensemble Solution »
Xueying Ding · Lingxiao Zhao · Leman Akoglu -
2022 Poster: A Practical, Progressively-Expressive GNN »
Lingxiao Zhao · Neil Shah · Leman Akoglu -
2022 Poster: Dual-discriminative Graph Neural Network for Imbalanced Graph-level Anomaly Detection »
GE ZHANG · Zhenyu Yang · Jia Wu · Jian Yang · Shan Xue · Hao Peng · Jianlin Su · Chuan Zhou · Quan Z. Sheng · Leman Akoglu · Charu Aggarwal -
2022 Poster: CyCLIP: Cyclic Contrastive Language-Image Pretraining »
Shashank Goel · Hritik Bansal · Sumit Bhatia · Ryan Rossi · Vishwa Vinay · Aditya Grover -
2020 Poster: Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs »
Jiong Zhu · Yujun Yan · Lingxiao Zhao · Mark Heimann · Leman Akoglu · Danai Koutra -
2019 Poster: Statistical Analysis of Nearest Neighbor Methods for Anomaly Detection »
Xiaoyi Gu · Leman Akoglu · Alessandro Rinaldo