Timezone: »
Distance-based approaches to outlier detection are popular in data mining, as they do not require to model the underlying probability distribution, which is particularly challenging for high-dimensional data. We present an empirical comparison of various approaches to distance-based outlier detection across a large number of datasets. We report the surprising observation that a simple, sampling-based scheme outperforms state-of-the-art techniques in terms of both efficiency and effectiveness. To better understand this phenomenon, we provide a theoretical analysis why the sampling-based approach outperforms alternative methods based on k-nearest neighbor search.
Author Information
Mahito Sugiyama (National Institute of Informatics)
Karsten Borgwardt (ETH Zurich)
Karsten Borgwardt is Professor of Data Mining at ETH Zürich, at the Department of Biosystems located in Basel. His work has won several awards, including the NIPS 2009 Outstanding Paper Award, the Krupp Award for Young Professors 2013 and a Starting Grant 2014 from the ERC-backup scheme of the Swiss National Science Foundation. Since 2013, he is heading the Marie Curie Initial Training Network for "Machine Learning for Personalized Medicine" with 12 partner labs in 8 countries (http://www.mlpm.eu). The business magazine "Capital" listed him as one of the "Top 40 under 40" in Science in/from Germany in 2014, 2015 and 2016. For more information, visit: https://www.bsse.ethz.ch/mlcb
More from the Same Authors
-
2023 Poster: ProteinShake: Building datasets and benchmarks for deep learning on protein structures »
Tim Kucera · Carlos Oliver · Dexiong Chen · Karsten Borgwardt -
2020 Poster: Uncovering the Topology of Time-Varying fMRI Data using Cubical Persistence »
Bastian Rieck · Tristan Yates · Christian Bock · Karsten Borgwardt · Guy Wolf · Nicholas Turk-Browne · Smita Krishnaswamy -
2020 Spotlight: Uncovering the Topology of Time-Varying fMRI Data using Cubical Persistence »
Bastian Rieck · Tristan Yates · Christian Bock · Karsten Borgwardt · Guy Wolf · Nicholas Turk-Browne · Smita Krishnaswamy -
2019 Poster: Wasserstein Weisfeiler-Lehman Graph Kernels »
Matteo Togninalli · Elisabetta Ghisu · Felipe Llinares-Lopez · Bastian Rieck · Karsten Borgwardt -
2019 Spotlight: Wasserstein Weisfeiler-Lehman Graph Kernels »
Matteo Togninalli · Elisabetta Ghisu · Felipe Llinares-López · Bastian Rieck · Karsten Borgwardt -
2016 Poster: Finding significant combinations of features in the presence of categorical covariates »
Laetitia Papaxanthos · Felipe Llinares-López · Dean Bodenham · Karsten Borgwardt -
2015 Poster: Halting in Random Walk Kernels »
Mahito Sugiyama · Karsten Borgwardt -
2013 Poster: Scalable kernels for graphs with continuous attributes »
Aasa Feragen · Niklas Kasenburg · Jens Petersen · Marleen de Bruijne · Karsten Borgwardt -
2013 Poster: It is all in the noise: Efficient multi-task Gaussian process inference with structured residuals »
Barbara Rakitsch · Christoph Lippert · Karsten Borgwardt · Oliver Stegle -
2011 Workshop: From statistical genetics to predictive models in personalized medicine »
Karsten Borgwardt · Oliver Stegle · Shipeng Yu · Glenn Fung · Faisal Farooq · Balaji R Krishnapuram -
2011 Poster: Learning sparse inverse covariance matrices in the presence of confounders »
Oliver Stegle · Christoph Lippert · Joris M Mooij · Neil D Lawrence · Karsten Borgwardt -
2009 Workshop: Transfer Learning for Structured Data »
Sinno Jialin Pan · Ivor W Tsang · Le Song · Karsten Borgwardt · Qiang Yang -
2009 Poster: Fast subtree kernels on graphs »
Nino Shervashidze · Karsten Borgwardt -
2009 Oral: Fast Subtree Kernels on Graphs »
Nino Shervashidze · Karsten Borgwardt -
2008 Workshop: Structured Input - Structured Output »
Karsten Borgwardt · Koji Tsuda · Vishwanathan S V N · Xifeng Yan -
2007 Oral: Colored Maximum Variance Unfolding »
Le Song · Alexander Smola · Karsten Borgwardt · Arthur Gretton -
2007 Poster: Colored Maximum Variance Unfolding »
Le Song · Alexander Smola · Karsten Borgwardt · Arthur Gretton -
2006 Poster: Fast Computation of Graph Kernels »
Vishwanathan S V N · Karsten Borgwardt · Nic Schraudolph -
2006 Poster: A Kernel Method for the Two-Sample-Problem »
Arthur Gretton · Karsten Borgwardt · Malte J Rasch · Bernhard Schölkopf · Alexander Smola -
2006 Poster: Correcting Sample Selection Bias by Unlabeled Data »
Jiayuan Huang · Alexander Smola · Arthur Gretton · Karsten Borgwardt · Bernhard Schölkopf -
2006 Spotlight: Correcting Sample Selection Bias by Unlabeled Data »
Jiayuan Huang · Alexander Smola · Arthur Gretton · Karsten Borgwardt · Bernhard Schölkopf -
2006 Talk: A Kernel Method for the Two-Sample-Problem »
Arthur Gretton · Karsten Borgwardt · Malte J Rasch · Bernhard Schölkopf · Alexander Smola