Timezone: »
We consider the problem of detecting anomalies in a large dataset. We propose a framework called Partial Identification which captures the intuition that anomalies are easy to distinguish from the overwhelming majority of points by relatively few attribute values. Formalizing this intuition, we propose a geometric anomaly measure for a point that we call PIDScore, which measures the minimum density of data points over all subcubes containing the point. We present PIDForest: a random forest based algorithm that finds anomalies based on this definition. We show that it performs favorably in comparison to several popular anomaly detection methods, across a broad range of benchmarks. PIDForest also provides a succinct explanation for why a point is labelled anomalous, by providing a set of features and ranges for them which are relatively uncommon in the dataset.
Author Information
Parikshit Gopalan (VMware Research)
Vatsal Sharan (Stanford University)
Udi Wieder (VMware Research)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Spotlight: PIDForest: Anomaly Detection via Partial Identification »
Wed Dec 11th 06:40 -- 06:45 PM Room West Ballroom A + B
More from the Same Authors
-
2018 Poster: Efficient Anomaly Detection via Matrix Sketching »
Vatsal Sharan · Parikshit Gopalan · Udi Wieder -
2018 Poster: A Spectral View of Adversarially Robust Features »
Shivam Garg · Vatsal Sharan · Brian Zhang · Gregory Valiant -
2018 Spotlight: A Spectral View of Adversarially Robust Features »
Shivam Garg · Vatsal Sharan · Brian Zhang · Gregory Valiant -
2017 Poster: Learning Overcomplete HMMs »
Vatsal Sharan · Sham Kakade · Percy Liang · Gregory Valiant