Timezone: »
Poster
On Binary Classification in Extreme Regions
Hamid Jalalzai · Stephan Clémençon · Anne Sabourin
In pattern recognition, a random label Y is to be predicted based upon observing a random vector X valued in $\mathbb{R}^d$ with d>1 by means of a classification rule with minimum probability of error. In a wide variety of applications, ranging from finance/insurance to environmental sciences through teletraffic data analysis for instance, extreme (i.e. very large) observations X are of crucial importance, while contributing in a negligible manner to the (empirical) error however, simply because of their rarity. As a consequence, empirical risk minimizers generally perform very poorly in extreme regions. It is the purpose of this paper to develop a general framework for classification in the extremes. Precisely, under non-parametric heavy-tail assumptions for the class distributions, we prove that a natural and asymptotic notion of risk, accounting for predictive performance in extreme regions of the input space, can be defined and show that minimizers of an empirical version of a non-asymptotic approximant of this dedicated risk, based on a fraction of the largest observations, lead to classification rules with good generalization capacity, by means of maximal deviation inequalities in low probability regions. Beyond theoretical results, numerical experiments are presented in order to illustrate the relevance of the approach developed.
Author Information
Hamid Jalalzai (Télécom ParisTech)
Stephan Clémençon (Telecom ParisTech)
Anne Sabourin (LTCI, Telecom ParisTech, Université Paris-Saclay)
More from the Same Authors
-
2023 Poster: Active Bipartite Ranking »
James Cheshire · Vincent Laurent · Stephan Clémençon -
2022 Poster: What are the best Systems? New Perspectives on NLP Benchmarking »
Pierre Colombo · Nathan Noiry · Ekhine Irurozki · Stephan Clémençon -
2020 Poster: Heavy-tailed Representations, Text Polarity Classification & Data Augmentation »
Hamid Jalalzai · Pierre Colombo · Chloé Clavel · Eric Gaussier · Giovanna Varni · Emmanuel Vignon · Anne Sabourin -
2016 Poster: On Graph Reconstruction via Empirical Risk Minimization: Fast Learning Rates and Scalability »
Guillaume Papa · Aurélien Bellet · Stephan Clémençon -
2008 Poster: Empirical performance maximization for linear rank statistics »
Stephan Clémençon · Nicolas Vayatis -
2008 Poster: On Bootstrapping the ROC Curve »
Patrice Bertail · Stephan Clémençon · Nicolas Vayatis -
2008 Poster: Overlaying classifiers: a practical approach for optimal ranking »
Stephan Clémençon · Nicolas Vayatis