Timezone: »

Learning with Noisy Labels
Nagarajan Natarajan · Inderjit S Dhillon · Pradeep Ravikumar · Ambuj Tewari

Fri Dec 06 07:00 PM -- 11:59 PM (PST) @ Harrah's Special Events Center, 2nd Floor #None

In this paper, we theoretically study the problem of binary classification in the presence of random classification noise --- the learner, instead of seeing the true labels, sees labels that have independently been flipped with some small probability. Moreover, random label noise is \emph{class-conditional} --- the flip probability depends on the class. We provide two approaches to suitably modify any given surrogate loss function. First, we provide a simple unbiased estimator of any loss, and obtain performance bounds for empirical risk minimization in the presence of iid data with noisy labels. If the loss function satisfies a simple symmetry condition, we show that the method leads to an efficient algorithm for empirical minimization. Second, by leveraging a reduction of risk minimization under noisy labels to classification with weighted 0-1 loss, we suggest the use of a simple weighted surrogate loss, for which we are able to obtain strong empirical risk bounds. This approach has a very remarkable consequence --- methods used in practice such as biased SVM and weighted logistic regression are provably noise-tolerant. On a synthetic non-separable dataset, our methods achieve over 88\% accuracy even when 40\% of the labels are corrupted, and are competitive with respect to recently proposed methods for dealing with label noise in several benchmark datasets.

Author Information

Nagarajan Natarajan (Microsoft Research, India)
Inderjit S Dhillon (UT Austin & Amazon)
Pradeep Ravikumar (Carnegie Mellon University)
Ambuj Tewari (University of Michigan)

More from the Same Authors