Timezone: »
In Learning with Label Proportions (LLP), the objective is to learn a supervised classifier when, instead of labels, only label proportions for bags of observations are known. This setting has broad practical relevance, in particular for privacy preserving data processing. We first show that the mean operator, a statistic which aggregates all labels, is minimally sufficient for the minimization of many proper scoring losses with linear (or kernelized) classifiers without using labels. We provide a fast learning algorithm that estimates the mean operator via a manifold regularizer with guaranteed approximation bounds. Then, we present an iterative learning algorithm that uses this as initialization. We ground this algorithm in Rademacher-style generalization bounds that fit the LLP setting, introducing a generalization of Rademacher complexity and a Label Proportion Complexity measure. This latter algorithm optimizes tractable bounds for the corresponding bag-empirical risk. Experiments are provided on fourteen domains, whose size ranges up to 300K observations. They display that our algorithms are scalable and tend to consistently outperform the state of the art in LLP. Moreover, in many cases, our algorithms compete with or are just percents of AUC away from the Oracle that learns knowing all labels. On the largest domains, half a dozen proportions can suffice, i.e. roughly 40K times less than the total number of labels.
Author Information
Giorgio Patrini (Australian National University / NICTA)
Richard Nock (Data61, the Australian National University and the University of Sydney)
Tiberio Caetano (NICTA Canberra)
Paul Rivera
Related Events (a corresponding poster, oral, or spotlight)
-
2014 Poster: (Almost) No Label No Cry »
Thu. Dec 11th 12:00 -- 04:59 AM Room Level 2, room 210D
More from the Same Authors
-
2017 Poster: f-GANs in an Information Geometric Nutshell »
Richard Nock · Zac Cranko · Aditya K Menon · Lizhen Qu · Robert Williamson -
2017 Spotlight: f-GANs in an Information Geometric Nutshell »
Richard Nock · Zac Cranko · Aditya K Menon · Lizhen Qu · Robert Williamson -
2016 Poster: A scaled Bregman theorem with applications »
Richard Nock · Aditya Menon · Cheng Soon Ong -
2016 Poster: On Regularizing Rademacher Observation Losses »
Richard Nock -
2015 Workshop: Learning and privacy with incomplete data and weak supervision »
Giorgio Patrini · Tony Jebara · Richard Nock · Dimitrios Kotzias · Felix Xinnan Yu -
2012 Poster: Learning as MAP Inference in Discrete Graphical Models »
Tiberio Caetano · Xianghang Liu · James Petterson -
2012 Poster: A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation »
Aaron Defazio · Tiberio Caetano -
2012 Session: Oral Session 8 »
Tiberio Caetano -
2012 Spotlight: A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation »
Aaron Defazio · Tiberio Caetano -
2011 Workshop: Philosophy and Machine Learning »
Marcello Pelillo · Joachim M Buhmann · Tiberio Caetano · Bernhard Schölkopf · Larry Wasserman -
2011 Poster: Submodular Multi-Label Learning »
James Petterson · Tiberio Caetano -
2010 Poster: Word Features for Latent Dirichlet Allocation »
James Petterson · Alexander Smola · Tiberio Caetano · Wray L Buntine · Shravan M Narayanamurthy -
2010 Poster: Reverse Multi-Label Learning »
James Petterson · Tiberio Caetano -
2010 Poster: Multitask Learning without Label Correspondences »
Novi Quadrianto · Alexander Smola · Tiberio Caetano · S.V.N. Vishwanathan · James Petterson -
2009 Workshop: Learning with Orderings »
Tiberio Caetano · Carlos Guestrin · Jonathan Huang · Risi Kondor · Guy Lebanon · Marina Meila -
2009 Poster: Convex Relaxation of Mixture Regression with Efficient Algorithms »
Novi Quadrianto · Tiberio Caetano · John Lim · Dale Schuurmans -
2009 Poster: Exponential Family Graph Matching and Ranking »
James Petterson · Tiberio Caetano · Julian J McAuley · Jin Yu -
2008 Poster: Robust Near-Isometric Matching via Structured Learning of Graphical Models »
Julian J McAuley · Tiberio Caetano · Alexander Smola