Timezone: »
We address the problem of learning classifiers when observations have multiple views, some of which may not be observed for all examples. We assume the existence of view generating functions which may complete the missing views in an approximate way. This situation corresponds for example to learning text classifiers from multilingual collections where documents are not available in all languages. In that case, Machine Translation (MT) systems may be used to translate each document in the missing languages. We derive a generalization error bound for classifiers learned on examples with multiple artificially created views. Our result uncovers a trade-off between the size of the training set, the number of views, and the quality of the view generating functions. As a consequence, we identify situations where it is more interesting to use multiple views for learning instead of classical single view learning. An extension of this framework is a natural way to leverage unlabeled multi-view data in semi-supervised learning. Experimental results on a subset of the Reuters RCV1/RCV2 collections support our findings by showing that additional views obtained from MT may significantly improve the classification performance in the cases identified by our trade-off.
Author Information
Massih R Amini (University Joseph Fourier)
Nicolas Usunier (Université Pierre et Marie Curie)
Cyril Goutte (National Research Council Canada)
More from the Same Authors
-
2012 Poster: On the (Non-)existence of Convex, Calibrated Surrogate Losses for Ranking »
Clément Calauzènes · Nicolas Usunier · Patrick Gallinari -
2012 Oral: On the (Non-)existence of Convex, Calibrated Surrogate Losses for Ranking »
Clément Calauzènes · Nicolas Usunier · Patrick Gallinari -
2008 Poster: A Transductive Bound for the Voted Classifier with an Application to Semi-supervised Learning »
Massih R Amini · Nicolas Usunier · Francois Laviolette -
2008 Spotlight: A Transductive Bound for the Voted Classifier with an Application to Semi-supervised Learning »
Massih R Amini · Nicolas Usunier · Francois Laviolette -
2006 Workshop: Machine Learning for Multilingual Information Access »
Cyril Goutte -
2006 Poster: PAC-Bayes Bounds for the Risk of the Majority Vote and the Variance of the Gibbs Classifier »
Alexandre Lacasse · Francois Laviolette · Mario Marchand · Pascal Germain · Nicolas Usunier -
2006 Spotlight: PAC-Bayes Bounds for the Risk of the Majority Vote and the Variance of the Gibbs Classifier »
Alexandre Lacasse · Francois Laviolette · Mario Marchand · Pascal Germain · Nicolas Usunier