Timezone: »
Semisupervised learning, i.e. learning from both labeled and unlabeled data has received significant attention in the machine learning literature in recent years. Still our understanding of the theoretical foundations of the usefulness of unlabeled data remains somewhat limited. The simplest and the best understood situation is when the data is described by an identifiable mixture model, and where each class comes from a pure component. This natural setup and its implications ware analyzed in \cite{ratsaby95,castelli96a}. One important result was that in certain regimes, labeled data becomes exponentially more valuable than unlabeled data. However, in most realistic situations, one would not expect that the data comes from a parametric mixture distribution with identifiable components. There have been recent efforts to analyze the nonparametric situation, for example, cluster'' and
manifold'' assumptions have been suggested as a basis for analysis. Still, a satisfactory and fairly complete theoretical understanding of the nonparametric problem, similar to that in \cite{ratsaby95,castelli96a} has not yet been developed. In this paper we investigate an intermediate situation, when the data comes from a probability distribution, which can be modeled, but not perfectly, by an identifiable mixture distribution. This seems applicable to many situation, when, for example, a mixture of Gaussians is used to model the data. the contribution of this paper is an analysis of the role of labeled and unlabeled data depending on the amount of imperfection in the model.
Author Information
Kaushik Sinha (The Ohio State University)
Mikhail Belkin (Ohio State University)
More from the Same Authors

2018 Poster: Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate »
Mikhail Belkin · Daniel Hsu · Partha Mitra 
2017 Poster: Diving into the shallows: a computational perspective on largescale shallow learning »
SIYUAN MA · Mikhail Belkin 
2017 Spotlight: Diving into the shallows: a computational perspective on largescale shallow learning »
SIYUAN MA · Mikhail Belkin 
2016 Poster: Graphons, mergeons, and so on! »
Justin Eldridge · Mikhail Belkin · Yusu Wang 
2016 Oral: Graphons, mergeons, and so on! »
Justin Eldridge · Mikhail Belkin · Yusu Wang 
2016 Poster: Clustering with Bregman Divergences: an Asymptotic Analysis »
Chaoyue Liu · Mikhail Belkin 
2015 Poster: A PseudoEuclidean Iteration for Optimal Recovery in Noisy ICA »
James R Voss · Mikhail Belkin · Luis Rademacher 
2014 Poster: Learning with Fredholm Kernels »
Qichao Que · Mikhail Belkin · Yusu Wang 
2013 Workshop: Modern Nonparametric Methods in Machine Learning »
Arthur Gretton · Mladen Kolar · Samory Kpotufe · John Lafferty · Han Liu · Bernhard Schölkopf · Alexander Smola · Rob Nowak · Mikhail Belkin · Lorenzo Rosasco · peter bickel · Yue Zhao 
2013 Poster: Inverse Density as an Inverse Problem: the Fredholm Equation Approach »
Qichao Que · Mikhail Belkin 
2013 Poster: Fast Algorithms for Gaussian Noise Invariant Independent Component Analysis »
James R Voss · Luis Rademacher · Mikhail Belkin 
2013 Spotlight: Inverse Density as an Inverse Problem: the Fredholm Equation Approach »
Qichao Que · Mikhail Belkin 
2012 Poster: Nearoptimal Differentially Private Principal Components »
Kamalika Chaudhuri · Anand D Sarwate · Kaushik Sinha 
2011 Poster: Data Skeletonization via Reeb Graphs »
Xiaoyin Ge · Issam I Safa · Mikhail Belkin · Yusu Wang 
2009 Poster: Semisupervised Learning using Sparse Eigenfunction Bases »
Kaushik Sinha · Mikhail Belkin 
2007 Poster: The Value of Labeled and Unlabeled Examples when the Model is Imperfect »
Kaushik Sinha · Mikhail Belkin 
2006 Poster: On the Relation Between Low Density Separation, Spectral Clustering and Graph Cuts »
Hariharan Narayanan · Mikhail Belkin · Partha Niyogi 
2006 Poster: Convergence of Laplacian Eigenmaps »
Mikhail Belkin · Partha Niyogi