Timezone: »
The cluster assumption is exploited by most semi-supervised learning (SSL) methods. However, if the unlabeled data is merely weakly related to the target classes, it becomes questionable whether driving the decision boundary to the low density regions of the unlabeled data will help the classification. In such case, the cluster assumption may not be valid; and consequently how to leverage this type of unlabeled data to enhance the classification accuracy becomes a challenge. We introduce "Semi-supervised Learning with Weakly-Related Unlabeled Data" (SSLW), an inductive method that builds upon the maximum-margin approach, towards a better usage of weakly-related unlabeled information. Although the SSLW could improve a wide range of classification tasks, in this paper, we focus on text categorization with a small training pool. The key assumption behind this work is that, even with different topics, the word usage patterns across different corpora tends to be consistent. To this end, SSLW estimates the optimal word-correlation matrix that is consistent with both the co-occurrence information derived from the weakly-related unlabeled documents and the labeled documents. For empirical evaluation, we present a direct comparison with a number of state-of-the-art methods for inductive semi-supervised learning and text categorization; and we show that SSLW results in a significant improvement in categorization accuracy, equipped with a small training set and an unlabeled resource that is weakly related to the test beds.
Author Information
Liu Yang (CMU)
Rong Jin (Michigan State University (MSU))
Rahul Sukthankar (Intel Labs and CMU)
Related Events (a corresponding poster, oral, or spotlight)
-
2008 Spotlight: Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization »
Tue. Dec 9th 04:36 -- 04:37 AM Room
More from the Same Authors
-
2014 Poster: Extracting Certainty from Uncertainty: Transductive Pairwise Classification from Pairwise Similarities »
Tianbao Yang · Rong Jin -
2014 Poster: Top Rank Optimization in Linear Time »
Nan Li · Rong Jin · Zhi-Hua Zhou -
2013 Poster: Mixed Optimization for Smooth Functions »
Mehrdad Mahdavi · Lijun Zhang · Rong Jin -
2013 Poster: Linear Convergence with Condition Number Independent Access of Full Gradients »
Lijun Zhang · Mehrdad Mahdavi · Rong Jin -
2013 Poster: Stochastic Convex Optimization with Multiple Objectives »
Mehrdad Mahdavi · Tianbao Yang · Rong Jin -
2013 Poster: Buy-in-Bulk Active Learning »
Liu Yang · Jaime Carbonell -
2013 Poster: Speedup Matrix Completion with Side Information: Application to Multi-Label Learning »
Miao Xu · Rong Jin · Zhi-Hua Zhou -
2012 Poster: Nystr{รถ}m Method vs Random Fourier Features: A Theoretical and Empirical Comparison »
Tianbao Yang · Yu-Feng Li · Mehrdad Mahdavi · Rong Jin · Zhi-Hua Zhou -
2012 Poster: Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning »
Jinfeng Yi · Rong Jin · Anil K Jain · Shaili Jain -
2012 Poster: Stochastic Gradient Descent with Only One Projection »
Mehrdad Mahdavi · Tianbao Yang · Rong Jin · Shenghuo Zhu -
2011 Poster: Active Learning with a Drifting Distribution »
Liu Yang -
2010 Poster: Active Learning by Querying Informative and Representative Examples »
Sheng-Jun Huang · Rong Jin · Zhi-Hua Zhou -
2010 Poster: Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition »
Serhat S Bucak · Rong Jin · Anil K Jain -
2009 Poster: Adaptive Regularization for Transductive Support Vector Machine »
Zenglin Xu · Rong Jin · Jianke Zhu · Irwin King · Michael R Lyu · Zhirong Yang -
2009 Spotlight: Adaptive Regularization for Transductive Support Vector Machine »
Zenglin Xu · Rong Jin · Jianke Zhu · Irwin King · Michael R Lyu · Zhirong Yang -
2009 Poster: Regularized Distance Metric Learning:Theory and Algorithm »
Rong Jin · Shijun Wang · Yang Zhou -
2009 Poster: Learning Bregman Distance Functions and Its Application for Semi-Supervised Clustering »
Lei Wu · Rong Jin · Steven Chu-Hong Hoi · Jianke Zhu · Nenghai Yu -
2009 Poster: An Integer Projected Fixed Point Method for Graph Matching and MAP Inference »
Marius Leordeanu · Martial Hebert · Rahul Sukthankar -
2009 Poster: DUOL: A Double Updating Approach for Online Learning »
Peilin Zhao · Steven Chu-Hong Hoi · Rong Jin -
2009 Poster: Learning to Rank by Optimizing NDCG Measure »
Hamed Valizadegan · Rong Jin · Ruofei Zhang · Jianchang Mao -
2009 Spotlight: Learning to Rank by Optimizing NDCG Measure »
Hamed Valizadegan · Rong Jin · Ruofei Zhang · Jianchang Mao -
2008 Poster: Multi-label Multiple Kernel Learning »
Shuiwang Ji · Liang Sun · Rong Jin · Jieping Ye -
2008 Spotlight: Multi-label Multiple Kernel Learning »
Shuiwang Ji · Liang Sun · Rong Jin · Jieping Ye -
2008 Poster: An Extended Level Method for Efficient Multiple Kernel Learning »
Zenglin Xu · Rong Jin · Irwin King · Michael R Lyu -
2007 Poster: Efficient Convex Relaxation for Transductive Support Vector Machine »
Zenglin Xu · Rong Jin · Jianke Zhu · Irwin King · Michael R Lyu -
2006 Poster: Generalized Maximum Margin Clustering and Unsupervised Kernel Learning »
Hamed Valizadegan · Rong Jin -
2006 Poster: Distributed Inference in Dynamical Systems »
Stanislav Funiak · Carlos Guestrin · Mark A Paskin · Rahul Sukthankar