MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Using EM to classify text from labeled and unlabeled documents (1521) [5 citations — 0 self]

Download:
pdf | ps
by Kamal Nigam, Andrew Mccallum, Sebastian Thrun, Tom Mitchell
Machine Learning
http://reports-archive.adm.cs.cmu.edu/anon/1998/CMU-CS-98-120.ps
Add To MetaCart

Abstract:

This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is significant because in many important text classification problems obtaining classification labels is expensive, while large quantities of unlabeled documents are readily available. We present a theoretical argument showing that, under common assumptions, unlabeled data contain information about the target function. We then introduce an algorithm for learning from labeled and unlabeled text, based on the combination of Expectation-Maximization with a naive Bayes classifier. The algorithm first trains a classifier using the available labeled documents, and probabilistically labels the unlabeled documents. It then trains a new classifier using the labels for all the documents, and iterates. Experimental results, obtained using text from three different real-world tasks, show that the use of unlabeled data reduces classification error by up to 30%.

Citations

4364 Elements of Information Theory – Cover, Thomas - 1991
4345 Maximum likelihood from incomplete data via the EM algorithm – Dempster, Laird, et al. - 1977
961 Text Categorization with Support Vector Machines – Joachims - 1997
573 A Probabilistic Theory of Pattern Recognition – Devroye, Gyorfi, et al. - 1996
559 Relevance feedback in information retrieval, The – Rocchio - 1971
477 A comparison of event models for Naive Bayes text classification – McCallum, Nigam - 1998
406 Relevance weighting of search terms – Robertson, Jones - 1988
347 Finite Mixture Models – McLachlan, Peel - 2000
339 Bayesian Classification (AutoClass): Theory and Results – Cheeseman, Stutz - 1996
289 Hierarchically classifying documents using very few words – Koller, Sahami - 1997
282 NewsWeeder: learning to filter netnews – Lang - 1995
277 A sequential algorithm for training text classifiers – Lewis, Gale - 1994
256 A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for text categorization – Joachims - 1997
255 Learning to extract symbolic knowledge from the World Wide Web – Craven, DiPasquo, et al. - 1998
250 Syskill & webert: Identifying interesting web sites – Pazzani, Muramatsu, et al. - 1996
234 Beyond independence: Conditions for the optimality of the simple bayesian classifier – Domingos, Pazzani - 1996
211 A comparison of two learning algorithms for text categorization – Lewis, Ringuette - 1994
188 Context-sensitive learning methods for text categorization – Cohen, SINGER - 1999
168 An evaluation of phrasal and clustered representations on a text categorization task – Lewis - 1992
151 On bias, variance, 0/1 - loss, and the curse-of-dimensionality – Friedman - 1997
143 Employing em and pool-based active learning for text classification – McCallum, Nigam - 1998
141 Developments in automatic text retrieval – Salton - 1991
124 Supervised learning from incomplete data via an EM approach – Ghahramani, Jordan - 1994
90 Combining classifiers in text categorization – Larkey, Croft - 1996
80 A mixture of experts classifier with learning based on both labelled and unlabelled data – Miller, Uyar - 1997
75 Active Learning with Committees for Text Categorization – Liere, Tadepelli - 1997
71 The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing param – Castelli, Cover - 1996
67 The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon – Shahshahani, Landgrebe - 1994
58 Feature selection in statistical learning of text categorization – Yang, Pedersen - 1997
52 On the exponential value of labeled samples – Castelli, Cover - 1995
49 Threading electronic mail: A preliminary study – Lewis, Knowles
36 Bayesian classification theory – Hanson, Stutz, et al. - 1991
32 A new probabilistic model of text classification and retrieval – Kalt - 1996
29 Improving text clasification by shrinkage in a hierarchy of classes – McCallum, Rosenfeld, et al. - 1998
21 Document classification using a finite mixture model – Li, Yamanishi - 1997
16 Estimations of dependences based on statistical data – Vapnik - 1982
14 An Application of Least Squares Fit Mapping to Text Information Retrieval – Yang, Chute - 1993