@MISC{Nigam_pool-basedactive, author = {Kamal Nigam}, title = {Pool-Based Active Learning for Text Classification}, year = {} }
Bookmark
OpenURL
Abstract
This paper shows how a text classifier’s need for labeled training documents can be reduced by employing a large pool of unlabeled documents. We modify the Query-by-Committee (QBC) method of active learning to use the unlabeled pool by explicitly estimating document density when selecting examples for labeling. Then active learning is combined with Expectation-Maximization in order to “fill in ” the class labels of those documents that remain unlabeled. Experimental results show that the improvements to active learning reduce the need for labelings by one-third over previous QBC approaches, and that the combination of EM and active learning requires only slightly more than half as many labeled training examples to achieve the same accuracy as either EM or active learning alone.