• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

DMCA

Combining labeled and unlabeled data with co-training (1998)

Cached

  • Download as a PDF

Download Links

  • [l2r.cs.uiuc.edu]
  • [luthuli.cs.uiuc.edu]
  • [luthuli.cs.uiuc.edu]
  • [cms.brookes.ac.uk]
  • [luthuli.cs.uiuc.edu]
  • [axon.cs.byu.edu]
  • [www.iro.umontreal.ca]
  • [isoft.postech.ac.kr]
  • [www.sis.pitt.edu]
  • [nlp.postech.ac.kr]
  • [www.sis.pitt.edu]
  • [www-connex.lip6.fr]
  • [www-connex.lip6.fr]
  • [www.cs.cmu.edu]
  • [www-2.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [www.ri.cmu.edu]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Avrim Blum , Tom Mitchell
Citations:1632 - 28 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@INPROCEEDINGS{Blum98combininglabeled,
    author = {Avrim Blum and Tom Mitchell},
    title = {Combining labeled and unlabeled data with co-training},
    booktitle = {},
    year = {1998},
    pages = {92--100},
    publisher = {Morgan Kaufmann Publishers}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

We consider the problem of using a large unlabeled sample to boost performance of a learning algorithm when only a small set of labeled examples is available. In particular, we consider a setting in which the description of each example can be partitioned into two distinct views, motivated by the task of learning to classify web pages. For example, the description of a web page can be partitioned into the words occurring on that page, and the words occurring in hyperlinks that point to that page. We assume that either view of the example would be su cient for learning if we had enough labeled data, but our goal is to use both views together to allow inexpensive unlabeled data to augment amuch smaller set of labeled examples. Speci cally, the presence of two distinct views of each example suggests strategies in which two learning algorithms are trained separately on each view, and then each algorithm's predictions on new unlabeled examples are used to enlarge the training set of the other. Our goal in this paper is to provide a PAC-style analysis for this setting, and, more broadly, a PAC-style framework for the general problem of learning from both labeled and unlabeled data. We also provide empirical results on real web-page data indicating that this use of unlabeled examples can lead to signi cant improvement of hypotheses in practice. As part of our analysis, we provide new re-

Keyphrases

labeled example    learning algorithm    distinct view    web page    unlabeled example    cant improvement    su cient    new unlabeled example    small set    training set    empirical result    inexpensive unlabeled data    pac-style analysis    pac-style framework    new re    general problem    real web-page data    large unlabeled sample   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University