• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

DMCA

A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization (1997)

Cached

  • Download as a PDF

Download Links

  • [www.cs.cornell.edu]
  • [www-connex.lip6.fr]
  • [reports.adm.cs.cmu.edu]
  • [www-ai.informatik.uni-dortmund.de]
  • [www.cs.cornell.edu]
  • [www.cs.cornell.edu]
  • [www-ai.cs.uni-dortmund.de]
  • [www-ai.informatik.uni-dortmund.de]
  • [www-ai.cs.uni-dortmund.de]

  • Other Repositories/Bibliography

  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Thorsten Joachims
Citations:455 - 1 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@INPROCEEDINGS{Joachims97aprobabilistic,
    author = {Thorsten Joachims},
    title = {A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization},
    booktitle = {},
    year = {1997},
    pages = {143--151}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

The Rocchio relevance feedback algorithm is one of the most popular and widely applied learning methods from information retrieval. Here, a probabilistic analysis of this algorithm is presented in a text categorization framework. The analysis gives theoretical insight into the heuristics used in the Rocchio algorithm, particularly the word weighting scheme and the similarity metric. It also suggests improvements which lead to a probabilistic variant of the Rocchio classifier. The Rocchio classifier, its probabilistic variant, and a naive Bayes classifier are compared on six text categorization tasks. The results show that the probabilistic algorithms are preferable to the heuristic Rocchio classifier not only because they are more well-founded, but also because they achieve better performance.

Keyphrases

rocchio algorithm    probabilistic analysis    text categorization    rocchio classifier    probabilistic variant    rocchio relevance feedback algorithm    text categorization task    information retrieval    naive bayes classifier    text categorization framework    heuristic rocchio classifier    theoretical insight    probabilistic algorithm   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University