• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

DMCA

Machine Learning in Automated Text Categorization (2002)

Cached

  • Download as a PDF

Download Links

  • [faure.iei.pi.cnr.it]
  • [www.lsi.upc.es]
  • [classes.seattleu.edu]
  • [arxiv.org]
  • [arxiv.org]
  • [www.clips.ua.ac.be]
  • [courses.ischool.berkeley.edu]
  • [www.scils.rutgers.edu]
  • [www.ic.unicamp.br]
  • [ssli.ee.washington.edu]
  • [www-public.int-evry.fr]
  • [www.isti.cnr.it]
  • [nmis.isti.cnr.it]
  • [www.miv.t.u-tokyo.ac.jp]
  • [www.cis.uni-muenchen.de]
  • [textmining.zemris.fer.hr]
  • [www.math.unipd.it]
  • [webcourse.cs.technion.ac.il]
  • [webcourse.cs.technion.ac.il]

  • Other Repositories/Bibliography

  • CiteULike
  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Fabrizio Sebastiani
Venue:ACM COMPUTING SURVEYS
Citations:1728 - 22 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@ARTICLE{Sebastiani02machinelearning,
    author = {Fabrizio Sebastiani},
    title = {Machine Learning in Automated Text Categorization},
    journal = {ACM COMPUTING SURVEYS},
    year = {2002},
    volume = {34},
    pages = {1--47}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.

Keyphrases

automated text categorization    machine learning    good effectiveness    knowledge engineering approach    general inductive process    expert labor power    predefined category    document representation    digital form    dominant approach    booming interest    main approach    automated categorization    manual definition    different domain    detail issue    considerable saving    different problem    domain expert    last ten year    research community    preclassified document   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University