See this document in CiteSeerX!

A Sequential Algorithm for Training Text Classifiers (1994)  (Make Corrections)  (135 citations)
David D. Lewis, William A. Gale
Proceedings of SIGIR-94, 17th ACM International Conference on Research and Development in Information Retrieval



  Home/Search   Context   Related

Links:   ACM   DBLP

 
View or download:
att.com/~lewis/papers/lewis94c.ps
wisc.edu/~shavlik/c...dlewissigir94.ps
wustl.edu/~zy/pape...wis95sequential.ps
Cached:  PS.gz  PS  PDF   Image  Update  Help
Problem Downloading?
From:  jprc.com/Information_Retr...index (more)
From:  jhu.edu/~weiss/papers
Homepages:  D.Lewis  

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: The ability to cheaply train text classifiers is critical to their use in information retrieval, content analysis, natural language processing, and other tasks involving data which is partly or fully textual. An algorithm for sequential sampling during machine learning of statistical classifiers was developed and tested on a newswire text categorization task. This method, which we call uncertainty sampling, reduced by as much as 500-fold the amount of training data that would have to be... (Update)

Cited by:   More
Active Learning with Multiple Views - Muslea (2002)   (Correct)
Unsupervised Activity Recognition Using Automatically.. - Wyatt, Philipose.. (2005)   (Correct)
Learning to Extract Entities from Labeled and Unlabeled Text - Jones (2005)   (Correct)

Active bibliography (related documents):   More   All
1.2:   Heterogeneous Uncertainty Sampling for Supervised Learning - Lewis, Catlett (1994)   (Correct)
0.4:   Probabilistic Information Retrieval as Combination of.. - Fuhr, Pfeifer (1994)   (Correct)
0.3:   An Integrated System for Filtering News and.. - Amati, D'Aloisi..   (Correct)

Similar documents based on text:   More   All
0.2:   The Projective Geometry of the Gale Transform - Eisenbud, Popescu (1991)   (Correct)
0.2:   Fax: An Alternative to SGML - Kenneth Church William   (Correct)
0.1:   Finding Terminology Translations From Non-Parallel Corpora - Fung, McKeown (1997)   (Correct)

Related documents from co-citation:   More   All
36:   Text categorization with Support Vector Machines: Learning with many relevant fe.. - Joachims - 1998
26:   Relevance Feedback in Information Retrieval (context) - Rocchio - 1971
25:   A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categoriza.. - Joachims - 1997

BibTeX entry:   (Update)

Lewis, D. D. (1995). A sequential algorithm for training text classifiers: Corrigendum and additional data. SIGIR Forum, 29 (2), 13--19. http://citeseer.ist.psu.edu/lewis94sequential.html   More

@inproceedings{ lewis94sequential,
    author = "David D. Lewis and William A. Gale",
    title = "A sequential algorithm for training text classifiers",
    booktitle = "Proceedings of {SIGIR}-94, 17th {ACM} International Conference on Research and Development in Information Retrieval",
    publisher = "Springer Verlag, Heidelberg, DE",
    address = "Dublin, IE",
    editor = "W. Bruce Croft and Cornelis J. van Rijsbergen",
    pages = "3--12",
    year = "1994",
    url = "citeseer.ist.psu.edu/lewis94sequential.html" }
Citations (may not include all citations):
520   Generalized Linear Models (context) - McCullagh, Nelder - 1989
441   Queries and concept learning (context) - Angluin - 1988  ACM   DBLP
274   Generalization as search (context) - Mitchell - 1982  DBLP
271   Improving retrieval performance by relevance feedback (context) - Salton, Buckley - 1990  ACM   DBLP
153   Sampling Techniques (context) - Cochran - 1977  ACM
94   A method for disambiguating word senses in a large corpus (context) - Gale, Church et al. - 1993
83   Query by committee - Seung, Opper et al. - 1992  ACM   DBLP
75   The evidence framework applied to classification networks - MacKay - 1992
73   An evaluation of phrasal and clustered representations on a .. (context) - Lewis - 1992  ACM   DBLP
46   Selecting concise training sets from clean data (context) - Plutowski, White - 1993
39   IEEE Transactions on Information Theory (context) - Hart, nearest - 1968
37   Models for retrieval with probabilistic indexing (context) - Fuhr - 1989  ACM
32   Query-based learning applied to partially trained multilayer.. (context) - Hwang, Choi et al. - 1991
23   Automatic indexing: An experimental inquiry (context) - Maron - 1961  ACM   DBLP
20   and query by committee (context) - Freund, Seung et al. - 1992
17   Some inconsistencies and misnomers in probabilistic informat.. (context) - Cooper - 1991  ACM
16   Probabilistic retrieval based on staged logistic regression (context) - Cooper, Gey et al. - 1992  ACM   DBLP
7   Attentional focus training by boundary region data selection (context) - Davis, Hwang - 1992
7   Intelligent high-volume text processing using shallow (context) - Hayes - 1992
6   Improved training via incremental learning (context) - Utgoff - 1989  ACM   DBLP
6   The automatic indexing system AIR/PHYS---from research to ap.. (context) - Biebricher, Fuhr et al. - 1988
6   Combining model-oriented and description-oriented approaches.. - Fuhr, Pfeifer
3   A brief history of sequential analysis (context) - Ghosh - 1991
2   Improving generalization with self-directed learning (context) - Cohn, Atlas et al. - 1992



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://www.cora.jprc.com/Information_Retrieval/Retrieval/index.html):   More
Automated Classification Of Encounter Notes In A Computer Based.. - Aronow (1994)   (Correct)
Advantages of Query Biased Summaries in Information Retrieval - Tombros, Sanderson (1998)   (Correct)
The Hardware/Software Balancing Act for Information.. - Lu, McKinley, Cahoon   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC