See this document in CiteSeerX!

Building Domain-Specific Search Engines with Machine Learning Techniques (1999)  (Make Corrections)  (23 citations)
Andrew McCallum, Kamal Nigam, Jason Rennie, Kristie Seymore
Proc. AAAI-99 Spring Symposium on Intelligent Agents in Cyberspace, 1999.



  Home/Search   Context   Related

 
View or download:
cmu.edu/~mccallum/pa...coraaaaiss98.ps
cmu.edu/~knigam/pa...oraaaaiss99.ps.gz
cmu.edu/~mccallum/p...coraaaaiss98s.ps
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  orst.edu/~dambrosi/uaiarc...0309 (more)
From:  cmu.edu/~mccallum/
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: Domain-specific search engines are becoming increasingly popular because they offer increased accuracy and extra features not possible with the general, Web-wide search engines. For example, www.campsearch.com allows complex queries by agegroup, size, location and cost over summer camps. Unfortunately, these domain-specific search engines are difficult and time consuming to maintain. This paper proposes the use of machine learning techniques to greatly automate the creation and maintenance of... (Update)

Context of citations to this paper:   More

...as a new approach to topic specific web resource discovery. This work was also done largely at IBM s Almaden research center. Cora [23, 24, 25] is a computer science research paper search engine that uses reinforcement learning to guide its focused crawler. Rennie and McCallum...

Cited by:   More
Quality and Relevance of Domain-Specific Search: A Case Study.. - Mental Health Thanh   (Correct)
Focused crawling in depression portal search: A feasibility study - Thanh Tin Tang   (Correct)
Using Generic Corpora to Learn Domain-Specific - Terminology David Vogel (2003)   (Correct)

Similar documents (at the sentence level):   More
47.0%:   Building Domain-Specific Search Engines with Machine .. - McCallum, Nigam.. (1999)   (Correct)
15.9%:   Automating the Construction of Internet Portals with.. - McCallum, Nigam..   (Correct)
14.0%:   Efficient Web Spidering with Reinforcement Learning - Rennie, McCallum (1999)   (Correct)

Active bibliography (related documents):   More   All
0.2:   Tagging English text with a probabilistic model - Merialdo (1993)   (Correct)
0.2:   Text Classification by Bootstrapping with Keywords, EM and.. - McCallum, Nigam (1999)   (Correct)
0.2:   Layered Learning - Stone, Veloso (2000)   (Correct)

Related documents from co-citation:   More   All
7:   Focused crawling using context graphs - Diligenti, Coetzee et al. - 2000
6:   Focused crawling: a new approach to topic-specific Web resource discovery - Chakrabarti, van der Berg et al. - 1999
5:   Information extraction using hidden Markov models - Leek - 1997

BibTeX entry:   (Update)

Andrew McCallum, Kamal Nigam, Jason Rennie, and Kristie Seymore. Building domain-specific search engines with machine learning techniques. In AAAI-99 Spring Symposium on Intelligent Agents in Cyberspace, 1999. http://citeseer.ist.psu.edu/mccallum99building.html   More

@inproceedings{ mccallum99building,
    author = "Andrew McCallum and Kamal Nigam and Jason Rennie and Kristie Seymore",
    title = "Building domain-specific search engines with machine learning techniques",
    booktitle = "Proc. {AAAI}-99 Spring Symposium on Intelligent Agents in Cyberspace, 1999.",
    year = "1999",
    url = "citeseer.ist.psu.edu/mccallum99building.html" }
Citations (may not include all citations):
1362   A tutorial on hidden Markov models and selected applications.. (context) - Rabiner - 1989
326   An inequality and associated maximization technique in stati.. (context) - Baum - 1972
262   Statistical Language Learning - Charniak - 1993
234   Dynamic Programming (context) - Bellman - 1957
189   Webwatcher: A tour guide for the World Wide Web - Joachims, Freitag et al. - 1997
149   Learning to extract symbolic knowledge from the World Wide W.. - Craven, DiPasquo et al. - 1998
140   A comparison of event models for naive Bayes text classifica.. - McCallum, Nigam - 1998
103   at forty: The independence assumption in information retriev.. (context) - Lewis - 1998
91   Nymble: a high-performance learning name-finder - Bikel, Miller et al. - 1997
88   Bayes and Empirical Bayes Methods for Data Analysis (context) - Carlin, Louis - 1996
69   On structuring probabilistic dependences in stochastic langu.. (context) - Ney, Essen et al. - 1994
58   Efficient crawling through URL ordering - Cho, Garcia-Molina et al. - 1998
57   Bayesian Learning of Probabilistic Language Models - Stolcke - 1994
42   A web-based information system that reasons with structured .. - Cohen - 1998
40   ARACHNID: Adaptive retrieval agents choosing heuristic neigh.. - Menczer - 1997
39   A hidden Markov model approach to text segmentation and even.. (context) - Yamron, Carp et al. - 1998
37   Information extraction using hidden Markov models - Leek - 1997
18   Improving text clasification by shrinkage in a hierarchy of .. (context) - McCallum, Rosenfeld et al. - 1998
16   Error bounds for convolutional codes and an asymtotically op.. (context) - Viterbi - 1967
8   Regression using classification algorithms (context) - Torgo, Gama - 1997
8   Machine Learning - ICML, Mitchell - 1997
5   A public digital library based on full-text retrieval: Colle.. - Witten, Nevill-Manning et al. - 1998
4   al Statistical Society, Series B 39(1):1--38. Hofmann, T., a.. (context) - from, via et al. - 1998
4   and Moore (context) - Kaelbling, Littman - 1996
2   and Rubin (context) - AAAI-, Dempster et al. - 1977
1   Wrapper Induction for Information Extraction (context) - AAAI- - 1997
1   Text classification from labeled and unlabeled documents usi.. (context) - Speech, -- et al. - 1999
1   Modeling web sources for information integration (context) - learning, survey et al. - 1998



The graph only includes citing articles where the year of publication is known.


Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC