(Enter summary)
Abstract: Consider the task of exploring the Web in order to find pages of a
particular kind or on a particular topic. This task arises in the construction
of search engines and Web knowledge bases. This paper argues
that the creation of efficient web spiders is best framed and solved
by reinforcement learning, a branch of machine learning that concerns
itself with optimal sequential decision making. One strength of reinforcement
learning is that it provides a formalism for measuring the
utility of... (Update)
Cited by: More
Design of a Crawler with Bounded Bandwidth - Diligenti, Maggini, Pucci.. (2004)
(Correct)
Using URLs and Table Layout for Web Classification Tasks - Lawrence Kai Shih (2004)
(Correct)
Web Page Classification Using Visual Layout Analysis - Kovacevic, Diligenti..
(Correct)
Similar documents (at the sentence level): More
35.7%: Using Reinforcement Learning to Spider the Web Efficiently - Rennie, McCallum (1999)
(Correct)
22.9%: Efficient Web Spidering with Reinforcement Learning - Rennie, McCallum (1999)
(Correct)
17.8%: Building Domain-Specific Search Engines with Machine .. - McCallum, Nigam.. (1999)
(Correct)
Active bibliography (related documents): More All
0.2: A Machine Learning Approach to Building.. - McCallum, Nigam.. (1999)
(Correct)
0.1: Information Retrieval on the World Wide Web and.. - Barfourosh.. (2002)
(Correct)
0.1: A Comparison of Probabilistic, Neural, and Fuzzy.. - Gini, Giumelli.. (2002)
(Correct)
Related documents from co-citation: More All
20: Focused crawling: a new approach to topic-specific Web resource discovery
- Chakrabarti, van der Berg et al. - 1999
16: Focused crawling using context graphs
- Diligenti, Coetzee et al. - 2000
16: The anatomy of a large-scale hypertextual Web search engine
- Brin, Page
BibTeX entry: (Update)
Jason Rennie and Andrew McCallum. Using reinforcement learning to spider the Web efficiently. In ICML-99, 1999. http://citeseer.ist.psu.edu/article/rennie99using.html More
@inproceedings{ rennie99using,
author = "Jason Rennie and Andrew Kachites McCallum",
title = "Using reinforcement learning to spider the {W}eb efficiently",
booktitle = "Proceedings of {ICML}-99, 16th International Conference on Machine Learning",
publisher = "Morgan Kaufmann Publishers, San Francisco, US",
address = "Bled, SL",
editor = "Ivan Bratko and Saso Dzeroski",
pages = "335--343",
year = "1999",
url = "citeseer.ist.psu.edu/article/rennie99using.html" }
Citations (may not include all citations):
976
Machine Learning (context) - Mitchell - 1997
408
Princeton University Press (context) - Bellman - 1957
189
Webwatcher: A tour guide for the World Wide Web
- Joachims, Freitag et al. - 1997
149
Learning to extract symbolic knowledge from the world wide w..
- Craven, DiPasquo et al. - 1998
140
A comparison of event models for naive Bayes text classifica..
- McCallum, Nigam - 1998
103
at forty: The independence assumption in information retriev.. (context) - Lewis, Bayes - 1998
81
Reinforcement learning: A survey (context) - Kaelbling, Littman et al. - 1996
58
Efficient crawling through URL ordering
- Cho, Garcia-Molina et al. - 1998
40
ARACHNID: Adaptive retrieval agents choosing heuristic neigh..
- Menczer - 1997
36
A machine learning architecture for optimizing web search en..
- Boyan, Freitag et al. - 1996
23
Building domain-specific search engines with machine learnin..
- McCallum, Nigam et al. - 1999
18
Improving text clasification by shrinkage in a hierarchy of .. (context) - McCallum, Rosenfeld et al. - 1998
8
Regression using classification algorithms (context) - Torgo, Gama - 1997
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://www.cs.cmu.edu/~mccallum/): More
Building Domain-Specific Search Engines with Machine .. - McCallum, Nigam.. (1999)
(Correct)
Learning to Classify Text from Labeled and Unlabeled Documents - Nigam (1998)
(Correct)
Distributional Clustering of Words for Text Classification - Baker, McCallum (1998)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC