(Enter summary)
Abstract: Consider the task of exploring the Web in order to find pages of a particular kind or on a particular topic. This task arises in the construction of search engines and Web knowledge bases. This paper argues that the creation of efficient web spiders is best framed and solved by reinforcement learning, a branch of machine learning that concerns itself with optimal sequential decision making. (Update)
Cited by: More
Design of a Crawler with Bounded Bandwidth - Diligenti, Maggini, Pucci.. (2004)
(Correct)
Using URLs and Table Layout for Web Classification Tasks - Lawrence Kai Shih (2004)
(Correct)
Web Page Classification Using Visual Layout Analysis - Kovacevic, Diligenti..
(Correct)
Similar documents (at the sentence level): More
40.0%: Using Reinforcement Learning to Spider the Web Efficiently - Rennie, McCallum (1999)
(Correct)
8.0%: Efficient Web Spidering with Reinforcement Learning - Rennie, McCallum (1999)
(Correct)
5.9%: Building Domain-Specific Search Engines with Machine .. - McCallum, Nigam.. (1999)
(Correct)
Active bibliography (related documents): More All
0.2: A Machine Learning Approach to Building.. - McCallum, Nigam.. (1999)
(Correct)
0.1: Information Retrieval on the World Wide Web and.. - Barfourosh.. (2002)
(Correct)
0.1: A Comparison of Probabilistic, Neural, and Fuzzy.. - Gini, Giumelli.. (2002)
(Correct)
Related documents from co-citation: More All
20: Focused crawling: a new approach to topic-specific Web resource discovery
- Chakrabarti, van der Berg et al. - 1999
16: Focused crawling using context graphs
- Diligenti, Coetzee et al. - 2000
16: The anatomy of a large-scale hypertextual Web search engine
- Brin, Page
BibTeX entry: (Update)
Jason Rennie and Andrew McCallum. Using reinforcement learning to spider the Web efficiently. In ICML-99, 1999. http://citeseer.ist.psu.edu/article/rennie99using.html More
@inproceedings{ rennie99using,
author = "Jason Rennie and Andrew Kachites McCallum",
title = "Using reinforcement learning to spider the {W}eb efficiently",
booktitle = "Proceedings of {ICML}-99, 16th International Conference on Machine Learning",
publisher = "Morgan Kaufmann Publishers, San Francisco, US",
address = "Bled, SL",
editor = "Ivan Bratko and Saso Dzeroski",
pages = "335--343",
year = "1999",
url = "citeseer.ist.psu.edu/article/rennie99using.html" }
Citations (may not include all citations):
976
Machine Learning (context) - Mitchell - 1997 ACM DBLP
408
Princeton University Press (context) - Bellman - 1957
189
Webwatcher: A tour guide for the World Wide Web
- Joachims, Freitag et al. - 1997
149
Learning to extract symbolic knowledge from the world wide w..
- Craven, DiPasquo et al. - 1998
140
A comparison of event models for naive Bayes text classifica..
- McCallum, Nigam - 1998
103
at forty: The independence assumption in information retriev.. (context) - Lewis, Bayes - 1998
81
Reinforcement learning: A survey (context) - Kaelbling, Littman et al. - 1996
40
ARACHNID: Adaptive retrieval agents choosing heuristic neigh..
- Menczer - 1997
36
cient crawling through URL ordering (context) - Cho, Garcia-Molina et al. - 1998
36
A machine learning architecture for optimizing web search en..
- Boyan, Freitag et al. - 1996
30
Statistical models for co-occurrence data
- Hofmann, Puzicha - 1998
23
Building domain-specific search engines with machine learnin..
- McCallum, Nigam et al. - 1999
18
Improving text clasification by shrinkage in a hierarchy of .. (context) - McCallum, Rosenfeld et al. - 1998
8
Regression using classification algorithms (context) - Torgo, Gama - 1997 DBLP
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://www.ai.mit.edu/~jrennie/):
Building Domain-Specific Search Engines with Machine .. - McCallum, Nigam.. (1999)
(Correct)
Using Reinforcement Learning to Spider the Web Efficiently - Rennie, McCallum (1999)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC