(Enter summary)
Abstract: Maintaining currency of search engine indices by exhaustive crawling is rapidly becoming impossible due to the increasing size and dynamic content of the web. Focused crawlers aim to search only the subset of the web related to a specific category, and offer a potential solution to the currency problem. The major problem in focused crawling is performing appropriate credit assignment to different documents along a crawl path, such that short-term gains are not pursued at the expense of... (Update)
Similar documents based on text: More All
7.2: Focused Crawling Using Context Graphs - Diligenti, Coetzee, Lawrence.. (2000)
(Correct)
1.7: Efficient Identification of Web Communities - Flake, Lawrence, Giles (2000)
(Correct)
1.3: Feature Selection in Web Applications By ROC.. - Coetzee, Glover.. (2001)
(Correct)
Related documents from co-citation: More All
41: Focused crawling: a new approach to topic-specific Web resource discovery
- Chakrabarti, van der Berg et al. - 1999
25: The anatomy of a large-scale hypertextual Web search engine
- Brin, Page
18: Efficient crawling through URL ordering
- Cho, Garcia-Molina et al. - 1998
BibTeX entry: (Update)
M. Diligenti, F. Coetzee, S. Lawrence, C. L. Giles, and M. Gori, "Focused crawling using context graphs," in Proc. Very Large Data Bases 2000 (VLDB 2000), September 2000. To appear. http://citeseer.ist.psu.edu/diligenti00focused.html More
@inproceedings{ diligenti00focused,
author = "Michelangelo Diligenti and Frans Coetzee and Steve Lawrence and C. Lee Giles and Marco Gori",
title = "Focused Crawling using Context Graphs",
booktitle = "26th International Conference on Very Large Databases, {VLDB} 2000",
month = "10--14 September",
address = "Cairo, Egypt",
pages = "527--534",
year = "2000",
url = "citeseer.ist.psu.edu/diligenti00focused.html" }
Citations (may not include all citations):
2528
Maximum likelihood from incomplete data via the EM algorithm (context) - Dempster, Laird et al. - 1977
1256
An Introduction to Modern Information Retrieval (context) - Salton, McGill - 1983
976
Machine Learning (context) - Mitchell - 1997
576
Authoritative sources in a hyperlinked environment
- Kleinberg - 1997
372
An algorithm for suffix stripping (context) - Porter - 1980
163
Improved algorithms for topic distillation in hyperlinked en..
- Bharat, Henzinger - 1998
154
Automatic resource compilation by analyzing hyperlink struct..
- Chakrabarti, Dom et al. - 1998
149
Focused crawling: a new approach to topicspecific web resour..
- Chakrabarti, van der Berg et al. - 1999
58
Efficient crawling through URL ordering
- Cho, Garcia-Molina et al. - 1998
52
The connectivity server: Fast access to linkage information .. (context) - Bharat, Broder et al. - 1998
41
Using reinforcement learning to spider the web efficiently
- Rennie, McCallum - 1999
32
Automating the construction of internet portals with machine..
- McCallum, Nigam et al.
10
available httpwww (context) - one, Inktomi et al. - 2000
7
Text classification from labeled and unlabelled documents us.. (context) - Nigam, McCallum et al. - 1999
7
Building domain-specic search engines with machine learning ..
- McCallum, Nigam et al. - 1999
3
Surfing backwards on the web (context) - Chakrabarti, Gidson et al. - 1999
3
WWW robots and search engines
- Heinonen, Htnen et al. - 1996
2
http://nautilus.dii.unisi.it (context) - Gori, Maggini et al.
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://lucy.ing.unisi.it/~diligmic/pubications.htm):
A Wireless, Position Aware and Adaptive Information.. - Benelli, Bianchi..
(Correct)
A Position Aware Information Appliance - Benelli, Bianchi, Diligenti (2000)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC