See this document in CiteSeerX!

Focused Crawling Using Context Graphs (2000)  (Make Corrections)  (57 citations)
M. Diligenti, F.M. Coetzee, S. Lawrence, C.L. Giles, M. Gori
26th International Conference on Very Large Databases, VLDB 2000



  Home/Search   Context   Related

 
View or download:
nec.com/~lawrence/...focusvldb00.ps.gz
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  lucy.ing.unisi.it/~...pubications (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: Maintaining currency of search engine indices by exhaustive crawling is rapidly becoming impossible due to the increasing size and dynamic content of the web. Focused crawlers aim to search only the subset of the web related to a specific category, and offer a potential solution to the currency problem. The major problem in focused crawling is performing appropriate credit assignment to different documents along a crawl path, such that short-term gains are not pursued at the expense of... (Update)

Similar documents based on text:   More   All
7.2:   Focused Crawling Using Context Graphs - Diligenti, Coetzee, Lawrence.. (2000)   (Correct)
1.7:   Efficient Identification of Web Communities - Flake, Lawrence, Giles (2000)   (Correct)
1.3:   Feature Selection in Web Applications By ROC.. - Coetzee, Glover.. (2001)   (Correct)

Related documents from co-citation:   More   All
41:   Focused crawling: a new approach to topic-specific Web resource discovery - Chakrabarti, van der Berg et al. - 1999
25:   The anatomy of a large-scale hypertextual Web search engine - Brin, Page
18:   Efficient crawling through URL ordering - Cho, Garcia-Molina et al. - 1998

BibTeX entry:   (Update)

M. Diligenti, F. Coetzee, S. Lawrence, C. L. Giles, and M. Gori, "Focused crawling using context graphs," in Proc. Very Large Data Bases 2000 (VLDB 2000), September 2000. To appear. http://citeseer.ist.psu.edu/diligenti00focused.html   More

@inproceedings{ diligenti00focused,
    author = "Michelangelo Diligenti and Frans Coetzee and Steve Lawrence and C. Lee Giles and Marco Gori",
    title = "Focused Crawling using Context Graphs",
    booktitle = "26th International Conference on Very Large Databases, {VLDB} 2000",
    month = "10--14 September",
    address = "Cairo, Egypt",
    pages = "527--534",
    year = "2000",
    url = "citeseer.ist.psu.edu/diligenti00focused.html" }
Citations (may not include all citations):
2528   Maximum likelihood from incomplete data via the EM algorithm (context) - Dempster, Laird et al. - 1977
1256   An Introduction to Modern Information Retrieval (context) - Salton, McGill - 1983
976   Machine Learning (context) - Mitchell - 1997
576   Authoritative sources in a hyperlinked environment - Kleinberg - 1997
372   An algorithm for suffix stripping (context) - Porter - 1980
163   Improved algorithms for topic distillation in hyperlinked en.. - Bharat, Henzinger - 1998
154   Automatic resource compilation by analyzing hyperlink struct.. - Chakrabarti, Dom et al. - 1998
149   Focused crawling: a new approach to topicspecific web resour.. - Chakrabarti, van der Berg et al. - 1999
58   Efficient crawling through URL ordering - Cho, Garcia-Molina et al. - 1998
52   The connectivity server: Fast access to linkage information .. (context) - Bharat, Broder et al. - 1998
41   Using reinforcement learning to spider the web efficiently - Rennie, McCallum - 1999
32   Automating the construction of internet portals with machine.. - McCallum, Nigam et al.
10   available httpwww (context) - one, Inktomi et al. - 2000
7   Text classification from labeled and unlabelled documents us.. (context) - Nigam, McCallum et al. - 1999
7   Building domain-specic search engines with machine learning .. - McCallum, Nigam et al. - 1999
3   Surfing backwards on the web (context) - Chakrabarti, Gidson et al. - 1999
3   WWW robots and search engines - Heinonen, Htnen et al. - 1996
2   http://nautilus.dii.unisi.it (context) - Gori, Maggini et al.



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://lucy.ing.unisi.it/~diligmic/pubications.htm):
A Wireless, Position Aware and Adaptive Information.. - Benelli, Bianchi..   (Correct)
A Position Aware Information Appliance - Benelli, Bianchi, Diligenti (2000)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC