• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

DMCA

Focused crawling: a new approach to topic-specific Web resource discovery (1999)

Cached

  • Download as a PDF

Download Links

  • [www.cs.berkeley.edu]
  • [www.cse.iitb.ac.in]
  • [http.cs.berkeley.edu]
  • [www.cse.iitb.ac.in]
  • [www.math.unipd.it]
  • [www.n3labs.com]
  • [www.cse.iitb.ac.in]
  • [www.cse.iitb.ac.in]
  • [www.math.unipd.it]
  • [mainline.brynmawr.edu]
  • [www.cs.berkeley.edu]
  • [www.cse.iitb.ac.in]
  • [www.cse.iitb.ac.in]
  • [cs.brynmawr.edu]

  • Other Repositories/Bibliography

  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Soumen Chakrabarti , Martin van den Berg , Byron Dom
Citations:626 - 10 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@MISC{Chakrabarti99focusedcrawling:,
    author = {Soumen Chakrabarti and Martin van den Berg and Byron Dom},
    title = {Focused crawling: a new approach to topic-specific Web resource discovery},
    year = {1999}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

The rapid growth of the World-Wide Web poses unprecedented scaling challenges for general-purpose crawlers and search engines. In this paper we describe a new hypertext resource discovery system called a Focused Crawler. The goal of a focused crawler is to selectively seek out pages that are relevant to a pre-defined set of topics. The topics are specified not using keywords, but using exemplary documents. Rather than collecting and indexing all accessible Web documents to be able to answer all possible ad-hoc queries, a focused crawler analyzes its crawl boundary to find the links that are likely to be most relevant for the crawl, and avoids irrelevant regions of the Web. This leads to significant savings in hardware and network resources, and helps keep the crawl more up-to-date. To achieve such goal-directed crawling, we designed two hypertext mining programs that guide our crawler: a classifier that evaluates the relevance of a hypertext document with respect to the focus topics, ...

Keyphrases

focused crawling    new approach    focused crawler    topic-specific web resource discovery    world-wide web    unprecedented scaling challenge    goal-directed crawling    avoids irrelevant region    network resource    hypertext mining program    general-purpose crawler    pre-defined set    hypertext document    possible ad-hoc query    accessible web document    significant saving    focus topic    rapid growth    new hypertext resource discovery system    exemplary document    crawl boundary    search engine   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University