30 citations found. Retrieving documents...
SHKAPENYUK, V. and SUEL, T. 2002. Design and implementation of a high-performance distributed Web crawler. In Proceedings of 8th International Conference on Data Engineering. San Jose, CA, 357-368.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

UbiCrawler: a scalable fully distributed web crawler - Boldi, Codenotti, Santini.. (2003)   (5 citations)  (Correct)

....to gather data about the web. Even if no code is available, in several cases the basic design has been made public: this is the case, for instance, of Mercator [17] the Altavista crawler) of the original Google crawler [6] and of some research crawlers developed by the academic community [22, 23, 21]. Nonetheless, little published work actually investigates the fundamental issues underlying the parallelization of the different tasks involved with the crawling process. In At the time, the name of the crawler was Trovatore, later changed into UbiCrawler when the authors learned about the ....

.... commercial crawlers are not public, there are some highly performant, scalable crawling systems whose structure has been described and discussed by the authors; among them, two distributed crawlers that might be compared to UbiCrawler are Mercator [17] used by AltaVista, the spider discussed in [21], and the spider discussed in [22] Mercator is high performance web crawler whose components are loosely coupled; indeed, they can be distributed across several computing units. However, there is a central element, the frontier, which keeps track of all the URLs that have been crawled up to now ....

[Article contains additional citation context not shown here]

Vladislav Shkapenyuk and Torsten Suel. Design and implementation of a highperformance distributed web crawler. In IEEE International Conference on Data Engineering (ICDE), 2002.


Three-Level Caching for Efficient Query Processing in Large Web .. - Long, Suel (2005)   Self-citation (Suel)   (Correct)

No context found.

V. Shkapenyuk and T. Suel. Design and implementation of a high-performance distributed web crawler. In Proc. of the Int. Conf. on Data Engineering, 2002.


Efficient Query Evaluation on Large Textual Collections in a.. - Zhang, Suel (2005)   Self-citation (Suel)   (Correct)

No context found.

V. Shkapenyuk and T. Suel. Design and implementation of a highperformance distributed web crawler. In Proc. of the Int. Conf. on Data Engineering, February 2002.


Three-Level Caching for Efficient Query Processing in Large Web .. - Long, Suel (2005)   Self-citation (Suel)   (Correct)

No context found.

V. Shkapenyuk and T. Suel. Design and implementation of a high-performance distributed web crawler. In Proc. of the Int. Conf. on Data Engineering, 2002.


ODISSEA: A Peer-to-Peer Architecture for Scalable.. - Suel, Mathur, Wu, .. (2003)   (13 citations)  Self-citation (Suel)   (Correct)

No context found.

V. Shkapenyuk and T. Suel. Design and implementation of a high-performance distributed web crawler. In Proc. of the Int. Conf. on Data Engineering, February 2002.


Local Methods for Estimating PageRank Values - Chen, Gan, Suel (2004)   (1 citation)  Self-citation (Suel)   (Correct)

No context found.

V. Shkapenyuk and T. Suel. Design and implementation of a high-performance distributed web crawler. In Proc. of the Int. Conf. on Data Engineering, February 2002. 10


ODISSEA: A Peer-to-Peer Architecture for Scalable.. - Suel, Mathur, Wu, .. (2003)   (13 citations)  Self-citation (Suel)   (Correct)

....section where the best combination of index structures is used. Crawling punt: As mentioned, we assume that in the web search application crawling would be performed by crawling clients that fetch and insert documents. The main reason is that from our own experiences with large scale crawling [44] we are not sure a P2P solution is appropriate. Large crawls generate many management issues due to queries or complaints from web site operators and network administrators. It is important to be able to reconfigure a crawler quickly to avoid certain web sites or subnetworks or to modify its ....

....beyond BFS are hard to implement in a P2P environment unless there is a centralized scheduler. We refer to [6, 14, 15] for work on highly distributed crawling. Thus, we would expect that a handful of powerful crawling clients would provide most documents, and we plan to use our Polybot crawler [44] to initially populate the system with data. It might be more appropriate to incorporate recrawling into the system, though. Thus, an inserted page could be labeled with an expiration date, after which it is automatically refreshed by the node holding the page. Alternatively, web sites could also ....

[Article contains additional citation context not shown here]

V. Shkapenyuk and T. Suel. Design and implementation of a high-performance distributed web crawler. In Proc. of the Int. Conf. on Data Engineering, February 2002.


Optimized Query Execution in Large Search Engines with Global.. - Long, Suel (2003)   (3 citations)  Self-citation (Suel)   (Correct)

....used only one disk for the index structure. Software and Data Sets: Our experiments were run on a search engine prototype, named pingo, that is currently being developed in our research group. The document collection consisted of about million web pages crawled by the PolyBot web crawler [35] in October of 2002. Not all of the pages are distinct and the set contains a significant number of duplicates due to pages being repeatedly downloaded because of crawl interruptions. The crawl started at a hundred homepages of US Universities, and was performed in a breadth first manner. As ....

V. Shkapenyuk and T. Suel. Design and implementation of a high-performance distributed web crawler. In Proc. of the Int. Conf. on Data Engineering, February 2002.


Automated Gathering of Web Information: An In-depth.. - Jansen, Mullen..   (Correct)

No context found.

SHKAPENYUK, V. and SUEL, T. 2002. Design and implementation of a high-performance distributed Web crawler. In Proceedings of 8th International Conference on Data Engineering. San Jose, CA, 357-368.


Workload-Aware Web Crawling and Server Workload - Detection Shaozhi Ye   (Correct)

No context found.

V. Shkapenyuk and T. Suel. Design and implementation of a high-performance distributed web crawler. In Proceedings of International Conference on Data Engineering(ICDE), 2002.


SpamRank - Fully Automatic Link Spam Detection - Benczur, Csalogany, Sarlos, Uher (2005)   (4 citations)  (Correct)

No context found.

T. Suel and V. Shkapenyuk. Design and implementation of a high-performance distributed web crawler. In Proceedings of the IEEE International Conference on Data Engineering, February 2002.


SpamRank - Fully Automatic Link Spam Detection - Benczur, Csalogany, Sarlos, Uher (2005)   (4 citations)  (Correct)

No context found.

T. Suel and V. Shkapenyuk. Design and implementation of a high-performance distributed web crawler. In Proceedings of the IEEE International Conference on Data Engineering, February 2002.


Workload-Aware Web Crawling and Server Workload Detection - Ye, Lu, Li (2004)   (Correct)

No context found.

V. Shkapenyuk and T. Suel. Design and implementation of a high-performance distributed web crawler. In Proceedings of International Conference on Data Engineering(ICDE), 2002.


Efficient URL Caching for World Wide Web Crawling - Broder, Najork, Wiener (2003)   (1 citation)  (Correct)

No context found.

V. Shkapenyuk and T. Suel. Design and implementation of a high-performance distributed web crawler. In IEEE International Conference on Data Engineering (ICDE), Feb. 2002.


Design of a Crawler with Bounded Bandwidth - Diligenti, Maggini, Pucci.. (2004)   (Correct)

No context found.

V. Shkapenyuk and T. Suel. Design and implementation of a high-performance distributed web crawler. In Proceedings of the 18th International Conference on Data Enginering, 2002. 293


Distributed High-performance Web Crawlers: - The   (Correct)

No context found.

Vladislav Shkapenyuk and Torsten Suel. Design and implementation of a highperformance distributed web crawler. In ICDE, 2002.


Scheduling Algorithms for Web Crawling - Castillo, Marin, Rodriguez.. (2004)   (Correct)

No context found.

V. Shkapenyuk and T. Suel, "Design and implementation of a high-performance distributed web crawler," in ICDE, 2002. [Online]. Available: citeseer.nj.nec.com/shkapenyuk01design.html


Distributed Web Crawling over DHTs - Boon Thau Loo (2004)   (2 citations)  (Correct)

No context found.

V. Shkapenyuk and T. Suel. Design and implementation of a high-performance distributed web crawler. In ICDE, 2002.


Effective Web Crawling - Chapter 2 - Castillo (2004)   (Correct)

No context found.

Vladislav Shkapenyuk and Torsten Suel. Design and implementation of a high-performance distributed web crawler. In Proceedings of the 18th International Conference on Data Engineering (ICDE), pages 357 -- 368, San Jose, California, February 2002. IEEE Cs. Press.


Scheduling Algorithms for Web Crawling - Universidad (2004)   (Correct)

No context found.

V. Shkapenyuk and T. Suel, "Design and implementation of a high-performance distributed web crawler," in ICDE, 2002. [Online]. Available: citeseer.nj.nec.com/shkapenyuk01design.html


UbiCrawler: A Scalable Fully Distributed Web Crawler - Boldi, Codenotti, Santini.. (2002)   (5 citations)  (Correct)

No context found.

Vladislav Shkapenyuk and Torsten Suel. Design and implementation of a high-performance distributed web crawler. In IEEE International Conference on Data Engineering (ICDE), 2002.


Apoidea: A Decentralized Peer-to-Peer Architecture for .. - Singh, Srivatsa, Liu, .. (2004)   (2 citations)  (Correct)

No context found.

Vladislav Shkapenyuk and Torsten Suel. Design and implementation of a high-performance distributed web crawler. In Proceedings of International Conference on Data Engineering, 2002.


UbiCrawler: a scalable fully distributed web crawler - Boldi, Codenotti, Santini.. (2002)   (5 citations)  (Correct)

No context found.

Vladislav Shkapenyuk and Torsten Suel. Design and implementation of a high-performance distributed web crawler. In IEEE International Conference on Data Engineering (ICDE), 2002.


Apoidea: A Decentralized Peer-to-Peer Architecture for .. - Singh, Srivatsa, Liu, .. (2004)   (2 citations)  (Correct)

No context found.

Vladislav Shkapenyuk and Torsten Suel. Design and implementation of a high-performance distributed web crawler. In Proceedings of International Conference on Data Engineering, 2002.


SharpSpider: Spidering the Web through Web Services - Moody, Palomino (2003)   (1 citation)  (Correct)

No context found.

V. Shkapenyuk and T. Suel. Design and Implementation of a High-Performance Distributed Web Crawler. In Proceedings of the 18th International Conference on Data Engineering, San Jose, California, February 2002.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC