64 citations found. Retrieving documents...
A. Heydon and M. Najork. Mercator: a scalable, extensible Web crawler. World Wide Web, 2(4):219--229, 1999. Magesh Jayapandian et al.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Versus: a Web Data Repository with Time Support - Campos (2003)   (3 citations)  (Correct)

....among objects. Any application built on it, requiring knowledge about objects and associations, for example to represent links between pages, has to use a complementary storage (or parse HTML, identifying links at run time) 2.5. 4 Mercator Mercator is a generic Web crawler for Web applications [21]. Periodically the crawler checkpoints the data structures it builds to disk, enabling recover from failures by resuming the crawling process on the last checkpoint before the failure. Versus supports a check in check out mechanism that can be used to implement checkpoints. Unlike Versus, Mercator ....

Heydon, A., and Najork, M. Mercator: A scalable, extensible web crawler. World Wide Web 2, 4 (1999), 219--229.


Crawling the Hidden Web - Raghavan, Garcia-Molina (2001)   (40 citations)  (Correct)

....However, since we use the layout technique to extract labels, there is no overhead in using layout information for extracting domain values as well. 6. 2 Crawler Performance Metric Traditional crawlers, which deal with the publicly indexable Web, use metrics such as crawling speed, scalability [15], page importance [8] and freshness [7] to measure the effectiveness of their crawling activity. However, none of these metrics captures the fundamental challenge in dealing with the Hidden Web namely processing and submitting forms. We considered a number of options for measuring the ....

....7 Related Work In recent years, the growth of the Web has stimulated significant interest in the study of Web crawlers. These studies have addressed various issues, such as performance, scalability, freshness, extensibility, and parallelism, in the design and implementation of crawlers [5, 6, 8, 15, 23]. However, all of this work has focused solely on the publicly indexable portion of the Web (see Figure 1) To the best of our knowledge, there has not been any previous report (at least, none that is publicly available) on techniques and architectures for crawling the hidden Web. Our task driven ....

Allan Heydon and Marc Najork. Mercator: A scalable, extensible Web crawler. World Wide Web, 2(4):219-- 229, December 1999.


Design and Implementation of a Distributed Crawler And .. - Zeinalipour-Yazti.. (2002)   (3 citations)  (Correct)

....Furthermore, it should provide built in support for (oS policies involving multiple service levels and servicelevel guarantees. Consequently, the scheduling and performance requirements of WebRACE crawling and filtering face very different constraints than systems like Google [3] Mercator [9], SPHINX [16] or NetAttache Pro [11] Finally, WebRACE is implemented entirely in Java. Its implementation consists of approximately 5500 lines of code, 2649 of which correspond to the Minicrawler implementation, 1184 to the Annotation Engine, 367 to the SafeQueue data structure, and 1300 to ....

....that each extracted link corresponds to a valid and absolute URL, invoking a URL normalizer to de relativize it, if necessary. Then, the normalized URL is appended to the list of URL s scheduled for download, provided this URL has not been fetched earlier. In contrast to typical crawlers [16,9], WebRACE refreshes frequently its URL seed list from requests posted by the eRACE Request Scheduler. These requests have the following format: Link, ParentLink, Depth, owners) Link is the URL address of the Web resource sought, ParentLink is the URL of the page that contained Link, Depth ....

[Article contains additional citation context not shown here]

A. Heydon and M. Najork. Mercator: A Scalable, Extensible Web Crawler. World Wide Web, 2(4):219-229, December 1999.


Design and Implementation of a High-Performance Distributed.. - Shkapenyuk, Suel (2002)   (21 citations)  (Correct)

....been less work on the second issue. Clearly, all the major search engines have highly optimized crawling systems, although details of these systems are usually proprietary. The only system described in detail in the literature appears to be the Mercator system of Heydon and Najork at DEC Compaq [16], which is used by AltaVista. Some details are also known about the first version of the Google crawler [5] and the system used by the Internet Archive [6] While it is fairly easy to build a slow crawler that downloads a few pages per second for a short period of time, building a ....

....We now discuss the requirements for a good crawler, and approaches for achieving them. Details on our solutions are Of course, in the case of the application, replication is up to the designer of the component, who has to decide how to partition data structures and workload. e.g. Mercator [16] does not use this partition, but achieves flexibility through the use of pluggable Java components. given in the subsequent sections. Flexibility: As mentioned, we would like to be able to use the system in a variety of scenarios, with as few modifications as possible. Low Cost and High ....

[Article contains additional citation context not shown here]

A. Heydon and M. Najork. Mercator: A scalable, extensible web crawler. World Wide Web, 2(4):219--229, 1999.


Efficient URL Caching for World Wide Web Crawling - Broder, Najork, Wiener (2003)   (1 citation)  Self-citation (Najork)   (Correct)

No context found.

A. Heydon and M. Najork. Mercator: A scalable, extensible web crawler. World Wide Web, 2(4):219--229, 1999.


Using Delay - To Defend Against   (Correct)

No context found.

A. Heydon and M. Najork. Mercator: a scalable, extensible Web crawler. World Wide Web, 2(4):219--229, 1999. Magesh Jayapandian et al.


Monitoring the Dynamic Web - To Respond To   (Correct)

No context found.

A. Heydon and M. Najork. Mercator: A scalable, extensible web crawler. World Wide Web, 2(4):219--229, 1999.


ACM Journal of Educational Resources in Computing, Vol. .. - Information Systems..   (Correct)

No context found.

HEYDON, A. AND NAJORK, M. 1999 Mercator: A s calable, extensible Web crawler. World Wide Web 2, 4 (1999), 219-229.


Workload-Aware Web Crawling and Server Workload - Detection Shaozhi Ye   (Correct)

No context found.

A. Heydon and M. Najork. Mercator: A scalable, extensible web crawler. World Wide Web, 2(4):219--229, 1999.


Location Aware Web-Crawling with the Use of Migrating Crawlers - Papapetrou (2003)   (Correct)

No context found.

Allan Heydon and Marc Najork. Mercator: A scalable, extensible web crawler. World Wide Web, 2(4):219--229, 1999.


Minimizing the Network Distance in Distributed Web Crawling - Papapetrou, Samaras   (Correct)

No context found.

Allan Heydon and Marc Najork. Mercator: A scalable, extensible web crawler. World Wide Web, 2(4):219--229, 1978.


Workload-Aware Web Crawling and Server Workload Detection - Ye, Lu, Li (2004)   (Correct)

No context found.

A. Heydon and M. Najork. Mercator: A scalable, extensible web crawler. World Wide Web, 2(4):219--229, 1999.


Distributed High-performance Web Crawlers: - The   (Correct)

No context found.

Allan Heydon and Marc Najork. Mercator: A scalable, extensible web crawler. World Wide Web, 2(4):219--229, 1999.


Automatic Web News Extraction Using Tree Edit Distance - Reis, Golgher, Laender, Silva (2004)   (5 citations)  (Correct)

No context found.

A. Heydon and M. Najork. Mercator: A scalable, extensible web crawler. World Wide Web, 2(4):219--229, 1999.


Semantic Resource Management for the Web: An E-Learning.. - Tane, Schmitz, Stumme (2004)   (Correct)

No context found.

Allan Heydon and Marc Najork. Mercator: A scalable, extensible web crawler. World Wide Web, 2(4):219--229, December 1999.


Phoenix : a Parallel Programming Model for.. - Taura, Kaneda, Endo.. (2003)   (Correct)

No context found.

A. Heydon and M. Najork. Mercator: A scalable, extensible web crawler. World Wide Web, 2(4):219--229, December 1999.


Effective Web Crawling - Chapter 2 - Castillo (2004)   (Correct)

No context found.

Allan Heydon and Marc Najork. Mercator: A scalable, extensible web crawler. World Wide Web Conference, 2(4):219--229, April 1999.


Monitoring the Dynamic Web to Respond to Continuous Queries - Pandey, Ramamritham, al. (2003)   (Correct)

No context found.

A. Heydon and M. Najork. Mercator: A scalable, extensible web crawler. World Wide Web, 2(4):219--229, 1999.


Mutable Strings in Java: Design, Implementation and.. - Boldi, Vigna   (Correct)

No context found.

A. Heydon, M. Najork, Mercator: A scalable, extensible web crawler, World Wide Web (1999) 219--229.


Apoidea: A Decentralized Peer-to-Peer Architecture for .. - Singh, Srivatsa, Liu, .. (2004)   (2 citations)  (Correct)

No context found.

Allan Heydon and Marc Najork. Mercator: A scalable, extensible web crawler. World Wide Web, 2(4):219--229, 1999.


A Prototype System for Retrieving Dynamic Content - Shestakov   (Correct)

No context found.

Allan Heydon and Marc Najork. Mercator: A Scalable, Extensible Web Crawler. World Wide Web, 2(4):219-229, 1999


Collecting Statistics about the Portuguese Web - Gomes, Silva (2003)   (Correct)

No context found.

A. Heydon and M. Najork. Mercator: A Scalable, Extensible Web Crawler. World Wide Web, 2(4):219--229, 1999.


Apoidea: A Decentralized Peer-to-Peer Architecture for .. - Singh, Srivatsa, Liu, .. (2004)   (2 citations)  (Correct)

No context found.

Allan Heydon and Marc Najork. Mercator: A scalable, extensible web crawler. World Wide Web, 2(4):219--229, 1999.


Crawling the Infinite Web: Five Levels Are Enough - Baeza-Yates, Castillo (2004)   (Correct)

No context found.

Heydon, A., Najork, M.: Mercator: A scalable, extensible web crawler. World Wide Web Conference 2 (1999) 219--229


Efficient Annotation Inference for an Extended Static Checker - Flanagan, Leino, Levin   (Correct)

No context found.

Allan Heydon and Marc A. Najork. Mercator: A scalable, extensible web crawler. World Wide Web, 2(4):219--229, December 1999.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC