2 citations found. Retrieving documents...
E. T. O'Neill, P. D. McClain, and B. F. Lavoie, A methodology for sampling the world wide web, Technical report, OCLC Annual Review of Research, 1997. http://www.oclc.org/oclc/research/publications/review97/oneill/o'neilla%r980213.htm.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
A URL-String-Based Algorithm for Finding WWWMirror Hosts - Submitted To Committee   (Correct)

....of duplicates and near duplicates has been estimated at 30 to 45 [1, 8] The principal reason for duplication on the Web is the systematic replication of content across distinct hosts, a phenomenon known as mirroring. It is estimated that at least 10 of the hosts on the WWW are mirrored [2, 4]. Duplication has both positive and negative aspects. On one hand the redundancy makes retrieval easier: if a search engine has missed one copy, maybe it has the other; or if one page has become unavailable, maybe a replica can be retrieved. On the other hand, from the point of view of search ....

E. T. O'Neill, P. D. McClain, and B. F. Lavoie, A methodology for sampling the world wide web, Technical report, OCLC Annual Review of Research, 1997. http://www.oclc.org/oclc/research/publications/review97/oneill/o'neilla%r980213.htm.


A Comparison of Techniques to Find Mirrored Hosts on the WWW - Bharat, Broder, Dean, al. (1999)   (17 citations)  (Correct)

....to a query is a nuisance. The principal reason for duplication on the Web is the systematic replication of content across distinct hosts, a phenomenon known as mirroring (These notions are defined more precisely below. It is estimated that at least 10 of the hosts on the WWW are mirrored [2, 12]. The aim of this paper is to present and evaluate algorithms for detecting mirroring on the WWW in an information retrieval framework. We start with some definitions. Each document on the WWW has a unique name called the Universal Resource Locator (URL) The URL consists of three disjoint parts, ....

E. T. O'Neill, P. D. McClain, and B. F. Lavoie. A methodology for sampling the world wide web. Technical report, OCLC Annual Review of Research, 1997. URL: http://www.oclc.org/oclc/research/ publications/review97/oneill/o'neilla%r980213.htm.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC