| K. Bharat, A. Broder, J. Dean, and M. R. Henzinger. A comparison of techniques to find mirrored hosts on the WWW. Journal of the American Society for Information Science, 51(12):1114-- 1122, 2000. |
....clients and servers, so both sides would know what redundant data existed in the client. In addition, the communication channel approach required a separate cache of packets exchanged in the past, which may compete with the browser cache and other applications for resources. Bharat, et al. [3, 4] give methods for detecting mirrors (systematic replication of content) Mirrored URLs are likely to have similar content and are good candidates for delta encoding [12] The extent of duplication on the web is estimated at 30 40 [7, 21] Here we discuss some possible new methods for improving ....
Krishna Bharat, Andrei Z. Broder, Jeffrey Dean, and Monika Rauch Henzinger. A compari- son of techniques to find mirrored hosts on the WWW. Journal of the American Society of Information Science, 51(12):1114-1122, 2000.
....and only 0.15 as a portal. While at first these results were disappointing, and discouraged any efforts to tune the parameters of the local approach to improve accuracy, it becomes apparent that in many practical web applications, most notably searching [19] but also other web heuristic problems [3], precision does not get close to 1. Consequently, we need not dismiss the possibility of a preference this kind of site option on a search engine simply because some pages will be preferenced incorrectly. Once more rigorous analyses were conducted (Section 4.3.2) it was clear that parameter ....
....[18] have developed client programs for doing practical metasearches. There were a number of features which neither they, nor hyper topological search engines like Google, had incorporated. Some of these, such as mirror detection (and representation) had been separate topics of research [3]. Whilst work on this thesis was underway, Google simultaneously implemented some of them, such as cluster visualisation [30] Some of these difficulties are inherent properties of the research field. Because of the enormous commercial pressures on the technology, highlighted by Brin and Page in ....
Krishna Bharat, Andrei Broder, Jeff Dean, and Monika Henzinger. A comparison of techniques to find mirrored hosts on the WWW. ACM Digital Library Workshop on Organizing Web Space (WOWS), Berkeley, CA, 1999.
....URLs. Shivakumar Garcia Molina investigate mirroring in a large crawler data set [47] They report far more aliasing than appears in the WebTV client trace: 36 of reply bodies are accessible through more than one URL. Bharat et al. survey techniques for identifying mirrors on the Internet [6]. Bharat Broder investigate mirroring in a large crawler data set and report that roughly 10 of popular hosts are mirrored to some extent [5] Broder et al. consider approximate mirroring or syntactic similarity [10] Although they introduce sophisticated measures of document similarity, ....
K. Bharat, A. Broder, J. Dean, and M. R. Henzinger. A comparison of techniques to find mirrored hosts on the WWW. In Proc. Workshop on Organizing Web Space at 4th ACM Conference on Digital Libraries, Aug. 1999.
....URLs. Shivakumar Garcia Molina investigate mirroring in a large crawler data set [47] They report far more aliasing than appears in the WebTV client trace: 36 of reply bodies are accessible through more than one URL. Bharat et al. survey techniques for identifying mirrors on the Internet [6]. Bharat Broder investigate mirroring in a large crawler data set and report that roughly 10 of popular hosts are mirrored to some extent [5] Broder et al. consider approximate mirroring or syntactic similarity [10] Although they introduce sophisticated measures of document similarity, ....
K. Bharat, A. Broder, J. Dean, and M. R. Henzinger. A comparison of techniques to find mirrored hosts on the WWW. In Proc. Workshop on Organizing Web Space at 4th ACM Conference on Digital Libraries, Aug. 1999.
.... Link Database include PageRank [BP98] which ranks the importance of web pages based on their inlinks; Kleinberg s HITS [K98] which refines web query results by examining local subgraphs of the web; mirror detection, which uses links and URLs to find web sites that mirror each other s contents [BBDH99]; and studies of web structure [BKM00] While the Link Database is critical for each of these applications, we know of no other database that can store as many links in as little space. We have developed three versions of the Link Database over the last four years. The first version, Link1 ....
K. Bharat, A. Broder, J. Dean, and M. Henzinger. A comparison of techniques to find mirrored hosts on the WWW. Proceedings of the ACM Conference on Digital Libraries, 1999.
....Name aliasing can lead to HTTP servers returning the same document for different URLs. Also, servers may contain broken links which lead to apparently unique and valid pages returning the same error page, instead of an HTTP 404 error code. Although not a completely reliable heuristic (see (Bharat et al. 1999) for a more thorough treatment of the identification of duplicate documents in Web data) a CRC64 checksum provides a reasonable signature of identity. We eliminated all documents sharing the same checksum as another document on the same server. A final problem was the existence of large ....
Bharat, K., Broder, A., Dean, J., and Henzinger, M. (1999). A comparison of techniques to find mirrored hosts on the WWW. In Proceedings of the ACM Digital Library Workshop on Organizing Web Space (WOWS)WWW9, Berkeley, CA. gatekeeper.dec.com/pub/ Digital/SRC/publications/Broder/DL99-WOWS.pdf.
....(e.g. advertising or paid links) Nepotistic links are typically considered undesirable, and so it is useful to eliminate them before calculating the popularity or status of a page. Finding link nepotism is similar to, but distinct from the problem of identifying duplicate pages or mirrored sites (Bharat Broder 1999; Bharat et al. 1999; Shivakumar Garcia Molina 1998) While mirrored pages are an instance of the problem we are trying to address (that one entity gets to vote more often than it should) we are only considering (in this paper) the cases in which pages (mirrored or not) are pointing to a ....
....links) Nepotistic links are typically considered undesirable, and so it is useful to eliminate them before calculating the popularity or status of a page. Finding link nepotism is similar to, but distinct from the problem of identifying duplicate pages or mirrored sites (Bharat Broder 1999; Bharat et al. 1999; Shivakumar Garcia Molina 1998) While mirrored pages are an instance of the problem we are trying to address (that one entity gets to vote more often than it should) we are only considering (in this paper) the cases in which pages (mirrored or not) are pointing to a target for reasons other ....
Bharat, K.; Broder, A.; Dean, J.; and Henzinger, M. 1999. A comparison of techniques to find mirrored hosts on the WWW. In Proceedings of the ACM Digital Library Workshop on Organizing Web Space (WOWS).
....influenced by shared language. 6. Mining Related Web Hosts A natural extension of previous work in connectivity based web data mining would be to look at ways to extract significant relationships between hosts on the web based on connectivity within the hostgraph. Previously Bharat et al. [5] explored the use of similarity in outdegree distributions between hosts to find mirrored web sites. This technique proved to be too weak to find mirrors. However, we found that it often yielded pairs of related or affiliated hosts, rather than true mirrors. This led us to using the hyperlink ....
K. Bharat, A. Broder, J. Dean, and M. Henzinger. A comparison of techniques to find mirrored hosts on the www. Journal of the American Society for Information Science, 2000.
....Similar Content. Level 4 Similar Content. Level 5 Same Structure. Table 1: Similarity levels 2.4 Current Research on Mirroring Several researches on mirroring have been conducted before. Bharat and Broder compared several algorithms for identifying mirrored hosts on the World Wide Web [5]. The algorithms operate on the basis of URL strings and linkage data: the type of information easily available from web proxies and crawlers. They evaluated 4 classes of top down algorithms for detecting mirrored host pairs (that is, algorithms that are based on page attributes such as URL, IP ....
Krishna Bharat, Andrei Broder, Jefferey Dean, and Monika R. Henzinger, A comparison of techniques to find mirrored hosts on the WWW, To appear in Journal of the American Society for Information Science, August
No context found.
K. Bharat, A. Broder, J. Dean, and M. R. Henzinger. A comparison of techniques to find mirrored hosts on the WWW. Journal of the American Society for Information Science, 51(12):1114-- 1122, 2000.
No context found.
Bharat, K., Broder, A.Z., Dean, J., Henzinger, M.R.: A comparison of techniques to find mirrored hosts on the WWW. Journal of the American Society for Information Science (JASIS) 51 (2000) 1114--1122
No context found.
K. Bharat, A. Z. Broder, J. Dean, and M. R. Henzinger. A comparison of techniques to find mirrored hosts on the WWW. Journal of the American Society of Information Science, 51(12):1114--1122, 2000.
No context found.
K. Bharat, A. Z. Broder, J. Dean, and M. R. Henzinger. A comparison of techniques to find mirrored hosts on the WWW. Journal of the American Society of Information Science, 51(12):1114--1122, 2000.
No context found.
K. Bharat, A. Z. Broder, J. Dean, and M. R. Henzinger. A comparison of techniques to find mirrored hosts on the WWW. Journal of the American Society of Information Science, 51(12):1114--1122, 2000.
No context found.
K. Bharat, A. Z. Broder, J. Dean, and M. R. Henzinger. A comparison of techniques to find mirrored hosts on the WWW. Journal of the American Society of Information Science, 51(12):1114--1122, 2000.
No context found.
K. Bharat, A. Z. Broder, J. Dean, and M. R. Henzinger. A comparison of techniques to find mirrored hosts on the WWW. Journal of the American Society of Information Science, 51(12):1114--1122, 2000.
No context found.
K. Bharat, A. Broder, J. Dean and M. Henzinger. A comparison of techniques to find mirrored hosts on the WWW. Journal of the American Society for Information Science, 51(12):1114-1122, Dec. 2000.
No context found.
Krishna Bharat, Andrei Z. Broder, Jeffrey Dean, and Monika Rauch Henzinger. A comparison of techniques to find mirrored hosts on the WWW. Journal of the American Society of Information Science, 51(12):1114--1122, 2000.
No context found.
K. Bharat, A. Broder, J. Dean and M. Henzinger. A comparison of techniques to find mirrored hosts on the WWW. Journal of the American Society for Information Science, 51(12):1114-1122, Dec. 2000.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC