3 citations found. Retrieving documents...
J. Cho, N. Shivakumar, and H. Garcia-Molina, Computing document clusters on the web, In private communication (submitted to VLDB '99), 1999.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
A URL-String-Based Algorithm for Finding WWWMirror Hosts - Submitted To Committee   (Correct)

....A high percentage of paths (that is, the portions of the URL after the hostname) are valid on both web sites, and . These common paths link to documents that have similar content. Highly similar is a subjective measure. I made this notion precise by adopting the resemblance distance described in [3] that experimentally captures well the informal notion of roughly the same. The technique efficiently computes the syntactic resemblance between two documents as a fractional score between 0 and 1. The higher the score is, the greater the resemblance. Any distance measure that computes a similar ....

....on page attributes such as URL, IP address, and connectivity, and not on the page content) on a collection of 140 million URLs (on 230,000 hosts) and their associated connectivity information. There is an alternative bottom up approach presented by Cho, Shivakumar, and Hector Garcia Molina in [3] whereby in the first stage of the algorithm 16 copies of a given pages are clustered together, and then these clusters are grown until they represent an entire site. Beyond finding mirror sites, some researchers tried to make best performance out of mirror sites. One promising approach is ....

[Article contains additional citation context not shown here]

J. Cho, N. Shivakumar, and H. Garcia-Molina, Computing document clusters on the web, In private communication (submitted to VLDB '99), 1999.


A Comparison of Techniques to Find Mirrored Hosts on the WWW - Bharat, Broder, Dean, al. (1999)   (17 citations)  (Correct)

....our algorithms are based on page attributes such as URL, IP address, and connectivity, and not on the page content. We are essentially using the replicated structure of mirrors to identify them. There is an alternative bottom up approach presented by Cho, Shivakumar, and Hector Garcia Molina in [10] whereby in the first stage of the algorithm copies of a given pages are clustered together, and then these clusters are grown until they represent an entire site. There are advantages and disadvantages to each approach: the top down structural approach has the advantage that it needs only the ....

J. Cho, N. Shivakumar, and H. Garcia-Molina. Computing document clusters on the web. In private communication (submitted to VLDB '99), 1999.


Beyond Document Similarity: Understanding.. - Paepcke.. (2000)   (3 citations)  Self-citation (Cho Garcia)   (Correct)

....Google considers a document with more links pointing to it as more valuable than one that is less linked to. The rationale is that if more authors of Web pages have felt it worthwhile to include a link, the document is in some way more valuable. Similarly, one aspect of the SCAM system [10] searches the Web and finds collections of documents that are complete or partial mirrors of another site. We can view such mirror sites as a kind of structural feature of the World Wide Web collection. Considering this collection level structure, one might conclude that mirrored documents are ....

....of crawlers, can be informed by an understanding of information value. For example, when limited in time and processing resources, crawlers can revisit high value documents more often, or can explore high value sites more deeply than other documents and sites that appear to be less important [10]. A wide variety of work remains to be accomplished in the area of value filtering. There is, of course, room for invention of new types of collection and judgment metadata. Similarly, novel techniques of designing the corresponding filter engines would help. There are also some broader, very ....

Junghoo Cho, Narayanan Shivakumar, and Hector Garcia -Molina. Computing Document Clusters on the Web . I n Submitted to VLDB '99, 1998.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC