| S. Raghavan and H. Garcia-Molina. Representing web graphs. In Proceedings of the IEEE Intl. Conference on Data Engineering, March 2003. |
....hyperlinks Tree distance Down links Up links Across links Figure 2: Distribution of tree distance for hyperlinks. As distance increases, the probability of a hyperlink decreases. 3. 3 Link Compressibility It has been observed by several authors that the link graph is highly compressible [17, 16]. In [17] they report that it takes only 6 bits on average to store the outlinks from a set of 350 million pages (6 billion links) If the links were random then of course this would not be possible, as an easy probabilistic argument says that we would require at least 28 bits to store a single ....
S. Raghavan and H. Garcia-Molina. Representing web graphs. In IEEE International Conference on Data Engineering (ICDE03), 2003.
.... the direction of all arcs reversed) A compact representation of the transposed graph is essential in the study of several advanced ranking algorithm (e.g. HITS [6] the literature often reports that the transposed graph is more entropic , and thus more di cult to compress than the graph itself [8, 9], but we shall see that in the WebGraph framework transposed graphs actually compress better. 2 The Web Graph The web graph relative to a certain set of URLs is a directed graph having those URLs as nodes, and with an arc from x to y whenever page x contains a hyperlink toward page y. When ....
....see that R = 1 gives excellent results, although even when R = 1 we improve signi cantly over existing techniques. For comparison, version 3 of the LINK database provides a compression of 5.61 bits link for a web graph of 61 Mnodes and 1 Glinks graph, and 5. 66 bits link for its transpose, whereas [8] reports 5.07 bits per link and 5.63 bits per link, respectively, for the WebBase graph and its transpose . The gures reported in [10] on a 115 Mpages 1.47 Glinks snapshot taken from the Internet Archive, on the other hand, are much worse (13.92 bits per link; no data is provided for the ....
[Article contains additional citation context not shown here]
Sriram Raghavan and Hector Garcia-Molina. Representing web graphs. Technical Report 2002.
....properties of the graph, such as degree distributions and connectivity statistics, and on modeling the creation of the web graph [12, 2] However, this research has not directly addressed how this inherent structure can be effectively exploited to speed up link analysis. Raghavan and Garcia Molina [15] have exploited the hostname (or more generally, url) induced structure of the web to efficiently represent the web graph. In this paper, we directly exploit this kind of structure to achieve large speedups compared with previous algorithms for computing PageRank by . substantially improving ....
S. Raghavan and H. Garcia-Molina. Representing web graphs. In Proceedings of the IEEE Intl. Conference on Data Engineering, March 2003.
....properties of the graph, such as degree distributions and connectivity statistics, and on modeling the creation of the web graph [12, 2] However, this research has not directly addressed how this inherent structure can be exploited effectively to speed up link analysis. Raghavan and Garcia Molina [15] have exploited the hostname (or more generally, url) induced structure of the web to represent the web graph efficiently. In this paper, we directly exploit this kind of structure to achieve large speedups compared with previous algorithms for computing PageRank by . Substantially improving ....
S. Raghavan and H. Garcia-Molina. Representing web graphs. In Proceedings of the IEEE Intl. Conference on Data Engineering, March 2003.
No context found.
S. Raghavan and H. Garcia-Molina. Representing Web graphs. In Proc. of the IEEE Intl. Conf. on Data Engineering, March 2003.
....Adler and Mitzenmacher [1] suggest the use of an affinity graph G . They describe how a directed minimum weight spanning tree over G can be used to generate an optimal (i.e. smallest size) reference encoded representation of G. We refer the reader to [1] and to our extended technical report [16] for details. 3.2. Iterative partition refinement In this section, we describe an iterative process to compute a partition on the set of pages in the repository that satisfies the three properties listed earlier in the section. We begin with an initial coarse grained partition 0 = 01 , N 02 ....
....that contains the page (Figure 7) In addition, a domain index is used to map a domain to the set of supernodes that contain the pages belonging to that domain. For instance, in Figure 7, all the pages belonging to the stanford.edu domain are present in supernodes 1 and 4. We refer the reader to [16] for further details on the disk organization of the S Node representation. 4. Experimental Results We conducted several experiments to measure the performance characteristics of S Node representations of Web graphs. Our experiments were designed to answer the following key questions: ....
[Article contains additional citation context not shown here]
S. Raghavan and H. Garcia-Molina. Representing Web graphs. Technical Report 2002-30, Stanford Univ, June 2002. Available at http://dbpubs.stanford.edu/pub/2002-30.
....(following links in the reverse direction, i.e. finding out which pages point to a given set of pages) respectively. Our description will focus on # # but the details for # are similar. For example, a 110 million page Web data set translates to over a billion edges in the Web graph [12]. 13 (a) Operands (b) Only pages ordered (c) Pages and links ordered Figure 8: Navigation in the presence of ordering Binary navigation operator. Operator # # accepts a page relation (say R) and a link relation (say S)as operands. One or both of R and S may be plain, ordered, or ranked. # ....
....similar outneighborhoods, i.e. point to almost the same set of pages. Intuitively, we attempt to group together related pages so that all the pages relevant to a complex query are distributed among a relatively small number of clusters. We refer the reader to our previous work, reported in [12], for a precise characterization of these page clusters and an iterative algorithm for partitioning a Web data set into clusters. Below, we briefly describe a representation scheme, based on page clusters, for physically organizing the Web graph for efficient navigation. In the next section, we ....
[Article contains additional citation context not shown here]
S. Raghavan and H. Garcia-Molina. Representing Web graphs. In Proc. of the IEEE Intl. Conf. on Data Engineering, March 2003.
No context found.
S. Raghavan and H. Garcia-Molina. Representing web graphs. In Proceedings of the IEEE Intl. Conference on Data Engineering, March 2003.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC