| T. Suel and J. Yuan. Compressing the graph structure of the web. In Data Compression Conference (DCC), pages 213--222, 2001. |
....about the same running time. 1 Introduction Many applications involve representing very large graphs. In such applications space can be as much of an issue as time. For example, there has been recent interest in compactly representing the link structure of the web for use in various algorithms [4, 1, 25]. Other applications of large graphs include representing the meshes used in finite element simulations, or 3d models in graphics. For all these applications it is often useful to support dynamic queries e#ciently while maintaining a compact representation. For random graphs the space that can be ....
T. Suel and J. Yuan. Compressing the graph structure of the web. In Data Compression Conference (DCC), pages 213--222, 2001.
....3 of the LINK database provides a compression of 5.61 bits link for a web graph of 61 Mnodes and 1 Glinks graph, and 5.66 bits link for its transpose, whereas [8] reports 5.07 bits per link and 5.63 bits per link, respectively, for the WebBase graph and its transpose . The gures reported in [10] on a 115 Mpages 1.47 Glinks snapshot taken from the Internet Archive, on the other hand, are much worse (13.92 bits per link; no data is provided for the transpose) 5 The O set Array As we have seen, the graph is described as a sequence of adjacency lists; each list is in turn represented by a ....
....about the (small) slowdown produced by partially loading the o set array. All data have been gathered using a Java implementation of the WebGraph framework; it is likely that a tuned C implementation would increase by a factor of about two the timings we report. Timings given in the literature [9, 10, 8] are in the same order of magnitude of the ones reported here (e.g. version 3 of the LINK database provides a sequential and random access time are 248 ns and 336 ns, respectively, albeit with a di erent language and architecture) A more precise comparison would require implementing all ....
[Article contains additional citation context not shown here]
Torsten Suel and Jun Yuan. Compressing the graph structure of the web. In Proc. of the IEEE Data Compression Conference, pages 213222, 2001. 14
.... are pages about modern Italian painters) or otherwise (e.g. both are home pages of Stanford graduate students) These observations are derived from recent empirical studies of the structure of the Web graph [5, 11] Link locality has been independently observed and reported by several authors [17, 18]. For instance, Suel and Yuan [18] observe that on average, around three quarters of the links from a page are to other pages on the same domain host. The last observation about page similarity is the underlying basis of several algorithms for Web search and for finding related Web pages [6, 10] ....
.... or otherwise (e.g. both are home pages of Stanford graduate students) These observations are derived from recent empirical studies of the structure of the Web graph [5, 11] Link locality has been independently observed and reported by several authors [17, 18] For instance, Suel and Yuan [18] observe that on average, around three quarters of the links from a page are to other pages on the same domain host. The last observation about page similarity is the underlying basis of several algorithms for Web search and for finding related Web pages [6, 10] Given these observations, we ....
[Article contains additional citation context not shown here]
T. Suel and J. Yuan. Compressing the graph structure of the Web. In Proc. of IEEE Data Compression Conf., 2001.
.... from recent empirical studies of the structure of the Web graph [8, 16, 17] In particular, link copying is used as a fundamental operation in the generative random Web graph models proposed by Ravi Kumar et al. 16] Link locality has been independently observed and reported by several authors [12, 30]. For instance, Suel and Yuan [30] observe that on average, around three quarters of the links from a page are to other pages on the same domain host. The last observation about page similarity is the underlying basis of several algorithms for Web search and for finding related Web pages [7, 25] ....
.... structure of the Web graph [8, 16, 17] In particular, link copying is used as a fundamental operation in the generative random Web graph models proposed by Ravi Kumar et al. 16] Link locality has been independently observed and reported by several authors [12, 30] For instance, Suel and Yuan [30] observe that on average, around three quarters of the links from a page are to other pages on the same domain host. The last observation about page similarity is the underlying basis of several algorithms for Web search and for finding related Web pages [7, 25] Given these observations, we ....
[Article contains additional citation context not shown here]
T. Suel and J. Yuan. Compressing the graph structure of the Web. In Proc. of IEEE Data Compression Conference, March 2001.
....goal of our research is to enhance breadth first search crawling, i.e. studying how to reduce breadth first search time for crawling. One option is to construct highly compressed representations of Web graphs that can be stored and analyzed in current machines with moderate amounts of main memory [15]. The alternative is to compute massive graphs with I O efficient techniques [17] i.e. applying efficient secondary memory algorithms to process the Web graph. While the first option is attractive because of computation speed in main memory, it may not be possible to fit 13 massive data sets in ....
T. Suel and J. Yuan, "Compressing the graph structure of the Web". In Proceedings of the IEEE Data Compression Conference (DCC), 213-222, 2001.
....uses two billion pages) Therefore, the two first problems which arise when one wants to study the Web graph are its size and its rapid evolution. Even if it was possible to get the whole graph, its storage and its algorithmic manipulation would be a real challenge [BBH 98, GLV02, RSWW01, SY01, WSW00] The first step to study the Web graph is to get large parts of it. Because of its size, and of many other factors we will discuss below, it is impossible to get the whole graph, but it is a key challenge to be able to obtain parts of it as large as possible. The method used to compute ....
T. Suel and J. Yuan. Compressing the graph structure of the web. In Proceedings of the IEEE Data Compression Conference (DCC), march 2001.
.... affinity graph of the Web. While their technique is not practical for use on the Web as a whole, it might be feasible if applied on a per host basis, since our measurements show that the best representative list is likely to be the adjacency list of another URL on the same host. Suel and Yuan [SY01] describe a Link Database with similar functionality to ours, although they only consider outlinks. They partition every adjacency list into two components, one containing global links (that cross host boundaries) and the other containing local links. For the global links, they use a Huffman code ....
T. Suel and J. Yuan. Compressing the Graph Structure of the Web. Proceedings of the IEEE Data Compression Conference (DCC), March 2001.
....of the Internet Archive [6] uses a Bloom filter stored in memory; this results in a very compact representation, but also gives false positives, i.e. some pages are never downloaded since they collide with other pages in the Bloom filter. Lossless compression can reduce URL size to below 10 bytes [4, 24], though this is still too high for large crawls. In both cases main memory will eventually become a bottleneck, although partitioning the application will also partition the data structures over several machines. A more scalable solution uses a disk resident structure, as for example done in ....
T. Suel and J. Yuan. Compressing the graph structure of the web. In Proc. of the IEEE Data Compression Conference (DCC), March 2001. To appear.
....through term based techniques and then performs an analysis of only the graph neighborhood of these pages. The success of such link based ranking techniques has also motivated a large amount of research focusing on the basic structure of the web [12] e# cient computation with massive web graphs [9, 37, 40], and other applications of link based techniques such as finding related pages [10] classifying pages [14] crawling important pages [20] or pages on a particular topic [16] or web data mining [29, 30] to name just a few. Both Pagerank and HITS are based on an iterative process defined on the ....
....a large graph will not fit into the main memory of most machines if we use standard graph data structures, and there are two approaches to overcoming this problem. The first approach is to try to fit the graph structure into main memory by using compression techniques for the web graph proposed in [9, 37, 40]; this results in very fast running times but still requires substantial amounts of memory. The second approach, taken by Haveliwala [24] implements the Pagerank computation in an I O e#cient manner through a sequence of scan operations on data files containing the graph structure and the ....
[Article contains additional citation context not shown here]
T. Suel and J. Yuan. Compressing the graph structure of the web. In Proc. of the IEEE Data Compression Conference (DCC), March 2001.
....the Internet Archive [6] uses a Bloom filter stored in memory; this results in a very compact representation, but also gives false positives, i.e. some pages are never downloaded since they collide with other pages in the Bloom filter. Lossless compression can reduce URL size to below bytes [4, 24], though this is still too high for large crawls. In both cases main memory will eventually become a bottleneck, although partitioning the application will also partition the data structures over several machines. A more scalable solution uses a disk resident structure, as for example done in ....
T. Suel and J. Yuan. Compressing the graph structure of the web. In Proc. of the IEEE Data Compression Conference (DCC), March 2001. To appear. 21
No context found.
T. Suel and J. Yuan. Compressing the graph structure of the web. In Data Compression Conference (DCC), pages 213--222, 2001.
No context found.
T. Suel and J. Yuan. Compressing the graph structure of the web. In Data Compression Conference (DCC), pages 213--222, 2001.
No context found.
Torsten Suel and Jun Yuan. Compressing the graph structure of the Web. In Proceedings of the Data Compression Conference DCC, pages 213 -- 222. IEEE Computer Society, 2001. 11
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC