| M. Adler and M. Mitzenmacher. Towards compressing web graphs. In Data Compression Conference, 2001. |
....proportion of vertices of a given degree is Poisson distributed, and the degree sequence drops o exponentially fast in the upper tail. There are many models of graph processes designed to capture various aspects of the structure of the www found in the studies given above. See for example [1] [2], 3] 4] 5] 6] 7] 8] 9] 10] 11] 14] 17] 15] 16] 18] 23] 25] 26] and [27] A recent survey by Bollob as and Riordan [10] gives an excellent overview of the topic and many detailed structural results. Supported in part by NSF grant CCR 9818411. We consider a simple ....
M. Adler and M. Mitzenmacher, Toward Compressing Web Graphs, Proceedings of the 2001.
....graphs associated with the world wide web. 1 Introduction We describe the evolution of an undirected random (multi )graph G(t) which is an example of the type of model referred to as a web graph. Such graphs (or digraphs) evolve by the addition of new vertices and or edges at each step t. See eg: [1], 2] 3] 4] 5] 7] 10] 11] for further details. Initially, at step t = 0, there is a single vertex v 0 . At step t = 1; 2; T ; either a procedure new is followed with probability 1 , or a procedure old is followed with probability . In procedure new, a new vertex v is added ....
M. Adler and M. Mitzenmacher, Toward Compressing Web Graphs, To appear in the 2001 Data Compression Conference.
....non navigational links are often copied from one page to another within the same host. These features suggest to use techniques borrowed from full text indexing for storing increasing sequences of integers with small gaps, and moreover inspired the reference compression techniques discussed in [9, 1]. Since several successor lists are similar, one can specify the successor list of a node by copying part of a previous list, and adding whatever remains. This is achieved using a list of bits, one for each successor in the referenced list, which tell whether the successor should be copied or not, ....
....of possible reference lists) the price for this improvement is a slower and more memoryconsuming compression and decompression. Several forms of reference compression are used in di erent versions of the LINK database; moreover, reference compression is analysed from a theoretical viewpoint in [1]. Di erential compression. WebGraph introduces di erential compression, in which the di erences with S(y) are recorded by a sequence of copy blocks: we look at the copy list as an alternating sequence of 1 and 0 blocks, and specify the length of each block (decremented by one, for all blocks ....
[Article contains additional citation context not shown here]
Micah Adler and Michael Mitzenmacher. Towards compressing Web graphs. In Proc. of the IEEE Data Compression Conference, pages 203212, 2001.
....sparse connected graph designed to capture some properties of the www. Studies of the graph structure of the www were made by [4] and [7] among others. There are many models of web graphs designed to capture the structure of the www found in the studies given above. For example see references [1] [2], 3] 5] 6] 10] 11] 12] 14] and [15] for various models. In the simple model we consider, each new vertex directs m edges towards existing vertices according to the degree of existing vertices (copy model) With probability p the new vertex takes the colour red, and with probability q ....
M. Adler and M. Mitzenmacher, Toward Compressing Web Graphs, To appear in the 2001 Data Compression Conference.
....links are between pages whose URLs are lexicographically similar. Hence, the reasoning behind this property is analogous to the reasoning described for property 2 above. The rest of this section is organized as follows. In Section 3. 1, we briefly describe the reference encoding technique [1] that we use for compressing intranode and superedge graphs. In Section 3.2, we describe an iterative process to compute a partition that has the above three properties. Finally, in Section 3.3, we describe how the S Node representation is encoded and physically organized on secondary storage. ....
....we describe an iterative process to compute a partition that has the above three properties. Finally, in Section 3.3, we describe how the S Node representation is encoded and physically organized on secondary storage. 3.1. Reference Encoding Reference encoding is a graph compression technique [1, 17] that is based on the following idea: if pages x and y have similar adjacency lists, it is possible to compress the 5 Figure 5. Example of reference encoding adjacency list of y by representing it in terms of the adjacency list of x (in this case, we say that x is a reference page for y) For ....
[Article contains additional citation context not shown here]
M. Adler and M. Mitzenmacher. Towards compressing Web graphs. In Proc. of IEEE Data Compression Conf., 2001.
.... then finding the optimum solution becomes NP Complete [43] If we allow each file to be compressed using more than one reference file, then this problem can be reduced to a generalization of optimum branching to hypergraphs, and has been shown NP Complete even with no bound on the length of chains [2]. Some experiments on small web page collections using minimum branching and several faster heuristics are given in [38] which show significant differences in compression performance between different approaches. For very large collections, general document clustering techniques such as [10, 30, ....
M. Adler and M. Mitzenmacher. Towards compressing web graphs. In Proc. of the IEEE Data Compression Conference (DCC), March 2001.
....links are between pages whose URLs are lexicographically similar. Hence, the reasoning behind this property is analogous to the reasoning described for property 2 above. The rest of this section is organized as follows. In Section 3. 1, we briefly describe the reference encoding technique [2] that we use for compressing intranode and superedge graphs. In Section 3.2, we describe an iterative process to compute a partition that has the above three properties. Finally, in Section 3.3, we describe how the S Node representation is encoded and physically organized on secondary storage. ....
....we describe an iterative process to compute a partition that has the above three properties. Finally, in Section 3.3, we describe how the S Node representation is encoded and physically organized on secondary storage. 3. 1 Reference Encoding Reference encoding is a graph compression technique [2, 12] that is based on the following idea: if pages x and y have similar adjacency lists, it is possible to compress the adjacency list of y by representing it in terms of the adjacency list of x (in this case, we say that x is a reference page for y) For instance, a simple reference encoding scheme ....
[Article contains additional citation context not shown here]
M. Adler and M. Mitzenmacher. Towards compressing Web graphs. In Proc. of IEEE Data Compression Conf., March 2001.
....we are not aware of previous work that uses optimum branchings to compress collections of files, there are two previous applications that are quite similar. In particular, Tate [23] uses optimum branchings to find an optimal scheme for compressing multispectral images, while Adler and Mitzenmacher [1] use it to compress the graph structure of the World Wide Web. Adler and Mitzenmacher [1] also show that a natural extension of the branching problem to hypergraphs that can be used to model delta compression with two or more reference files is NP Complete, indicating that an efficient optimal ....
....files, there are two previous applications that are quite similar. In particular, Tate [23] uses optimum branchings to find an optimal scheme for compressing multispectral images, while Adler and Mitzenmacher [1] use it to compress the graph structure of the World Wide Web. Adler and Mitzenmacher [1] also show that a natural extension of the branching problem to hypergraphs that can be used to model delta compression with two or more reference files is NP Complete, indicating that an efficient optimal solution is unlikely. We use two types of hash based clustering techniques in our work, a ....
M. Adler and M. Mitzenmacher. Towards compressing web graphs. In Proc. of the IEEE Data Compression Conference (DCC), March 2001.
....e#cient in many cases, we show in this paper that further improvements are possible for very large graphs and for the case of Topic Sensitive Pagerank. We note that an alternative approach is to compute Pagerank completely in main memory using a highly compressed representation of the web graph [2, 9, 23, 37, 40]. In this case, enough main memory is required to store the rank vector for the entire graph plus the compressed representation of the hyperlink structure. The currently best compression scheme requires about 0.6 bytes per hyperlink [37] resulting in a total space requirement of about 2 4.2 = ....
M. Adler and M. Mitzenmacher. Towards compressing web graphs. In Proc. of the IEEE Data Compression Conference (DCC), March 2001.
....i.e. local subsets that share di#erent statistical properties with the whole structure. Predicting the evolution of new phenomena in the Web. Dealing more e#ciently with large scale computation (i.e. by recognizing the possibility of compressing a graph generated accordingly to such model [1]) # Dip. di Informatica e Sistemistica Universita di Roma La Sapienza Via Salaria 113 00198 Roma Italy. E mail: laura,leon dis.uniroma1.it Sezione INFM and Dip. Fisica Universita di Roma La Sapienza , P.le A. Moro 2 00185 Roma, Italy. E mail: gcalda pil.phys.uniroma1.it # Institut de ....
....cliques. Very recently Pandurangan, Raghavan and Upfal [22] proposed a model based on the rank values computed by the PageRank algorithm used in search engines such as Google. They propose to complement the Albert,Barabasi and Jeong model in the following manner. There are two parameters a, b [0, 1] such that a b 1. With probability a the end point of the the lth edge is chosen with probability proportional to its in degree, with probability b is chosen with probability proportional to its PageRank, with probability 1 # at random. The authors show on computer simulation that with ....
M. Adler and M. Mitzenmacher. Towards compressing Web graphs. U. of Mass.CMPSCI Technical Report 00-39.
....we are not aware of previous work that uses optimum branchings to compress collections of files, there are two previous applications that are quite similar. In particular, Tate [23] uses optimum branchings to find an optimal scheme for compressing multispectral images, while Adler and Mitzenmacher [1] use it to compress the graph structure of the World Wide Web. Adler and Mitzenmacher [1] also show that a natural extension of the branching problem to hypergraphs that can be used to model delta compression with two or more reference files is NP Complete, indicating that an efficient optimal ....
....files, there are two previous applications that are quite similar. In particular, Tate [23] uses optimum branchings to find an optimal scheme for compressing multispectral images, while Adler and Mitzenmacher [1] use it to compress the graph structure of the World Wide Web. Adler and Mitzenmacher [1] also show that a natural extension of the branching problem to hypergraphs that can be used to model delta compression with two or more reference files is NP Complete, indicating that an efficient optimal solution is unlikely. We use two types of hash based clustering techniques in our work, a ....
M. Adler and M. Mitzenmacher. Towards compressing web graphs. In Proc. of the IEEE Data Compression Conference (DCC), March 2001.
....sparse connected graph designed to capture some properties of the www. Studies of the graph structure of the www were made by [4] and [7] among others. There are many models of web graphs designed to capture the structure of the www found in the studies given above. For example see references [1] [2], 3] 5] 6] 8] 9] 10] 12] and [13] for various models. In the simple models we consider, each new vertex directs m edges towards existing vertices, either randomly (random graph model) or according to the degree of existing vertices (copy model) Once a vertex has been added the ....
....the search) We consider the following scenario. We have a sequence (G(t) t 1, 2, of connected random graphs. The graph G(t) is constructed from G(t 1) by adding the vertex t, and m random edges from vertex t to G(t 1) We refer to such graphs as web graphs. See references [1] [2], 3] 4] 5] 6] 7] 8] 9] 10] 12] and [13] for various models of this and related types. There is also a spider walking randomly from vertex to vertex on the evolving graph G(t) The parameter Yt we estimate is the expected number of vertices which have not been visited by the ....
M. Adler and M. Mitzenmacher, Toward Compressing Web Graphs, To appear in the 2001 Data Compression Conference.
....web. I Introduction Essentially, there are tkree types of random graph or digraph used to describe small world phenom ena; see for example [11] for an introduction to this topic. These axe: i) Web graphs: Graphs evolving by the addition of new vertices and or edges at each step t. See eg: [1], 3] 5] 6] 7] 10] 15] 16] ii) Alpha Beta graphs: Standard random graph models with atypical degree sequences. See eg: 2] 8] 17] 18] iii) Lattice graphs: Graphs generated by perturbing regular lattices. See eg: 4] 13] We describe the evolution of a random ....
M. Adler and M. Mitzenmacher, Toward Compressing Web Graphs, To appear in the 2001 Data Compression Conference.
....sparse connected graph designed to capture some properties of the www. Studies of the graph structure of the www were made by [4] and [7] among others. There are many models of web graphs designed to capture the structure of the www found in the studies given above. For example see references [1] [2], 3] 5] 6] 8] 9] 10] 12] and [13] for various models. In the simple models we consider, each new vertex directs m edges towards existing vertices, either randomly (random graph model) or according to the degree of existing vertices (copy model) Once a vertex has been added the ....
....during the search) We consider the following scenario. We have a sequence (G(t) t = 1; 2; of connected random graphs. The graph G(t) is constructed from G(t 1) by adding the vertex t, and m random edges from vertex t to G(t 1) We refer to such graphs as web graphs. See references [1] [2], 3] 4] 5] 6] 7] 8] 9] 10] 12] and [13] for various models of this and related types. There is also a spider S walking randomly from vertex to vertex on the evolving graph G(t) The parameter t we estimate is the expected number of vertices which have not been visited by the ....
M. Adler and M. Mitzenmacher, Toward Compressing Web Graphs, To appear in the 2001 Data Compression Conference.
....in the data set. Our work is strongly influenced by the Connectivity Server; in fact, one of our main objectives is to build a similar system that obtains better compression and that is fully accessible to the academic community. Very recently and independent of our work, Adler and Mitzenmacher [1] have proposed a new technique that exploits the special global structure of the web for compression, and that is based on recent attempts to model the web graph using a new type of random graph model [17] The idea is to code an adjacency list by referring to the already coded adjacency list of ....
....list by referring to the already coded adjacency list of another node that points to many of the same pages. Adler and Mitzenmacher show that combining this with standard Huffman coding results in significant improvements for a graph with global links. The main difference to our work is that [1] focuses on a novel technique that exploits one possible source of 6 See http: cis.poly.edu tr . 4 compressibility in web graphs, and on the algorithmic problems associated with this techniques. In contrast, our goal is to design and engineer a complete tool for web graph compression that is ....
[Article contains additional citation context not shown here]
M. Adler and M. Mitzenmacher. Towards compressing web graphs. In Proc. of the IEEE Data Compression Conference (DCC), March 2001. To appear.
....memory requirements While search engines such as Google [Google] must use compression techniques to store the Web graph, they are unpublished. Altavista [AV] has used our Link2 in production since 1999. Two other groups have published research on compressing the Web graph. Adler and Mitzenmacher [AM01] contains a theoretical study of Web graph compression. As in our Select method, they encode lists relative to representative lists chosen from among the adjacency lists of other nodes in the graph. However, rather than search locally to find representative lists, they formulate an optimization ....
M. Adler and M. Mitzenmacher. Towards Compressing Web Graphs. Proceedings of IEEE Data Compression Conference (DCC), March 2001.
....solved by Edmond [5] in O(n 3 ) time, where n is the number of nodes of G. For the case of a complete graph G, a modi cation of Edmond s algorithm with better practical performances was suggested in [6] and slightly more ecient algorithms have been more recently conceived for the general case [1, 7] as well. However, as we will see later, the tree computation time is not a critical step in PaTre tool because here the number of vertices of G is at most a few tens. Hence, in our implementation we adopted Edmonds algorithm with the variation suggested in [6] 2 Our proposal: the PaTre tool ....
M. Adler and M. Mitzenmacher. Towards compressing web graphs. In IEEE Data Compression Conference, 2001.
.... structure of the web [13] to design an efficient algorithmic partitioning method for certain eigenvector computations; these are the key to the successful search algorithms of [26, 12] and to popular database indexing methods such as latent semantic indexing [20, 36] Adler and Mitzenmacher [4] have shown how the random graph characterizations of the web given in [27] can be used to construct very effective strategies to compress the web graph. 2 1.1 Our results In this paper, we present a much more refined characterization of the structure of the web. Specifically, we present ....
....to a theme, corporate intranets, etc. it is sufficient (and perhaps necessary) to understand the structure that emerges from one fairly simple stochastic process. We believe that this is a significant step in web algorithmics. For example, it shows that the sophisticated algorithms of [5, 4] are only the beginning, and the prospects are, in fact, much wider. We fully expect future data applications on the web to leverage this understanding. Our characterization is based on two findings we report in this paper. Our first finding is an experimental result. We show that self similarity ....
M. Adler and M. Mitzenmacher. Towards compressing web graphs. Proc. IEEE Data Compression Conference, 2001, To appear.
No context found.
M. Adler and M. Mitzenmacher. Towards compressing web graphs. In Data Compression Conference, 2001.
No context found.
Micah Adler and Michael Mitzenmacher. Towards compressing Web graphs. In Proc. of the IEEE Data Compression Conference, pages 203--212, 2001.
No context found.
A. Adler, M. Mitzenmacher, Towards Compressing Web Graphs, In: Proceedings of the Data Compression Conference, 2001.
No context found.
A. Adler, M. Mitzenmacher, Towards compressing web graphs, In: Proceedings of the Data Compression Conference, 2001.
No context found.
M. Adler, M. Mitzenmacher, Towards compressing web graphs, In Proceedings of the IEEE Data Compression Conference (DCC), 2001.
No context found.
M. Adler, M. Mitzenmacher, Towards compressing web graphs, In: Proceedings of the IEEE Data Compression Conference (DCC), 2001.
No context found.
M. Adler and M. Mitzenmacher. Toward compressing web graphs. In Proceedings of the 2001.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC