Results 1 - 10
of
42
The WebGraph Framework I: Compression Techniques
- In Proc. of the Thirteenth International World Wide Web Conference
, 2003
"... Studying web graphs is often dicult due to their large size. Recently, several proposals have been published about various techniques that allow to store a web graph in memory in a limited space, exploiting the inner redundancies of the web. The WebGraph framework is a suite of codes, algorithms ..."
Abstract
-
Cited by 268 (31 self)
- Add to MetaCart
Studying web graphs is often dicult due to their large size. Recently, several proposals have been published about various techniques that allow to store a web graph in memory in a limited space, exploiting the inner redundancies of the web. The WebGraph framework is a suite of codes, algorithms and tools that aims at making it easy to manipulate large web graphs. This papers presents the compression techniques used in WebGraph, which are centred around referentiation and intervalisation (which in turn are dual to each other).
GraphChi: Large-scale Graph Computation On just a PC
- In Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation, OSDI’12
, 2012
"... Current systems for graph computation require a distributed computing cluster to handle very large real-world problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible, developing distributed graph algorithms still remains c ..."
Abstract
-
Cited by 115 (6 self)
- Add to MetaCart
(Show Context)
Current systems for graph computation require a distributed computing cluster to handle very large real-world problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible, developing distributed graph algorithms still remains challenging, especially to non-experts. In this work, we present GraphChi, a disk-based system for computing efficiently on graphs with billions of edges. By using a well-known method to break large graphs into small parts, and a novel parallel sliding windows method, GraphChi is able to execute several advanced data mining, graph mining, and machine learning algorithms on very large graphs, using just a single consumer-level computer. We further extend GraphChi to support graphs that evolve over time, and demonstrate that, on a single computer, GraphChi can process over one hundred thousand graph updates per second, while simultaneously performing computation. We show, through experiments and theoretical analysis, that GraphChi performs well on both SSDs and rotational hard drives. By repeating experiments reported for existing distributed systems, we show that, with only fraction of the resources, GraphChi can solve the same problems in very reasonable time. Our work makes large-scale graph computation available to anyone with a modern PC. 1
Efficient Aggregation for Graph Summarization
"... Graphs are widely used to model real world objects and their relationships, and large graph datasets are common in many application domains. To understand the underlying characteristics of large graphs, graph summarization techniques are critical. However, existing graph summarization methods are mo ..."
Abstract
-
Cited by 83 (5 self)
- Add to MetaCart
(Show Context)
Graphs are widely used to model real world objects and their relationships, and large graph datasets are common in many application domains. To understand the underlying characteristics of large graphs, graph summarization techniques are critical. However, existing graph summarization methods are mostly statistical (studying statistics such as degree distributions, hop-plots and clustering coefficients). These statistical methods are very useful, but the resolutions of the summaries are hard to control. In this paper, we introduce two database-style operations to summarize graphs. Like the OLAP-style aggregation methods that allow users to drill-down or roll-up to control the resolution of summarization, our methods provide an analogous functionality for large graph datasets. The first operation, called SNAP, produces a summary graph by grouping nodes based on user-selected node attributes and relationships. The second operation, called k-SNAP, further allows users to control the resolutions of summaries and provides the “drill-down ” and “roll-up ” abilities to navigate through summaries with different resolutions. We propose an efficient algorithm to evaluate the SNAP operation. In addition, we prove that the k-SNAP computation is NPcomplete. We propose two heuristic methods to approximate the k-SNAP results. Through extensive experiments on a variety of real and synthetic datasets, we demonstrate the effectiveness and efficiency of the proposed methods.
A Geometric Preferential Attachment Model of Networks II
, 2008
"... We study a random graph Gn that combines certain aspects of geometric random graphs and preferential attachment graphs. This model yields a graph with power-law degree distribution where the expansion property depends on a tunable parameter of the model. The vertices of Gn are n sequentially generat ..."
Abstract
-
Cited by 60 (4 self)
- Add to MetaCart
We study a random graph Gn that combines certain aspects of geometric random graphs and preferential attachment graphs. This model yields a graph with power-law degree distribution where the expansion property depends on a tunable parameter of the model. The vertices of Gn are n sequentially generated points x1, x2,..., xn chosen uniformly at random from the unit sphere in R 3. After generating xt, we randomly connect it to m points from those points in x1, x2,..., xt−1.
A Fast and Compact Web Graph Representation
"... Compressed graphs representation has become an attractive research topic because of its applications in the manipulation of huge Web graphs in main memory. By far the best current result is the technique by Boldi and Vigna, which takes advantage of several particular properties of Web graphs. In t ..."
Abstract
-
Cited by 28 (17 self)
- Add to MetaCart
(Show Context)
Compressed graphs representation has become an attractive research topic because of its applications in the manipulation of huge Web graphs in main memory. By far the best current result is the technique by Boldi and Vigna, which takes advantage of several particular properties of Web graphs. In this paper we show that the same properties can be exploited with a different and elegant technique, built on Re-Pair compression, which achieves about the same space but much faster navigation of the graph. Moreover, the technique has the potential of adapting well to secondary memory. In addition, we introduce an approximate Re-Pair version that works efficiently with limited main memory.
Compact representations of simplicial meshes in two and three dimensions
- International Journal of Computational Geometry and Applications
, 2003
"... We describe data structures for representing simplicial meshes compactly while supporting online queries and updates efficiently. Our data structure requires about a factor of five less memory than the most efficient standard data structures for triangular or tetrahedral meshes, while efficiently su ..."
Abstract
-
Cited by 25 (6 self)
- Add to MetaCart
(Show Context)
We describe data structures for representing simplicial meshes compactly while supporting online queries and updates efficiently. Our data structure requires about a factor of five less memory than the most efficient standard data structures for triangular or tetrahedral meshes, while efficiently supporting traversal among simplices, storing data on simplices, and insertion and deletion of simplices. Our implementation of the data structures uses about 5 bytes/triangle in two dimensions (2D) and 7.5 bytes/tetrahedron in three dimensions (3D). We use the data structures to implement 2D and 3D incremental algorithms for generating a Delaunay mesh. The 3D algorithm can generate 100 Million tetrahedrons with 1 Gbyte of memory, including the space for the coordinates and all data used by the algorithm. The runtime of the algorithm is as fast as Shewchuk’s Pyramid code, the most efficient we know of, and uses a factor of 3.5 less memory overall. 1
Discovery-Driven Graph Summarization
"... Large graph datasets are ubiquitous in many domains, including social networking and biology. Graph summarization techniques are crucial in such domains as they can assist in uncovering useful insights about the patterns hidden in the underlying data. One important type of graph summarization is to ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
(Show Context)
Large graph datasets are ubiquitous in many domains, including social networking and biology. Graph summarization techniques are crucial in such domains as they can assist in uncovering useful insights about the patterns hidden in the underlying data. One important type of graph summarization is to produce small and informative summaries based on userselected node attributes and relationships, and allowing users to interactively drill-down or roll-up to navigate through summaries with different resolutions. However, two key components are missing from the previous work in this area that limit the use of this method in practice. First, the previous work only deals with categorical node attributes. Consequently, users have to manually bucketize numerical attributes based on domain knowledge, which is not always possible. Moreover, users often have to manually iterate through many resolutions of summaries to identify the most interesting ones. This paper addresses both these key issues to make the interactive graph summarization approach more useful in practice. We first present a method to automatically categorize numerical attributes values by exploiting the domain knowledge hidden inside the node attributes values and graph link structures. Furthermore, we propose an interestingness measure for graph summaries to point users to the potentially most insightful summaries. Using two real datasets, we demonstrate the effectiveness and efficiency of our techniques.
An Experimental Analysis of a Compact Graph Representation
- In ALENEX04
, 2004
"... In previous work we described a method for compactly representing graphs with small separators, which makes use of small separators, and presented preliminary experimental results. In this paper we extend the experimental results in several ways, including extensions for dynamic insertion and deleti ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
(Show Context)
In previous work we described a method for compactly representing graphs with small separators, which makes use of small separators, and presented preliminary experimental results. In this paper we extend the experimental results in several ways, including extensions for dynamic insertion and deletion of edges, a comparison of a variety of coding schemes, and an implementation of two applications using the representation.
Expansion and lack thereof in randomly perturbed graphs
- Internet Math
"... Developing models of complex networks has been a major industry in the fields of physics, mathematics, and computer science during the last decade. Empirical study of numerous large networks harvested from the real world has revealed that, unlike the classical models of random graphs developed ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
(Show Context)
Developing models of complex networks has been a major industry in the fields of physics, mathematics, and computer science during the last decade. Empirical study of numerous large networks harvested from the real world has revealed that, unlike the classical models of random graphs developed
Linear-time compression of bounded-genus graphs into information-theoretically optimal number of bits
- In: 13th Symposium on Discrete Algorithms (SODA
, 2002
"... 1 I n t roduct ion This extended abstract summarizes a new result for the graph compression problem, addressing how to compress a graph G into a binary string Z with the requirement that Z can be decoded to recover G. Graph compression finds important applications in 3D model compression of Computer ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
(Show Context)
1 I n t roduct ion This extended abstract summarizes a new result for the graph compression problem, addressing how to compress a graph G into a binary string Z with the requirement that Z can be decoded to recover G. Graph compression finds important applications in 3D model compression of Computer Graphics [12, 17-20] and compact routing ta-ble of Computer Networks [7}. For brevity, let a ~r-graph stand for a graph with property n. The information-theoretically optimal number of bits required to repre-sent an n-node n-graph is [log 2 N~(n)], where N,~(n) is the number of distinct n-node *r-graphs. Although determining or approximating the close forms of N ~ (n) for nontrivial classes of n is challenging, we provide a linear-time methodology for graph compression schemes that are information-theoretically optimal with respect to continuous uper-additive functions (abbreviated as optimal for the rest of the extended abstract). 1 Specifi-cally, if 7r satisfies certain properties, then we can com-press any n-node m-edge 1r-graph G into a binary string Z such that G and Z can be computed from each other in O(m + n) time, and that the bit count of Z is at most fl(n) + o(fl(n)) for any continuous uper-additive function fl(n) with log 2 N~(n) < fl(n) + o(fl(n)). Our methodology is applicable to general classes of graphs; this extended abstract focuses on graphs with sublinear genus. 2 For example, if the input n-node,r-graph G is equipped with an embedding on its genus surface, which is a reasonable assumption for graphs arising from 3D model compression, then our methodology is applicable to any 7r satisfying the following statements: