Results 1 
7 of
7
Massive graph triangulation
 In ACM SIGMOD Conference on Management of Data
, 2013
"... This paper studies I/Oefficient algorithms for settling the classic triangle listing problem, whose solution is a basic operator in dealing with many other graph problems. Specifically, given an undirected graph G, the objective of triangle listing is to find all the cliques involving 3 vertices ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
(Show Context)
This paper studies I/Oefficient algorithms for settling the classic triangle listing problem, whose solution is a basic operator in dealing with many other graph problems. Specifically, given an undirected graph G, the objective of triangle listing is to find all the cliques involving 3 vertices in G. The problem has been well studied in internal memory, but remains an urgent difficult challenge when G does not fit in memory, rendering any algorithm to entail frequent I/O accesses. Although previous research has attempted to tackle the challenge, the stateoftheart solutions rely on a set of crippling assumptions to guarantee good performance. Motivated by this, we develop a new algorithm that is provably I/O and CPU efficient at the same time, without making any assumption on the input G at all. The algorithm uses ideas drastically different from all the previous approaches, and outperformed the existing competitors by a factor over an order of magnitude in our extensive experimentation.
LargeScale Bisimulation of RDF Graphs
"... RDF datasets with billions of triples are no longer unusual and continue to grow constantly (e.g. LOD cloud) driven by the inherent flexibility of RDF that allows to represent very diverse datasets, ranging from highly structured to unstructured data. Because of their size, understanding and proces ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
RDF datasets with billions of triples are no longer unusual and continue to grow constantly (e.g. LOD cloud) driven by the inherent flexibility of RDF that allows to represent very diverse datasets, ranging from highly structured to unstructured data. Because of their size, understanding and processing RDF graphs is often a difficult task and methods to reduce the size while keeping as much of its structural information become attractive. In this paper we study bisimulation as a means to reduce the size of RDF graphs according to structural equivalence. We study two bisimulation algorithms, one for sequential execution using SQL and one for distributed execution using MapReduce. We demonstrate that the MapReducebased implementation scales linearly with the number of the RDF triples, allowing to compute the bisimulation of very large RDF graphs within a time which is by far not possible for the sequential version. Experiments based on synthetic benchmark data and real data (DBPedia) exhibit a reduction of more than 90 % of the size of the RDF graph in terms of the number of nodes to the number of blocks in the resulting bisimulation partition.
Regularities and dynamics in bisimulation reductions of big graphs
 In GRADES
, 2013
"... ABSTRACT Bisimulation is a basic graph reduction operation, which plays a key role in a wide range of graph analytical applications. While there are many algorithms dedicated to computing bisimulation results, to our knowledge, little work has been done to analyze the results themselves. Since data ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
ABSTRACT Bisimulation is a basic graph reduction operation, which plays a key role in a wide range of graph analytical applications. While there are many algorithms dedicated to computing bisimulation results, to our knowledge, little work has been done to analyze the results themselves. Since data properties such as skew can greatly influence the performances of dataintensive tasks, the lack of such insight leads to inefficient algorithm and system design. In this paper we take a close look into various aspects of bisimulation results on big graphs, from both realworld scenarios and synthetic graph generators, with graph size varying from 1 million to 1 billion edges. We make the following observations: (1) A certain degree of regularity exists in realworld graphs' bisimulation results. Specifically, powerlaw distributions appear in many of the results' properties. (2) Synthetic graphs fail to fulfill one or more of these regularities that are revealed in the realworld graphs. (3) By examining a growing social network graph (FlickrGrow), we see that the corresponding bisimulation partition relation graph grows as well, but the growth is stable with respect to the original graph.
Similar Structures inside RDFGraphs
"... RDF is the common data model to publish structured data on the Web. RDF data sets are given as subjectpredicateobject triples and typically are represented as directed edgelabeled graphs. To make the information represented by such graphs comprehensible, RDFschema (RDFS) provides concepts to defin ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
RDF is the common data model to publish structured data on the Web. RDF data sets are given as subjectpredicateobject triples and typically are represented as directed edgelabeled graphs. To make the information represented by such graphs comprehensible, RDFschema (RDFS) provides concepts to define a classstructure as part of the given RDFgraph and thus supports a more abstract view on the data set. In this paper we follow a different approach and propose to make an RDF graph more comprehensible by reducing its size by partitioning to discover subgraphs which are similar with respect to their structure. The methods applied to derive a partition are based on bisimulation and agglomerative clustering. We demonstrate the usefulness of the approach by applying it on several synthetic and one real world RDF datasets.
Search on Graphs: Theory Meets Engineering
"... Abstract. The last decade has witnessed an explosion of the availability of and interest in graph structured data. The desire to search and reason over these increasingly massive data collections pushes the boundaries of search languages, from pure keyword search to structureaware searches in the ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. The last decade has witnessed an explosion of the availability of and interest in graph structured data. The desire to search and reason over these increasingly massive data collections pushes the boundaries of search languages, from pure keyword search to structureaware searches in the graph. These phenomena have inspired a rich body of research on query languages, data management and query evaluation techniques for graph data, both from the theoretical and engineering angles. In this tutorial, we present an overview of the progress on graph search queries, focusing specifically on how the theoretical and engineering perspectives meet and together advanced the field. Tutorial Overview Exploratory keywordstyle search has been heavily studied in the past decade, both in the context of structured In this tutorial, we survey this growing body of work, with an eye towards both bringing participants up to speed in this field of rapid progress and delimiting the boundaries of the stateoftheart. A particular focus will be on recent results in the theory of graph languages on the design and structural characterization of simple yet powerful algebraic languages for graph search, which bridge structureoblivious and structureaware graph exploration
External Memory kBisimulation Reduction of Big Graphs
, 2013
"... In this paper, we present, to our knowledge, the first known I/O efficient solutions for computing the kbisimulation partition of a massive directed graph, and performing maintenance of such a partition upon updates to the underlying graph. Ubiquitous in the theory and application of graph data, bi ..."
Abstract
 Add to MetaCart
(Show Context)
In this paper, we present, to our knowledge, the first known I/O efficient solutions for computing the kbisimulation partition of a massive directed graph, and performing maintenance of such a partition upon updates to the underlying graph. Ubiquitous in the theory and application of graph data, bisimulation is a robust notion of node equivalence which intuitively groups together nodes in a graph which share fundamental structural features. kbisimulation is the standard variant of bisimulation where the topological features of nodes are only considered within a local neighborhood of radius k> 0. The I/O cost of our partition construction algorithm is bounded by O(k · sort(Et) + k · scan(Nt) + sort(Nt)), while our maintenance algorithms are bounded by O(k · sort(Et) + k · sort(Nt)). The space complexity bounds are O(Nt+ Et) and O(k · Nt+ k · Et), resp. Here, Et and Nt  are the number of disk pages occupied by the input graph’s edge set and node set, resp., and sort(n) and scan(n) are the cost of sorting and scanning, resp., a file occupying n pages in external memory. Empirical analysis on a variety of massive realworld and synthetic graph datasets shows that our algorithms perform efficiently in practice, scaling gracefully as graphs grow in size.
A Scalable Query Optimization Index for
"... Querying large data graphs has brought the attention of the research community. Many solutions were proposed, such as Oracle Semantic Technologies, Virtuoso, RDF3X, and CStore, among others. Although such approaches have shown good performance in queries with medium complexity, they perform ..."
Abstract
 Add to MetaCart
(Show Context)
Querying large data graphs has brought the attention of the research community. Many solutions were proposed, such as Oracle Semantic Technologies, Virtuoso, RDF3X, and CStore, among others. Although such approaches have shown good performance in queries with medium complexity, they perform poorly when the complexity of the queries increases. In this paper, the authors propose the Graph Signature Index, a novel and scalable approach to index and query large data graphs. The idea is that they summarize a graph and instead of executing the query on the original graph, they execute it on the summaries. The authors ’ experiments with Yago (16M triples) have shown that e.g., a query with 4 levels costs 62 sec using Oracle but it only costs about 0.6 sec with their index. Their index can be implemented on top of any Graph database, but they chose to implement it as an extension to Oracle on top of the SEM_MATCH table function. The paper also introduces diskbased versions of the Trace Equivalence and Bisimilarity algorithms to summarize data graphs, and discusses their complexity and usability for RDF graphs.