Results 1  10
of
56
Graphsatatime: Query Language and Access Methods for Graph Databases
, 2008
"... With the prevalence of graph data in a variety of domains, there is an increasing need for a language to query and manipulate graphs with heterogeneous attributes and structures. We propose a query language for graph databases that supports arbitrary attributes on nodes, edges, and graphs. In this l ..."
Abstract

Cited by 70 (0 self)
 Add to MetaCart
(Show Context)
With the prevalence of graph data in a variety of domains, there is an increasing need for a language to query and manipulate graphs with heterogeneous attributes and structures. We propose a query language for graph databases that supports arbitrary attributes on nodes, edges, and graphs. In this language, graphs are the basic unit of information and each query manipulates one or more collections of graphs. To allow for flexible compositions of graph structures, we extend the notion of formal languages from strings to the graph domain. We present a graph algebra extended from the relational algebra in which the selection operator is generalized to graph pattern matching and a composition operator is introduced for rewriting matched graphs. Then, we investigate access methods of the selection operator. Pattern matching over large graphs is challenging due to the NPcompleteness of subgraph isomorphism. We address this by a combination of techniques: use of neighborhood subgraphs and profiles, joint reduction of the search space, and optimization of the search order. Experimental results on real and synthetic large graphs demonstrate that our graph specific optimizations outperform an SQLbased implementation by orders of magnitude.
Efficiently Answering Reachability Queries on Very Large Directed Graphs
"... Efficiently processing queries against very large graphs is an important research topic largely driven by emerging real world applications, as diverse as XML databases, GIS, web mining, social network analysis, ontologies, and bioinformatics. In particular, graph reachability has attracted a lot of ..."
Abstract

Cited by 52 (5 self)
 Add to MetaCart
Efficiently processing queries against very large graphs is an important research topic largely driven by emerging real world applications, as diverse as XML databases, GIS, web mining, social network analysis, ontologies, and bioinformatics. In particular, graph reachability has attracted a lot of research attention as reachability queries are not only common on graph databases, but they also serve as fundamental operations for many other graph queries. The main idea behind answering reachability queries in graphs is to build indices based on reachability labels. Essentially, each vertex in the graph is assigned with certain labels such that the reachability between any two vertices can be determined by their labels. Several approaches have been proposed for building these reachability labels; among them are interval labeling (tree cover) and 2hop labeling. However, due to the large number of vertices in many real world graphs (some graphs can easily contain millions of vertices), the computational cost and (index) size of the labels using existing methods would prove too expensive to be practical. In this paper, we introduce a novel graph structure, referred to as pathtree, to help labeling very large graphs. The pathtree cover is a spanning subgraph of G in a tree shape. We demonstrate both analytically and empirically the effectiveness of our new approaches.
Fast computation of reachability labeling for large graphs
 In Proc. of EDBT’06
, 2006
"... There are numerous applications that need to deal with a large graph and need to query reachability between nodes in the graph. A 2hop cover can compactly represent the whole edge transitive closure of a graph in O(V  · E  1/2) space, and be used to answer reachability query efficiently. Howev ..."
Abstract

Cited by 46 (15 self)
 Add to MetaCart
(Show Context)
There are numerous applications that need to deal with a large graph and need to query reachability between nodes in the graph. A 2hop cover can compactly represent the whole edge transitive closure of a graph in O(V  · E  1/2) space, and be used to answer reachability query efficiently. However, it is challenging to compute a 2hop cover. The existing approaches suffer from either large resource consumption or low compression rate. In this paper, we propose a hierarchical partitioning approach to partition a large graph G into two subgraphs repeatedly in a topdown fashion. The unique feature of our approach is that we compute 2hop cover while partitioning. In brief, in every iteration of topdown partitioning, we provide techniques to compute the 2hop cover for connections between the two subgraphs first. A cover is computed to cut the graph into two subgraphs, which results in an overall cover with high compression for the entire graph G. Two approaches are proposed, namely a nodeoriented approach and an edgeoriented approach. Our approach can efficiently compute 2hop cover for a large graph with high compression rate. Our extensive experiment study shows that the 2hop cover for a graph with 1,700,000 nodes and 169 billion connections can be obtained in less than 30 minutes with a compression rate about 40,000 using a PC.
Adding Regular Expressions to Graph Reachability and Pattern Queries
 Frontiers of Computer Science
, 2012
"... Abstract—It is increasingly common to find graphs in which edges bear different types, indicating a variety of relationships. For such graphs we propose a class of reachability queries and a class of graph patterns, in which an edge is specified with a regular expression of a certain form, expressin ..."
Abstract

Cited by 29 (5 self)
 Add to MetaCart
(Show Context)
Abstract—It is increasingly common to find graphs in which edges bear different types, indicating a variety of relationships. For such graphs we propose a class of reachability queries and a class of graph patterns, in which an edge is specified with a regular expression of a certain form, expressing the connectivity in a data graph via edges of various types. In addition, we define graph pattern matching based on a revised notion of graph simulation. On graphs in emerging applications such as social networks, we show that these queries are capable of finding more sensible information than their traditional counterparts. Better still, their increased expressive power does not come with extra complexity. Indeed, (1) we investigate their containment and minimization problems, and show that these fundamental problems are in quadratic time for reachability queries and are in cubic time for pattern queries. (2) We develop an algorithm for answering reachability queries, in quadratic time as for their traditional counterpart. (3) We provide two cubictime algorithms for evaluating graph pattern queries based on extended graph simulation, as opposed to the NPcompleteness of graph pattern matching via subgraph isomorphism. (4) The effectiveness, efficiency and scalability of these algorithms are experimentally verified using reallife data and synthetic data. I.
Online exact shortest distance query processing
 Proceedings of the International Conference on Extending Database Technology (EDBT), 2009
"... Shortestpath query processing not only serves as a long established routine for numerous applications in the past but also is of increasing popularity to support novel graph applications in very large databases nowadays. For a large graph, there is the new scenario to query intensively against ar ..."
Abstract

Cited by 19 (4 self)
 Add to MetaCart
Shortestpath query processing not only serves as a long established routine for numerous applications in the past but also is of increasing popularity to support novel graph applications in very large databases nowadays. For a large graph, there is the new scenario to query intensively against arbitrary nodes, asking to quickly return node distance or even shortest paths. And traditional main memory algorithms and shortest paths materialization become inadequate. We are interested in graph labelings to encode the underlying graphs and assign labels to nodes to support efficient query processing. Surprisingly, the existing work of this category mainly emphasizes on reachability query processing, while no sufficient effort has been given to distance labelings to support querying exact shortest distances between nodes. Distance labelings must be developed on the graph in whole to correctly retain node distance information. It makes many existing methods to be inapplicable. We focus on fast computing distanceaware 2hop covers, which can encode the allpairs shortest paths of a graph in O(V  · E1/2) space. Our approach exploits strongly connected components collapsing and graph partitioning to gain speed, while it can overcome the challenges in correctly retaining node distance information and appropriately encoding allpairs shortest paths with small overhead. Furthermore, our approach avoids precomputing allpairs shortest paths, which can be prohibitive over large graphs. We conducted extensive performance studies, and confirm the efficiency of our proposed new approaches. 1.
Efficiently Indexing Shortest Paths by Exploiting Symmetry in Graphs
 In EDBT 2009
"... Shortest path queries (SPQ) are essential in many graph analysis and mining tasks. However, answering shortest path queries onthefly on large graphs is costly. To online answer shortest path queries, we may materialize and index shortest paths. However, a straightforward index of all shortest path ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
(Show Context)
Shortest path queries (SPQ) are essential in many graph analysis and mining tasks. However, answering shortest path queries onthefly on large graphs is costly. To online answer shortest path queries, we may materialize and index shortest paths. However, a straightforward index of all shortest paths in a graph of N vertices takes O(N 2) space. In this paper, we tackle the problem of indexing shortest paths and online answering shortest path queries. As many large real graphs are shown richly symmetric, the central idea of our approach is to use graph symmetry to reduce the index size while retaining the correctness and the efficiency of shortest path query answering. Technically, we develop a framework to index a large graph at the orbit level instead of the vertex level so that the number of breadthfirst
An Optimal Labeling Scheme for Workflow Provenance Using Skeleton Labels
"... We develop a compact and efficient reachability labeling scheme for answering provenance queries on workflow runs that conform to a given specification. Even though a workflow run can be structurally more complex and can be arbitrarily larger than the specification due to fork (parallel) and loop ex ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
(Show Context)
We develop a compact and efficient reachability labeling scheme for answering provenance queries on workflow runs that conform to a given specification. Even though a workflow run can be structurally more complex and can be arbitrarily larger than the specification due to fork (parallel) and loop executions, we show that a compact reachability labeling for a run can be efficiently computed using the fact that it originates from a fixed specification. Our labeling scheme is optimal in the sense that it uses labels of logarithmic length, runs in linear time, and answers any reachability query in constant time. Our approach is based on using the reachability labeling for the specification as an effective skeleton for designing the reachability labeling for workflow runs. We also demonstrate empirically the effectiveness of our skeletonbased labeling approach.
T.K.: Evaluating reachability queries over path collections
, 2008
"... Abstract. Several applications in areas such as biochemistry, GIS, involve storing and querying large volumes of sequential data stored as path collections. There is a number of interesting queries that can be posed on such data. This work focuses on reachability queries: given a path collection an ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Several applications in areas such as biochemistry, GIS, involve storing and querying large volumes of sequential data stored as path collections. There is a number of interesting queries that can be posed on such data. This work focuses on reachability queries: given a path collection and two nodes vs, vt, determine whether a path from vs to vt exists and identify it. To answer these queries, the pathfirst search paradigm, which treats paths as firstclass citizens, is proposed. To improve the performance of our techniques, two indexing structures that capture the reachability information of paths are introduced. Further, methods for updating a path collection and its indices are discussed. Finally, an extensive experimental evaluation verifies the advantages of our approach.