Results 1  10
of
51
On Graph Query Optimization in Large Networks
"... The dramatic proliferation of sophisticated networks has resulted in a growing need for supporting effective querying and mining methods over such largescale graphstructured data. At the core of many advanced network operations lies a common and critical graph query primitive: how to search graph ..."
Abstract

Cited by 32 (3 self)
 Add to MetaCart
The dramatic proliferation of sophisticated networks has resulted in a growing need for supporting effective querying and mining methods over such largescale graphstructured data. At the core of many advanced network operations lies a common and critical graph query primitive: how to search graph structures efficiently within a large network? Unfortunately, the graph query is hard due to the NPcomplete nature of subgraph isomorphism. It becomes even challenging when the network examined is large and diverse. In this paper, we present a high performance graph indexing mechanism, SPath, to address the graph query problem on large networks. SPath leverages decomposed shortest paths around vertex neighborhood as basic indexing units, which prove to be both effective in graph search space pruning and highly scalable in index construction and deployment. Via SPath, a graph query is processed and optimized beyond the traditional vertexatatime fashion to a more efficient pathatatime way: the query is first decomposed to a set of shortest paths, among which a subset of candidates with good selectivity is picked by a query plan optimizer; Candidate paths are further joined together to help recover the query graph to finalize the graph query processing. We evaluate SPath with the stateoftheart GraphQL on both real and synthetic data sets. Our experimental studies demonstrate the effectiveness and scalability of SPath, which proves to be a more practical and efficient indexing method in addressing graph queries on large networks. 1.
Adding Regular Expressions to Graph Reachability and Pattern Queries
 Frontiers of Computer Science
, 2012
"... Abstract—It is increasingly common to find graphs in which edges bear different types, indicating a variety of relationships. For such graphs we propose a class of reachability queries and a class of graph patterns, in which an edge is specified with a regular expression of a certain form, expressin ..."
Abstract

Cited by 29 (5 self)
 Add to MetaCart
(Show Context)
Abstract—It is increasingly common to find graphs in which edges bear different types, indicating a variety of relationships. For such graphs we propose a class of reachability queries and a class of graph patterns, in which an edge is specified with a regular expression of a certain form, expressing the connectivity in a data graph via edges of various types. In addition, we define graph pattern matching based on a revised notion of graph simulation. On graphs in emerging applications such as social networks, we show that these queries are capable of finding more sensible information than their traditional counterparts. Better still, their increased expressive power does not come with extra complexity. Indeed, (1) we investigate their containment and minimization problems, and show that these fundamental problems are in quadratic time for reachability queries and are in cubic time for pattern queries. (2) We develop an algorithm for answering reachability queries, in quadratic time as for their traditional counterpart. (3) We provide two cubictime algorithms for evaluating graph pattern queries based on extended graph simulation, as opposed to the NPcompleteness of graph pattern matching via subgraph isomorphism. (4) The effectiveness, efficiency and scalability of these algorithms are experimentally verified using reallife data and synthetic data. I.
Fast and accurate estimation of shortest paths in large graphs
 In Proceedings of Conference on Information and Knowledge Management (CIKM
, 2010
"... Computing shortest paths between two given nodes is a fundamental operation over graphs, but known to be nontrivial over large diskresident instances of graph data. While a numberoftechniquesexistfor answeringreachabilityqueries and approximating node distances efficiently, determining actual short ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
(Show Context)
Computing shortest paths between two given nodes is a fundamental operation over graphs, but known to be nontrivial over large diskresident instances of graph data. While a numberoftechniquesexistfor answeringreachabilityqueries and approximating node distances efficiently, determining actual shortest paths (i.e. the sequence of nodes involved) is often neglected. However, in applications arising in massive online social networks, biological networks, and knowledge graphs it is often essential to find out many, if not all, shortest paths between two given nodes. In this paper, we address this problem and present a scalable sketchbased index structure that not only supports estimation of node distances, but also computes corresponding shortest paths themselves. Generating the actual path information allows for further improvements to the estimation accuracy of distances (and paths), leading to nearexact shortestpath approximations in real world graphs. We evaluate our techniques – implemented within a fully functional RDF graph database system – over large realworld social and biological networks of sizes ranging from tens of thousand to millions of nodes and edges. Experiments on several datasets show that we can achieve query response times providing several orders of magnitude speedup over traditional path computations while keeping the estimation errors between 0 % and 1 % on average.
Managing large dynamic graphs efficiently
, 2012
"... There is an increasing need to ingest, manage, and query large volumes of graphstructured data arising in applications like social networks, communication networks, biological networks, and so on. Graph databases that can explicitly reason about the graphical nature of the data, that can support fl ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
(Show Context)
There is an increasing need to ingest, manage, and query large volumes of graphstructured data arising in applications like social networks, communication networks, biological networks, and so on. Graph databases that can explicitly reason about the graphical nature of the data, that can support flexible schemas and nodecentric or edgecentric analysis and querying, are ideal for storing such data. However, although there is much work on singlesite graph databases and on efficiently executing different types of queries over large graphs, to date there is little work on understanding the challenges in distributed graph databases, needed to handle the large scale of such data. In this paper, we propose the design of an inmemory, distributed graph data management system aimed at managing a largescale dynamically changing graph, and supporting lowlatency query processing over it. The key challenge in
Online exact shortest distance query processing
 Proceedings of the International Conference on Extending Database Technology (EDBT), 2009
"... Shortestpath query processing not only serves as a long established routine for numerous applications in the past but also is of increasing popularity to support novel graph applications in very large databases nowadays. For a large graph, there is the new scenario to query intensively against ar ..."
Abstract

Cited by 19 (4 self)
 Add to MetaCart
(Show Context)
Shortestpath query processing not only serves as a long established routine for numerous applications in the past but also is of increasing popularity to support novel graph applications in very large databases nowadays. For a large graph, there is the new scenario to query intensively against arbitrary nodes, asking to quickly return node distance or even shortest paths. And traditional main memory algorithms and shortest paths materialization become inadequate. We are interested in graph labelings to encode the underlying graphs and assign labels to nodes to support efficient query processing. Surprisingly, the existing work of this category mainly emphasizes on reachability query processing, while no sufficient effort has been given to distance labelings to support querying exact shortest distances between nodes. Distance labelings must be developed on the graph in whole to correctly retain node distance information. It makes many existing methods to be inapplicable. We focus on fast computing distanceaware 2hop covers, which can encode the allpairs shortest paths of a graph in O(V  · E1/2) space. Our approach exploits strongly connected components collapsing and graph partitioning to gain speed, while it can overcome the challenges in correctly retaining node distance information and appropriately encoding allpairs shortest paths with small overhead. Furthermore, our approach avoids precomputing allpairs shortest paths, which can be prohibitive over large graphs. We conducted extensive performance studies, and confirm the efficiency of our proposed new approaches. 1.
Query preserving graph compression
 In SIGMOD
, 2012
"... It is common to find graphs with millions of nodes and billions of edges in, e.g., social networks. Queries on such graphs are often prohibitively expensive. These motivate us to propose query preserving graph compression, to compress graphs relative to a class Q of queries of users ’ choice. We c ..."
Abstract

Cited by 18 (7 self)
 Add to MetaCart
(Show Context)
It is common to find graphs with millions of nodes and billions of edges in, e.g., social networks. Queries on such graphs are often prohibitively expensive. These motivate us to propose query preserving graph compression, to compress graphs relative to a class Q of queries of users ’ choice. We compute a small Gr from a graph G such that (a) for any query Q ∈ Q, Q(G) = Q′(Gr), where Q ′ ∈ Q can be efficiently computed from Q; and (b) any algorithm for computing Q(G) can be directly applied to evaluating Q ′ on Gr as is. That is, while we cannot lower the complexity of evaluating graph queries, we reduce data graphs while preserving the answers to all the queries in Q. To verify the effectiveness of this approach, (1) we develop compression strategies for two classes of queries: reachability and graph pattern queries via (bounded) simulation. We show that graphs can be efficiently compressed via a reachability equivalence relation and graph bisimulation, respectively, while preserving query answers. (2) We provide techniques for maintaining compressed graph Gr in response to changes ΔG to the original graph G. We show that the incremental maintenance problems are unbounded for the two classes of queries, i.e., their costs are not a function of the size of ΔG and changes in Gr. Nevertheless, we develop incremental algorithms that depend only on ΔG and Gr, independent of G, i.e., we do not have to decompress Gr to propagate the changes. (3) Using reallife data, we experimentally verify that our compression techniques could reduce graphs in average by 95% for reachability and 57 % for graph pattern matching, and that our incremental maintenance algorithms are efficient. Categories and Subject Descriptors F.2 [Analysis of algorithms and problem complexity]: Nonnumerical algorithms and problems—graph compression
An Optimal Labeling Scheme for Workflow Provenance Using Skeleton Labels
"... We develop a compact and efficient reachability labeling scheme for answering provenance queries on workflow runs that conform to a given specification. Even though a workflow run can be structurally more complex and can be arbitrarily larger than the specification due to fork (parallel) and loop ex ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
(Show Context)
We develop a compact and efficient reachability labeling scheme for answering provenance queries on workflow runs that conform to a given specification. Even though a workflow run can be structurally more complex and can be arbitrarily larger than the specification due to fork (parallel) and loop executions, we show that a compact reachability labeling for a run can be efficiently computed using the fact that it originates from a fixed specification. Our labeling scheme is optimal in the sense that it uses labels of logarithmic length, runs in linear time, and answers any reachability query in constant time. Our approach is based on using the reachability labeling for the specification as an effective skeleton for designing the reachability labeling for workflow runs. We also demonstrate empirically the effectiveness of our skeletonbased labeling approach.
T.K.: Evaluating reachability queries over path collections
, 2008
"... Abstract. Several applications in areas such as biochemistry, GIS, involve storing and querying large volumes of sequential data stored as path collections. There is a number of interesting queries that can be posed on such data. This work focuses on reachability queries: given a path collection an ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Several applications in areas such as biochemistry, GIS, involve storing and querying large volumes of sequential data stored as path collections. There is a number of interesting queries that can be posed on such data. This work focuses on reachability queries: given a path collection and two nodes vs, vt, determine whether a path from vs to vt exists and identify it. To answer these queries, the pathfirst search paradigm, which treats paths as firstclass citizens, is proposed. To improve the performance of our techniques, two indexing structures that capture the reachability information of paths are introduced. Further, methods for updating a path collection and its indices are discussed. Finally, an extensive experimental evaluation verifies the advantages of our approach.
Labeling Recursive Workflow Executions OntheFly
, 2011
"... This paper presents a compact labeling scheme for answering reachability queries over workflow executions. In contrast to previous work, our scheme allows nodes (processes and data) in the execution graph to be labeled onthefly, i.e., in a dynamic fashion. In this way, reachability queries can be ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
(Show Context)
This paper presents a compact labeling scheme for answering reachability queries over workflow executions. In contrast to previous work, our scheme allows nodes (processes and data) in the execution graph to be labeled onthefly, i.e., in a dynamic fashion. In this way, reachability queries can be answered as soon as the relevant data is produced. We first show that, in general, for workflows that contain recursion, dynamic labeling of executions requires long (linearsize) labels. Fortunately, most reallife scientific workflows are linear recursive, and for this natural class we show that dynamic, yet compact (logarithmicsize) labeling is possible. Moreover, our scheme labels the executions in linear time, and answers any reachability query in constant time. We also show that linear recursive workflows are, in some sense, the largest class of workflows that allow compact, dynamic labeling schemes. Interestingly, the empirical evaluation, performed over both real and synthetic workflows, shows that our proposed dynamic scheme outperforms the stateoftheart static scheme for large executions, and creates labels that are shorter by a factor of almost 3.