Results 1 
7 of
7
Distributed Graph Simulation: Impossibility and Possibility
"... This paper studies fundamental problems for distributed graph simulation. Given a pattern query Q and a graph G that is fragmented and distributed, a graph simulation algorithm A is to compute the matches Q(G) of Q in G. We say that A is parallel scalable in (a) response time if its parallel comput ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
This paper studies fundamental problems for distributed graph simulation. Given a pattern query Q and a graph G that is fragmented and distributed, a graph simulation algorithm A is to compute the matches Q(G) of Q in G. We say that A is parallel scalable in (a) response time if its parallel computational cost is determined by the largest fragment Fm of G and the size Q  of query Q, and (b) data shipment if its total amount of data shipped is determined by Q and the number of fragments of G, independent of the size of graph G. (1) We prove an impossibility theorem: there exists no distributed graph simulation algorithm that is parallel scalable in either response time or data shipment. (2) However, we show that distributed graph simulation is partition bounded, i.e., its response time depends only on Q, Fm  and the number Vf  of nodes in G with edges across different fragments; and its data shipment depends on Q and the number Ef  of crossing edges only. We provide the first algorithms with these performance guarantees. (3) We also identify special cases of patterns and graphs when parallel scalability is possible. (4) We experimentally verify the scalability and efficiency of our algorithms. 1.
On Implementing ProvenanceAware Regular Path Queries with Relational Query Engines
"... Use of graphs is growing rapidly in social networks, semantic web, biological databases, scientific workflow provenance, and other areas. Regular Path Queries (RPQs) can be seen as a core graph query language to answer patternbased reachability queries. Unfortunately, the number of freely available ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Use of graphs is growing rapidly in social networks, semantic web, biological databases, scientific workflow provenance, and other areas. Regular Path Queries (RPQs) can be seen as a core graph query language to answer patternbased reachability queries. Unfortunately, the number of freely available systems for querying graphs using RPQs is rather limited, and available implementations do not provide direct support for a number of desirable variants of RPQs, e.g., to return those edges that are contained in some (or all) paths that match the given regular expression R. Thus, by returning not just a pair (x, y) of end points of paths that match R, but also “witness edges ” (u, v) inbetween, our RPQ variants can be understood as returning additional provenance information about the answer (x, y), i.e., those edges (u, v) that are in some (or all) paths from x to y matching R. We propose a number of such RPQ variants and show how they can be implemented using either Datalog or a suitable RDBMS. Our initial experimental results indicate that RPQs and our provenanceaware variants (RPQProv), when implemented using conventional relational technologies, yield reasonable performance even for relatively large graphs. On the other hand, the overhead associated with some of these variants also makes efficient handling of provenanceaware graph queries an interesting challenge for future research.
Efficient Query Evaluation on Distributed Graphs with Hadoop Environment
"... Graph has emerged as a powerful data structure to describe various data. Query evaluation on distributed graphs takes much cost due to the complexity of links among sites. Dan Suciu has proposed algorithms for query evaluation on semistructured data that is a rooted, edgelabeled graph, and algorit ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Graph has emerged as a powerful data structure to describe various data. Query evaluation on distributed graphs takes much cost due to the complexity of links among sites. Dan Suciu has proposed algorithms for query evaluation on semistructured data that is a rooted, edgelabeled graph, and algorithms are proved to be efficient in terms of communication steps and data transferring during the evaluation. However, one disadvantage is that communication data are collected to one single site, which leads to a bottleneck in the evaluation for reallife data. In this paper, we propose two algorithms to improve Dan Suciu’s algorithms: onepass algorithm is to significantly reduce a large amount of redundant data in the evaluation, and iter acc algorithm is to resolve the bottleneck. Then, we design an efficient implementation with only one MapReduce job for our algorithms in Hadoop environment by utilizing features of Hadoop file system. Experiments on cloud system show that onepass algorithm can detect and remove 50 % of data being redundant in the evaluation process on YouTube and DBLP datasets, and iter acc algorithm is running without the bottleneck even when we double the size of input data.
Simple, Fast, and Scalable Reachability Oracle
"... A reachability oracle (or hop labeling) assigns each vertex v two sets of vertices: Lout(v) and Lin(v), such that u reaches v iff Lout(u) ∩ Lin(v) = ∅. Despite their simplicity and elegance, reachability oracles have failed to achieve efficiency in more than ten years since their introduction: The ..."
Abstract
 Add to MetaCart
(Show Context)
A reachability oracle (or hop labeling) assigns each vertex v two sets of vertices: Lout(v) and Lin(v), such that u reaches v iff Lout(u) ∩ Lin(v) = ∅. Despite their simplicity and elegance, reachability oracles have failed to achieve efficiency in more than ten years since their introduction: The main problem is high construction cost, which stems from a setcover framework and the need to materialize transitive closure. In this paper, we present two simple and efficient labeling algorithms, HierarchicalLabeling and DistributionLabeling, which can work on massive realworld graphs: Their construction time is an order of magnitude faster than the setcover based labeling approach, and transitive closure materialization is not needed. On large graphs, their index sizes and their query performance can now beat the stateoftheart transitive closure compression and online search approaches.
Minimizing Data Transfers for Regular Reachability Queries on Distributed Graphs
"... Nowadays, there is an explosion of Internet information, which is normally distributed on different sites. Hence, efficient finding information becomes difficult. Efficient query evaluation on distributed graphs is an important research topic since it can be used in real applications such as: soci ..."
Abstract
 Add to MetaCart
(Show Context)
Nowadays, there is an explosion of Internet information, which is normally distributed on different sites. Hence, efficient finding information becomes difficult. Efficient query evaluation on distributed graphs is an important research topic since it can be used in real applications such as: social network analysis, web mining, ontology matching, etc. A widelyused query on distributed graphs is the regular reachability query (RRQ). A RRQ verifies whether a node can reach another node by a path satisfying a regular expression. Traditionally RRQs are evaluated by distributed depthfirst search or distributed breadthfirst search methods. However, these methods are restricted by the total network traffic and the response time on large graphs. Recently, Wenfei Fan et al. proposed an approach for improving reachability queries by visiting each site only once, but it has a communication bottleneck problem when assembling all distributed partial query results. In this paper, we propose two algorithms in order to improve Wenfei Fan’s algorithm for RRQs. The first algorithm filters and removes redundant nodes/edges on each local site, in parallel. The second algorithm limits the data transfers by local contraction of the partial result. We extensively evaluated our algorithms on MapReduce using YouTube and DBLP datasets. The experimental results show that our method reduces unnecessary data transfers at most 60%, this solves the communication bottleneck problem.
A1 Am
"... Find all matches of a pattern in a graph Pattern matching in social graphs Identify suspects in a drug ring 3 “Understanding the structure of drug trafficking organizations” pattern graph B ..."
Abstract
 Add to MetaCart
Find all matches of a pattern in a graph Pattern matching in social graphs Identify suspects in a drug ring 3 “Understanding the structure of drug trafficking organizations” pattern graph B
Processing SPARQL Queries Over Linked Data— A Distributed Graphbased Approach
"... We propose techniques for processing SPARQL queries over linked data. We follow a graphbased approach where answering a query Q is equivalent to finding its matches over a distributed RDF data graph G. We adopt a “partial evaluation and assembly ” framework. Partial evaluation results of query Q ov ..."
Abstract
 Add to MetaCart
(Show Context)
We propose techniques for processing SPARQL queries over linked data. We follow a graphbased approach where answering a query Q is equivalent to finding its matches over a distributed RDF data graph G. We adopt a “partial evaluation and assembly ” framework. Partial evaluation results of query Q over each repository—called local partial match—are found. In the assembly stage, we propose a centralized and a distributed assembly strategy. We analyze our algorithms both theoretically and the experimentally. Extensive experiments over both real and benchmark RDF repositories with billion triples demonstrate the high performance and scalability of our methods compared with that of the existing solutions. 1.