Results 1  10
of
13
Distributed Graph Pattern Matching
"... Graph simulation has been adopted for pattern matching to reduce the complexity and capture the need of novel applications. With the rapid development of the Web and social networks, data is typically distributed over multiple machines. Hence a natural question raised is how to evaluate graph simula ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
Graph simulation has been adopted for pattern matching to reduce the complexity and capture the need of novel applications. With the rapid development of the Web and social networks, data is typically distributed over multiple machines. Hence a natural question raised is how to evaluate graph simulation on distributed data. To our knowledge, no such distributed algorithms are in place yet. This paper settles this question by providing evaluation algorithms and optimizations for graph simulation in a distributed setting. (1) We study the impacts of components and data locality on the evaluation of graph simulation. (2) We give an analysis of a large class of distributed algorithms, captured by a messagepassing model, for graph simulation. We also identify three complexity measures: visit times, makespan and data shipment, for analyzing the distributed algorithms, and show that these measures are essentially controversial with each other. (3) We propose distributed algorithms and optimization techniques that exploit the properties of graph simulation and the analyses of distributed algorithms. (4) We experimentally verify the effectiveness and efficiency of these algorithms, using both reallife and synthetic data. Categories and Subject Descriptors H.2.8 [Database Management]: Database applications— graph data, data mining
Distributed Graph Simulation: Impossibility and Possibility
"... This paper studies fundamental problems for distributed graph simulation. Given a pattern query Q and a graph G that is fragmented and distributed, a graph simulation algorithm A is to compute the matches Q(G) of Q in G. We say that A is parallel scalable in (a) response time if its parallel comput ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
This paper studies fundamental problems for distributed graph simulation. Given a pattern query Q and a graph G that is fragmented and distributed, a graph simulation algorithm A is to compute the matches Q(G) of Q in G. We say that A is parallel scalable in (a) response time if its parallel computational cost is determined by the largest fragment Fm of G and the size Q  of query Q, and (b) data shipment if its total amount of data shipped is determined by Q and the number of fragments of G, independent of the size of graph G. (1) We prove an impossibility theorem: there exists no distributed graph simulation algorithm that is parallel scalable in either response time or data shipment. (2) However, we show that distributed graph simulation is partition bounded, i.e., its response time depends only on Q, Fm  and the number Vf  of nodes in G with edges across different fragments; and its data shipment depends on Q and the number Ef  of crossing edges only. We provide the first algorithms with these performance guarantees. (3) We also identify special cases of patterns and graphs when parallel scalability is possible. (4) We experimentally verify the scalability and efficiency of our algorithms. 1.
Query Optimization of Distributed Pattern Matching
"... Abstract—Greedy algorithms for subgraph pattern matching operations are often sufficient when the graph data set can be held in memory on a single machine. However, as graph data sets increasingly expand and require external storage and partitioning across a cluster of machines, more sophisticated q ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Greedy algorithms for subgraph pattern matching operations are often sufficient when the graph data set can be held in memory on a single machine. However, as graph data sets increasingly expand and require external storage and partitioning across a cluster of machines, more sophisticated query optimization techniques become critical to avoid explosions in query latency. In this paper, we introduce several query optimization techniques for distributed graph pattern matching. These techniques include (1) a SystemR style dynamic programmingbased optimization algorithm that considers both linear and bushy plans, (2) a cycle detectionbased algorithm that leverages cycles to reduce intermediate result set sizes, and (3) a computation reusing technique that eliminates redundant query execution and data transfer over the network. Experimental results show that these algorithms can lead to an order of magnitude improvement in query performance. I.
Querying big graphs within bounded resources
 In SIGMOD
, 2014
"... This paper studies the problem of querying graphs within bounded resources. Given a query Q, a graph G and a small ratio α, it aims to answer Q in G by accessing only a fraction GQ of G of size GQ  ≤ αG. The need for this is evident when G is big while our available resources are limited, as ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
This paper studies the problem of querying graphs within bounded resources. Given a query Q, a graph G and a small ratio α, it aims to answer Q in G by accessing only a fraction GQ of G of size GQ  ≤ αG. The need for this is evident when G is big while our available resources are limited, as indicated by α. We propose resourcebounded query answering via a dynamic scheme that reduces big G to GQ. We investigate when we can find the exact answers Q(G) from GQ, and if GQ cannot accommodate enough information, how accurate the approximate answers Q(GQ) are. To verify the effectiveness of the approach, we study two types of queries. One consists of pattern queries that have data locality, such as subgraph isomorphism and strong simulation. The other is the class of reachability queries, without data locality. We show that it is hard to get resourcebounded algorithms with 100 % accuracy: NPhard for pattern queries, and nonexisting for reachability when α 6 = 1. Despite these, we develop resourcebounded algorithms for answering these queries. Using reallife and synthetic data, we experimentally evaluate the performance of the algorithms. We find that they scale well for both types of queries, and our approximate answers are accurate, even 100 % for small α.
Answering Graph Pattern Queries Using Views
"... Abstract—Answering queries using views has proven an effective technique for querying relational and semistructured data. This paper investigates this issue for graph pattern queries based on (bounded) simulation, which have been increasingly used in, e.g., social network analysis. We propose a not ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Answering queries using views has proven an effective technique for querying relational and semistructured data. This paper investigates this issue for graph pattern queries based on (bounded) simulation, which have been increasingly used in, e.g., social network analysis. We propose a notion of pattern containment to characterize graph pattern matching using graph pattern views. We show that a graph pattern query can be answered using a set of views if and only if the query is contained in the views. Based on this characterization we develop efficient algorithms to answer graph pattern queries. In addition, we identify three problems associated with graph pattern containment. We show that these problems range from quadratictime to NPcomplete, and provide efficient algorithms for containment checking (approximation when the problem is intractable). Using reallife data and synthetic data, we experimentally verify that these methods are able to efficiently answer graph pattern queries on large social graphs, by using views. I.
Multiconstrained graph pattern matching in largescale contextual social graphs
 in ICDE’15, 2015
"... Abstract—Graph Pattern Matching (GPM) plays a significant role in social network analysis, which has been widely used in, for example, experts finding, social community mining and social position detection. Given a pattern graph GQ and a data graph GD, a GPM algorithm finds those subgraphs, GM, that ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Graph Pattern Matching (GPM) plays a significant role in social network analysis, which has been widely used in, for example, experts finding, social community mining and social position detection. Given a pattern graph GQ and a data graph GD, a GPM algorithm finds those subgraphs, GM, that match GQ in GD. However, the existing GPM methods do not consider the multiple constraints on edges in GQ, which are commonly exist in various applications such as, crowdsourcing travel, social network based ecommerce and study group selection, etc. In this paper, we first conceptually extend Bounded Simulation to MultiConstrained Simulation (MCS), and propose a novel NPComplete MultiConstrained Graph Pattern Matching (MCGPM) problem. Then, to address the efficiency issue in largescale MCGPM, we propose a new concept called Strong Social Component (SSC), consisting of participants with strong social connections. We also propose an approach to identify SSCs, and propose a novel index method and a graph compression method for SSC. Moreover, we devise a heuristic algorithm to identify MCGPM results effectively and efficiently without decompressing graphs. An extensive empirical study on five realworld largescale social graphs has demonstrated the effectiveness, efficiency and scalability of our approach. I.
Techniques for Graph Analytics on Big Data
"... Abstract—Graphs enjoy profound importance because of their versatility and expressivity. They can be effectively used to represent social networks, web search engines and genome sequencing. The field of graph pattern matching has been of significant importance and has widespread applications. Conce ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Graphs enjoy profound importance because of their versatility and expressivity. They can be effectively used to represent social networks, web search engines and genome sequencing. The field of graph pattern matching has been of significant importance and has widespread applications. Conceptually, we want to find subgraphs that match a pattern in a given graph. Much work has been done in this field with solutions like Subgraph Isomorphism and Regular Expression matching. With Big Data, scientists are frequently running into massive graphs that have amplified the challenge that this area poses. We study the speedup and communication behavior of three distributed algorithms for inexact graph pattern matching. We also study the impact of different graph partitionings on runtime and network I/O. Our extensive results show that the algorithms exhibit excellent scalable behavior and mincut partitioning can lead to improved performance under some circumstances, and can drastically reduce the network traffic as well. Keywordsgraph analytics; big data; graph simulation; parallel and distributed algorithms. I.
Towards Efficient Query Processing on Massive TimeEvolving Graphs
"... Abstract—Time evolving graph (TEG) is increasingly being used as a paradigm for modeling and analyzing dynamic relationships in many emerging domains such as online social networks, World Wide Web and evolutionary genomics. A timeevolving graph consists of a sequence of snapshots of the graph as it ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Time evolving graph (TEG) is increasingly being used as a paradigm for modeling and analyzing dynamic relationships in many emerging domains such as online social networks, World Wide Web and evolutionary genomics. A timeevolving graph consists of a sequence of snapshots of the graph as it evolves over time. The ability to scalably process various types of queries on massive TEGs is central to building powerful analytic applications for these domains. Unfortunately, indexing techniques and cluster computing schemes that have been designed for static graphs are not very effective for processing massive TEGs. Towards designing scalable mechanisms for answering TEG queries, this paper studies three important problems. The first is the distribution of TEG data on the nodes of a cluster computing framework such as Pregel or Giraph so that the computing and communication resources of the cluster are effectively harnessed. The second is the answering of reachability queries on any snapshot of a TEG and the third is that of processing pattern matching queries in TEGs. For each problem, we provide a brief literature survey and explain why trivial extensions of static graph techniques are not adequate for TEGs. We also present our preliminary ideas towards addressing these problems and discuss their benefits. I.
Event Pattern Matching over Graph Streams
"... A graph is a fundamental and general data structure underlying all data applications. Many applications today call for the management and query capabilities directly on graphs. Real time graph streams, as seen in road networks, social and communication networks, and web requests, are such applicat ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
A graph is a fundamental and general data structure underlying all data applications. Many applications today call for the management and query capabilities directly on graphs. Real time graph streams, as seen in road networks, social and communication networks, and web requests, are such applications. Event pattern matching requires the awareness of graph structures, which is different from traditional complex event processing. It also requires a focus on the dynamicity of the graph, time order constraints in patterns, and online query processing, which deviates significantly from previous work on subgraph matching as well. We study the semantics and efficient online algorithms for this important and intriguing problem, and evaluate our approaches with extensive experiments over real world datasets in four different domains. 1.
AIncremental Graph Pattern Matching
"... Graph pattern matching is commonly used in a variety of emerging applications such as social network analysis. These applications highlight the need for studying the following two issues. First, graph pattern matching is traditionally defined in terms of subgraph isomorphism or graph simulation. The ..."
Abstract
 Add to MetaCart
Graph pattern matching is commonly used in a variety of emerging applications such as social network analysis. These applications highlight the need for studying the following two issues. First, graph pattern matching is traditionally defined in terms of subgraph isomorphism or graph simulation. These notions, however, often impose too strong a topological constraint on graphs to identify meaningful matches. Second, in practice a graph is typically large, and is frequently updated with small changes. It is often prohibitively expensive to recompute matches starting from scratch via batch algorithms when the graph is updated. This paper studies these two issues. (1) We propose to define graph pattern matching based on a notion of bounded simulation, which extends graph simulation by specifying the connectivity of nodes in a graph within a predefined number of hops. We show that bounded simulation is able to find sensible matches that the traditional matching notions fail to catch. We also show that matching via bounded simulation is in cubictime, by giving such an algorithm. (2) We provide an account of results on incremental graph pattern matching, for matching defined with graph simulation, bounded simulation and subgraph isomorphism. We show that the incremental matching problem is unbounded, i.e., its cost is not determined alone by the size of the changes in the input and output, for all these matching notions. Nonetheless, when matching is defined in terms of simulation or bounded simulation, incremental matching is semibounded, i.e., its worst