Results 1  10
of
17
Incremental graph pattern matching
 In SIGMOD
, 2011
"... Graph pattern matching has become a routine process in emerging applications such as social networks. In practice a data graph is typically large, and is frequently updated with small changes. It is often prohibitively expensive to recompute matches from scratch via batch algorithms when the graph ..."
Abstract

Cited by 22 (7 self)
 Add to MetaCart
Graph pattern matching has become a routine process in emerging applications such as social networks. In practice a data graph is typically large, and is frequently updated with small changes. It is often prohibitively expensive to recompute matches from scratch via batch algorithms when the graph is updated. With this comes the need for incremental algorithms that compute changes to the matches in response to updates, to minimize unnecessary recomputation. This paper investigates incremental algorithms for graph pattern matching defined in terms of graph simulation, bounded simulation and subgraph isomorphism. (1) For simulation, we provide incremental algorithms for unit updates and certain graph patterns. These algorithms are optimal: in linear time in the size of the changes in the input and output, which characterizes the cost that is inherent to the problem itself. For general patterns we show that the incremental matching problem is unbounded, i.e., its cost is not determined by the size of the changes alone. (2) For bounded simulation, we show that the problem is unbounded even for unit updates and path patterns. (3) For subgraph isomorphism, we show that the problem is intractable and unbounded for unit updates and path patterns. (4) For multiple updates, we develop an incremental algorithm for each of simulation, bounded simulation and subgraph isomorphism. We experimentally verify that these incremental algorithms significantly outperform their batch counterparts in response to small changes, using reallife data and synthetic data. Categories and Subject Descriptors: F.2 [Analysis of algorithms and problem complexity]: Nonnumerical algorithms and problems[pattern matching]
Capturing Topology in Graph Pattern Matching
"... Graph pattern matching is often defined in terms of subgraph isomorphism, an npcomplete problem. To lower its complexity, various extensions of graph simulation have been considered instead. These extensions allow pattern matching to be conducted in cubictime. However, they fall short of capturing ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
(Show Context)
Graph pattern matching is often defined in terms of subgraph isomorphism, an npcomplete problem. To lower its complexity, various extensions of graph simulation have been considered instead. These extensions allow pattern matching to be conducted in cubictime. However, they fall short of capturing the topology of data graphs, i.e., graphs may have a structure drastically different from pattern graphs they match, and the matches found are often too large to understand and analyze. To rectify these problems, this paper proposes a notion of strong simulation, a revision of graph simulation, for graph pattern matching. (1) We identify a set of criteria for preserving the topology of graphs matched. We show that strong simulation preserves the topology of data graphs and finds a bounded number of matches. (2) We show that strong simulation retains the same complexity as earlier extensions of simulation, by providing a cubictime algorithm for computing strong simulation. (3) We present the locality property of strong simulation, which allows us to effectively conduct pattern matching on distributed graphs. (4) We experimentally verify the effectiveness and efficiency of these algorithms, using reallife data and synthetic data. 1.
Distributed Graph Pattern Matching
"... Graph simulation has been adopted for pattern matching to reduce the complexity and capture the need of novel applications. With the rapid development of the Web and social networks, data is typically distributed over multiple machines. Hence a natural question raised is how to evaluate graph simula ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
Graph simulation has been adopted for pattern matching to reduce the complexity and capture the need of novel applications. With the rapid development of the Web and social networks, data is typically distributed over multiple machines. Hence a natural question raised is how to evaluate graph simulation on distributed data. To our knowledge, no such distributed algorithms are in place yet. This paper settles this question by providing evaluation algorithms and optimizations for graph simulation in a distributed setting. (1) We study the impacts of components and data locality on the evaluation of graph simulation. (2) We give an analysis of a large class of distributed algorithms, captured by a messagepassing model, for graph simulation. We also identify three complexity measures: visit times, makespan and data shipment, for analyzing the distributed algorithms, and show that these measures are essentially controversial with each other. (3) We propose distributed algorithms and optimization techniques that exploit the properties of graph simulation and the analyses of distributed algorithms. (4) We experimentally verify the effectiveness and efficiency of these algorithms, using both reallife and synthetic data. Categories and Subject Descriptors H.2.8 [Database Management]: Database applications— graph data, data mining
Diversified topk graph pattern matching
 PVLDB
"... Graph pattern matching has been widely used in e.g., social data analysis. A number of matching algorithms have been developed that, given a graph pattern Q and a graph G, compute the set M(Q;G) of matches of Q in G. However, these algorithms often return an excessive number of matches, and are ex ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
Graph pattern matching has been widely used in e.g., social data analysis. A number of matching algorithms have been developed that, given a graph pattern Q and a graph G, compute the set M(Q;G) of matches of Q in G. However, these algorithms often return an excessive number of matches, and are expensive on large reallife social graphs. Moreover, in practice many social queries are to find matches of a specific pattern node, rather than the entire M(Q;G). This paper studies topk graph pattern matching. (1) We revise graph pattern matching defined in terms of simulation, by supporting a designated output node uo. Given G and Q, it is to find those nodes in M(Q;G) that match uo, instead of the large setM(Q;G). (2) We study two classes of functions for ranking the matches: relevance functions r() based on, e.g., social impact, and distance functions d() to cover diverse elements. (3) We develop two algorithms for computing topk matches of uo based on r(), with the early termination property, i.e., they find topk matches without computing the entireM(Q;G). (4) We also study diversified topk matching, a bicriteria optimization problem based on both r() and d(). We show that its decision problem is NPcomplete. Nonetheless, we provide an approximation algorithm with performance guarantees and a heuristic one with the early termination property. (5) Using reallife and synthetic data, we experimentally verify that our (diversified) topk matching algorithms are effective, and outperform traditional matching algorithms in efficiency. 1.
Distributed Graph Simulation: Impossibility and Possibility
"... This paper studies fundamental problems for distributed graph simulation. Given a pattern query Q and a graph G that is fragmented and distributed, a graph simulation algorithm A is to compute the matches Q(G) of Q in G. We say that A is parallel scalable in (a) response time if its parallel comput ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
This paper studies fundamental problems for distributed graph simulation. Given a pattern query Q and a graph G that is fragmented and distributed, a graph simulation algorithm A is to compute the matches Q(G) of Q in G. We say that A is parallel scalable in (a) response time if its parallel computational cost is determined by the largest fragment Fm of G and the size Q  of query Q, and (b) data shipment if its total amount of data shipped is determined by Q and the number of fragments of G, independent of the size of graph G. (1) We prove an impossibility theorem: there exists no distributed graph simulation algorithm that is parallel scalable in either response time or data shipment. (2) However, we show that distributed graph simulation is partition bounded, i.e., its response time depends only on Q, Fm  and the number Vf  of nodes in G with edges across different fragments; and its data shipment depends on Q and the number Ef  of crossing edges only. We provide the first algorithms with these performance guarantees. (3) We also identify special cases of patterns and graphs when parallel scalability is possible. (4) We experimentally verify the scalability and efficiency of our algorithms. 1.
Answering Graph Pattern Queries Using Views
"... Abstract—Answering queries using views has proven an effective technique for querying relational and semistructured data. This paper investigates this issue for graph pattern queries based on (bounded) simulation, which have been increasingly used in, e.g., social network analysis. We propose a not ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Answering queries using views has proven an effective technique for querying relational and semistructured data. This paper investigates this issue for graph pattern queries based on (bounded) simulation, which have been increasingly used in, e.g., social network analysis. We propose a notion of pattern containment to characterize graph pattern matching using graph pattern views. We show that a graph pattern query can be answered using a set of views if and only if the query is contained in the views. Based on this characterization we develop efficient algorithms to answer graph pattern queries. In addition, we identify three problems associated with graph pattern containment. We show that these problems range from quadratictime to NPcomplete, and provide efficient algorithms for containment checking (approximation when the problem is intractable). Using reallife data and synthetic data, we experimentally verify that these methods are able to efficiently answer graph pattern queries on large social graphs, by using views. I.
Multiconstrained graph pattern matching in largescale contextual social graphs
 in ICDE’15, 2015
"... Abstract—Graph Pattern Matching (GPM) plays a significant role in social network analysis, which has been widely used in, for example, experts finding, social community mining and social position detection. Given a pattern graph GQ and a data graph GD, a GPM algorithm finds those subgraphs, GM, that ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Graph Pattern Matching (GPM) plays a significant role in social network analysis, which has been widely used in, for example, experts finding, social community mining and social position detection. Given a pattern graph GQ and a data graph GD, a GPM algorithm finds those subgraphs, GM, that match GQ in GD. However, the existing GPM methods do not consider the multiple constraints on edges in GQ, which are commonly exist in various applications such as, crowdsourcing travel, social network based ecommerce and study group selection, etc. In this paper, we first conceptually extend Bounded Simulation to MultiConstrained Simulation (MCS), and propose a novel NPComplete MultiConstrained Graph Pattern Matching (MCGPM) problem. Then, to address the efficiency issue in largescale MCGPM, we propose a new concept called Strong Social Component (SSC), consisting of participants with strong social connections. We also propose an approach to identify SSCs, and propose a novel index method and a graph compression method for SSC. Moreover, we devise a heuristic algorithm to identify MCGPM results effectively and efficiently without decompressing graphs. An extensive empirical study on five realworld largescale social graphs has demonstrated the effectiveness, efficiency and scalability of our approach. I.
Techniques for Graph Analytics on Big Data
"... Abstract—Graphs enjoy profound importance because of their versatility and expressivity. They can be effectively used to represent social networks, web search engines and genome sequencing. The field of graph pattern matching has been of significant importance and has widespread applications. Conce ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Graphs enjoy profound importance because of their versatility and expressivity. They can be effectively used to represent social networks, web search engines and genome sequencing. The field of graph pattern matching has been of significant importance and has widespread applications. Conceptually, we want to find subgraphs that match a pattern in a given graph. Much work has been done in this field with solutions like Subgraph Isomorphism and Regular Expression matching. With Big Data, scientists are frequently running into massive graphs that have amplified the challenge that this area poses. We study the speedup and communication behavior of three distributed algorithms for inexact graph pattern matching. We also study the impact of different graph partitionings on runtime and network I/O. Our extensive results show that the algorithms exhibit excellent scalable behavior and mincut partitioning can lead to improved performance under some circumstances, and can drastically reduce the network traffic as well. Keywordsgraph analytics; big data; graph simulation; parallel and distributed algorithms. I.
ExpFinder: Finding Experts by Graph Pattern Matching
"... Abstract—We present ExpFinder, a system for finding experts in social networks based on graph pattern matching. We demonstrate (1) how ExpFinder identifies topK experts in a social network by supporting bounded simulation of graph patterns, and by ranking the matches based on a metric for social i ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract—We present ExpFinder, a system for finding experts in social networks based on graph pattern matching. We demonstrate (1) how ExpFinder identifies topK experts in a social network by supporting bounded simulation of graph patterns, and by ranking the matches based on a metric for social impact; (2) how it copes with the sheer size of reallife social graphs by supporting incremental query evaluation and query preserving graph compression, and (3) how the GUI of ExpFinder interacts with users to help them construct queries and inspect matches. I.
unknown title
"... Graph pattern matching has been widely used in e.g., social data analysis. A number of matching algorithms have been developed that, given a graph pattern Q and a graph G, compute the set M(Q,G) of matches of Q in G. However, these algorithms often return an excessive number of matches, and are expe ..."
Abstract
 Add to MetaCart
(Show Context)
Graph pattern matching has been widely used in e.g., social data analysis. A number of matching algorithms have been developed that, given a graph pattern Q and a graph G, compute the set M(Q,G) of matches of Q in G. However, these algorithms often return an excessive number of matches, and are expensive on large reallife social graphs. Moreover, inpracticemanysocialqueriesaretofindmatches of a specific pattern node, rather than the entire M(Q,G). This paper studies topk graph pattern matching. (1) We revise graph pattern matching defined in terms of simulation, by supporting a designated output node uo. Given G and Q, it is to find those nodes in M(Q,G) that match uo, instead of thelarge set M(Q,G). (2) Westudy twoclasses of functions for ranking the matches: relevance functions δr() based on, e.g., social impact, and distance functions δd() to cover diverse elements. (3) We develop two algorithms for computing topk matches of uo based on δr(), with the early termination property, i.e., they find topk matches without computing the entire M(Q,G). (4) We also study diversified topk matching, a bicriteria optimization problem based on both δr() and δd(). We show that its decision problem is NPcomplete. Nonetheless, we provide an approximation algorithm with performance guarantees and a heuristic one with the early termination property. (5) Using reallife and synthetic data, we experimentally verify that our (diversified) topk matching algorithms are effective, and outperform traditional matching algorithms in efficiency. 1.