#### DMCA

## Capturing Topology in Graph Pattern Matching

Citations: | 13 - 6 self |

### Citations

10440 | Introduction to Algorithms
- Cormen, Leiserson, et al.
- 1989
(Show Context)
Citation Context ...all, ExtractMaxPG finds its perfect subgraph in O(|V |) time since finding pairwise disconnected components is linear-time equivalent to finding strongly connected components, which is in linear time =-=[13]-=-. By leveraging the algorithm developed in [24], DualSim canbedoneinO((|Vq| + |Eq|)(|V | + |E|)) time. Thus Match is in O(|V |(|V | +(|Vq|+ |Eq|)(|V | + |E|))) time. 4.2 Optimization Techniques We nex... |

3894 |
Communication and Concurrency
- Milner
- 1989
(Show Context)
Citation Context ...roceedings of the VLDB Endowment, Vol. 5, No. 4 Copyright . 2011 VLDB Endowment 2150-8097/11/12... $ 10.00. Figure 1: Social matching: query and data graphs To reduce the complexity, graph simulation =-=[29]-=- has been adopted for pattern matching. A graph G matches a pattern Q via graph simulation if there exists a binary relation S ⊆ VQ × V ,whereVQ and V are the set of nodes in Q and G, respectively,suc... |

3206 | MapReduce: Simplified Data Processing on Large Clusters
- Dean, Ghemawat, et al.
- 2004
(Show Context)
Citation Context ...in distributed graphs. To the best of our knowledge, this is among the first distributed algorithms for graph pattern matching, for which the need is evident when processing massive graphs (see e.g., =-=[15, 21, 28]-=-). (4) Using both real-life data (Amazon and YouTube) and synthetic data, we conduct an extensive experimental study (Section 5). We find that our algorithms for strong simulation scale well with larg... |

2802 | Computational Complexity
- Papadimitriou
- 1994
(Show Context)
Citation Context ...undary in Matching One might want to find a notion of graph pattern matching that preserves maximum graph topology, and characterize ptime along the same lines as how Fagin’s theorem characterizes np =-=[30]-=-. This is, however, very challenging. Indeed, as observed in [23], in graph theory Fagin’s theorem implies that “if no logic captures ptime, thenptime ̸= np”. Below we present two negative results: ex... |

1838 | Foundations of Databases - Abiteboul, Hull, et al. - 1995 |

1153 | Graph Theory
- Diestel
- 1997
(Show Context)
Citation Context ...m 2. (3) The correctness of DualSim can be verified along the same lines as its counterpart for simulation [24]. It takes BuildBall O(|V |+|E|) time to build a ball ˆ G[v, dQ] by using the BFS method =-=[16]-=-. For each ball, ExtractMaxPG finds its perfect subgraph in O(|V |) time since finding pairwise disconnected components is linear-time equivalent to finding strongly connected components, which is in ... |

651 |
Data on the Web: From Relations to Semistructured Data and XML
- Abiteboul, Buneman, et al.
- 1999
(Show Context)
Citation Context ...ty) on graph simulation. Schema extraction is to discover the implicit structure of semi-structured data, which has no schema predefined. It has proved effective in query formulation and optimization =-=[2, 22]-=-. Schema of semi-structured data is often extracted via a mild generalization of simulation that deals with labeled edges [2]. Nevertheless, topology preservation is not an issue in schema extraction,... |

566 | Dataguides: Enabling query formulation and optimization in semistructured databases
- Goldman, Widom
- 1997
(Show Context)
Citation Context ...ty) on graph simulation. Schema extraction is to discover the implicit structure of semi-structured data, which has no schema predefined. It has proved effective in query formulation and optimization =-=[2, 22]-=-. Schema of semi-structured data is often extracted via a mild generalization of simulation that deals with labeled edges [2]. Nevertheless, topology preservation is not an issue in schema extraction,... |

465 | Pregel: a system for large-scale graph processing
- Malewicz, Austern, et al.
- 2010
(Show Context)
Citation Context ...in distributed graphs. To the best of our knowledge, this is among the first distributed algorithms for graph pattern matching, for which the need is evident when processing massive graphs (see e.g., =-=[15, 21, 28]-=-). (4) Using both real-life data (Amazon and YouTube) and synthetic data, we conduct an extensive experimental study (Section 5). We find that our algorithms for strong simulation scale well with larg... |

336 | An Algorithm for Subgraph Isomorphism
- Ullmann
- 1976
(Show Context)
Citation Context ... pattern node u in Q, u and f(u) have the same label, and (2) there exists an edge (u, u ′ )inQ if and only if (f(u),f(u ′ )) is an edge in Gs. However, subgraph isomorphism is an np-complete problem =-=[34]-=-. Moreover, there are possibly exponential many subgraphs in G that match Q. In addition, as observed in [6, 19], it is often too restrictive to catch sensible matches, as it requires matches to have ... |

335 |
The igraph software package for complex network research
- Csardi, Nepusz
- 2006
(Show Context)
Citation Context ...hing algorithm, denoted by MCS, that utilizes the approximation algorithm of [25] for computing maximum common subgraphs. We used the VF2 algorithm [12] for subgraph isomorphism in the igraph package =-=[14]-=-. Consider pattern graph Q(Vq,Eq) and data graph G(V , E). For approximate matching algorithms TALE and MCS, there are essentially 2 |V | number of subgraphs of G to compare with Q, beyond reach in pr... |

310 | The state of the art in distributed query processing
- Kossmann
(Show Context)
Citation Context ... (e.g., [10]), graph simulation [8] and graph pattern queries [18]. This work explores it for pattern matching via strong simulation. Distributed query processing has been studied for relational data =-=[26]-=- and XML [9, 11]. There has also been recent work on distributed graph processing to manage large-scale graphs [15, 21, 28]. However, to the best of our knowledge, no previous work has studied distrib... |

184 | Computing simulations on finite and infinite graphs
- Henzinger, Henzinger, et al.
- 1995
(Show Context)
Citation Context ... Q, thereexistsv in G such that (a) (u, v) ∈ S, and (b) for each edge (u, u ′ ) in Q, there exists an edge (v, v ′ )inG such that (u ′ ,v ′ ) ∈ S. Graph simulation can be determined in quadratic time =-=[24]-=-. Recently this notion has been extended by mapping edges in Q to (bounded) paths in G [19, 18], with a cubic-time complexity, to identify matches in, e.g., social networks. Nevertheless, the low comp... |

179 |
A (sub)graph isomorphism algorithm for matching large graphs
- Cordella, Foggia, et al.
- 2004
(Show Context)
Citation Context ... algorithm TALE of [32], and (4) an approximate matching algorithm, denoted by MCS, that utilizes the approximation algorithm of [25] for computing maximum common subgraphs. We used the VF2 algorithm =-=[12]-=- for subgraph isomorphism in the igraph package [14]. Consider pattern graph Q(Vq,Eq) and data graph G(V , E). For approximate matching algorithms TALE and MCS, there are essentially 2 |V | number of ... |

97 | Social matching: A framework and research agenda
- Terveen, McDonald
- 2005
(Show Context)
Citation Context ... and a pattern Q with drastically different structures. (2) The match relation S is often too large to understand and analyze, as illustrated below. Example 1: Consider a real-life example taken from =-=[31]-=-. A headhunter wants to find a biologist (Bio) to help a group of software engineers (SEs) analyze genetic data. To do this, she uses an expertise recommendation network G1, as depicted in Fig. 1. In ... |

70 | Managing and Mining Graph Data
- Aggarwal, Wang
- 2010
(Show Context)
Citation Context ...e analysis [27, 32, 33, 35]. Given a pattern graph Q and a data graph G, it is to find all subgraphs of G that match Q. Here matching is typically defined in terms of subgraph isomorphism (see, e.g., =-=[4, 20]-=- for surveys): a subgraph Gs of G matches Q if there exists a bijective function f from the nodes of Q to the nodes in Gs such that (1) for each pattern node u in Q, u and f(u) have the same label, an... |

67 | GPLAG: Detection of Software Plagiarism by Program Dependence Graph Analysis
- Liu, Chen, et al.
- 2006
(Show Context)
Citation Context ... synthetic data. 1. Introduction Graph pattern matching is being increasingly used in a number of applications, e.g., software plagiarism detection, biology, social networks and intelligence analysis =-=[27, 32, 33, 35]-=-. Given a pattern graph Q and a data graph G, it is to find all subgraphs of G that match Q. Here matching is typically defined in terms of subgraph isomorphism (see, e.g., [4, 20] for surveys): a sub... |

62 |
Tale: a tool for approximate large graph matching
- Tian, Patel
- 2008
(Show Context)
Citation Context ... synthetic data. 1. Introduction Graph pattern matching is being increasingly used in a number of applications, e.g., software plagiarism detection, biology, social networks and intelligence analysis =-=[27, 32, 33, 35]-=-. Given a pattern graph Q and a data graph G, it is to find all subgraphs of G that match Q. Here matching is typically defined in terms of subgraph isomorphism (see, e.g., [4, 20] for surveys): a sub... |

51 | Graph pattern matching: from intractable to polynomial time
- Fan, Li, et al.
- 2010
(Show Context)
Citation Context ...graph processing to manage large-scale graphs [15, 21, 28]. However, to the best of our knowledge, no previous work has studied distributed computation 311of graph simulation [29] and its extensions =-=[19, 18]-=-, not to mention strong simulation proposed in this work. 2. Strong Simulation In this section, we first present basic notations of graphs. We then introduce the notion of strong simulation. 2.1 Preli... |

51 | Fast best-effort pattern matching in large attributed graphs
- Tong, Faloutsos, et al.
- 2007
(Show Context)
Citation Context ... synthetic data. 1. Introduction Graph pattern matching is being increasingly used in a number of applications, e.g., software plagiarism detection, biology, social networks and intelligence analysis =-=[27, 32, 33, 35]-=-. Given a pattern graph Q and a data graph G, it is to find all subgraphs of G that match Q. Here matching is typically defined in terms of subgraph isomorphism (see, e.g., [4, 20] for surveys): a sub... |

31 | Özsu. Distance-join: Pattern match query in a large graph database
- Zou, Chen, et al.
- 2009
(Show Context)
Citation Context |

29 | Adding regular expressions to graph reachability and pattern queries
- Fan, Li, et al.
- 2011
(Show Context)
Citation Context ... exists an edge (v, v ′ )inG such that (u ′ ,v ′ ) ∈ S. Graph simulation can be determined in quadratic time [24]. Recently this notion has been extended by mapping edges in Q to (bounded) paths in G =-=[19, 18]-=-, with a cubic-time complexity, to identify matches in, e.g., social networks. Nevertheless, the low complexity comes with a price: (1) simulation and its extensions [19, 18] do not preserve the topol... |

26 | Matching structure and semantics: A survey on graph-based pattern matching
- Gallagher
- 2006
(Show Context)
Citation Context ...0 1.35 (e) Vary |V |×10 3 (Amazon) (f) Vary |V |×10 3 (YouTube) (g) Vary |V |×10 6 (synthetic) (h) Vary α (synthetic) Figure 8: Performance evaluation of centralized algorithms #nodes [0, 9] [10, 19] =-=[20, 29]-=- [30, 39] [40, 49] ≥ 50 Amazon 0 98 23 0 0 0 YouTube 0 21 18 1 1 0 Synthetic 0 187 113 65 6 0 Table 3: Sizes of matched subgraphs (4) In the same setting as (2) for testing closeness with largest poss... |

23 |
Simulation-based minimization
- Bustan, Grumberg
- 2003
(Show Context)
Citation Context ...re has studied how simulation should be refined to capture topology. Query minimization, as a classical optimization technique, has been well studied for sql [3], XPath (e.g., [10]), graph simulation =-=[8]-=- and graph pattern queries [18]. This work explores it for pattern matching via strong simulation. Distributed query processing has been studied for relational data [26] and XML [9, 11]. There has als... |

21 | Challenges in searching online communities
- Yahia, Benedikt, et al.
(Show Context)
Citation Context ... V toalabell(u) inΣ. Wedenote G as (V,E) when it is clear from the context. Intuitively, the function l() specifies node attributes, e.g., keywords, blogs, comments, ratings, names, emails, companies =-=[5]-=-; and the label set Σ denotes all such attributes. We next review some basic notations of graphs. Subgraphs. Graph H(Vs,Es,lH) is a subgraph of graph G(V,E,lG), denoted as G[Vs,Es], if (1) for each no... |

20 |
The boundaries of trust: own and others
- Buchan, Croson
(Show Context)
Citation Context ...stance dist(v, v ′ ) ≤ r, and(2)ithasexactly the edges that appear in G overthesamenodeset. We define the locality by requiring matches to be within a ball of a certain radius. Indeed, as observed in =-=[7]-=-, when social distance increases, the closeness of relationships decreases and the relationships may become irrelevant. Hence it often suffices in practice to consider only those matches of a pattern ... |

18 | On the approximability of the maximum common subgraph problem
- Kann
- 1992
(Show Context)
Citation Context ...ulation algorithm of [24], denoted by Sim, (3) the approximate matching algorithm TALE of [32], and (4) an approximate matching algorithm, denoted by MCS, that utilizes the approximation algorithm of =-=[25]-=- for computing maximum common subgraphs. We used the VF2 algorithm [12] for subgraph isomorphism in the igraph package [14]. Consider pattern graph Q(Vq,Eq) and data graph G(V , E). For approximate ma... |

17 | Distributed query evaluation with performance guarantees
- Cong, Fan, et al.
- 2007
(Show Context)
Citation Context ..., graph simulation [8] and graph pattern queries [18]. This work explores it for pattern matching via strong simulation. Distributed query processing has been studied for relational data [26] and XML =-=[9, 11]-=-. There has also been recent work on distributed graph processing to manage large-scale graphs [15, 21, 28]. However, to the best of our knowledge, no previous work has studied distributed computation... |

16 |
Detecting social positions using simulation
- Brynielsson, Högberg, et al.
- 2010
(Show Context)
Citation Context ...f (f(u),f(u ′ )) is an edge in Gs. However, subgraph isomorphism is an np-complete problem [34]. Moreover, there are possibly exponential many subgraphs in G that match Q. In addition, as observed in =-=[6, 19]-=-, it is often too restrictive to catch sensible matches, as it requires matches to have exactly the same topology as a pattern graph. These hinder its applicability in emerging applications such as so... |

10 | The subgraph bisimulation problem
- Dovier, Piazza
(Show Context)
Citation Context ... than simulation. Indeed, it is a notion stronger than simulation but weaker than isomorphism. However, pattern matching via bisimulation becomes intractable. Indeed, subgraph bisimulation is np-hard =-=[17]-=-, although graph bisimulation is solvable in ptime [29]. In contrast, subgraph simulation is equivalent to graph simulation, i.e., checking whether there exists a subgraph Gs of G such that Q ≺ Gs is ... |

7 | Minimization of tree pattern queries with constraints
- Chen, Chan
- 2008
(Show Context)
Citation Context ...and no previous work there has studied how simulation should be refined to capture topology. Query minimization, as a classical optimization technique, has been well studied for sql [3], XPath (e.g., =-=[10]-=-), graph simulation [8] and graph pattern queries [18]. This work explores it for pattern matching via strong simulation. Distributed query processing has been studied for relational data [26] and XML... |

4 | From polynomial time queries to graph structure theory
- Grohe
- 2010
(Show Context)
Citation Context ...rn matching that preserves maximum graph topology, and characterize ptime along the same lines as how Fagin’s theorem characterizes np [30]. This is, however, very challenging. Indeed, as observed in =-=[23]-=-, in graph theory Fagin’s theorem implies that “if no logic captures ptime, thenptime ̸= np”. Below we present two negative results: extending strong simulation makes its computation from ptime to np-... |

3 |
Massive graph management for the web and web 2.0
- Giatsoglou, Papadopoulos, et al.
- 2011
(Show Context)
Citation Context ...in distributed graphs. To the best of our knowledge, this is among the first distributed algorithms for graph pattern matching, for which the need is evident when processing massive graphs (see e.g., =-=[15, 21, 28]-=-). (4) Using both real-life data (Amazon and YouTube) and synthetic data, we conduct an extensive experimental study (Section 5). We find that our algorithms for strong simulation scale well with larg... |

2 |
Distributed XML processing: Theory and applications
- Cavendish, Candan
(Show Context)
Citation Context ..., graph simulation [8] and graph pattern queries [18]. This work explores it for pattern matching via strong simulation. Distributed query processing has been studied for relational data [26] and XML =-=[9, 11]-=-. There has also been recent work on distributed graph processing to manage large-scale graphs [15, 21, 28]. However, to the best of our knowledge, no previous work has studied distributed computation... |