#### DMCA

## Efficient Subgraph Similarity Search on Large Probabilistic Graph Databases

Citations: | 5 - 1 self |

### Citations

13830 |
Computers and Intractability : A Guide to the Theory of NP-Completeness
- Garey, Johnson
- 1979
(Show Context)
Citation Context ...er. This probabilistic graph model is a special case of the probabilistic graph defined in Definition 2. We prove the theorem by reducing an arbitrary instance of the #P-complete DNF counting problem =-=[13]-=- to an instance of the problem of computing Pr(q ⊆sim g) in polynomial time. Figure 3 illustrates an reduction for the DNF formula F = (y1∧y2)∨(y1∧y2∧y3)∨(y2∧y3). In the figure, the graph distance bet... |

878 | The link-prediction problem for social networks
- Liben-Nowell, Kleinberg
- 2007
(Show Context)
Citation Context ...27th - 31st 2012, Istanbul, Turkey. Proceedings of the VLDB Endowment, Vol. 5, No. 9 Copyright 2012 VLDB Endowment 2150-8097/12/05... $ 10.00. degree of influence or trust between two social entities =-=[2, 25, 14]-=-. In a RDF graph, uncertainties/ inconsistencies are introduced in data integration where various data sources are integrated into RDF graphs [18, 24]. To model the uncertain graph data, a probabilist... |

518 |
Probability and computing: randomized algorithms and probabilistic analysis
- Mitzenmacher, Upfal
- 2005
(Show Context)
Citation Context ...+ ··· + (−1) i Pr(BLj) − ∑ 0≤j1 <j2≤δ−d ∑ Pr(BLj1 ∧ ... ∧ BLj ) i 0≤j 1 <...<j i ≤δ−d (7) Pr(BLj 1 ∧ BLj 2 ) + ··· + (−1) δ−d Pr(BLj1 ∧ ... ∧ BLj ). δ−d (8) Based on the Inclusion-Exclusion Principle =-=[26]-=-, the RHS of Equation 8 is Pr(BL0 ∨ ... ∨ BLδ−d). Clearly, BL0 ⊆ ... ⊆ BLδ−d, then Pr(BL0∨...∨BLδ−d) = Pr(BLδ−d) = Pr(Brq1∨...∨Brqa) Lemma 1 gives a method to compute SSP. Intuitively, the probability... |

447 | ExOR: Opportunistic MultiHop Routing for Wireless Networks
- Biswas, Morris
- 2005
(Show Context)
Citation Context ...specially with high dependence of interactions at the same proteins. Given another example, in communication networks or road networks, an edge probability is used to quantify the reliability of link =-=[8]-=- or the degree of traffic jam [16]. Obviously, there are correlations for the routing paths in these networks [16], i.e., a busy traffic path often blocking traffics in nearby paths. Therefore, it is ... |

428 | Propagation of trust and distrust
- Guha, Kumar, et al.
- 2004
(Show Context)
Citation Context ...27th - 31st 2012, Istanbul, Turkey. Proceedings of the VLDB Endowment, Vol. 5, No. 9 Copyright 2012 VLDB Endowment 2150-8097/12/05... $ 10.00. degree of influence or trust between two social entities =-=[2, 25, 14]-=-. In a RDF graph, uncertainties/ inconsistencies are introduced in data integration where various data sources are integrated into RDF graphs [18, 24]. To model the uncertain graph data, a probabilist... |

251 | CloseGraphs: Mining Closed Frequent Graph Patterns
- Yan, Han
(Show Context)
Citation Context ...Bf|Ef|). (10) According to Theorem 2, it is not difficult to see that calculating the exactPr(f ⊆iso g) is NP-complete. Thus we rewrite Equation 10 as follows 6 In this paper, we use the algorithm in =-=[36]-=- to compute embeddings of a feature ing c Pr(f ⊆iso g) = Pr(Bf1 ∨ ... ∨ Bf |Ef|) = 1 − Pr(Bf1 ∧ ... ∧ Bf |Ef|) ≥ 1 − Pr(Bf1 ∧ ... ∧ Bf |IN| | Bf |IN|+1 ∧ ... ∧ Bf |Ef|). whereIN = {Bf1,...,Bf |IN|} ⊆ ... |

195 | Graph Indexing: A Frequent Structure-based approach
- Yan, Yu, et al.
(Show Context)
Citation Context ...quency threshold β, a feature f is frequent iff frq(f) ≥ β. Thus we would like to index a frequent feature. To achieve rule 2, we control a feature size used in Algorithm 4. To control feature number =-=[37, 29]-=-, we also define the discriminative measure as: dis(f) = | ∩ {Df′|f ′ ⊆isof}| , where Df is the list probabilistic graphs g s.t. |Df| f ⊆iso g c . Given a discriminative threshold γ, a feature f is di... |

186 | Scalable semantic web data management using vertical partitioning
- Abadi, Marcus, et al.
(Show Context)
Citation Context ...per, we study subgraph similarity search over probabilistic graphs due to wide usage of subgraph similarity search in many application fields, such as answering SPARQL query (graph) in RDF graph data =-=[18, 1]-=-, predicting complex biological interactions (graphs) [33, 9], and identifying vehicle routings (graphs) in road networks [8, 16]. In the following, we give the details about subgraph similarity searc... |

179 |
A (sub)graph isomorphism algorithm for matching large graphs
- Cordella, Foggia, et al.
- 2004
(Show Context)
Citation Context ... Obtain tightestUsim(q) Note that the pruning process needs to address the traditional subgraph isomorphism problem (rq ⊆iso f or rq ⊇iso f). In our work, we implement the state-of-the-art method VF2 =-=[10]-=-. 3.2 Obtain Tightest Bounds of subgraph similarity probability In pruning conditions, for eachrqi (1 ≤ i ≤ a), we find only one pair feature {f 1 i ,f 2 i }, among |F| features, such that rqi ⊇iso f ... |

174 | Inference in belief networks: A procedural guide
- Huang, Darwiche
- 1996
(Show Context)
Citation Context .... Assume m Bfs have x1,...,xk Boolean variables for uncertain edges. Algorithm 5 gives detailed steps of the sampling algorithm. In this algorithm, we use junction tree algorithm to calculate Pr(Bfi) =-=[17]-=-. Algorithm 5 Calculate Pr(q ⊆sim g) 1: Cnt = 0, V = ∑m i=1 Pr(Bfi); 2: N = (4ln2/ξ)/τ 2 ; 3: for 1 to N do 4: randomly choosei ∈ {1,...,m} with probability Pr(Bfi)/V ; 5: randomly choose x1,..,xk (ac... |

127 |
Gaining confidence in high-throughput protein interaction networks." Nat Biotechnol 22(1
- Bader, Chaudhuri
- 2004
(Show Context)
Citation Context ...cy-preserving mechanisms, uncertainties are often introduced in the graph data. For example, in a proteinprotein interaction (PPI) network, the pairwise interaction is derived from statistical models =-=[5, 6, 20]-=-, and the STRING database (http://string-db.org) is such a public data source that contains PPIs with uncertain edges provided by statistical predications. In a social network, probabilities can be as... |

107 |
Management of probabilistic data: foundations and challenges
- Dalvi, Suciu
- 2007
(Show Context)
Citation Context ...S query retrieves all graphs g ∈ D such that the subgraph similarity probability (SSP) between q and g is at least ϵ. We will formally define SSP later (Def 9). We employ the possible world semantics =-=[31, 11]-=-, which has been widely used for modeling probabilistic databases, to explain the meaning of returned results for subgraph similarity search. A possible world graph (PWG) of a probabilistic graph is a... |

88 | Closure-tree: an index structure for graph queries
- He, Singh
- 2006
(Show Context)
Citation Context ...ter composition strategy to prune large number of graphs directly without performing pairwise similarity computation, which makes [38] more efficient compared to other graph similar search algorithms =-=[15, 41]-=-. Assume the result isSC c q = {g c |q ⊆sim g c ,g c ∈ D c }. Then, its corresponding probabilistic graph set,SCq = {g|g c ∈ SC c q}, is the input for uncertain subgraph similar matching in the next s... |

87 | Substructure similarity search in graph databases
- Yan, Yu, et al.
(Show Context)
Citation Context ...to g. Based on this observation, given D and q, we can prune the databaseD c = {g c 1,...,g c n} using conventional deterministic graph similar matching methods. In this paper, we adopt the method in =-=[38]-=- to quickly compute results. [38] uses a multi-filter composition strategy to prune large number of graphs directly without performing pairwise similarity computation, which makes [38] more efficient ... |

70 | Managing and Mining Graph Data
- Aggarwal, Wang
- 2010
(Show Context)
Citation Context ...udy similarity search over uncertain graphs, which is related to uncertain and graph data management. Readers who are interested in general uncertain and graph data management please refer to [3] and =-=[4]-=- respectively. Total query time (second) 1000 100 10 PMI Exact 1 2k 4k 6k 8k 10k Database size Figure 13: Total query processing time. % 100 80 60 40 COR-Precision IND-Precision COR-Recall IND-Recall ... |

65 | Predicting protein complex membership using probabilistic network reliability." Genome Res 14(6
- Asthana, King
- 2004
(Show Context)
Citation Context ...cy-preserving mechanisms, uncertainties are often introduced in the graph data. For example, in a proteinprotein interaction (PPI) network, the pairwise interaction is derived from statistical models =-=[5, 6, 20]-=-, and the STRING database (http://string-db.org) is such a public data source that contains PPIs with uncertain edges provided by statistical predications. In a social network, probabilities can be as... |

57 |
The polynomial solvability of convex quadratic programming
- Kozlov, Tarasov, et al.
- 1980
(Show Context)
Citation Context ... take values within [0,1], i.e., xs i ∈ [0,1]. Then the equation becomes a standard quadratic programming (QP). Clearly, this QP is convex, and there is an efficient solution to solve the programming =-=[23]-=-. Since all feasible solutions for Equation 9 are also feasible solutions for the relaxed quadratic programming, the maximum value QP(I) computed by the relaxed QP provides an upper bound for the valu... |

56 | Graph database indexing using structured graph decomposition
- Williams, Huan, et al.
- 2007
(Show Context)
Citation Context ...ing graphs hierarchically in a tree, to support k-NN search to the query graph. Jiang et al [19] encoded graphs into strings and converted graph similarity search into string matching. Williams et al =-=[35]-=- aimed to find graphs with the minimum number of miss-matchings of vertex and edge labels bounded by a given threshold. Zeng et al [41] proposed tight bounds of graph edit-distance to filter out false... |

47 | Taming verification hardness: an efficient algorithm for testing subgraph isomorphism, The
- Shang, Zhang, et al.
- 2008
(Show Context)
Citation Context ...quency threshold β, a feature f is frequent iff frq(f) ≥ β. Thus we would like to index a frequent feature. To achieve rule 2, we control a feature size used in Algorithm 4. To control feature number =-=[37, 29]-=-, we also define the discriminative measure as: dis(f) = | ∩ {Df′|f ′ ⊆isof}| , where Df is the list probabilistic graphs g s.t. |Df| f ⊆iso g c . Given a discriminative threshold γ, a feature f is di... |

42 | Gstring: A novel approach for efficient search in graph databases
- Jiang, Wang, et al.
- 2007
(Show Context)
Citation Context ...ing-verification paradigm to process queries. He et al [15] employed an R-tree like index structure, organizing graphs hierarchically in a tree, to support k-NN search to the query graph. Jiang et al =-=[19]-=- encoded graphs into strings and converted graph similarity search into string matching. Williams et al [35] aimed to find graphs with the minimum number of miss-matchings of vertex and edge labels bo... |

40 |
Managing and Mining Uncertain Data
- Aggarwal
- 2009
(Show Context)
Citation Context ...r, we study similarity search over uncertain graphs, which is related to uncertain and graph data management. Readers who are interested in general uncertain and graph data management please refer to =-=[3]-=- and [4] respectively. Total query time (second) 1000 100 10 PMI Exact 1 2k 4k 6k 8k 10k Database size Figure 13: Total query processing time. % 100 80 60 40 COR-Precision IND-Precision COR-Recall IND... |

33 |
A direct comparison of protein interaction confidence assignment schemes
- Suthram, Shlomi, et al.
(Show Context)
Citation Context ...raphs due to wide usage of subgraph similarity search in many application fields, such as answering SPARQL query (graph) in RDF graph data [18, 1], predicting complex biological interactions (graphs) =-=[33, 9]-=-, and identifying vehicle routings (graphs) in road networks [8, 16]. In the following, we give the details about subgraph similarity search, our solutions and contributions. 1 Neighbor edges are the ... |

32 |
Weighted and unweighted maximum clique algorithms with upper bounds from fractional coloring
- Balas, Xue
- 1996
(Show Context)
Citation Context ...ter (larger) the lower bound. To obtain a tight lower bound, we should find a clique whose weight is largest, which is exactly the maximum weight clique problem. Here we use the efficient solution in =-=[7]-=- to solve the maximum clique problem, and the algorithm returns the largest weight z. Therefore, we use 1−e −z as the tightest value for LowerB(f). Example 6. Following Example 5, as shown in Figure 7... |

31 | k-nearest neighbors in uncertain graphs
- Potamias, Bonchi, et al.
- 2010
(Show Context)
Citation Context ...es/ inconsistencies are introduced in data integration where various data sources are integrated into RDF graphs [18, 24]. To model the uncertain graph data, a probabilistic graph model is introduced =-=[27, 43, 21, 18, 24]-=-. In this model, each edge is associated with an edge existence probability to quantify the likelihood that this edge exists in the graph, and edge probabilities are independent of each other. However... |

30 |
Foundations of probabilistic answers to queries
- Suciu, Dalvi
(Show Context)
Citation Context ...S query retrieves all graphs g ∈ D such that the subgraph similarity probability (SSP) between q and g is at least ϵ. We will formally define SSP later (Def 9). We employ the possible world semantics =-=[31, 11]-=-, which has been widely used for modeling probabilistic databases, to explain the meaning of returned results for subgraph similarity search. A possible world graph (PWG) of a probabilistic graph is a... |

30 | Efficient subgraph matching on billion node graphs
- Sun, Wang, et al.
- 2012
(Show Context)
Citation Context ... up query. Shang et al [30] studied super-graph similarity search, and proposes top-down and bottom-up index construction strategy to optimize the performance of query processing. Recently, Sun et al =-=[32]-=- proposed a subgraph matching algorithm on distributed in-memory graphs without using structured index. Another related topic is querying uncertain graphs. Potamias et al [27] studied k-nearest neighb... |

28 |
Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics
- Zou, Gao, et al.
(Show Context)
Citation Context ...udied k-nearest neighbor queries (k-NN) over uncertain graphs, i.e., computing the k closest nodes to a query node. They proposed sampling algorithms to answer the #P-complete k-NN queries. Zou et al =-=[42, 43]-=- studied frequent subgraph mining on uncertain graph data under the probability and expectation semantics respectively. Yuan et al [40] proposed graph feature-based framework to conduct uncertain subg... |

26 |
Mining frequent subgraph patterns from uncertain graph data. Knowledge and Data Engineering
- Zou, Li, et al.
- 2010
(Show Context)
Citation Context ...es/ inconsistencies are introduced in data integration where various data sources are integrated into RDF graphs [18, 24]. To model the uncertain graph data, a probabilistic graph model is introduced =-=[27, 43, 21, 18, 24]-=-. In this model, each edge is associated with an edge existence probability to quantify the likelihood that this edge exists in the graph, and edge probabilities are independent of each other. However... |

25 | Probabilistic Path Queries in Road Networks: Traffic Uncertainty Aware Path Selection
- Hua, Pei
(Show Context)
Citation Context ... interactions at the same proteins. Given another example, in communication networks or road networks, an edge probability is used to quantify the reliability of link [8] or the degree of traffic jam =-=[16]-=-. Obviously, there are correlations for the routing paths in these networks [16], i.e., a busy traffic path often blocking traffics in nearby paths. Therefore, it is necessary for a probabilistic grap... |

25 | Distance-constraint reachability computation
- Jin, Liu, et al.
(Show Context)
Citation Context ...es/ inconsistencies are introduced in data integration where various data sources are integrated into RDF graphs [18, 24]. To model the uncertain graph data, a probabilistic graph model is introduced =-=[27, 43, 21, 18, 24]-=-. In this model, each edge is associated with an edge existence probability to quantify the likelihood that this edge exists in the graph, and edge probabilities are independent of each other. However... |

24 | Comparing stars: On approximating graph edit distance. VLDB
- Zeng, Tung, et al.
- 2009
(Show Context)
Citation Context ...ter composition strategy to prune large number of graphs directly without performing pairwise similarity computation, which makes [38] more efficient compared to other graph similar search algorithms =-=[15, 41]-=-. Assume the result isSC c q = {g c |q ⊆sim g c ,g c ∈ D c }. Then, its corresponding probabilistic graph set,SCq = {g|g c ∈ SC c q}, is the input for uncertain subgraph similar matching in the next s... |

23 |
Efficient algorithm for finding all minimal edge cuts of a non-oriented graph
- Karzanov, Timofeev
- 1986
(Show Context)
Citation Context ...s transformation, we have Theorem 6. The embedding cut set of g c is also the cut set (without edges incident to s and t) fromsto t incG. In this work, we determine embedding cuts using the method in =-=[22]-=-. Example 7. Figure 8 shows the transformation for feature f2 in graph 002 in Figure 1. IncG, we can find cuts{e2,e4},{e1,e3, e4} and{e2,e3} which are clearly the embedding cuts off2 in 002. 4.2 Featu... |

19 | Managing uncertainty in social networks
- Adar, Re
(Show Context)
Citation Context ...27th - 31st 2012, Istanbul, Turkey. Proceedings of the VLDB Endowment, Vol. 5, No. 9 Copyright 2012 VLDB Endowment 2150-8097/12/05... $ 10.00. degree of influence or trust between two social entities =-=[2, 25, 14]-=-. In a RDF graph, uncertainties/ inconsistencies are introduced in data integration where various data sources are integrated into RDF graphs [18, 24]. To model the uncertain graph data, a probabilist... |

13 | Efficient subgraph search over large uncertain graphs
- Yuan, Wang, et al.
(Show Context)
Citation Context ...algorithms to answer the #P-complete k-NN queries. Zou et al [42, 43] studied frequent subgraph mining on uncertain graph data under the probability and expectation semantics respectively. Yuan et al =-=[40]-=- proposed graph feature-based framework to conduct uncertain subgraph graph query. In another work, Yuan et al [39] and Jin et al [21] studed shortest path query and distance-constraint reachability q... |

9 |
Efficiently answering probability threshold-based shortest path queries over uncertain graphs
- Yuan, Chen, et al.
- 2010
(Show Context)
Citation Context ... graph data under the probability and expectation semantics respectively. Yuan et al [40] proposed graph feature-based framework to conduct uncertain subgraph graph query. In another work, Yuan et al =-=[39]-=- and Jin et al [21] studed shortest path query and distance-constraint reachability query in a single uncertain graph. The above works define uncertain graph models with independent edge distributions... |

8 |
Query evaluation on probabilistic rdf databases
- Huang, Liu
- 2009
(Show Context)
Citation Context ...influence or trust between two social entities [2, 25, 14]. In a RDF graph, uncertainties/ inconsistencies are introduced in data integration where various data sources are integrated into RDF graphs =-=[18, 24]-=-. To model the uncertain graph data, a probabilistic graph model is introduced [27, 43, 21, 18, 24]. In this model, each edge is associated with an edge existence probability to quantify the likelihoo... |

6 |
Network motif identification in stochastic networks
- Jiang, Tu, et al.
(Show Context)
Citation Context ...cy-preserving mechanisms, uncertainties are often introduced in the graph data. For example, in a proteinprotein interaction (PPI) network, the pairwise interaction is derived from statistical models =-=[5, 6, 20]-=-, and the STRING database (http://string-db.org) is such a public data source that contains PPIs with uncertain edges provided by statistical predications. In a social network, probabilities can be as... |

6 |
Efficient query answering in probabilistic rdf graphs
- Lian, Chen
- 2011
(Show Context)
Citation Context ...influence or trust between two social entities [2, 25, 14]. In a RDF graph, uncertainties/ inconsistencies are introduced in data integration where various data sources are integrated into RDF graphs =-=[18, 24]-=-. To model the uncertain graph data, a probabilistic graph model is introduced [27, 43, 21, 18, 24]. In this model, each edge is associated with an edge existence probability to quantify the likelihoo... |

5 | Similarity search on supergraph containment
- Shang, Zhu, et al.
- 2010
(Show Context)
Citation Context ...ubgraph similar to g2. Note that, in this definition, subgraph distance only depends on the edge set difference, which is consistent with pervious works on similarity search over deterministic graphs =-=[38, 15, 30]-=-. The operations on an edge consist of edge deletion, relabeling and insertion. Definition 9. (Subgraph Similarity Probability) For a given query graph q, a probabilistic graph g 3 and a subgraph dist... |

4 |
Exploiting indirect neighbours and topological weight to predict protein function from protein-rotein interactions
- Chui, Sung, et al.
(Show Context)
Citation Context ...ies are independent of each other. However, the proposed probabilistic graph model is invalid in many real scenarios. For example, for uncertain protein-protein interaction (PPI) networks, authors in =-=[9, 28]-=- first establish elementary interactions with probabilities between proteins, then use machine learning tools to predict other possible interactions based on the elementary links. The predictive resul... |

4 | An efficient graph indexing method
- Wang, Ding, et al.
- 2012
(Show Context)
Citation Context ...chings of vertex and edge labels bounded by a given threshold. Zeng et al [41] proposed tight bounds of graph edit-distance to filter out false graphs in similarity search, based on which, Wang et al =-=[34]-=- developed an indexing strategy to speed up query. Shang et al [30] studied super-graph similarity search, and proposes top-down and bottom-up index construction strategy to optimize the performance o... |

2 |
Interaction generality: a measurement to assess the reliability of a protein-protein interaction
- Rintaro, Harukazu, et al.
(Show Context)
Citation Context ...ies are independent of each other. However, the proposed probabilistic graph model is invalid in many real scenarios. For example, for uncertain protein-protein interaction (PPI) networks, authors in =-=[9, 28]-=- first establish elementary interactions with probabilities between proteins, then use machine learning tools to predict other possible interactions based on the elementary links. The predictive resul... |

2 |
Aggarwal.Managing and mining uncertain data
- unknown authors
- 2009
(Show Context)
Citation Context ...r, we study similarity search over uncertain graphs, which is related to uncertain and graph data management. Readers who are interested in general uncertain and graph data management please refer to =-=[3]-=- and [4] respectively. 1s10s100s1000s2ks4ks6ks8ks10ksDatabase sizesT ot alsq ue rysti m es(se co nd )sPMIsExactsFigure 13: Total query processing time. 40s60s80s100s0.3s0.4s0.5s0.6s0.7sProbability thr... |

1 |
Approximation algorithms for NP-Hard problems
- H
- 1997
(Show Context)
Citation Context ...problem is NP-complete [13], we use a greedy approach to approximate the tightestUsim(q). Algorithm 1 gives detailed steps. Assume the optimal value is OPT, the approximate value is within OPT ·ln|U| =-=[12]-=-. Algorithm 1 ObtainTightestUsim(q)(U, S) 1: A ← ϕ, Usim(q) = 0; 2: whileAis not a cover ofU do 3: for each s ∈ S, compute γ(s) = w(s) |s − A| ; 4: choose answith minimalγ(s); 5: A ← A ∪ s; 6: Usim(q)... |