Results 1  10
of
32
Fgindex: towards verificationfree query processing on graph databases
 in SIGMOD, 2007
"... Graphs are prevalently used to model the relationships between objects in various domains. With the increasing usage of graph databases, it has become more and more demanding to efficiently process graph queries. Querying graph databases is costly since it involves subgraph isomorphism testing, whic ..."
Abstract

Cited by 77 (10 self)
 Add to MetaCart
(Show Context)
Graphs are prevalently used to model the relationships between objects in various domains. With the increasing usage of graph databases, it has become more and more demanding to efficiently process graph queries. Querying graph databases is costly since it involves subgraph isomorphism testing, which is an NPcomplete problem. In recent years, some effective graph indexes have been proposed to first obtain a candidate answer set by filtering part of the false results and then perform verification on each candidate by checking subgraph isomorphism. Query performance is improved since the number of subgraph isomorphism tests is reduced. However, candidate verification is still inevitable, which can be expensive when the size of the candidate answer set is large. In this paper, we propose a novel indexing technique that constructs a nested invertedindex, called FGindex, based on the set of Frequent subGraphs (FGs). Given a graph query that is an FG in the database, FGindex returns the exact set of query answers without performing candidate verification. When the query is an infrequent graph, FGindex produces a candidate answer set which is close to the exact answer set. Since an infrequent graph means the graph occurs in only a small number of graphs in the database, the number of subgraph isomorphism tests is small. To ensure that the index fits into the main memory, we propose a new notion of δTolerance Closed Frequent Graphs (δTCFGs), which allows us to flexibly tune the size of the index in a parameterized way. Our extensive experiments verify that query processing using FGindex is orders of magnitude more efficient than using the stateoftheart graph index.
Graph database indexing using structured graph decomposition
 In ICDE
, 2007
"... We introduce a novel method of indexing graph databases in order to facilitate subgraph isomorphism and similarity queries. The index is comprised of two major data structures. The primary structure is a directed acyclic graph which contains a node for each of the unique, induced subgraphs of the da ..."
Abstract

Cited by 57 (5 self)
 Add to MetaCart
(Show Context)
We introduce a novel method of indexing graph databases in order to facilitate subgraph isomorphism and similarity queries. The index is comprised of two major data structures. The primary structure is a directed acyclic graph which contains a node for each of the unique, induced subgraphs of the database graphs. The secondary structure is a hash table which crossindexes each subgraph for fast isomorphic lookup. In order to create a hash key independent of isomorphism, we utilize a codebased canonical representation of adjacency matrices, which we have further refined to improve computation speed. We validate the concept by demonstrating its effectiveness in answering queries for two practical datasets. Our experiments show that for subgraph isomorphism queries, our method outperforms existing methods by more than an order of magnitude. 1.
On Querying Historical Evolving Graph Sequences
"... In many applications, information is best represented as graphs. In a dynamic world, information changes and so the graphs representing the information evolve with time. We propose that historical graphstructured data be maintained for analytical processing. We call a historical evolving graph sequ ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
(Show Context)
In many applications, information is best represented as graphs. In a dynamic world, information changes and so the graphs representing the information evolve with time. We propose that historical graphstructured data be maintained for analytical processing. We call a historical evolving graph sequence an EGS. We observe that in many applications, graphs of an EGS are large and numerous, and they often exhibit much redundancy among them. We study the problem of efficient query processing on an EGS and put forward a solution framework called FVF. Through extensive experiments on both real and synthetic datasets, we show that our FVF framework is highly efficient in EGS query processing. 1.
Correlation Search in Graph Databases
 KDD'07
, 2007
"... Correlation mining has gained great success in many application domains for its ability to capture the underlying dependency between objects. However, the research of correlation mining from graph databases is still lacking despite the fact that graph data, especially in various scientific domains, ..."
Abstract

Cited by 20 (7 self)
 Add to MetaCart
Correlation mining has gained great success in many application domains for its ability to capture the underlying dependency between objects. However, the research of correlation mining from graph databases is still lacking despite the fact that graph data, especially in various scientific domains, proliferate in recent years. In this paper, we propose a new problem of correlation mining from graph databases, called Correlated Graph Search (CGS). CGS adopts Pearson’s correlation coefficient as a correlation measure to take into consideration the occurrence distributions of graphs. However, the problem poses significant challenges, since every subgraph of a graph in the database is a candidate but the number of subgraphs is exponential. We derive two necessary conditions which set bounds on the occurrence probability of a candidate in the database. With this result, we design an efficient algorithm that operates on a much smaller projected database and thus we are able to obtain a significantly smaller set of candidates. To further improve the efficiency, we develop three heuristic rules and apply them on the candidate set to further reduce the search space. Our extensive experiments demonstrate the effectiveness of our method on candidate reduction. The results also justify the efficiency of our algorithm in mining correlations from large real and synthetic datasets.
SIGMA: A SETCOVERBASED INEXACT GRAPH MATCHING ALGORITHM
, 2010
"... Network querying is a growing domain with vast applications ranging from screening compounds against a database of known molecules to matching subnetworks across species. Graph indexing is a powerful method for searching a large database of graphs. Most graph indexing methods to date tackle the exa ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
Network querying is a growing domain with vast applications ranging from screening compounds against a database of known molecules to matching subnetworks across species. Graph indexing is a powerful method for searching a large database of graphs. Most graph indexing methods to date tackle the exact matching (isomorphism) problem, limiting their applicability to specific instances in which such matches exist. Here we provide a novel graph indexing method to cope with the more general, inexact matching problem. Our method, SIGMA, builds on approximating a variant of the setcover problem that concerns overlapping multisets. We extensively test our method and compare it to a baseline method and to the stateoftheart Grafil. We show that SIGMA outperforms both, providing higher pruning power in all the tested scenarios.
GConnect: A Connectivity Index for Massive DiskResident Graphs
"... The problem of connectivity is an extremely important one in the context of massive graphs. In many large communication networks, social networks and other graphs, it is desirable to determine the minimumcut between any pair of nodes. The problem is well solved in the classical literature, since it ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
(Show Context)
The problem of connectivity is an extremely important one in the context of massive graphs. In many large communication networks, social networks and other graphs, it is desirable to determine the minimumcut between any pair of nodes. The problem is well solved in the classical literature, since it is related to the maximumflow problem, which is efficiently solvable. However, large graphs may often be disk resident, and such graphs cannot be efficiently processed for connectivity queries. This is because the minimumcut problem is typically solved with the use of a variety of combinatorial and flowbased techniques which require random access to the underlying edges in the graph. In this paper, we propose to develop a connectivity index for massivedisk resident graphs. We will use an edgesampling based approach to create compressed representations of the underlying graphs. Since these compressed representations can be held in main memory, they can be used to derive efficient approximations for the minimumcut problem. These compressed representations are then organized into a diskresident index structure. We present experimental results which show that the resulting approach provides between two and three orders of magnitude more efficient query processing than a diskresident approach at the expense of a small amount of accuracy. 1.
U.: Selecting Materialized Views for RDF Data
 In: Semantic Web Information Management Workshop (SWIM
, 2010
"... Abstract. In the design of a relational database, the administrator has to decide, given a fixed or estimated workload, which indexes should be created. This so called index selection problem is an nontrivial optimization problem in relational databases. In this paper we describe a novel approach f ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Abstract. In the design of a relational database, the administrator has to decide, given a fixed or estimated workload, which indexes should be created. This so called index selection problem is an nontrivial optimization problem in relational databases. In this paper we describe a novel approach for index selection on RDF data sets. We propose an algorithm to automatically suggest a set of indexes as materialized views based on a workload of SPARQL queries. The selected set of indexes aims to decrease the cost of the workload. We provide a cost model to estimate the potential impact of candidate indexes on query performance and an algorithm to select an optimal set of indexes. This algorithm may be integrated into an existing SPARQL query engine. We experimentally evaluate our approach on a standard query processor. We claim that our approach is the first comprehensive suggestion for the index selection problem in RDF.
Dual active feature and sample selection for graph classification
 in KDD
, 2011
"... Graph classification has become an important and active research topic in the last decade. Current research on graph classification focuses on mining discriminative subgraph features under supervised settings. The basic assumption is that a large number of labeled graphs are available. However, labe ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
(Show Context)
Graph classification has become an important and active research topic in the last decade. Current research on graph classification focuses on mining discriminative subgraph features under supervised settings. The basic assumption is that a large number of labeled graphs are available. However, labeling graph data is quite expensive and time consuming for many realworld applications. In order to reduce the labeling cost for graph data, we address the problem of how to select the most important graph to query for the label. This problem is challenging and different from conventional active learning problems because there is no predefined feature vector. Moreover, the subgraph enumeration problem is NPhard. The active sample selection problem and the feature selection problem are correlated for graph data. Before we can solve the active sample selection problem, we need to find a set of optimal subgraph features. To address this challenge, we demonstrate how one can simultaneously estimate the usefulness of a query graph and a set of subgraph features. The idea is to maximize the dependency between subgraph features and graph labels using an active learning framework. We propose a branchandbound algorithm to search for the optimal query graph and optimal features simultaneously. Empirical studies on nine realworld tasks demonstrate that the proposed method can obtain better accuracy on graph data than alternative approaches.
LTS: Discriminative Subgraph Mining by Learning from Search History
"... Abstract — Discriminative subgraphs can be used to characterize complex graphs, construct graph classifiers and generate graph indices. The search space for discriminative subgraphs is usually prohibitively large. Most measurements of interestingness of discriminative subgraphs are neither monotonic ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
Abstract — Discriminative subgraphs can be used to characterize complex graphs, construct graph classifiers and generate graph indices. The search space for discriminative subgraphs is usually prohibitively large. Most measurements of interestingness of discriminative subgraphs are neither monotonic nor antimonotonic with respect to subgraph frequencies. Therefore, branchandbound algorithms are unable to mine discriminative subgraphs efficiently. We discover that search history of discriminative subgraph mining is very useful in computing empirical upperbounds of discrimination scores of subgraphs. We propose a novel discriminative subgraph mining method, LTS (Learning To Search), which begins with a greedy algorithm that first samples the search space through subgraph probing and then explores the search space in a branch and bound fashion leveraging the search history of these samples. Extensive experiments have been performed to analyze the gain in performance by taking into account search history and to demonstrate that LTS can significantly improve performance compared with the stateoftheart discriminative subgraph mining algorithms. I.
Efficient Query Processing on Graph Databases
 ACM Transactions on Database Systems
, 2008
"... We study the problem of processing subgraph queries on a database that consists of a set of graphs. The answer to a subgraph query is the set of graphs in the database that are supergraphs of the query. In this article, we propose an efficient index, FG*index, to solve this problem. The cost of pro ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
We study the problem of processing subgraph queries on a database that consists of a set of graphs. The answer to a subgraph query is the set of graphs in the database that are supergraphs of the query. In this article, we propose an efficient index, FG*index, to solve this problem. The cost of processing a subgraph query using most existing indexes mainly consists of two parts, the index probing cost and the candidate verification cost. Index probing is to find the query in the index, or to find the graphs from which we can generate a candidate answer set for the query. Candidate verification is to test whether each graph in the candidate set is indeed a supergraph of the query. We design FG*index to minimize these two costs as follows. FG*index consists of three components: the FGindex, the featureindex, and the FAQindex. First, the FGindex employs the concept of Frequent subGraph (FG) to allow the set of queries that are FGs to be answered without candidate verification. We call this set of queries FGqueries. We can enlarge the set of FGqueries so that more queries can be answered without candidate verification; however, a larger set of FGqueries implies a larger FGindex and hence the index probing cost also increases. We propose the featureindex to reduce the index probing cost. The featureindex uses features to filter false results that are matched in the FGindex, so that we can