Results 1  10
of
53
Graphsatatime: Query Language and Access Methods for Graph Databases
, 2008
"... With the prevalence of graph data in a variety of domains, there is an increasing need for a language to query and manipulate graphs with heterogeneous attributes and structures. We propose a query language for graph databases that supports arbitrary attributes on nodes, edges, and graphs. In this l ..."
Abstract

Cited by 70 (0 self)
 Add to MetaCart
(Show Context)
With the prevalence of graph data in a variety of domains, there is an increasing need for a language to query and manipulate graphs with heterogeneous attributes and structures. We propose a query language for graph databases that supports arbitrary attributes on nodes, edges, and graphs. In this language, graphs are the basic unit of information and each query manipulates one or more collections of graphs. To allow for flexible compositions of graph structures, we extend the notion of formal languages from strings to the graph domain. We present a graph algebra extended from the relational algebra in which the selection operator is generalized to graph pattern matching and a composition operator is introduced for rewriting matched graphs. Then, we investigate access methods of the selection operator. Pattern matching over large graphs is challenging due to the NPcompleteness of subgraph isomorphism. We address this by a combination of techniques: use of neighborhood subgraphs and profiles, joint reduction of the search space, and optimization of the search order. Experimental results on real and synthetic large graphs demonstrate that our graph specific optimizations outperform an SQLbased implementation by orders of magnitude.
Taming Verification Hardness: An Efficient Algorithm for Testing Subgraph Isomorphism
"... Graphs are widely used to model complicated data semantics in many applications. In this paper, we aim to develop efficient techniques to retrieve graphs, containing a given query graph, from a large set of graphs. Considering the problem of testing subgraph isomorphism is generally NPhard, most of ..."
Abstract

Cited by 47 (8 self)
 Add to MetaCart
(Show Context)
Graphs are widely used to model complicated data semantics in many applications. In this paper, we aim to develop efficient techniques to retrieve graphs, containing a given query graph, from a large set of graphs. Considering the problem of testing subgraph isomorphism is generally NPhard, most of the existing techniques are based on the framework of filteringandverification to reduce the precise computation costs; consequently various novel featurebased indexes have been developed. While the existing techniques work well for small query graphs, the verification phase becomes a bottleneck when the query graph size increases. Motivated by this, in the paper we firstly propose a novel and efficient algorithm for testing subgraph isomorphism, QuickSI. Secondly, we develop a new featurebased index technique to accommodate QuickSI in the filtering phase. Our extensive experiments on real and synthetic data demonstrate the efficiency and scalability of the proposed techniques, which significantly improve the existing techniques. 1.
On Graph Query Optimization in Large Networks
"... The dramatic proliferation of sophisticated networks has resulted in a growing need for supporting effective querying and mining methods over such largescale graphstructured data. At the core of many advanced network operations lies a common and critical graph query primitive: how to search graph ..."
Abstract

Cited by 32 (3 self)
 Add to MetaCart
The dramatic proliferation of sophisticated networks has resulted in a growing need for supporting effective querying and mining methods over such largescale graphstructured data. At the core of many advanced network operations lies a common and critical graph query primitive: how to search graph structures efficiently within a large network? Unfortunately, the graph query is hard due to the NPcomplete nature of subgraph isomorphism. It becomes even challenging when the network examined is large and diverse. In this paper, we present a high performance graph indexing mechanism, SPath, to address the graph query problem on large networks. SPath leverages decomposed shortest paths around vertex neighborhood as basic indexing units, which prove to be both effective in graph search space pruning and highly scalable in index construction and deployment. Via SPath, a graph query is processed and optimized beyond the traditional vertexatatime fashion to a more efficient pathatatime way: the query is first decomposed to a set of shortest paths, among which a subset of candidates with good selectivity is picked by a query plan optimizer; Candidate paths are further joined together to help recover the query graph to finalize the graph query processing. We evaluate SPath with the stateoftheart GraphQL on both real and synthetic data sets. Our experimental studies demonstrate the effectiveness and scalability of SPath, which proves to be a more practical and efficient indexing method in addressing graph queries on large networks. 1.
A novel spectral coding in a large graph database
 In Proceedings of the International Conference on Extending Database Technology
, 2008
"... Retrieving related graphs containing a query graph from a large graph database is a key issue in many graphbased applications, such as drug discovery and structural pattern recognition. Because subgraph isomorphism is a NPcomplete problem [4], we have to employ a filterandverification framework ..."
Abstract

Cited by 30 (2 self)
 Add to MetaCart
(Show Context)
Retrieving related graphs containing a query graph from a large graph database is a key issue in many graphbased applications, such as drug discovery and structural pattern recognition. Because subgraph isomorphism is a NPcomplete problem [4], we have to employ a filterandverification framework to speed up the search efficiency, that is, using an effective and efficient pruning strategy to filter out the false positives (graphs that are not possible in the results) as many as possible first, then validating the remaining candidates by subgraph isomorphism checking. In this paper, we propose a novel filtering method, a spectral encoding method, i.e. GCoding. Specifically, we assign a signature to each vertex based on its local structures. Then, we generate a spectral graph code by combining all vertex signatures in a graph. Based on spectral graph codes, we derive a necessary condition for subgraph isomorphism. Then we propose two pruning rules for subgraph search problem, and prove that they satisfy the nofalsenegative requirement (no dismissal in answers). Since graph codes are in numerical space, we take this advantage and conduct efficient filtering over graph codes. Extensive experiments show that GCoding outperforms existing counterpart methods. 1.
SAPPER: Subgraph Indexing and Approximate Matching in Large Graphs
"... With the emergence of new applications, e.g., computational biology, new software engineering techniques, social networks, etc., more data is in the form of graphs. Locating occurrences of a query graph in a large database graph is an important research topic. Due to the existence of noise (e.g., mi ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
With the emergence of new applications, e.g., computational biology, new software engineering techniques, social networks, etc., more data is in the form of graphs. Locating occurrences of a query graph in a large database graph is an important research topic. Due to the existence of noise (e.g., missing edges) in the large database graph, we investigate the problem of approximate subgraph indexing, i.e., finding the occurrences of a query graph in a large database graph with (possible) missing edges. The SAPPER method is proposed to solve this problem. Utilizing the hybrid neighborhood unit structures in the index, SAPPER takes advantage of pregenerated random spanning trees and a carefully designed graph enumeration order. Real and synthetic data sets are employed to demonstrate the efficiency and scalability of our approximate subgraph indexing method.
GSPARQL: A Hybrid Engine for Querying Large Attributed Graphs
 In CIKM
, 2012
"... We propose a SPARQLlike language, GSPARQL, for querying attributed graphs. The language expresses types of queries which of large interest for applications which model their data as large graphs such as: pattern matching, reachability and shortest path queries. Each query can combine both of struc ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
(Show Context)
We propose a SPARQLlike language, GSPARQL, for querying attributed graphs. The language expresses types of queries which of large interest for applications which model their data as large graphs such as: pattern matching, reachability and shortest path queries. Each query can combine both of structural predicates and valuebased predicates (on the attributes of the graph nodes and edges). We describe an algebraic compilation mechanism for our proposed query language which is extended from the relational algebra and based on the basic construct of building SPARQL queries, the Triple Pattern. We describe a hybrid Memory/Disk representation of large attributed graphs where only the topology of the graph is maintained in memory while the data of the graph is stored in a relational database. The execution engine of our proposed query language splits parts of the query plan to be pushed inside the relational database while the execution of other parts of the query plan are processed using memorybased algorithms, as necessary. Experimental results on real datasets demonstrate the efficiency and the scalability of our approach and show that our approach outperforms native graph databases by several factors.
SIGMA: A SETCOVERBASED INEXACT GRAPH MATCHING ALGORITHM
, 2010
"... Network querying is a growing domain with vast applications ranging from screening compounds against a database of known molecules to matching subnetworks across species. Graph indexing is a powerful method for searching a large database of graphs. Most graph indexing methods to date tackle the exa ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
Network querying is a growing domain with vast applications ranging from screening compounds against a database of known molecules to matching subnetworks across species. Graph indexing is a powerful method for searching a large database of graphs. Most graph indexing methods to date tackle the exact matching (isomorphism) problem, limiting their applicability to specific instances in which such matches exist. Here we provide a novel graph indexing method to cope with the more general, inexact matching problem. Our method, SIGMA, builds on approximating a variant of the setcover problem that concerns overlapping multisets. We extensively test our method and compare it to a baseline method and to the stateoftheart Grafil. We show that SIGMA outperforms both, providing higher pruning power in all the tested scenarios.
A novel approach for efficient supergraph query processing on graph databases
 In EDBT
"... In recent years, large amount of data modeled by graphs, namely graph data, have been collected in various domains. Efficiently processing queries on graph databases has attracted a lot of research attentions. Supergraph query is a kind of new and important queries in practice. A supergraph query, q ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
In recent years, large amount of data modeled by graphs, namely graph data, have been collected in various domains. Efficiently processing queries on graph databases has attracted a lot of research attentions. Supergraph query is a kind of new and important queries in practice. A supergraph query, q, on a graph database D is to retrieve all graphs in D such that q is a supergraph of them. Because the number of graphs in databases is large and subgraph isomorphism testing is NPcomplete, efficiently processing such queries is a big challenge. This paper first proposes an optimal compact method for organizing graph databases. Common subgraphs of the graphs in a database are stored only once in the compact organization of the database, in order to reduce the overall cost of subgraph isomorphism testings from stored graphs to queries during query processing. Then, an exact algorithm and an approximate algorithm for generating significant feature set with optimal order are proposed to construct indices on graph databases. The optimal order on the feature set is to reduce the number of subgraph isomorphism testings during query processing. Based on the compact organization of graph databases, a novel algorithm of testing subgraph isomorphisms from multiple graphs to one graph is presented. Finally, based on all these techniques, a query processing method is proposed. Analytical and experimental results show that the proposed algorithms outperform the existing similar algorithms by one to two orders of magnitude. 1.
Connected Substructure Similarity Search
"... Substructure similarity search is to retrieve graphs that approximately contain a given query graph. It has many applications, e.g., detecting similar functions among chemical compounds. The problem is challenging as even testing subgraph containment between two graphs is NPcomplete. Hence, existin ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
(Show Context)
Substructure similarity search is to retrieve graphs that approximately contain a given query graph. It has many applications, e.g., detecting similar functions among chemical compounds. The problem is challenging as even testing subgraph containment between two graphs is NPcomplete. Hence, existing techniques adopt the filteringandverification framework with the focus on developing effective and efficient techniques to remove nonpromising graphs. Nevertheless, existing filtering techniques may be still unable to effectively remove many “low ” quality candidates. To resolve this, in this paper we propose a novel indexing technique, GrafDIndex, to index graphs according to their “distances ” to features. We characterize a tight condition under which the distancebased triangular inequality holds. We then develop lower and upper bounding techniques that exploit the GrafDIndex to (1) prune nonpromising graphs and (2) include graphs whose similarities are guaranteed to exceed the given similarity threshold. Considering that the verification phase is not well studied and plays the dominant role in the whole process, we devise efficient algorithms to verify candidates. A comprehensive experiment using real datasets demonstrates that our proposed methods significantly outperform existing methods.
The Centrality of
, 1992
"... This Article is brought to you for free and open access by the Biochemistry, Department of at DigitalCommons@University of Nebraska Lincoln. It ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
This Article is brought to you for free and open access by the Biochemistry, Department of at DigitalCommons@University of Nebraska Lincoln. It