Results 1 - 10
of
62
Efficient algorithms for processing XPath queries
- In VLDB
, 2002
"... Our experimental analysis of several popular XPath processors reveals a striking fact: Query evaluation in each of the systems requires time exponential in the size of queries in the worst case. We show that XPath can be processed much more efficiently, and propose main-memory algorithms for this pr ..."
Abstract
-
Cited by 219 (20 self)
- Add to MetaCart
Our experimental analysis of several popular XPath processors reveals a striking fact: Query evaluation in each of the systems requires time exponential in the size of queries in the worst case. We show that XPath can be processed much more efficiently, and propose main-memory algorithms for this problem with polynomial-time combined query evaluation complexity. Moreover, we present two fragments of XPath for which linear-time query processing algorithms exist. 1
Graph Indexing: A Frequent Structure-based Approach
, 2004
"... Graph has become increasingly important in modelling complicated structures and schemaless data such as proteins, chemical compounds, and XML documents. Given a graph query, it is desirable to retrieve graphs quickly from a large database via graph-based indices. In this paper, we investigate the is ..."
Abstract
-
Cited by 80 (12 self)
- Add to MetaCart
Graph has become increasingly important in modelling complicated structures and schemaless data such as proteins, chemical compounds, and XML documents. Given a graph query, it is desirable to retrieve graphs quickly from a large database via graph-based indices. In this paper, we investigate the issues of indexing graphs and propose a novel solution by applying a graph mining technique. Di#erent from the existing path-based methods, our approach, called gIndex, makes use of frequent substructure as the basic indexing feature. Frequent substructures are ideal candidates since they explore the intrinsic characteristics of the data and are relatively stable to database updates. To reduce the size of index structure, two techniques, size-increasing support constraint and discriminative fragments, are introduced. Our performance study shows that gIndex has 10 times smaller index size, but achieves 3-10 times better performance in comparison with a typical path-based method, GraphGrep. The gIndex approach not only provides an elegant solution to the graph indexing problem, but also demonstrates how database indexing and query processing can benefit from data mining, especially frequent pattern mining. Furthermore, the concepts developed here can be applied to indexing sequences, trees, and other complicated structures as well.
Querying rdf data from a graph database perspective
- In Proceedings of the Second European Semantic Web Conference
, 2005
"... Abstract. This paper studies the RDF model from a database perspective. From this point of view it is compared with other database models, particularly with graph database models, which are very close in motivations and use cases to RDF. We concentrate on query languages, analyze current RDF trends, ..."
Abstract
-
Cited by 38 (6 self)
- Add to MetaCart
Abstract. This paper studies the RDF model from a database perspective. From this point of view it is compared with other database models, particularly with graph database models, which are very close in motivations and use cases to RDF. We concentrate on query languages, analyze current RDF trends, and propose the incorporation to RDF query languages of primitives which are not present today, based on the experience and techniques of graph database research. 1
Closure-Tree: An Index Structure for Graph Queries
, 2006
"... Graphs have become popular for modeling structured data. As a result, graph queries are becoming common and graph indexing has come to play an essential role in query processing. We introduce the concept of a graph closure, a generalized graph that represents a number of graphs. Our indexing techniq ..."
Abstract
-
Cited by 32 (1 self)
- Add to MetaCart
Graphs have become popular for modeling structured data. As a result, graph queries are becoming common and graph indexing has come to play an essential role in query processing. We introduce the concept of a graph closure, a generalized graph that represents a number of graphs. Our indexing technique, called Closure-tree, organizes graphs hierarchically where each node summarizes its descendants by a graph closure. Closure-tree can efficiently support both subgraph queries and similarity queries. Subgraph queries find graphs that contain a specific subgraph, whereas similarity queries find graphs that are similar to a query graph. For subgraph queries, we propose a technique called pseudo subgraph isomorphism which approximates subgraph isomorphism with high accuracy. For similarity queries, we measure graph similarity through edit distance using heuristic graph mapping methods. We implement two kinds of similarity queries: K-NN query and range query. Our experiments on chemical compounds and synthetic graphs show that for subgraph queries, Closure-tree outperforms existing techniques by up to two orders of magnitude in terms of candidate answer set size and index size. For similarity queries, our experiments validate the quality and efficiency of the presented algorithms.
Querying and creating visualizations by analogy
- IEEE Transactions on Visualization and Computer Graphics
"... Abstract — While there have been advances in visualization systems, particularly in multi-view visualizations and visual exploration, the process of building visualizations remains a major bottleneck in data exploration. We show that provenance metadata collected during the creation of pipelines can ..."
Abstract
-
Cited by 28 (17 self)
- Add to MetaCart
Abstract — While there have been advances in visualization systems, particularly in multi-view visualizations and visual exploration, the process of building visualizations remains a major bottleneck in data exploration. We show that provenance metadata collected during the creation of pipelines can be reused to suggest similar content in related visualizations and guide semi-automated changes. We introduce the idea of query-by-example in the context of an ensemble of visualizations, and the use of analogies as first-class operations in a system to guide scalable interactions. We describe an implementation of these techniques in VisTrails, a publicly-available, open-source system. Index Terms—visualization systems, query-by-example, analogy 1
A Succinct Physical Storage Scheme for Efficient Evaluation
- of Path Queries in XML. In ICDE’04, pages 54 – 65
, 2004
"... Path expressions are ubiquitous in XML processing languages. Existing approaches evaluate a path expression by selecting nodes that satisfies the tag-name and value constraints and then joining them according to the structural constraints. In this paper, we propose a novel approach, next-of-kin (NoK ..."
Abstract
-
Cited by 27 (12 self)
- Add to MetaCart
Path expressions are ubiquitous in XML processing languages. Existing approaches evaluate a path expression by selecting nodes that satisfies the tag-name and value constraints and then joining them according to the structural constraints. In this paper, we propose a novel approach, next-of-kin (NoK) pattern matching, to speed up the nodeselection step, and to reduce the join size significantly in the second step. To efficiently perform NoK pattern matching, we also propose a succinct XML physical storage scheme that is adaptive to updates and streaming XML as well. Our performance results demonstrate that the proposed storage scheme and path evaluation algorithm is highly efficient and outperforms the other tested systems in most cases. 1.
Fg-index: towards verification-free query processing on graph databases
- in SIGMOD, 2007
"... Graphs are prevalently used to model the relationships between objects in various domains. With the increasing usage of graph databases, it has become more and more demanding to efficiently process graph queries. Querying graph databases is costly since it involves subgraph isomorphism testing, whic ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
Graphs are prevalently used to model the relationships between objects in various domains. With the increasing usage of graph databases, it has become more and more demanding to efficiently process graph queries. Querying graph databases is costly since it involves subgraph isomorphism testing, which is an NP-complete problem. In recent years, some effective graph indexes have been proposed to first obtain a candidate answer set by filtering part of the false results and then perform verification on each candidate by checking subgraph isomorphism. Query performance is improved since the number of subgraph isomorphism tests is reduced. However, candidate verification is still inevitable, which can be expensive when the size of the candidate answer set is large. In this paper, we propose a novel indexing technique that constructs a nested inverted-index, called FG-index, based on the set of Frequent subGraphs (FGs). Given a graph query that is an FG in the database, FG-index returns the exact set of query answers without performing candidate verification. When the query is an infrequent graph, FGindex produces a candidate answer set which is close to the exact answer set. Since an infrequent graph means the graph occurs in only a small number of graphs in the database, the number of subgraph isomorphism tests is small. To ensure that the index fits into the main memory, we propose a new notion of δ-Tolerance Closed Frequent Graphs (δ-TCFGs), which allows us to flexibly tune the size of the index in a parameterized way. Our extensive experiments verify that query processing using FG-index is orders of magnitude more efficient than using the state-of-the-art graph index.
Survey of graph database models
, 2001
"... Graph database models can be characterized as those where data structures for the schema and instances are modeled as graphs or generalizations of them, and data manipulation is expressed by graph-oriented operations and type constructors. These models flourished in the eighties and early nineties i ..."
Abstract
-
Cited by 21 (6 self)
- Add to MetaCart
Graph database models can be characterized as those where data structures for the schema and instances are modeled as graphs or generalizations of them, and data manipulation is expressed by graph-oriented operations and type constructors. These models flourished in the eighties and early nineties in parallel to object oriented models and their influence gradually faded with the emergence of other database models, particularly the geographical, spatial, semistructured and XML. Recently, the need to manage information with inherent graph-like nature has brought back the relevance of the area. In fact, a whole new wave of applications for graph databases emerged with the development of huge networks (e.g. Web, geographical systems, transportation, telephones), and families of networks generated due to the automation of the process of data gathering (e.g. social and biological networks). The main objective of this survey is to present in a single place the work that has been done in the area of graph database modeling, concentrating in data structures, query languages and integrity constraints.
Graph indexing: Tree + delta >= graph
- In VLDB
, 2007
"... Recent scientific and technological advances have witnessed an abundance of structural patterns modeled as graphs. As a result, it is of special interest to process graph containment queries effectively on large graph databases. Given a graph database G, and a query graph q, the graph containment qu ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
Recent scientific and technological advances have witnessed an abundance of structural patterns modeled as graphs. As a result, it is of special interest to process graph containment queries effectively on large graph databases. Given a graph database G, and a query graph q, the graph containment query is to retrieve all graphs in G which contain q as subgraph(s). Due to the vast number of graphs in G and the nature of complexity for subgraph isomorphism testing, it is desirable to make use of high-quality graph indexing mechanisms to reduce the overall query processing cost. In this paper, we propose a new cost-effective graph indexing method based on frequent tree-features of the graph database. We analyze the effectiveness and efficiency of tree as indexing feature from three critical aspects: feature size, feature selection cost, and pruning power. In order to achieve better pruning ability than existing graph-based indexing methods, we select, in addition to frequent tree-features (Tree), a small number of discriminative graphs (∆) on demand, without a costly graph mining process beforehand. Our study verifies that (Tree+∆) is a better choice than graph for indexing purpose, denoted (Tree+ ∆ ≥Graph), to address the graph containment query problem. It has two implications: (1) the index construction by (Tree+∆) is efficient, and (2) the graph containment query processing by (Tree+∆) is efficient. Our experimental studies demonstrate that (Tree+∆) has a compact index structure, achieves an order of magnitude better performance in index construction, and most importantly, outperforms up-to-date graphbased indexing methods: gIndex and C-Tree, in graph containment query processing. 1.
Gstring: A novel approach for efficient search in graph databases
- In ICDE
, 2007
"... Graphs are widely used for modeling complicated data, including chemical compounds, protein interactions, XML documents, and multimedia. Information retrieval against such data can be formulated as a graph search problem, and finding an efficient solution to the problem is essential for many applica ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
Graphs are widely used for modeling complicated data, including chemical compounds, protein interactions, XML documents, and multimedia. Information retrieval against such data can be formulated as a graph search problem, and finding an efficient solution to the problem is essential for many applications. A popular approach is to represent both graphs and queries on graphs by sequences, thus converting graph search to subsequence matching. State-of-the-art sequencing methods work at the finest granularity – each node (or edge) in the graph will appear as an element in the resulting sequence. Clearly, such methods are not semantic conscious, and the resulting sequences are not only bulky but also prone to complexities arising from graph isomorphism and other problems in searching. In this paper, we introduce a novel sequencing method to capture the semantics of the underlying graph data. We find meaningful components in graph structures and use them as the most basic units in sequencing. It not only reduces the size of resulting sequences, but also enables semantic-based searching. In this paper, we base our approach on chemical compound databases, although it can be applied to searching other complicated graphs, such as protein structures. Experiments demonstrate that our approach outperforms state-ofthe-art graph search methods. 1.

