Results 1 -
4 of
4
Engineering High-Performance Database Engines
"... Developing a database engine is both challenging and re-warding. Database engines are very complex software arti-facts that have to scale to large data sizes and large hardware configurations, and developing such systems usually means choosing between different trade-offs at various points of de-vel ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Developing a database engine is both challenging and re-warding. Database engines are very complex software arti-facts that have to scale to large data sizes and large hardware configurations, and developing such systems usually means choosing between different trade-offs at various points of de-velopment. This papers gives a survey over two different database en-gines, the disk-based SPARQL-processing engine RDF-3X, and the relational main-memory engine HyPer. It discusses the design choices that were made during development, and highlights optimization techniques that are important for both systems. 1.
PrefixSolve: Efficiently Solving Multi-Source Multi-Destination Path Queries on RDF Graphs by Sharing Suffix Computations ABSTRACT
"... Uncovering the “nature ” of the connections between a set of entities e.g. passengers on a flight and organizations on a watchlist can be viewed as a Multi-Source Multi-Destination (MSMD) Path Query problem on labeled graph data models such as RDF. Using existing graph-navigational path finding tech ..."
Abstract
- Add to MetaCart
(Show Context)
Uncovering the “nature ” of the connections between a set of entities e.g. passengers on a flight and organizations on a watchlist can be viewed as a Multi-Source Multi-Destination (MSMD) Path Query problem on labeled graph data models such as RDF. Using existing graph-navigational path finding techniques to solve MSMD problems will require queries to be decomposed into multiple single-source or destination path subqueries, each of which is solved independently. Navigational techniques on disk-resident graphs typically generate very poor I/O access patterns for large, disk-resident graphs and for MSMD path queries, such poor access patterns may be repeated if common graph exploration steps exist across subqueries. In this paper, we propose an optimization technique for general MSMD path queries that generalizes an efficient algebraic approach for solving a variety of single-source path problems. The generalization enables holistic evaluation of MSMD path queries without the need for query decomposition. We present a conceptual framework for sharing computation in the algebraic framework that is based on “suffix equivalence”. Suffix equivalence amongst subqueries captures the fact that multiple subqueries with different prefixes can share a suffix and as such share the computation of shared suffixes, which allows prefix path computations to share common suffix path computations. This approach offers orders of magnitude better performance than current existing techniques as demonstrated by a comprehensive experimental evaluation over real and synthetic datasets.
Dissemination level:
"... This document surveys the existing approaches to store, index and query RDF data. While providing an overview of diverse techniques for RDF data storing, we concentrate on triple stores for query processing and query optimization issues. We describe the state-of-the-art solutions for two classical d ..."
Abstract
- Add to MetaCart
(Show Context)
This document surveys the existing approaches to store, index and query RDF data. While providing an overview of diverse techniques for RDF data storing, we concentrate on triple stores for query processing and query optimization issues. We describe the state-of-the-art solutions for two classical database management problems of join ordering and selectivity esimtimations in the context of RDF systems, and then analyze their drawbacks. Namely, we provide a detailed descriptions of technical challenges (coined choke points) that any system must cope with in order to efficiently execute complex SPARQL queries on large real-world RDF datasets. These choke points will later be used in the SPARQL benchmark design. Although illustrated with RDF/SPARQL examples, the classification of choke points is general, and applies to Graph DBs and declarative graph query languages.
Adaptive Landmark Selection Strategies for Fast Shortest Path Computation in Large Real-World Graphs
"... Abstract—This paper considers the task of answering shortest path queries in large real-world graphs such as social networks, communication networks and web graphs. The traditional Breadth First Search (BFS) approach for solving this problem is too time-consuming when networks with millions of nodes ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—This paper considers the task of answering shortest path queries in large real-world graphs such as social networks, communication networks and web graphs. The traditional Breadth First Search (BFS) approach for solving this problem is too time-consuming when networks with millions of nodes and possibly billions of edges are considered. A common technique to address these complexity issues uses a small set of landmark nodes from which the distance to all other nodes is precomputed in order to then answer arbitrary distance queries by navigating via one of the selected landmarks. Although many strategies to select landmarks have been introduced in previous work, the problem of finding an optimal set that covers the entire graph remains NP-hard. Our contribution starts with a study of characteristics that determine the successfulness of a land-mark selection strategy. We propose a new adaptive heuristic for selecting landmarks that does not only pick central nodes, but also ensures that these landmarks properly cover different areas of the graph. Experiments on a diverse set of large graphs show that the proposed selection strategy and assisting node processing technique can efficiently estimate the node-to-node distance in graphs with millions of nodes with very high accuracy, while using the same amount of precomputation time as previously proposed strategies. Keywords-graphs; shortest paths; distances; landmarks