Results 1 - 10
of
26
Towards effective partition management for large graphs
- IN SIGMOD
, 2012
"... Searching and mining large graphs today is critical to a variety of application domains, ranging from community detection in social networks, to de novo genome sequence assembly. Scalable processing of large graphs requires careful partitioning and distribution of graphs across clusters. In this pap ..."
Abstract
-
Cited by 29 (1 self)
- Add to MetaCart
(Show Context)
Searching and mining large graphs today is critical to a variety of application domains, ranging from community detection in social networks, to de novo genome sequence assembly. Scalable processing of large graphs requires careful partitioning and distribution of graphs across clusters. In this paper, we investigate the problem of managing large-scale graphs in clusters and study access characteristics of local graph queries such as breadth-first search, random walk, and SPARQL queries, which are popular in real applications. These queries exhibit strong access locality, and therefore require specific data partitioning strategies. In this work, we propose a Self Evolving Distributed Graph Management Environment (Sedge), to minimize inter-machine communi-cation during graph query processing in multiple machines. In order to improve query response time and throughput, Sedge introduces a two-level partition management archi-tecture with complimentary primary partitions and dynamic secondary partitions. These two kinds of partitions are able to adapt in real time to changes in query workload. Sedge also includes a set of workload analyzing algorithms whose time complexity is linear or sublinear to graph size. Empirical results show that it significantly improves distributed graph processing on today’s commodity clusters.
Static Analysis and Optimization of Semantic Web Queries
- In Proceedings of the ACM Symposium on Principles of Database Systems
, 2012
"... Static analysis is a fundamental task in query optimization. In this paper we study static analysis and optimization techniques for SPARQL, which is the standard language for querying Semantic Web data. Of particular interest for us is the optionality feature in SPARQL. It is crucial in Semantic Web ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
(Show Context)
Static analysis is a fundamental task in query optimization. In this paper we study static analysis and optimization techniques for SPARQL, which is the standard language for querying Semantic Web data. Of particular interest for us is the optionality feature in SPARQL. It is crucial in Semantic Web data management, where data sources are inherently incomplete and the user is usually in-terested in partial answers to queries. This feature is one of the most complicated constructors in SPARQL and also the one that makes this language depart from classical query languages such as relational conjunctive queries. We focus on the class of well-designed SPARQL queries, which has been proposed in the liter-ature as a fragment of the language with good properties regard-ing query evaluation. We first propose a tree representation for SPARQL queries, called pattern trees, which captures the class of well-designed SPARQL graph patterns and which can be consid-ered as a query execution plan. Among other results, we propose several transformation rules for pattern trees, a simple normal form, and study equivalence and containment. We also study the enumer-ation and counting problems for this class of queries.
TriAL for RDF: Adapting Graph Query Languages for RDF Data
"... Querying RDF data is viewed as one of the main applications of graph query languages, and yet the standard model of graph databases – essentially labeled graphs – is different from the triples-based model of RDF. While encodings of RDF databases into graph data exist, we show that even the most natu ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
(Show Context)
Querying RDF data is viewed as one of the main applications of graph query languages, and yet the standard model of graph databases – essentially labeled graphs – is different from the triples-based model of RDF. While encodings of RDF databases into graph data exist, we show that even the most natural ones are bound to lose somefunctionalitywhenused inconjunctionwith graph query languages. The solution is to work directly with triples, but then many properties taken for granted in the graphdatabasecontext(e.g., reachability)losetheir natural meaning. Our goal is to introduce languages that work directly over triples and are closed, i.e., they produce sets of triples, ratherthan graphs. Our basiclanguageis called TriAL, or Triple Algebra: it guarantees closure properties by replacing the product with a family of join operations. We extend TriAL with recursion, and explain why such an extension is more intricate for triples than for graphs. We present a declarative language, namely a fragment of datalog, capturing the recursive algebra. For both languages, the combined complexity of query evaluation is given by low-degree polynomials. We compare our languages with relational languages, such as finite-variable logics, and previously studied graph query languages such as adaptations of XPath, regular path queries, and nested regular expressions; many of these languages are subsumed by the recursive triple algebra. We also provide examples of the usefulness of TriAL in querying graph, RDF, and social networks data.
Algebraic Structures for Capturing the Provenance of SPARQL Queries ∗ ABSTRACT
"... We show that the evaluation of SPARQL algebra queries on various notions of annotated RDF graphs can be seen as particular cases of the evaluation of these queries on RDF graphs annotated with elements of so-called spm-semirings. Spm-semirings extend semirings, used for positive relational algebra q ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
We show that the evaluation of SPARQL algebra queries on various notions of annotated RDF graphs can be seen as particular cases of the evaluation of these queries on RDF graphs annotated with elements of so-called spm-semirings. Spm-semirings extend semirings, used for positive relational algebra queries on annotated relational data, with a new operator to capture the semantics of the non-monotone SPARQL operator OPTIONAL. Furthermore, spmsemiring-based annotations ensure that desired SPARQL query equivalences hold when querying annotated RDF. In addition to introducing spm-semirings, we study their properties and provide an alternative characterization of these structures in terms of semirings with an embedded boolean algebra (or seba-structure for short). This characterization allows to construct spm-semirings and to identify a universal object in the class of spm-semirings. Finally, we show that this universal object provides a concise provenance representation and can be used to evaluate SPARQL queries on arbitrary spm-semiring-annotated RDF graphs.
On the Semantics of SPARQL Queries with Optional Matching under Entailment Regimes
"... Abstract. We study the semantics of SPARQL queries with optional matching features under entailment regimes. We argue that the normative semantics may lead to answers that are in conflict with the intuitive mean-ing of optional matching, where unbound variables naturally represent unknown informatio ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
Abstract. We study the semantics of SPARQL queries with optional matching features under entailment regimes. We argue that the normative semantics may lead to answers that are in conflict with the intuitive mean-ing of optional matching, where unbound variables naturally represent unknown information. We propose an extension of the SPARQL algebra that addresses these issues and is compatible with any entailment regime satisfying the minimal requirements given in the normative specification. We then study the complexity of query evaluation and show that our extension comes at no cost for regimes with an entailment relation of reasonable complexity. Finally, we show that our semantics preserves the known properties of optional matching that are commonly exploited for static analysis and optimisation. 1
Towards Reconciling SPARQL and Certain Answers
"... SPARQL entailment regimes are strongly influenced by the big body of works on ontology-based query answering, notably in the area of Description Logics (DLs). However, the semantics of query answer-ing under SPARQL entailment regimes is defined in a more naive and much less expressive way than the c ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
SPARQL entailment regimes are strongly influenced by the big body of works on ontology-based query answering, notably in the area of Description Logics (DLs). However, the semantics of query answer-ing under SPARQL entailment regimes is defined in a more naive and much less expressive way than the certain answer semantics usually adopted in DLs. The goal of this work is to introduce an intuitive certain answer semantics also for SPARQL and to show the feasibility of this approach. For OWL 2 QL entailment, we present algorithms for the evaluation of an interesting fragment of SPARQL (the so-called well-designed SPARQL). Moreover, we show that the complexity of the most fundamental query analysis tasks (such as query containment and equivalence testing) is not negatively affected by the presence of OWL 2 QL entailment under the proposed semantics.
Bridging the Semantic Gap between RDF and SPARQL using Completeness Statements
"... Abstract. RDF data is often treated as incomplete, following the Open-World Assumption. On the other hand, SPARQL, the standard query language over RDF, usually follows the Closed-World Assumption, as-suming RDF data to be complete. This gives rise to a semantic gap between RDF and SPARQL. In this p ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
Abstract. RDF data is often treated as incomplete, following the Open-World Assumption. On the other hand, SPARQL, the standard query language over RDF, usually follows the Closed-World Assumption, as-suming RDF data to be complete. This gives rise to a semantic gap between RDF and SPARQL. In this paper, we address how to close the semantic gap between RDF and SPARQL in terms of certain answers and possible answers using completeness statements.
SPAM: A SPARQL Analysis and Manipulation Tool
"... SQL developers are used to having elaborate tools which help them in writing queries. In contrast, the creation of tools to assist users in the development of SPARQL queries is still in its infancy. In this system demo, we present the SPARQL Analysis and Manipulation (SPAM) tool, which provides help ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
SQL developers are used to having elaborate tools which help them in writing queries. In contrast, the creation of tools to assist users in the development of SPARQL queries is still in its infancy. In this system demo, we present the SPARQL Analysis and Manipulation (SPAM) tool, which provides help for the development of SPARQL queries. The main features of the SPAM tool comprise an editor with both text and graphical interface, as well as various functionsforthestaticanddynamicanalysisofSPARQLqueries. 1.
Querying Linked Geospatial Data with Incomplete Information
"... Abstract. Linked geospatial data has recently received attention, as researchers and practitioners have started tapping the wealth of geospatial information available on the Web. Incomplete geospatial information, although appearing often in the applications captured by such datasets, is not represe ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract. Linked geospatial data has recently received attention, as researchers and practitioners have started tapping the wealth of geospatial information available on the Web. Incomplete geospatial information, although appearing often in the applications captured by such datasets, is not represented and queried properly due to the lack of appropriate data models and query languages. We discuss our recent work on the model RDF i, an extension of RDF with the ability to represent property values that exist, but are unknown or partially known, using constraints, and an extension of the query language SPARQL with qualitative and quantitative geospatial querying capabilities. We demonstrate the usefulness of RDF i in geospatial Semantic Web applications by giving examples and comparing the modeling capabilities of RDF i with the ones of related Semantic Web systems.
An Algebra and Equivalences to Transform Graph Patterns in Neo4j
"... Modern query optimizers of relational database systems em-body more than three decades of research and practice in the area of data management and processing. Key advances in-clude algebraic query transformation, intelligent search space pruning, and modular optimizer architectures. Surprisingly, ma ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Modern query optimizers of relational database systems em-body more than three decades of research and practice in the area of data management and processing. Key advances in-clude algebraic query transformation, intelligent search space pruning, and modular optimizer architectures. Surprisingly, many of these contributions seem to have been overlooked in the emerging field of graph databases so far. In particular, we believe that query optimization based on a general graph algebra and its equivalences can greatly improve on the cur-rent state of the art. Although some graph algebras have already been proposed, they have often been developed in a context, in which a relational database system is used as a backend to process graph data. As a consequence, these al-gebras are typically tightly coupled to the relational algebra, making them unsuitable for native graph databases. While we support the approach of extending the relational algebra, we argue that graph-specific operations should be defined at a higher level, independent of the database backend. In this paper, we introduce such a general graph algebra and corresponding equivalences. We demonstrate how it can be used to optimize Cypher queries in the setting of the Neo4j native graph database. 1.