Results 11 - 20
of
43
The Internet of Things: A survey from the data-centric perspective,” in Managing and Mining Sensor Data,
, 2013
"... ..."
(Show Context)
Graph data management and mining: a survey of algorithms and applications
- Wang (Eds.), Managing and Mining Graph Data, of Advances in Database Systems
, 2010
"... ..."
An algebra for basic graph patterns
- In Proc. of the Workshop on Logic in Databases (LID
, 2008
"... Abstract. Motivated by recent developments in the dataspaces, web, and personal information management communities, we outline research directions on query processing for SPARQL, the W3C recommendation language for querying RDF triple stores. The core of each SPARQL query is a basic graph pattern (B ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
(Show Context)
Abstract. Motivated by recent developments in the dataspaces, web, and personal information management communities, we outline research directions on query processing for SPARQL, the W3C recommendation language for querying RDF triple stores. The core of each SPARQL query is a basic graph pattern (BGP). BGP is a little logic for extracting subsets of related nodes in an RDF graph. In this paper we undertake a formal study of BGP with an eye towards efficient SPARQL query evaluation. Our main contributions are (1) an algebraization of BGP, and (2) first steps towards a framework for the design of structural indexes to accelerate processing of queries in this algebra. 1
Workload Matters: Why RDF Databases Need a New Design
"... The Resource Description Framework (RDF) is a standard for conceptually describing data on the Web, and SPARQL is the query language for RDF. As RDF is becoming widely utilized, RDF data management systems are being exposed to more diverse and dynamic workloads. Existing systems are workload-oblivio ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
The Resource Description Framework (RDF) is a standard for conceptually describing data on the Web, and SPARQL is the query language for RDF. As RDF is becoming widely utilized, RDF data management systems are being exposed to more diverse and dynamic workloads. Existing systems are workload-oblivious, and are therefore unable to provide consistently good performance. We propose a vision for a workload-aware and adaptive system. To realize this vision, we re-evaluate relevant existing physical design criteria for RDF and address the resulting set of new challenges. 1.
RAPID: Enabling Scalable Ad-Hoc Analytics on the Semantic Web
"... Abstract. As the amount of available RDF data continues to increase steadily, there is growing interest in developing efficient methods for analyzing such data. While recent efforts have focused on developing efficient methods for traditional data processing, analytical processing which typically in ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Abstract. As the amount of available RDF data continues to increase steadily, there is growing interest in developing efficient methods for analyzing such data. While recent efforts have focused on developing efficient methods for traditional data processing, analytical processing which typically involves more complex queries has received much less attention. The use of cost effective parallelization techniques such as Google’s Map-Reduce offer significant promise for achieving Web scale analytics. However, currently available implementations are designed for simple data processing on structured data. In this paper, we present a language, RAPID, for scalable ad-hoc analytical processing of RDF data on Map-Reduce frameworks. It builds on Yahoo’s Pig Latin by introducing primitives based on a specialized join operator, the MDjoin, for expressing analytical tasks in a manner that is more amenable to parallel processing, as well as primitives for coping with semi-structured nature of RDF data. Experimental evaluation results demonstrate significant performance improvements for analytical processing of RDF data over existing Map-Reduce based techniques.
On enhancing scalability for distributed rdf/s stores
- In Proc. of the Intl. Conf. on Extending Database Technology
, 2011
"... This work presents MIDAS-RDF, a distributed P2P RDF/S repository that is built on top of a distributed multi-dimensional index structure. MIDAS-RDF features fast retrieval of RDF triples satisfying various pattern queries by translating them into multi-dimensional range queries, which can be process ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
This work presents MIDAS-RDF, a distributed P2P RDF/S repository that is built on top of a distributed multi-dimensional index structure. MIDAS-RDF features fast retrieval of RDF triples satisfying various pattern queries by translating them into multi-dimensional range queries, which can be processed by the underlying index in hops logarithmic to the number of peers. More importantly, MIDAS-RDF utilizes a labeling scheme to handle expensive transitive closure computations efficiently. This allows for distributed RDFS reasoning in a more scalable way compared to existing methods, as also demonstrated by our extensive experimental study. Furthermore, MIDAS-RDF supports a publish-subscribe model that enables remote peers to selectively subscribe to RDF content.
Automating Relational Database Schema Design for Very Large Semantic Datasets
"... Many semantic datasets or RDF datasets are very large but have no pre-defined data structures. Triple stores are commonly used as RDF databases yet they cannot achieve good query performance for large datasets owing to excessive self-joins. Recent research work proposed to store RDF data in column-b ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Many semantic datasets or RDF datasets are very large but have no pre-defined data structures. Triple stores are commonly used as RDF databases yet they cannot achieve good query performance for large datasets owing to excessive self-joins. Recent research work proposed to store RDF data in column-based databases. Yet, some study has shown that such an approach is not scalable to the number of predicates. The third common approach is to organize an RDF data set in different tables in a relational database. Multiple “correlated ” predicates are maintained in the same table called property table so that table-joins are not needed for queries that involve only the predicates within the table. The main challenge for the property table approach is that it is infeasible to manually design good schemas for the property tables of a very large RDF dataset. We propose a novel data-mining technique called Attribute Clustering by Table Load (ACTL) that clusters a given set of attributes into correlated groups, so as to automatically generate the property table schemas. While ACTL is an NP-complete problem, we propose an agglomerative clustering algorithm with several effective pruning techniques to approximate the optimal solution. Experiments show that our algorithm can efficiently mine huge datasets (e.g., Wikipedia Infobox data) to generate good property table schemas, with which queries generally run faster than with triple stores and column-based databases. Keywords: RDF database schema design, attribute clustering, Wikipedia Infobox, semantic web
Storing and Indexing Massive RDF Data Sets
"... In this chapter we present a general survey of the current state of the art in RDF storage and indexing. In the flurry of research on RDF data management in the last decade, we can identify three different perspectives on RDF: (1) a relational perspective; (2) an entity perspective; and (3) a graph ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In this chapter we present a general survey of the current state of the art in RDF storage and indexing. In the flurry of research on RDF data management in the last decade, we can identify three different perspectives on RDF: (1) a relational perspective; (2) an entity perspective; and (3) a graph-based perspective. Each of these three perspectives has drawn from ideas and results in three distinct research communities to propose solutions for managing RDF data: relational databases (for the relational perspective); information retrieval (for the entity perspective); and graph theory and graph databases (for the graph-based perspective). Our goal in this chapter is to give an up-to-date overview of represpentative solutions within each perspective.
ENVIRONMENT
, 2013
"... Beyond relational: a database architecture and federated query optimization in a multi-modal healthcare environment ..."
Abstract
- Add to MetaCart
Beyond relational: a database architecture and federated query optimization in a multi-modal healthcare environment
ARCHIVES Query Execution in Column-Oriented Database Systems
, 2008
"... There are two obvious ways to map a two-dimension relational database table onto a one-dimensional storage in-terface: store the table row-by-row, or store the table column-by-column. Historically, database system imple-mentations and research have focused on the row-by row data layout, since it per ..."
Abstract
- Add to MetaCart
(Show Context)
There are two obvious ways to map a two-dimension relational database table onto a one-dimensional storage in-terface: store the table row-by-row, or store the table column-by-column. Historically, database system imple-mentations and research have focused on the row-by row data layout, since it performs best on the most common application for database systems: business transactional data processing. However, there are a set of emerging applications for database systems for which the row-by-row layout performs poorly. These applications are more analytical in nature, whose goal is to read through the data to gain new insight and use it to drive decision making and planning. In this dissertation, we study the problem of poor performance of row-by-row data layout for these emerging applications, and evaluate the column-by-column data layout opportunity as a solution to this problem. There have been a variety of proposals in the literature for how to build a database system on top of column-by-column layout.