Results 1 -
3 of
3
Adaptive Integration of Distributed Semantic Web Data
"... Abstract. The use of RDF (Resource Description Framework) data is a cornerstone of the Semantic Web. RDF data embedded in Web pages may be indexed using semantic search engines, however, RDF data is oftenstoredindatabases,accessibleviaWebServicesusingtheSPARQL query language for RDF, which form part ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract. The use of RDF (Resource Description Framework) data is a cornerstone of the Semantic Web. RDF data embedded in Web pages may be indexed using semantic search engines, however, RDF data is oftenstoredindatabases,accessibleviaWebServicesusingtheSPARQL query language for RDF, which form part of the Deep Web which is not accessible using search engines. This paper addresses the problem of effectively integrating RDF data stored in separate Web-accessible databases. An approach based on distributed query processing is described, where data from multiple repositories are used to construct partitioned tables that are integrated using an adaptive query processing technique supporting join reordering, which limits any reliance on statistics and metadata about SPARQL endpoints, as such information is often inaccurate or unavailable, but is required by existing systems supporting federated SPARQL queries. The approach presented extends existing approaches in this area by allowing tables to be added to the query plan while it is executing, and shows how an approach currently used within relational query processing can be applied to distributed SPARQL query processing. The approach is evaluated using a prototype implementation and potential applications are discussed. 1
Using Precomputed Bloom Filters to Speed Up SPARQL Processing in the Cloud
"... Increasingly data on the Web is stored in the form of Semantic Web data. Because of today’s information overload, it becomes very important to store and query these big datasets in a scalable way and hence in a distributed fash-ion. Cloud Computing offers such a distributed environment with dynamic ..."
Abstract
- Add to MetaCart
Increasingly data on the Web is stored in the form of Semantic Web data. Because of today’s information overload, it becomes very important to store and query these big datasets in a scalable way and hence in a distributed fash-ion. Cloud Computing offers such a distributed environment with dynamic reallocation of computing and storing resources based on needs. In this work we introduce a scalable distributed Semantic Web database in the Cloud. In order to reduce the number of (unnecessary) intermediate results early, we apply bloom filters. Instead of computing bloom filters, a time-consuming task during query processing as it has been done traditionally, we precompute the bloom filters as much as possible and store them in the indices besides the data. The experimental results with data sets up to 1 billion triples show that our approach speeds up query processing significantly and sometimes even reduces the processing time to less than half. TYPE OF PAPER AND KEYWORDS
IOS Press A Scalable RDF Data Processing Framework based on Pig and Hadoop
"... Abstract. In order to effectively handle the growing amount of available RDF data, scalable and flexible RDF data processing frameworks are needed. While emerging technologies for Big Data, such as Hadoop-based systems that take advantages of scalable and fault-tolerant distributed processing, based ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. In order to effectively handle the growing amount of available RDF data, scalable and flexible RDF data processing frameworks are needed. While emerging technologies for Big Data, such as Hadoop-based systems that take advantages of scalable and fault-tolerant distributed processing, based on Google’s distributed file system and MapReduce parallel model, have become available, there are still many issues when applying the technologies to RDF data processing. In this paper, we propose our RDF data processing framework using Pig and Hadoop with several extensions to solve the issues. We integrate an efficient RDF storage schema into our framework and then show the performance improvement from Pig’s standard bulk load and store operations, including the schema conversion cost from conventional RDF file formats. We also compare the performance of our framework to the existing single-node RDF databases. Furthermore, as reasoning is an important requirement for most RDF data processing systems, we introduce the user operation for inferring new triples using entailment rules and show the performance evaluation of the transitive closure operation as an example of the inference, on our framework.