Results 1 - 10
of
10
A.: HarVANA - Harvesting Community Tags to Enrich Collection Metadata
- 16 – 20, pp 147
"... Collaborative, social tagging and annotation systems have exploded on the Internet as part of the Web 2.0 phenomenon. Systems such as Flickr, Del.icio.us, Technorati, Connotea and LibraryThing, provide a community-driven approach to classifying information and resources on the Web, so that they can ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
(Show Context)
Collaborative, social tagging and annotation systems have exploded on the Internet as part of the Web 2.0 phenomenon. Systems such as Flickr, Del.icio.us, Technorati, Connotea and LibraryThing, provide a community-driven approach to classifying information and resources on the Web, so that they can be browsed, discovered and re-used. Although social tagging sites provide simple, user-relevant tags, there are issues associated with the quality of the metadata and the scalability compared with conventional indexing systems. In this paper we propose a hybrid approach that enables authoritative metadata generated by traditional cataloguing methods to be merged with community annotations and tags. The HarvANA (Harvesting and Aggregating Networked Annotations) system uses a standardized but extensible RDF model for representing the annotations/tags and OAI-PMH to harvest the annotations/tags from distributed community servers. The harvested annotations are aggregated with the authoritative metadata in a centralized metadata store. This streamlined, interoperable, scalable approach enables libraries, archives and repositories to leverage community enthusiasm for tagging and annotation, augment their metadata and enhance their discovery services. This paper describes the HarvANA system and its evaluation through a collaborative testbed with the National Library of Australia using architectural images from PictureAustralia.
Sparql query optimization on top of dhts
- In ISWC
, 2010
"... Abstract. We study the problem of SPARQL query optimization on top of distributed hash tables. Existing works on SPARQL query processing in such environments have never been implemented in a real system, or do not utilize any optimization techniques and thus exhibit poor perfor-mance. Our goal in th ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
(Show Context)
Abstract. We study the problem of SPARQL query optimization on top of distributed hash tables. Existing works on SPARQL query processing in such environments have never been implemented in a real system, or do not utilize any optimization techniques and thus exhibit poor perfor-mance. Our goal in this paper is to propose efficient and scalable algo-rithms for optimizing SPARQL basic graph pattern queries. We augment a known distributed query processing algorithm with query optimization strategies that improve performance in terms of query response time and bandwidth usage. We implement our techniques in the system Atlas and study their performance experimentally in a local cluster. 1
3rdf: Storing and Querying RDF Data on top of the 3nuts Overlay Network
"... Web systems mainly use distributed hash table (DHT) based networks. These networks provide good load balancing by applying uniform hash functions with the drawback that they destroy possible semantic relations between data elements. But mapping the data semantics on the network structure could impro ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
(Show Context)
Web systems mainly use distributed hash table (DHT) based networks. These networks provide good load balancing by applying uniform hash functions with the drawback that they destroy possible semantic relations between data elements. But mapping the data semantics on the network structure could improve the routing time in the network and consequently the RDF query latency on application layer. In this paper, we present 3rdf, a distributed RDF system for storing and querying RDF data. The 3rdf system has been built on top of the 3nuts p2p network. The 3nuts network improves on reducing the query response time and bandwidth usage in our system by adapting the network structure to the semantics of the RDF data. In addition, we study how the evaluation of SPARQL BASIC graph patterns in existing distributed RDF repositories can be extended for other graph patterns such as OPTIONAL and UNION in our 3rdf system. I.
Opportunistic Linked Data Querying through Approximate Membership Metadata
"... Abstract. Between dereferencing and the protocol lies a largely unexplored axis of possible interfaces to Linked Data, each with its own combination of trade-offs. One of these interfaces is Triple Pattern Fragments, which allows clients to execute queries against low-cost servers, at the cost of h ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Between dereferencing and the protocol lies a largely unexplored axis of possible interfaces to Linked Data, each with its own combination of trade-offs. One of these interfaces is Triple Pattern Fragments, which allows clients to execute queries against low-cost servers, at the cost of higher bandwidth. Increasing a client's efficiency means lowering the number of requests, which can among others be achieved through additional metadata in responses. We noted that typical query evaluations against Triple Pattern Fragments require a significant portion of membership subqueries, which check the presence of a specific triple, rather than a variable pattern. This paper studies the impact of providing approximate membership functions, i.e., Bloom filters and Golombcoded sets, as extra metadata. In addition to reducing requests, such functions allow to achieve full result recall earlier when temporarily allowing lower precision. Half of the tested queries from a WatDiv benchmark test set could be executed with up to a third fewer requests with only marginally higher server cost. Query times, however, did not improve, likely due to slower metadata generation and transfer. This indicates that approximate membership functions can partly improve the client-side query process with minimal impact on the server and its interface.
Topk rdf query evaluation in structured p2p networks
- Lehner (Eds.), Euro-Par 2006 Parallel Processing
, 2006
"... ..."
(Show Context)
Evaluating sparql subqueries over p2p overlay networks
- in Database and Expert Systems Applications (DEXA), 2012 23rd International Workshop on, 2012
"... Abstract—The introduction of subqueries is one of the most interesting feature included in the latest SPARQL 1.1 specifica-tion. Existing distributed RDF storage and querying systems have not studied the evaluation of this newly included query feature. The evaluation of subqueries in distributed env ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract—The introduction of subqueries is one of the most interesting feature included in the latest SPARQL 1.1 specifica-tion. Existing distributed RDF storage and querying systems have not studied the evaluation of this newly included query feature. The evaluation of subqueries in distributed environment may be very inefficient and expensive in term of query response time and bandwidth usage, particularly for the correlated subqueries, where the inner query block is evaluated once for each solution of the outer query. In this paper, we study the problem of evaluating SPARQL subqueries over RDF data stored in 3nuts p2p network. We study semijoin based optimization technique, and transformation algorithms to transform correlated queries to equivalent uncorrelated once, that would improve the efficiency of nested query evaluation in distributed environment. I.
Accessing XML Documents using Semantic Meta Data in a P2P Environment
"... XGR (XML Data Grid) and BabelPeers are both data man-agement systems based on distributed hash tables (DHT) that use the Pastry DHT to store data and meta data. XGR is based on the XML data model; BabelPeers uses the Resource Description Framework (RDF) for its data. XGR and Ba-belPeers have differe ..."
Abstract
- Add to MetaCart
XGR (XML Data Grid) and BabelPeers are both data man-agement systems based on distributed hash tables (DHT) that use the Pastry DHT to store data and meta data. XGR is based on the XML data model; BabelPeers uses the Resource Description Framework (RDF) for its data. XGR and Ba-belPeers have different but complementary functionality. On the one hand, XGR focuses on document-based storage of XML data and publish/subscribe mechanisms, on the other hand, BabelPeers focuses on query strategies that combine pieces of information originating from various sources and provides reasoning about the information. Thus it is valuable to research how the two concepts can be merged to get the best of both worlds.1
Using Precomputed Bloom Filters to Speed Up SPARQL Processing in the Cloud
"... Increasingly data on the Web is stored in the form of Semantic Web data. Because of today’s information overload, it becomes very important to store and query these big datasets in a scalable way and hence in a distributed fash-ion. Cloud Computing offers such a distributed environment with dynamic ..."
Abstract
- Add to MetaCart
(Show Context)
Increasingly data on the Web is stored in the form of Semantic Web data. Because of today’s information overload, it becomes very important to store and query these big datasets in a scalable way and hence in a distributed fash-ion. Cloud Computing offers such a distributed environment with dynamic reallocation of computing and storing resources based on needs. In this work we introduce a scalable distributed Semantic Web database in the Cloud. In order to reduce the number of (unnecessary) intermediate results early, we apply bloom filters. Instead of computing bloom filters, a time-consuming task during query processing as it has been done traditionally, we precompute the bloom filters as much as possible and store them in the indices besides the data. The experimental results with data sets up to 1 billion triples show that our approach speeds up query processing significantly and sometimes even reduces the processing time to less than half. TYPE OF PAPER AND KEYWORDS
Scalable Distributed Indexing and Query Processing over Linked Data
"... Linked Data is becoming the core part of modern Web applications and thus efficient access to structured information expressed in RDF gains paramount importance. A number of efficient local RDF stores exist already, while distributed indexing and distributed query processing over Linked Data with si ..."
Abstract
- Add to MetaCart
(Show Context)
Linked Data is becoming the core part of modern Web applications and thus efficient access to structured information expressed in RDF gains paramount importance. A number of efficient local RDF stores exist already, while distributed indexing and distributed query processing over Linked Data with similar efficiency and data management features as known from traditional database and data integration systems are only starting to develop. Distributed approaches will necessarily co-exist with centralized schemes, as data will be owned by different stakeholders who may not want to provide their complete data sets to a central place. Additionally, central / integrated storage may be prohibited for organizational or legal reasons in certain areas. To support decentralized schemes, only a few attempts in this direction exist so far, but they are limited in terms of capabilities and the degree of distribution vs. efficiency, query expressivity, and scalability. To remedy this situation, the approach and proof-of-concept prototype presented in this paper provides a solution for these open challenges. As we argue for widely distributed systems as a possible answer to scalability issues, we first identify and discuss the main challenges and based on this analysis, we propose an approach for efficient and scalable query processing over distributed Linked Data sources, taking into account the latest advances in database technology. Our system is based on a layered architecture that makes use of the advantages of decentralized indexing and query processing approaches, which have been researched and matured over the last decade. Our approach is based on a logical algebra for queries over RDF data and a related physical query algebra to enable optimization, both on the logical and physical layers in
Noname manuscript No. (will be inserted by the editor) Comparing Data Summaries for Processing Live Queries over Linked Data
"... Abstract A growing amount of Linked Data – graph-structured data accessible at sources distributed across the Web – enables advanced data integration and decision-making applications. Typical systems operat-ing on Linked Data collect (crawl) and pre-process (in-dex) large amounts of data, and evalua ..."
Abstract
- Add to MetaCart
Abstract A growing amount of Linked Data – graph-structured data accessible at sources distributed across the Web – enables advanced data integration and decision-making applications. Typical systems operat-ing on Linked Data collect (crawl) and pre-process (in-dex) large amounts of data, and evaluate queries against a centralised repository. Given that crawling and in-dexing are time-consuming operations, the data in the centralised index may be out of date at query execu-tion time. An ideal query answering system for query-ing Linked Data live should return current answers in a reasonable amount of time, even on corpora as large as the Web. In such a live query system source selec-tion – determining which sources contribute answers to a query – is a crucial step. In this article we propose to use lightweight data summaries for determining rele-vant sources during query evaluation. We compare sev-eral data structures and hash functions with respect to their suitability for building such summaries, stress-ing benefits for queries that contain joins and require ranking of results and sources. We elaborate on join variants, join ordering and ranking. We analyse the dif-ferent approaches theoretically and provide results of an extensive experimental evaluation. 1