Results 1 - 10
of
24
Scalable and Distributed Methods for Entity Matching, Consolidation and Disambiguation over Linked Data Corpora
, 2011
"... With respect to large-scale, static, Linked Data corpora, in this paper we discuss scalable and distributed methods for entity consolidation (aka. smushing, entity resolution, object consolidation, etc.) to locate and process names that signify the same entity. We investigate (i) a baseline approach ..."
Abstract
-
Cited by 21 (6 self)
- Add to MetaCart
With respect to large-scale, static, Linked Data corpora, in this paper we discuss scalable and distributed methods for entity consolidation (aka. smushing, entity resolution, object consolidation, etc.) to locate and process names that signify the same entity. We investigate (i) a baseline approach, which uses explicit owl:sameAs relations to perform consolidation; (ii) extended entity consolidation which additionally uses a subset of OWL 2 RL/RDF rules to derive novel owl:sameAs relations through the semantics of inverse-functional properties, functional-properties and (max-)cardinality restrictions with value one; (iii) deriving weighted concurrence measures between entities in the corpus based on shared inlinks/outlinks and attribute values using statistical analyses; (iv) disambiguating (initially) consolidated entities based on inconsistency detection using OWL 2 RL/RDF rules. Our methods are based upon distributed sorts and scans of the corpus, where we deliberately avoid the requirement for indexing all data. Throughout, we offer evaluation over a diverse Linked Data corpus consisting of 1.118 billion quadruples derived from a domain-agnostic, open crawl of 3.985 million RDF/XML Web documents, demonstrating the feasibility of our methods at that scale, and giving insights into the quality of the results for real-world data.
Exploiting RDFS and OWL for Integrating Heterogeneous, Large-Scale, Linked Data Corpora
, 2011
"... The Web contains a vast amount of information on an abundance of topics, much of which is encoded as structured data indexed by local databases. However, these databases are rarely interconnected and information reuse across sites is limited. Semantic Web standards offer a possible solution in the ..."
Abstract
-
Cited by 17 (11 self)
- Add to MetaCart
The Web contains a vast amount of information on an abundance of topics, much of which is encoded as structured data indexed by local databases. However, these databases are rarely interconnected and information reuse across sites is limited. Semantic Web standards offer a possible solution in the form of an agreed-upon data model and set of syntaxes, as well as metalanguages for publishing schema-level information, offering a highly-interoperable means of publishing and interlinking structured data on the Web. Thanks to the Linked Data community, an unprecedented lode of such data has now been published on the Web—by individuals, academia, communities, corporations and governmental organisations alike—on a medley of often overlapping topics. This new publishing paradigm has opened up a range of new and interesting research topics with respect to how this emergent “Web of Data” can be harnessed and exploited by consumers. Indeed, although Semantic
Assessing linked data mappings using network measures
- In ESWC
, 2012
"... Abstract. Linked Data is at its core about the setting of links between resources. Links provide enriched semantics, pointers to extra informa-tion and enable the merging of data sets. However, as the amount of Linked Data has grown, there has been the need to automate the cre-ation of links and suc ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
(Show Context)
Abstract. Linked Data is at its core about the setting of links between resources. Links provide enriched semantics, pointers to extra informa-tion and enable the merging of data sets. However, as the amount of Linked Data has grown, there has been the need to automate the cre-ation of links and such automated approaches can create low-quality links or unsuitable network structures. In particular, it is difficult to know whether the links introduced improve or diminish the quality of Linked Data. In this paper, we present LINK-QA, an extensible framework that allows for the assessment of Linked Data mappings using network met-rics. We test five metrics using this framework on a set of known good and bad links generated by a common mapping system, and show the behaviour of those metrics.
RDFS & OWL Reasoning for Linked Data
"... Abstract. Linked Data promises that a large portion of Web Data will be usable as one big interlinked RDF database against which structured queries can be answered. In this lecture we will show how reasoning – using RDF Schema (RDFS) and the Web Ontology Language (OWL) – can help to obtain more comp ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
(Show Context)
Abstract. Linked Data promises that a large portion of Web Data will be usable as one big interlinked RDF database against which structured queries can be answered. In this lecture we will show how reasoning – using RDF Schema (RDFS) and the Web Ontology Language (OWL) – can help to obtain more complete answers for such queries over Linked Data. We first look at the extent to which RDFS and OWL features are being adopted on the Web. We then introduce two high-level architectures for query answering over Linked Data and outline how these can be enriched by (lightweight) RDFS and OWL reasoning, enumerating the main challenges faced and discussing reasoning methods that make practical and theoretical trade-offs to address these challenges. In the end, we also ask whether or not RDFS and OWL are enough and discuss numeric reasoning methods that are beyond the scope of these standards but that are often important when integrating Linked Data from several, heterogeneous sources. 1
Improving the recall of live Linked Data querying through reasoning
- In RR
, 2012
"... Abstract. Linked Data principles allow for processing SPARQL queries on-the-fly by dereferencing URIs. Link-traversal query approaches for Linked Data have the benefit of up-to-date results and decentralised execution, but operate only on explicit data from dereferenced documents, affecting recall. ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
(Show Context)
Abstract. Linked Data principles allow for processing SPARQL queries on-the-fly by dereferencing URIs. Link-traversal query approaches for Linked Data have the benefit of up-to-date results and decentralised execution, but operate only on explicit data from dereferenced documents, affecting recall. In this paper, we show how inferable knowledge— specifically that found through owl:sameAs and RDFS reasoning—can improve recall in this setting. We first analyse a corpus featuring 7 million Linked Data sources and 2.1 billion quadruples: we (1) measure expected recall by only considering dereferenceable information, (2) measure the improvement in recall given by considering rdfs:seeAlso links as previous proposals did. We further propose and measure the impact of additionally considering (3) owl:sameAs links, and (4) applying lightweight RDFS reasoning for finding more results, relying on static schema information. We evaluate different configurations for live queries covering different shapes and domains, generated from random walks over our corpus. 1
Generating and summarizing explanations for linked data
- In: Proc. of the 11th Extended Semantic Web Conference
, 2014
"... Abstract. Linked Data consumers may need explanations for debug-ging or understanding the reasoning behind producing the data. They may need the possibility to transform long explanations into more un-derstandable short explanations. In this paper, we discuss an approach to explain reasoning over Li ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
(Show Context)
Abstract. Linked Data consumers may need explanations for debug-ging or understanding the reasoning behind producing the data. They may need the possibility to transform long explanations into more un-derstandable short explanations. In this paper, we discuss an approach to explain reasoning over Linked Data. We introduce a vocabulary to de-scribe explanation related metadata and we discuss how publishing these metadata as Linked Data enables explaining reasoning over Linked Data. Finally, we present an approach to summarize these explanations taking into account user specified explanation filtering criteria.
Practical rdf schema reasoning with annotated semantic web data
- In The Semantic Web–ISWC 2011
"... Abstract. Semantic Web data with annotations is becoming available, being YAGO knowledge base a prominent example. In this paper we present an approach to perform the closure of large RDF Schema anno-tated semantic web data using standard database technology. In partic-ular, we exploit several alter ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Semantic Web data with annotations is becoming available, being YAGO knowledge base a prominent example. In this paper we present an approach to perform the closure of large RDF Schema anno-tated semantic web data using standard database technology. In partic-ular, we exploit several alternatives to address the problem of computing transitive closure with real fuzzy semantic data extracted from YAGO in the PostgreSQL database management system. We benchmark the sev-eral alternatives and compare to classical RDF Schema reasoning, pro-viding the first implementation of annotated RDF schema in persistent storage.
Evolution of Workflow Provenance Information in the Presence of Custom Inference Rules
"... Abstract. Workflow systems can produce very large amounts of provenance information. In this paper we introduce provenance-based inference rules as a means to reduce the amount of provenance information that has to be stored, and to ease quality control (e.g., corrections). We motivate this kind of ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Abstract. Workflow systems can produce very large amounts of provenance information. In this paper we introduce provenance-based inference rules as a means to reduce the amount of provenance information that has to be stored, and to ease quality control (e.g., corrections). We motivate this kind of (provenance) inference and identify a number of basic inference rules over a conceptual model appropriate for representing provenance. The proposed inference rules concern the interplay between (i) actors and carried out activities, (ii) activities and devices that were used for such activities, and, (iii) the presence of information objects and physical things at events. However, since a knowledge base is not static but it changes over time for various reasons, we also study how we can satisfy change requests while supporting and respecting the aforementioned inference rules. Towards this end, we elaborate on the specification of the required change operations. 1
Link Traversal Querying for a Diverse Web of Data
, 2014
"... Traditional approaches for querying the Web of Data often involve centralised warehouses that replicate remote data. Conversely, Linked Data principles allow for answering queries live over the Web by dereferencing URIs to traverse remote data sources at runtime. A number of authors have looked at ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Traditional approaches for querying the Web of Data often involve centralised warehouses that replicate remote data. Conversely, Linked Data principles allow for answering queries live over the Web by dereferencing URIs to traverse remote data sources at runtime. A number of authors have looked at answering SPARQL queries in such a manner; these link-traversal based query execution (LTBQE) approaches for Linked Data offer up-to-date results and decentralised (i.e., client-side) execution, but must operate over incomplete dereferenceable knowledge available in remote documents, thus affecting response times and “recall” for query answers. In this paper, we study the recall and effectiveness of LTBQE, in practice, for the Web of Data. Furthermore, to integrate data from diverse sources, we propose lightweight reasoning extensions to help find additional answers. From the state-of-the-art which (1) considers only dereferenceable information and (2) follows rdfs:seeAlso links, we propose extensions to consider (3) owl:sameAs links and reasoning, and (4) lightweight RDFS reasoning. We then estimate the recall of link-traversal query techniques in practice: we analyse a large crawl of the Web of Data (the BTC’11 dataset), looking at the ratio of raw data contained in dereferenceable documents vs. the corpus as a whole and determining how much more raw data our extensions make available for query answering. We then stress-test LTBQE (and our extensions) in real-world settings using the FedBench and DBpedia SPARQL Benchmark frameworks, and propose a novel benchmark called QWalk based on random
Towards An Approximative Ontology-Agnostic Approach for Logic Programs
- In Proc. of the 8th Intl. Symposium on Foundations of Information and Knowledge Systems
, 2014
"... Abstract. Distributional semantics focuses on the automatic construc-tion of a semantic model based on the statistical distribution of co-located words in large-scale texts. Deductive reasoning is a fundamental component for semantic understanding. Despite the generality and ex-pressivity of logical ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
Abstract. Distributional semantics focuses on the automatic construc-tion of a semantic model based on the statistical distribution of co-located words in large-scale texts. Deductive reasoning is a fundamental component for semantic understanding. Despite the generality and ex-pressivity of logical models, from an applied perspective, deductive rea-soners are dependent on highly consistent conceptual models, which lim-its the application of reasoners to highly heterogeneous and open domain knowledge sources. Additionally, logical reasoners may present scalabil-ity issues. This work focuses on advancing the conceptual and formal work on the interaction between distributional semantics and logic, fo-cusing on the introduction of a distributional deductive inference model for large-scale and heterogeneous knowledge bases. The proposed reason-ing model targets the following features: (i) an approximative ontology-agnostic reasoning approach for logical knowledge bases, (ii) the inclu-sion of large volumes of distributional semantics commonsense knowledge into the inference process and (iii) the provision of a principled geometric representation of the inference process.