Results 1 -
6 of
6
Discovering Linkage Points over Web Data
"... A basic step in integration is the identification of linkage points, i.e., finding attributes that are shared (or related) between data sources, and that can be used to match records or entities across sources. This is usually performed using a match operator, that associates attributes of one datab ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
(Show Context)
A basic step in integration is the identification of linkage points, i.e., finding attributes that are shared (or related) between data sources, and that can be used to match records or entities across sources. This is usually performed using a match operator, that associates attributes of one database to another. However, the massive growth in the amount and variety of unstructured and semistructured data on the Web has created new challenges for this task. Such data sources often do not have a fixed pre-defined schema and contain large numbers of diverse attributes. Furthermore, the end goal is not schema alignment as these schemas may be too heterogeneous (and dynamic) to meaningfully align. Rather, the goal is to align any overlapping data shared by these sources. We will show that even attributes with different meanings (that would not qualify as schema matches) can sometimes be useful in aligning data. The solution we propose in this paper replaces the basic schemamatching step with a more complex instance-based schema analysis and linkage discovery. We present a framework consisting of a library of efficient lexical analyzers and similarity functions, and a set of search algorithms for effective and efficient identification of linkage points over Web data. We experimentally evaluate the effectiveness of our proposed algorithms in real-world integration scenarios in several domains.
Statistical knowledge patterns: Identifying synonymous relations in large linked datasets
- In The 12th International Semantic Web Conference and the 1st Australasian Semantic Web Conference
, 2013
"... Abstract. The Web of Data is a rich common resource with billions of triples available in thousands of datasets and individual Web documents created by both expert and non-expert ontologists. A common problem is the imprecision in the use of vocabularies: annotators can misunderstand the semantics o ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
(Show Context)
Abstract. The Web of Data is a rich common resource with billions of triples available in thousands of datasets and individual Web documents created by both expert and non-expert ontologists. A common problem is the imprecision in the use of vocabularies: annotators can misunderstand the semantics of a class or property or may not be able to find the right objects to annotate with. This decreases the quality of data and may eventually hamper its usability over large scale. This paper describes Statistical Knowledge Patterns (SKP) as a means to address this issue. SKPs encapsulate key information about ontology classes, including synonymous properties in (and across) datasets, and are automatically generated based on statistical data analysis. SKPs can be effectively used to automatically normalise data, and hence increase recall in querying. Both pattern extraction and pattern usage are completely automated. The main benefits of SKPs are that: (1) their structure allows for both accurate query expansion and restriction; (2) they are context dependent, hence they describe the usage and meaning of properties in the context of a particular class; and (3) they can be generated offline, hence the equivalence among relations can be used efficiently at run time. 1
Locality-Sensitive Hashing for Massive String-Based Ontology Matching
"... Abstract-This paper reports initial research results related to the use of locality-sensitive hashing (LSH) for string-based matching of big ontologies. Two ways of transforming the matching problem into a LSH problem are proposed and experimental results are reported. The performed experiments sho ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract-This paper reports initial research results related to the use of locality-sensitive hashing (LSH) for string-based matching of big ontologies. Two ways of transforming the matching problem into a LSH problem are proposed and experimental results are reported. The performed experiments show that using LSH for ontology matching could lead to a very fast matching process. The quality of the alignment achieved in these experiments is comparable to state-of-the-art matchers, but much faster. Further research is needed to find out whether the use of different metrics or specific hardware would improve the results.
Understanding a Large Corpus of Web Tables Through Matching with Knowledge Bases - An Empirical Study
"... Abstract. Extracting and analyzing the vast amount of structured tabular data available on the Web is a challenging task and has received a significant attention in the past few years. In this paper, we present the results of our analysis of the contents of a large corpus of over 90 million Web ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. Extracting and analyzing the vast amount of structured tabular data available on the Web is a challenging task and has received a significant attention in the past few years. In this paper, we present the results of our analysis of the contents of a large corpus of over 90 million Web
A Two-step Blocking Scheme Learner for Scalable Link Discovery
"... Abstract. A two-step procedure for learning a link-discovery blocking scheme is presented. Link discovery is the problem of linking entities be-tween two or more datasets. Identifying owl:sameAs links is an impor-tant, special case. A blocking scheme is a one-to-many mapping from en-tities to blocks ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. A two-step procedure for learning a link-discovery blocking scheme is presented. Link discovery is the problem of linking entities be-tween two or more datasets. Identifying owl:sameAs links is an impor-tant, special case. A blocking scheme is a one-to-many mapping from en-tities to blocks. Blocking methods avoid O(n2) comparisons by clustering entities into blocks, and limiting the evaluation of link specifications to entity pairs within blocks. Current link-discovery blocking methods use blocking schemes tailored for owl:sameAs links or that rely on assump-tions about the underlying link specifications. The presented framework learns blocking schemes for arbitrary link specifications. The first step of the algorithm is unsupervised and performs dataset mapping between a pair of dataset collections. The second supervised step learns blocking schemes on structurally heterogeneous dataset pairs. Application to RDF is accomplished by representing the RDF dataset in property table form. The method is empirically evaluated on four real-world test collections ranging over various domains and tasks.