Results 1 - 10
of
16
Integration of semantically annotated data by the knofuss architecture
- of Lecture Notes in Computer Science
, 2008
"... Abstract. Most of the existing work on information integration in the Semantic Web concentrates on resolving schema-level problems. Specific issues of data-level integration (instance coreferencing, conflict resolu-tion, handling uncertainty) are usually tackled by applying the same techniques as fo ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
(Show Context)
Abstract. Most of the existing work on information integration in the Semantic Web concentrates on resolving schema-level problems. Specific issues of data-level integration (instance coreferencing, conflict resolu-tion, handling uncertainty) are usually tackled by applying the same techniques as for ontology schema matching or by reusing the solutions produced in the database domain. However, data structured according to OWL ontologies has its specific features: e.g., the classes are organized into a hierarchy, the properties are inherited, data constraints differ from those defined by database schema. This paper describes how these fea-tures are exploited in our architecture KnoFuss, designed to support data-level integration of semantic annotations. 1
Handling instance coreferencing in the KnoFuss architecture
"... Abstract. Finding RDF individuals that refer to the same real-world entities but have different URIs is necessary for the efficient use of data across sources. The requirements for such instance-level integration of RDF data are different from both database record linkage and ontology schema matchin ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Finding RDF individuals that refer to the same real-world entities but have different URIs is necessary for the efficient use of data across sources. The requirements for such instance-level integration of RDF data are different from both database record linkage and ontology schema matching scenarios. Flexible configuration and reuse of different methods is needed to achieve good performance. Our data integration architecture, called KnoFuss, implements a component-based approach, which allows flexible selection and tuning of methods and takes the ontological schemata into account to improve the reusability of methods. 1
L.: Scaling record linkage to non-uniform distributed class sizes
- Proceedings of the 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining
, 2008
"... Abstract. Record linkage is a central task when information from different sources is integrated. Record linkage models use so-called blockers for reducing the search space by discarding obviously different record pairs. In practice, important problems have Zipf distributed class sizes with some lar ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
(Show Context)
Abstract. Record linkage is a central task when information from different sources is integrated. Record linkage models use so-called blockers for reducing the search space by discarding obviously different record pairs. In practice, important problems have Zipf distributed class sizes with some large classes where blocking is not applicable any more. Therefore we propose two novel meta algorithms for scaling arbitrary record linkage models to such data sets. The first one parallelizes problems by creating overlapping subproblems and the second one reduces the search space for large classes effectively. Our evaluation shows that both scaling techniques are effective and are able to scale state-of-the-art models to challenging datasets. 1
Disambiguating Identity through Social Circles and Social Data
"... Abstract: This paper presents an approach to disambiguate extracted identity information relating to different individuals through the use of social circles. Social circles are generated through the extraction and pruning of social networks using the analysis of existing social data. Social data enc ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Abstract: This paper presents an approach to disambiguate extracted identity information relating to different individuals through the use of social circles. Social circles are generated through the extraction and pruning of social networks using the analysis of existing social data. Social data encompasses information such as images, videos and blogs shared within a social network. Identity information is extracted by involving the user in both selecting their key identity features for disambiguation, and validating the retrieved information. Our approach provides a methodology to monitor existing identity information, applicable to addressing such issues as identity theft, online fraud and lateral surveillance.
XMedia: Web People Search by Clustering with Machinely Learned Similarity Measures
"... In this paper we present an approach to person name disambiguation that clusters documents on the basis of textual features using cosine similarity and a machinely learned meta similarity measure. The approach achieves an F-measure of B-Cubed Precision and Recall of 0.74 1 on the Clustering Subtask ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
(Show Context)
In this paper we present an approach to person name disambiguation that clusters documents on the basis of textual features using cosine similarity and a machinely learned meta similarity measure. The approach achieves an F-measure of B-Cubed Precision and Recall of 0.74 1 on the Clustering Subtask for WePS-2. Such task consists of clustering a set of documents that mention an ambiguous person name according to the actual entities referred to that name.
Scalable Event-based Clustering of Social Media via Record Linkage Techniques
"... We tackle the problem of grouping content available in social media applications such as Flickr, Youtube, Panoramino etc. into clusters of documents describing the same event. This task has been referred to as event identification before. We present a new formalization of the event identification ta ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We tackle the problem of grouping content available in social media applications such as Flickr, Youtube, Panoramino etc. into clusters of documents describing the same event. This task has been referred to as event identification before. We present a new formalization of the event identification task as a record linkage problem and show that this formulation leads to a principled and highly efficient solution to the problem. We present results on two datasets derived from Flickr – last.fm and upcoming – comparing the results in terms of Normalized Mutual Information and F-Measure with respect to several baselines, showing that a record linkage approach outperforms all baselines as well as a stateof-the-art system. We demonstrate that our approach can scale to large amounts of data, reducing the processing time considerably compared to a state-of-theart approach. The scalability is achieved by applying an appropriate blocking strategy and relying on a Single Linkage clustering algorithm which avoids the exhaustive computation of pairwise similarities. 1
Relational Classification Using Automatically Extracted Relations by Record Linkage
"... Abstract. Relational classifiers often outperform traditional classifiers which assume that objects are independent. For applying relational classifiers relations are required. Since data often is noisy and unstructured, these relations are not given explicitly, but need to be extracted. In this pap ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract. Relational classifiers often outperform traditional classifiers which assume that objects are independent. For applying relational classifiers relations are required. Since data often is noisy and unstructured, these relations are not given explicitly, but need to be extracted. In this paper we show a framework for relational classification that first automatically extracts relations from such a noisy database and then applies relational classifiers. For extracting relations we use techniques from the field of record linkage that learn the characteristics for a relation from pairwise features. With the proposed framework relational classifiers can be applied without requiring manual annotation. 1
Active Learning of Equivalence Relations by Minimizing the Expected Loss Using Constraint Inference
"... Selecting promising queries is the key to effective active learning. In this paper, we investigate selection techniques for the task of learning an equivalence relation where the queries are about pairs of objects. As the target relation satisfies the axioms of transitivity, from one queried pair ad ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Selecting promising queries is the key to effective active learning. In this paper, we investigate selection techniques for the task of learning an equivalence relation where the queries are about pairs of objects. As the target relation satisfies the axioms of transitivity, from one queried pair additional constraints can be inferred. We derive both the upper and lower bound on the number of queries needed to converge to the optimal solution. Besides restricting the set of possible solutions, constraints can be used as training data for learning a similarity measure. For selecting queries that result in a large number of meaningful constraints, we present an approximative optimal selection technique that greedily minimizes the expected loss in each round of active learning. This technique makes use of inference of expected constraints. Besides the theoretical results, an extensive evaluation for the application of record linkage shows empirically that the proposed selection method leads to both interesting and a high number of constraints. 1
Version: Accepted Manuscript Link(s) to article on publisher’s website:
"... Copyright and Moral Rights for the articles on this site are retained by the individual authors and/or other copyright owners. For more information on Open Research Online’s data policy on reuse of materials please consult the policies page. oro.open.ac.uk Integration of semantically annotated data ..."
Abstract
- Add to MetaCart
(Show Context)
Copyright and Moral Rights for the articles on this site are retained by the individual authors and/or other copyright owners. For more information on Open Research Online’s data policy on reuse of materials please consult the policies page. oro.open.ac.uk Integration of semantically annotated data by the KnoFuss architecture