Results 1 - 10
of
174
Entity Linking meets Word Sense Disambiguation: A Unified Approach
- Transactions of the Association for Computational Linguistics
, 2014
"... Entity Linking (EL) and Word Sense Disam-biguation (WSD) both address the lexical am-biguity of language. But while the two tasks are pretty similar, they differ in a fundamen-tal respect: in EL the textual mention can be linked to a named entity which may or may not contain the exact mention, while ..."
Abstract
-
Cited by 46 (17 self)
- Add to MetaCart
Entity Linking (EL) and Word Sense Disam-biguation (WSD) both address the lexical am-biguity of language. But while the two tasks are pretty similar, they differ in a fundamen-tal respect: in EL the textual mention can be linked to a named entity which may or may not contain the exact mention, while in WSD there is a perfect match between the word form (bet-ter, its lemma) and a suitable word sense. In this paper we present Babelfy, a unified graph-based approach to EL and WSD based on a loose identification of candidate mean-ings coupled with a densest subgraph heuris-tic which selects high-coherence semantic in-terpretations. Our experiments show state-of-the-art performances on both tasks on 6 differ-ent datasets, including a multilingual setting. Babelfy is online at
ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking
"... We tackle the problem of entity linking for large collections of online pages; Our system, ZenCrowd, identifies entities from natural language text using state of the art techniques and automatically connects them to the Linked Open Data cloud. We show how one can take advantage of human intelligenc ..."
Abstract
-
Cited by 43 (4 self)
- Add to MetaCart
(Show Context)
We tackle the problem of entity linking for large collections of online pages; Our system, ZenCrowd, identifies entities from natural language text using state of the art techniques and automatically connects them to the Linked Open Data cloud. We show how one can take advantage of human intelligence to improve the quality of the links by dynamically generating micro-tasks on an online crowdsourcing platform. We develop a probabilistic framework to make sensible decisions about candidate links and to identify unreliable human workers. We evaluate ZenCrowd in a real deployment and show how a combination of both probabilistic reasoning and crowdsourcing techniques can significantly improve the quality of the links, while limiting the amount of work performed by the crowd.
A Framework for Benchmarking Entity-Annotation Systems
"... In this paper we design and implement a benchmarking framework for fair and exhaustive comparison of entity-annotation systems. The framework is based upon the definition of a set of problems related to the entity-annotation task, a set of measures to evaluate systems performance, and a systematic c ..."
Abstract
-
Cited by 30 (1 self)
- Add to MetaCart
In this paper we design and implement a benchmarking framework for fair and exhaustive comparison of entity-annotation systems. The framework is based upon the definition of a set of problems related to the entity-annotation task, a set of measures to evaluate systems performance, and a systematic comparative evaluation involving all publicly available datasets, containing texts of various types such as news, tweets and Web pages. Our framework is easily-extensible with novel entity annotators, datasets and evaluation measures for comparing systems, and it has been released to the public as open source 1. We use this framework to perform the first extensive comparison among all available entity annotators over all available datasets, and draw many interesting conclusions upon their efficiency and effectiveness. We also draw conclusions between academic versus commercial annotators.
NERD meets NIF: Lifting NLP extraction results to the linked data cloud
- In LDOW
, 2012
"... We have often heard that data is the new oil. In particular, extracting information from semi-structured textual documents on the Web is key to realize the Linked Data vision. Several attempts have been proposed to extract knowledge from textual documents, extracting named entities, classifying them ..."
Abstract
-
Cited by 26 (6 self)
- Add to MetaCart
(Show Context)
We have often heard that data is the new oil. In particular, extracting information from semi-structured textual documents on the Web is key to realize the Linked Data vision. Several attempts have been proposed to extract knowledge from textual documents, extracting named entities, classifying them according to pre-defined taxonomies and disambiguating them through URIs identifying real world entities. As a step towards interconnecting the Web of documents via those entities, different extractors have been proposed. Although they share the same main purpose (extracting named entity), they differ from numerous aspects such as their underlying dictionary or ability to disambiguate entities. We have developed NERD, an API and a front-end user interface powered by an ontology to unify various named entity extractors. The unified result output is serialized in RDF according to the NIF specification and published back on the Linked Data cloud. We evaluated NERD with a dataset composed of five TED talk transcripts, a dataset composed of 1000 New York Times articles and a dataset composed of the 217 abstracts of the papers published at WWW 2011.
Unsupervised Generation of Data Mining Features from Linked Open Data
, 2011
"... The quality of the results of a data mining process strongly depends on the quality of the data it processes. In this paper, we present a fully automatic approach for enriching data with features that are derived from Linked Open Data, a very large, openly available data collection. We identify six ..."
Abstract
-
Cited by 25 (9 self)
- Add to MetaCart
(Show Context)
The quality of the results of a data mining process strongly depends on the quality of the data it processes. In this paper, we present a fully automatic approach for enriching data with features that are derived from Linked Open Data, a very large, openly available data collection. We identify six different types of feature generators, which are implemented in our open-source tool FeGeLOD. In four case studies, we show that our approach can be applied to different problems, ranging from classical data mining to ontology learning and ontology matching on the semantic web. The results show that features generated from publicly available information may allow data mining in problems where features are not
Integrating nlp using linked data
, 2013
"... Abstract. We are currently observing a plethora of Natural Language Processing tools and services being made available. Each of the tools and services has its particular strengths and weaknesses, but exploiting the strengths and synergistically combining different tools is currently an ex-tremely cu ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
(Show Context)
Abstract. We are currently observing a plethora of Natural Language Processing tools and services being made available. Each of the tools and services has its particular strengths and weaknesses, but exploiting the strengths and synergistically combining different tools is currently an ex-tremely cumbersome and time consuming task. Also, once a particular set of tools is integrated, this integration is not reusable by others. We argue that simplifying the interoperability of different NLP tools performing similar but also complementary tasks will facilitate the comparability of results and the creation of sophisticated NLP applications. In this paper, we present the NLP Interchange Format (NIF). NIF is based on a Linked Data enabled URI scheme for identifying elements in (hyper-)texts and an ontology for describing common NLP terms and concepts. In contrast to more centralized solutions such as UIMA and GATE, NIF enables the creation of heterogeneous, distributed and loosely coupled NLP applica-tions, which use the Web as an integration platform. We present several use cases of the second version of the NIF specification (NIF 2.0) and the result of a developer study.
Microblog-Genre Noise and Impact on Semantic Annotation Accuracy
"... Using semantic technologies for mining and intelligent information access to microblogs is a challenging, emerging research area. Unlike carefully authored news text and other longer content, tweets pose a number of new challenges, due to their short, noisy, contextdependent, and dynamic nature. Sem ..."
Abstract
-
Cited by 19 (9 self)
- Add to MetaCart
(Show Context)
Using semantic technologies for mining and intelligent information access to microblogs is a challenging, emerging research area. Unlike carefully authored news text and other longer content, tweets pose a number of new challenges, due to their short, noisy, contextdependent, and dynamic nature. Semantic annotation of tweets is typically performed in a pipeline, comprising successive stages of language identification, tokenisation, part-of-speech tagging, named entity recognition and entity disambiguation (e.g. with respect to DBpedia). Consequently, errors are cumulative, and earlier-stage problems can severely reduce the performance of final stages. This paper presents a characterisation of genre-specific problems at each semantic annotation stage and the impact on subsequent stages. Critically, we evaluate impact on two high-level semantic annotation tasks: named entity detection and disambiguation. Our results demonstrate the importance of making approaches specific to the genre, and indicate a diminishing returns effect that reduces the effectiveness of complex text normalisation.
DBpedia: A multilingual cross-domain knowledge base
, 2012
"... The DBpedia project extracts structured information from Wikipedia editions in 97 different languages and combines this information into a large multi-lingual knowledge base covering many specific domains and general world knowledge. The knowledge base contains textual descriptions (titles and abstr ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
The DBpedia project extracts structured information from Wikipedia editions in 97 different languages and combines this information into a large multi-lingual knowledge base covering many specific domains and general world knowledge. The knowledge base contains textual descriptions (titles and abstracts) of concepts in up to 97 languages. It also contains structured knowledge that has been extracted from the infobox systems of Wikipedias in 15 different languages and is mapped onto a single consistent ontology by a community effort. The knowledge base can be queried using a structured query language and all its data sets are freely available for download. In this paper, we describe the general DBpedia knowledge base and extended data sets that specifically aim at supporting computational linguistics tasks. These task include Entity Linking, Word Sense Disambiguation, Question Answering, Slot Filling and Relationship Extraction. These use cases are outlined, pointing at added value that the structured data of DBpedia provides. Keywords:Knowledge Base, Semantic Web, Ontology 1.
W.: A scalable approach for efficiently generating structured dataset topic profiles
, 2014
"... Abstract. The increasing adoption of Linked Data principles has led to an abundance of datasets on the Web. However, take-up and reuse is hindered by the lack of descriptive information about the nature of the data, such as their topic coverage, dynamics or evolution. To address this issue, we propo ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
(Show Context)
Abstract. The increasing adoption of Linked Data principles has led to an abundance of datasets on the Web. However, take-up and reuse is hindered by the lack of descriptive information about the nature of the data, such as their topic coverage, dynamics or evolution. To address this issue, we propose an approach for creating linked dataset profiles. A profile consists of structured dataset metadata describing topics and their relevance. Profiles are generated through the configuration of tech-niques for resource sampling from datasets, topic extraction from refer-ence datasets and their ranking based on graphical models. To enable a good trade-off between scalability and accuracy of generated profiles, appropriate parameters are determined experimentally. Our evaluation considers topic profiles for all accessible datasets from the Linked Open Data cloud. The results show that our approach generates accurate pro-files even with comparably small sample sizes (10%) and outperforms established topic modelling approaches.
GAF: A Grounded Annotation Framework for Events.
- In Proceedings of 1st EVENTS Workshop,
, 2013
"... Abstract This paper introduces GAF, a grounded annotation framework to represent events in a formal context that can represent information from both textual and extra-textual sources. GAF makes a clear distinction between mentions of events in text and their formal representation as instances in a ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
(Show Context)
Abstract This paper introduces GAF, a grounded annotation framework to represent events in a formal context that can represent information from both textual and extra-textual sources. GAF makes a clear distinction between mentions of events in text and their formal representation as instances in a semantic layer. Instances are represented by RDF compliant URIs that are shared across different research disciplines. This allows us to complete textual information with external sources and facilitates reasoning. The semantic layer can integrate any linguistic information and is compatible with previous event representations in NLP. Through a use case on earthquakes in Southeast Asia, we demonstrate GAF flexibility and ability to reason over events with the aid of extra-linguistic resources.