Results 1 - 10
of
14
Entity Linking meets Word Sense Disambiguation: A Unified Approach
- Transactions of the Association for Computational Linguistics
, 2014
"... Entity Linking (EL) and Word Sense Disam-biguation (WSD) both address the lexical am-biguity of language. But while the two tasks are pretty similar, they differ in a fundamen-tal respect: in EL the textual mention can be linked to a named entity which may or may not contain the exact mention, while ..."
Abstract
-
Cited by 46 (17 self)
- Add to MetaCart
(Show Context)
Entity Linking (EL) and Word Sense Disam-biguation (WSD) both address the lexical am-biguity of language. But while the two tasks are pretty similar, they differ in a fundamen-tal respect: in EL the textual mention can be linked to a named entity which may or may not contain the exact mention, while in WSD there is a perfect match between the word form (bet-ter, its lemma) and a suitable word sense. In this paper we present Babelfy, a unified graph-based approach to EL and WSD based on a loose identification of candidate mean-ings coupled with a densest subgraph heuris-tic which selects high-coherence semantic in-terpretations. Our experiments show state-of-the-art performances on both tasks on 6 differ-ent datasets, including a multilingual setting. Babelfy is online at
Statistical Analyses of Named Entity Disambiguation Benchmarks
"... Abstract. In the last years, various tools for automatic semantic annotation of textual information have emerged. The main challenge of all approaches is to solve ambiguity of natural language and assign unique semantic entities according to the present context. To compare the different approaches a ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
Abstract. In the last years, various tools for automatic semantic annotation of textual information have emerged. The main challenge of all approaches is to solve ambiguity of natural language and assign unique semantic entities according to the present context. To compare the different approaches a ground truth namely an annotated benchmark is essential. But, besides the actual disambiguation approach the achieved evaluation results are also dependent on the characteristics of the benchmark dataset and the expressiveness of the dictionary applied to determine entity candidates. This paper presents statistical analyses and mapping experiments on different benchmarks and dictionaries to identify characteristics and structure of the respective datasets.
Learning relatedness measures for entity linking
- In Proceedings of CIKM
, 2013
"... Entity Linking is the task of detecting, in text documents, relevant mentions to entities of a given knowledge base. To this end, entity-linking algorithms use several signals and features extracted from the input text or from the knowl-edge base. The most important of such features is entity relate ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
Entity Linking is the task of detecting, in text documents, relevant mentions to entities of a given knowledge base. To this end, entity-linking algorithms use several signals and features extracted from the input text or from the knowl-edge base. The most important of such features is entity relatedness. Indeed, we argue that these algorithms benefit from maximizing the relatedness among the relevant enti-ties selected for annotation, since this minimizes errors in disambiguating entity-linking. The definition of an e↵ective relatedness function is thus a crucial point in any entity-linking algorithm. In this paper we address the problem of learning high-quality entity relat-edness functions. First, we formalize the problem of learn-ing entity relatedness as a learning-to-rank problem. We propose a methodology to create reference datasets on the basis of manually annotated data. Finally, we show that our machine-learned entity relatedness function performs better than other relatedness functions previously proposed, and, more importantly, improves the overall performance of dif-ferent state-of-the-art entity-linking algorithms.
Knowledge harvesting from text and web sources
- In: Proceedings of International Conference On Data Engineering (ICDE
"... Abstract-The proliferation of knowledge-sharing communities such as Wikipedia and the progress in scalable information extraction from Web and text sources has enabled the automatic construction of very large knowledge bases. Recent endeavors of this kind include academic research projects such as ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Abstract-The proliferation of knowledge-sharing communities such as Wikipedia and the progress in scalable information extraction from Web and text sources has enabled the automatic construction of very large knowledge bases. Recent endeavors of this kind include academic research projects such as DBpedia, KnowItAll, Probase, ReadTheWeb, and YAGO, as well as industrial ones such as Freebase and Trueknowledge. These projects provide automatically constructed knowledge bases of facts about named entities, their semantic classes, and their mutual relationships. Such world knowledge in turn enables cognitive applications and knowledge-centric services like disambiguating natural-language text, deep question answering, and semantic search for entities and relations in Web and enterprise data. Prominent examples of how knowledge bases can be harnessed include the Google Knowledge Graph and the IBM Watson question answering system. This tutorial presents state-of-theart methods, recent advances, research opportunities, and open challenges along this avenue of knowledge harvesting and its applications.
Methods for Exploring and Mining Tables on Wikipedia
"... Knowledge bases extracted automatically from the Web present new opportunities for data mining and exploration. Given a large, heterogeneous set of extracted relations, new tools are needed for searching the knowledge and uncovering relationships of interest. We present WikiTables, a Web application ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Knowledge bases extracted automatically from the Web present new opportunities for data mining and exploration. Given a large, heterogeneous set of extracted relations, new tools are needed for searching the knowledge and uncovering relationships of interest. We present WikiTables, a Web application that enables users to interactively explore tabular knowledge extracted from Wikipedia. In experiments, we show that WikiTables substantially outperforms baselines on the novel task of automatically joining together disparate tables to uncover “interesting” relationships between table columns. We find that a “Semantic Relatedness ” measure that leverages the Wikipedia link structure accounts for a majority of this improvement. Further, on the task of keyword search for tables, we show that WikiTables performs comparably to Google Fusion Tables despite using an order of magnitude fewer tables. Our work also includes the release of a number of public resources, including over 15 million tuples of extracted tabular data, manually annotated evaluation sets, and public APIs. 1
Is Brad Pitt Related to Backstreet Boys? Exploring Related Entities
"... Abstract. More than 40 % web search queries pivot around a single en-tity, which can be interpreted with the available knowledge bases such as DBpedia and Freebase. With the advent of these knowledge repos-itories, major search engines are interested in suggesting the related entities to the users. ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract. More than 40 % web search queries pivot around a single en-tity, which can be interpreted with the available knowledge bases such as DBpedia and Freebase. With the advent of these knowledge repos-itories, major search engines are interested in suggesting the related entities to the users. These related entities provide an opportunity to the users to extend their knowledge. Related entities can be obtained from knowledge bases by retrieving the directly linked entities. However, many popular entities may have more than 1,000 directly connected enti-ties without defining any associativity weight, and these knowledge bases mainly cover some specific types of relations. For instance, “Brad Pitt” can be considered related to “Tom Cruise ” but they are not directly con-nected in DBpedia graph with any relation. Therefore, it is worth finding the related entities beyond the relations explicitly defined in knowledge graphs, and a ranking method is also required to quantify the relations between the entities. In this work, we present Entity Relatedness Graph (EnRG) that provides a ranked list of related entities to a given entity, and clusters them by their DBpedia types. Moreover, EnRG helps a user in discovering the provenance of implicit relations between two entities. With the help of EnRG, one can easily explore even the weak relations that are not explicitly included in the available knowledge bases, for instance, Why is “Brad Pitt ” related to “Backstreet Boys”? 1
SEARCHING TO TRANSLATE AND TRANSLATING TO SEARCH: WHEN INFORMATION RETRIEVAL MEETS MACHINE TRANSLATION
, 2013
"... With the adoption of web services in daily life, people have access to tremen-dous amounts of information, beyond any human’s reading and comprehension capabilities. As a result, search technologies have become a fundamental tool for accessing information. Furthermore, the web contains information i ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
With the adoption of web services in daily life, people have access to tremen-dous amounts of information, beyond any human’s reading and comprehension capabilities. As a result, search technologies have become a fundamental tool for accessing information. Furthermore, the web contains information in multiple languages, introducing another barrier between people and information. Therefore, search technologies need to handle content written in multiple languages, which requires techniques to account for the linguistic differences. Information Retrieval (IR) is the study of search techniques, in which the task is to find material relevant to a given information need. Cross-Language Information Retrieval (CLIR) is a special case of IR when the search takes place in a multi-lingual collection. Of course, it is not helpful to retrieve content in languages the user cannot understand. Machine Translation (MT) studies the translation of text from one language into another efficiently (within a reasonable amount of time) and effectively (fluent and retaining the original meaning), which helps people understand what is being written, regardless of the source language.
RESEARCH Open Access Calculating semantic relatedness for biomedical use in a knowledge-poor environment
"... Background: Computing semantic relatedness between textual labels representing biological and medical concepts is a crucial task in many automated knowledge extraction and processing applications relevant to the biomedical domain, specifically due to the huge amount of new findings being published e ..."
Abstract
- Add to MetaCart
(Show Context)
Background: Computing semantic relatedness between textual labels representing biological and medical concepts is a crucial task in many automated knowledge extraction and processing applications relevant to the biomedical domain, specifically due to the huge amount of new findings being published each year. Most methods benefit from making use of highly specific resources, thus reducing their usability in many real world scenarios that differ from the original assumptions. In this paper we present a simple resource-efficient method for calculating semantic relatedness in a knowledge-poor environment. The method obtains results comparable to state-of-the-art methods, while being more generic and flexible. The solution being presented here was designed to use only a relatively generic and small document corpus and its statistics, without referring to a previously defined knowledge base, thus it does not assume a ‘closed ’ problem. Results: We propose a method in which computation for two input texts is based on the idea of comparing the vocabulary associated with the best-fit documents related to those texts. As keyterm extraction is a costly process, it is done in a preprocessing step on a ‘per-document ’ basis in order to limit the on-line processing. The actual computations are executed in a compact vector space, limited by the most informative extraction results. The method has been evaluated on five direct benchmarks by calculating correlation coefficients w.r.t. average human
Ranking Entities in a Large Semantic Network
"... Abstract. We present two knowledge-rich methods for ranking entities in a semantic network. Our approach relies on the DBpedia knowledge base for acquiring fine-grained information about entities and their semantic relations. Experiments on a benchmarking dataset show the viability of our approach. ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. We present two knowledge-rich methods for ranking entities in a semantic network. Our approach relies on the DBpedia knowledge base for acquiring fine-grained information about entities and their semantic relations. Experiments on a benchmarking dataset show the viability of our approach.
Efficient Computation of Relationship-Centrality in Large Entity-Relationship Graphs
"... Abstract. Given two sets of entities – potentially the results of two queries on a knowledge-graph like YAGO or DBpedia – characterizing the relationship between these sets in the form of important people, events and organizations is an analytics task useful in many domains. In this paper, we presen ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. Given two sets of entities – potentially the results of two queries on a knowledge-graph like YAGO or DBpedia – characterizing the relationship between these sets in the form of important people, events and organizations is an analytics task useful in many domains. In this paper, we present an intuitive and efficiently computable vertex centrality measure that captures the importance of a node with respect to the explanation of the relationship between the pair of query sets. Using a weighted link graph of entities contained in the English Wikipedia, we demonstrate the usefulness of the proposed measure. 1