• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

KORE: Keyphrase Overlap Relatedness for Entity Disambiguation. CIKM (2012)

by J Hoffart
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 14
Next 10 →

Entity Linking meets Word Sense Disambiguation: A Unified Approach

by Andrea Moro, Ro Raganato, Roberto Navigli - Transactions of the Association for Computational Linguistics , 2014
"... Entity Linking (EL) and Word Sense Disam-biguation (WSD) both address the lexical am-biguity of language. But while the two tasks are pretty similar, they differ in a fundamen-tal respect: in EL the textual mention can be linked to a named entity which may or may not contain the exact mention, while ..."
Abstract - Cited by 46 (17 self) - Add to MetaCart
Entity Linking (EL) and Word Sense Disam-biguation (WSD) both address the lexical am-biguity of language. But while the two tasks are pretty similar, they differ in a fundamen-tal respect: in EL the textual mention can be linked to a named entity which may or may not contain the exact mention, while in WSD there is a perfect match between the word form (bet-ter, its lemma) and a suitable word sense. In this paper we present Babelfy, a unified graph-based approach to EL and WSD based on a loose identification of candidate mean-ings coupled with a densest subgraph heuris-tic which selects high-coherence semantic in-terpretations. Our experiments show state-of-the-art performances on both tasks on 6 differ-ent datasets, including a multilingual setting. Babelfy is online at
(Show Context)

Citation Context

... of on-line social networking services, such as Twitter and Facebook, have contributed to the development of new methods for the efficient disambiguation of short texts (Ferragina and Scaiella, 2010; =-=Hoffart et al., 2012-=-; Böhm et al., 2012). Thanks to a loose candidate identification technique coupled with a densest subgraph heuristic, we show that our approach is particularly suited for short and highly ambiguous t...

Statistical Analyses of Named Entity Disambiguation Benchmarks

by Nadine Steinmetz, Magnus Knuth, Harald Sack
"... Abstract. In the last years, various tools for automatic semantic annotation of textual information have emerged. The main challenge of all approaches is to solve ambiguity of natural language and assign unique semantic entities according to the present context. To compare the different approaches a ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
Abstract. In the last years, various tools for automatic semantic annotation of textual information have emerged. The main challenge of all approaches is to solve ambiguity of natural language and assign unique semantic entities according to the present context. To compare the different approaches a ground truth namely an annotated benchmark is essential. But, besides the actual disambiguation approach the achieved evaluation results are also dependent on the characteristics of the benchmark dataset and the expressiveness of the dictionary applied to determine entity candidates. This paper presents statistical analyses and mapping experiments on different benchmarks and dictionaries to identify characteristics and structure of the respective datasets.
(Show Context)

Citation Context

...ere was also a more specific entity enlisted occupying their solely possible location, e. g. hypertext markup language has been annotated with dbp:HTML rather than dbp:Markup_language. KORE 50 (AIDA) =-=[7]-=- is a subset of the larger AIDA corpus [8], which is based on the dataset of the CoNLL 2003 NER task. The dataset aims to capture hard to disambiguate mentions of entities and it contains a large numb...

Learning relatedness measures for entity linking

by Diego Ceccarelli, Claudio Lucchese, Salvatore Orl, Raffaele Perego, Salvatore Trani - In Proceedings of CIKM , 2013
"... Entity Linking is the task of detecting, in text documents, relevant mentions to entities of a given knowledge base. To this end, entity-linking algorithms use several signals and features extracted from the input text or from the knowl-edge base. The most important of such features is entity relate ..."
Abstract - Cited by 4 (2 self) - Add to MetaCart
Entity Linking is the task of detecting, in text documents, relevant mentions to entities of a given knowledge base. To this end, entity-linking algorithms use several signals and features extracted from the input text or from the knowl-edge base. The most important of such features is entity relatedness. Indeed, we argue that these algorithms benefit from maximizing the relatedness among the relevant enti-ties selected for annotation, since this minimizes errors in disambiguating entity-linking. The definition of an e↵ective relatedness function is thus a crucial point in any entity-linking algorithm. In this paper we address the problem of learning high-quality entity relat-edness functions. First, we formalize the problem of learn-ing entity relatedness as a learning-to-rank problem. We propose a methodology to create reference datasets on the basis of manually annotated data. Finally, we show that our machine-learned entity relatedness function performs better than other relatedness functions previously proposed, and, more importantly, improves the overall performance of dif-ferent state-of-the-art entity-linking algorithms.
(Show Context)

Citation Context

... word is represented in a high dimensional space by considering for each Wikipedia article the relevances of the word in the article, and by summing such score vectors for longer text fragments. Also =-=[13]-=- investigates new text-based relatedness measures that try to go beyond link-based similarities. The study conducted in [18] shows however that ESA has a performance similar to that of ⇢Milne, with th...

Knowledge harvesting from text and web sources

by Fabian Suchanek , Gerhard Weikum - In: Proceedings of International Conference On Data Engineering (ICDE
"... Abstract-The proliferation of knowledge-sharing communities such as Wikipedia and the progress in scalable information extraction from Web and text sources has enabled the automatic construction of very large knowledge bases. Recent endeavors of this kind include academic research projects such as ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
Abstract-The proliferation of knowledge-sharing communities such as Wikipedia and the progress in scalable information extraction from Web and text sources has enabled the automatic construction of very large knowledge bases. Recent endeavors of this kind include academic research projects such as DBpedia, KnowItAll, Probase, ReadTheWeb, and YAGO, as well as industrial ones such as Freebase and Trueknowledge. These projects provide automatically constructed knowledge bases of facts about named entities, their semantic classes, and their mutual relationships. Such world knowledge in turn enables cognitive applications and knowledge-centric services like disambiguating natural-language text, deep question answering, and semantic search for entities and relations in Web and enterprise data. Prominent examples of how knowledge bases can be harnessed include the Google Knowledge Graph and the IBM Watson question answering system. This tutorial presents state-of-theart methods, recent advances, research opportunities, and open challenges along this avenue of knowledge harvesting and its applications.
(Show Context)

Citation Context

...n answering system. This tutorial presents state-of-theart methods, recent advances, research opportunities, and open challenges along this avenue of knowledge harvesting and its applications. I. MOTIVATION Knowledge harvesting from Web and text sources has become a major research avenue in the last five years. It is the core methodology for the automatic construction of large knowledge bases [1], [2], going beyond manually compiled knowledge collections like Cyc [13], WordNet [9], and a variety of ontologies [17]. Salient projects with publicly available resources include KnowItAll [7], [4], [8], ConceptNet (MIT) [16], DBpedia [3], Freebase [5], NELL [6], WikiTaxonomy [15], and YAGO [18], [11] (our own project at the Max Planck Institute for Informatics). Commercial interest has been strongly growing, with evidence by projects like the Google Knowledge Graph, the EntityCube (Renlifang) project at Microsoft Research [14], and the use of public knowledge bases for type coercion in IBM’s Watson project [12]. These knowledge bases contain many millions of entities, organized in hundreds to hundred thousands of semantic classes, and hundred millions of relational facts between entities. A...

Methods for Exploring and Mining Tables on Wikipedia

by Ra Sekhar Bhagavatula, Thanapon Noraset, Doug Downey
"... Knowledge bases extracted automatically from the Web present new opportunities for data mining and exploration. Given a large, heterogeneous set of extracted relations, new tools are needed for searching the knowledge and uncovering relationships of interest. We present WikiTables, a Web application ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
Knowledge bases extracted automatically from the Web present new opportunities for data mining and exploration. Given a large, heterogeneous set of extracted relations, new tools are needed for searching the knowledge and uncovering relationships of interest. We present WikiTables, a Web application that enables users to interactively explore tabular knowledge extracted from Wikipedia. In experiments, we show that WikiTables substantially outperforms baselines on the novel task of automatically joining together disparate tables to uncover “interesting” relationships between table columns. We find that a “Semantic Relatedness ” measure that leverages the Wikipedia link structure accounts for a majority of this improvement. Further, on the task of keyword search for tables, we show that WikiTables performs comparably to Google Fusion Tables despite using an order of magnitude fewer tables. Our work also includes the release of a number of public resources, including over 15 million tuples of extracted tabular data, manually annotated evaluation sets, and public APIs. 1
(Show Context)

Citation Context

...between Wikipedia topics using the link graph [11, 20] is especially valuable for the relevant join task. Experimenting with other SR measures that utilize other inputs, beyond Wikipedia links (e.g., =-=[21, 22]-=-), is an item of future work. 3 Task Definitions In this section, we formally define our three tasks: relevant join, correlation mining and table search. All three tasks utilize a corpus of tables ext...

Is Brad Pitt Related to Backstreet Boys? Exploring Related Entities

by Nitish Aggarwal, Kartik Asooja, Paul Buitelaar, Gabriela Vulcu
"... Abstract. More than 40 % web search queries pivot around a single en-tity, which can be interpreted with the available knowledge bases such as DBpedia and Freebase. With the advent of these knowledge repos-itories, major search engines are interested in suggesting the related entities to the users. ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Abstract. More than 40 % web search queries pivot around a single en-tity, which can be interpreted with the available knowledge bases such as DBpedia and Freebase. With the advent of these knowledge repos-itories, major search engines are interested in suggesting the related entities to the users. These related entities provide an opportunity to the users to extend their knowledge. Related entities can be obtained from knowledge bases by retrieving the directly linked entities. However, many popular entities may have more than 1,000 directly connected enti-ties without defining any associativity weight, and these knowledge bases mainly cover some specific types of relations. For instance, “Brad Pitt” can be considered related to “Tom Cruise ” but they are not directly con-nected in DBpedia graph with any relation. Therefore, it is worth finding the related entities beyond the relations explicitly defined in knowledge graphs, and a ranking method is also required to quantify the relations between the entities. In this work, we present Entity Relatedness Graph (EnRG) that provides a ranked list of related entities to a given entity, and clusters them by their DBpedia types. Moreover, EnRG helps a user in discovering the provenance of implicit relations between two entities. With the help of EnRG, one can easily explore even the weak relations that are not explicitly included in the available knowledge bases, for instance, Why is “Brad Pitt ” related to “Backstreet Boys”? 1
(Show Context)

Citation Context

...ilne [10] applied Google distance metric [3] on incoming links to Wikipedia. These approaches perform only for the entities which appear on 2 http://en.wikipedia.org/wiki/A._J._McLean Wikipedia. KORE =-=[5]-=- eliminates this issue by computing the relatedness scores between the context of two entities. It observes the partial overlaps between the concepts (key-phrases) appearing in the contexts of the giv...

SEARCHING TO TRANSLATE AND TRANSLATING TO SEARCH: WHEN INFORMATION RETRIEVAL MEETS MACHINE TRANSLATION

by Ferhan Ture , 2013
"... With the adoption of web services in daily life, people have access to tremen-dous amounts of information, beyond any human’s reading and comprehension capabilities. As a result, search technologies have become a fundamental tool for accessing information. Furthermore, the web contains information i ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
With the adoption of web services in daily life, people have access to tremen-dous amounts of information, beyond any human’s reading and comprehension capabilities. As a result, search technologies have become a fundamental tool for accessing information. Furthermore, the web contains information in multiple languages, introducing another barrier between people and information. Therefore, search technologies need to handle content written in multiple languages, which requires techniques to account for the linguistic differences. Information Retrieval (IR) is the study of search techniques, in which the task is to find material relevant to a given information need. Cross-Language Information Retrieval (CLIR) is a special case of IR when the search takes place in a multi-lingual collection. Of course, it is not helpful to retrieve content in languages the user cannot understand. Machine Translation (MT) studies the translation of text from one language into another efficiently (within a reasonable amount of time) and effectively (fluent and retaining the original meaning), which helps people understand what is being written, regardless of the source language.

RESEARCH Open Access Calculating semantic relatedness for biomedical use in a knowledge-poor environment

by Maciej Rybinski, José Francisco Aldana-montes
"... Background: Computing semantic relatedness between textual labels representing biological and medical concepts is a crucial task in many automated knowledge extraction and processing applications relevant to the biomedical domain, specifically due to the huge amount of new findings being published e ..."
Abstract - Add to MetaCart
Background: Computing semantic relatedness between textual labels representing biological and medical concepts is a crucial task in many automated knowledge extraction and processing applications relevant to the biomedical domain, specifically due to the huge amount of new findings being published each year. Most methods benefit from making use of highly specific resources, thus reducing their usability in many real world scenarios that differ from the original assumptions. In this paper we present a simple resource-efficient method for calculating semantic relatedness in a knowledge-poor environment. The method obtains results comparable to state-of-the-art methods, while being more generic and flexible. The solution being presented here was designed to use only a relatively generic and small document corpus and its statistics, without referring to a previously defined knowledge base, thus it does not assume a ‘closed ’ problem. Results: We propose a method in which computation for two input texts is based on the idea of comparing the vocabulary associated with the best-fit documents related to those texts. As keyterm extraction is a costly process, it is done in a preprocessing step on a ‘per-document ’ basis in order to limit the on-line processing. The actual computations are executed in a compact vector space, limited by the most informative extraction results. The method has been evaluated on five direct benchmarks by calculating correlation coefficients w.r.t. average human
(Show Context)

Citation Context

...he best fit documents. The comparison of vectors results in a numerical approximation of semantic similarity between the respective textual representations. This idea is similar to those presented in =-=[18]-=- (general domain) and in [19], only here the vectors are obtained for each document in the preprocessing and then aggregated at runtime for each lexicalization in the function of the most suitable doc...

Ranking Entities in a Large Semantic Network

by Michael Schuhmacher , Simone Paolo Ponzetto
"... Abstract. We present two knowledge-rich methods for ranking entities in a semantic network. Our approach relies on the DBpedia knowledge base for acquiring fine-grained information about entities and their semantic relations. Experiments on a benchmarking dataset show the viability of our approach. ..."
Abstract - Add to MetaCart
Abstract. We present two knowledge-rich methods for ranking entities in a semantic network. Our approach relies on the DBpedia knowledge base for acquiring fine-grained information about entities and their semantic relations. Experiments on a benchmarking dataset show the viability of our approach.
(Show Context)

Citation Context

...mantic Network 3 Path-based Graph-based baseline jointIC combIC IC+PMI baseline jointIC combIC IC+PMI Hollywood Celebr. 0.639 0.541 0.690 0.661 0.439 0.506 0.417 0.401 IT Companies 0.559 0.636 0.644 0.583 0.355 0.446 0.298 0.278 Television Series 0.529 0.595 0.643 0.602 0.302 0.473 0.300 0.280 Video Games 0.451 0.562 0.532 0.484 0.552 0.519 0.434 0.424 Chuck Norris 0.458 0.409 0.558 0.506 0.448 0.544 0.425 0.291 All 0.541 0.575 0.624 0.579 0.414 0.489 0.365 0.343 Table 1. Performance on the entity ranking KORE dataset. 3 Experiments Experimental setting. We use the KORE entity ranking dataset [2], consisting of 21 different reference entities from four different domains. Relatedness assessments were obtained using a crowd-sourcing approach. We evaluate using Spearman’s rank correlation (ρ) and DBpedia 3.8 as knowledge base. Results and discussion. The results in Table 1 indicate that weighing paths based on their information content (as introduced in [4]) consistently outperforms a baseline approach that simply computes entity relatedness as a function of distance in the network. In the case of the path-based approach, the best weighting schema is combIC, which achieves an average inc...

Efficient Computation of Relationship-Centrality in Large Entity-Relationship Graphs

by Stephan Seufert, Srikanta J. Bedathur, Johannes Hoffart, Andrey Gubichev, Klaus Berberich
"... Abstract. Given two sets of entities – potentially the results of two queries on a knowledge-graph like YAGO or DBpedia – characterizing the relationship between these sets in the form of important people, events and organizations is an analytics task useful in many domains. In this paper, we presen ..."
Abstract - Add to MetaCart
Abstract. Given two sets of entities – potentially the results of two queries on a knowledge-graph like YAGO or DBpedia – characterizing the relationship between these sets in the form of important people, events and organizations is an analytics task useful in many domains. In this paper, we present an intuitive and efficiently computable vertex centrality measure that captures the importance of a node with respect to the explanation of the relationship between the pair of query sets. Using a weighted link graph of entities contained in the English Wikipedia, we demonstrate the usefulness of the proposed measure. 1
(Show Context)

Citation Context

...he corresponding edge-weighting schemes we envision can be based on the graph structure (Milne-Witten inlink overlap measure [6]) or textual representations of the entities (keyphrase overlap measure =-=[4]-=-), among others. Relationship centrality takes into account different paths than betweenness centrality (which regards the shortest paths between all pairs of vertices). For every vertex in the graph ...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University