Results 1 -
3 of
3
A cross-lingual dictionary for English Wikipedia concepts
- In LREC
, 2012
"... We present a resource for automatically associating strings of text with English Wikipedia concepts. Our machinery is bi-directional, in the sense that it uses the same fundamental probabilistic methods to map strings to empirical distributions over Wikipedia articles as it does to map article URLs ..."
Abstract
-
Cited by 37 (2 self)
- Add to MetaCart
(Show Context)
We present a resource for automatically associating strings of text with English Wikipedia concepts. Our machinery is bi-directional, in the sense that it uses the same fundamental probabilistic methods to map strings to empirical distributions over Wikipedia articles as it does to map article URLs to distributions over short, language-independent strings of natural language text. For maximal interoperability, we release our resource as a set of flat line-based text files, lexicographically sorted and encoded with UTF-8. These files capture joint probability distributions underlying concepts (we use the terms article, concept and Wikipedia URL interchangeably) and associated snippets of text, as well as other features that can come in handy when working with Wikipedia articles and related information. Keywords: cross-language information retrieval (CLIR), entity linking (EL), Wikipedia. 1.
Stanford-UBC Entity Linking at TAC-KBP, Again
"... This paper describes the joint Stanford-UBC knowledge base population system for the entity linking tasks. We participated in both the English and the cross-lingual tasks, using a dictionary from strings to possible Wikipedia titles, taken from our 2009 submission. This dictionary is based on freque ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
This paper describes the joint Stanford-UBC knowledge base population system for the entity linking tasks. We participated in both the English and the cross-lingual tasks, using a dictionary from strings to possible Wikipedia titles, taken from our 2009 submission. This dictionary is based on frequencies of Wikipedia back-links, and it provides a strong context-independent baseline. For the English track, we improved on the results given by the dictionary by disambiguating entities using a distantly supervised classifier, trained on context extracted from Wikipedia. Since we did not use any text from the Wikipedia pages associated with the knowledge base nodes for the dictionary, we submitted that run to the no wiki text track, and the one using the distantly supervised classifier to the wiki text track. Our work focused on disambiguating among articles, allowing for very simple NIL strategies: the system returned NIL whenever selected Wikipedia articles were not present in the KB; moreover, NILs were then clustered only according to the target string. These simple approaches were sufficient for our runs to score above the median entry in each of their respective tracks for the English task; for the cross-lingual task, there was only one track, and our submissions (using the English-specific, context-independent dictionaries) fell below the median.
Cross-lingual named entity extraction and disambiguation
"... Abstract. We propose a method for the task of identifying and disambiguation of named entities in a scenario where the language of the input text differs from the language of the knowledge base. We demonstrate this functionality on English and Slovene named entity disambiguation ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. We propose a method for the task of identifying and disambiguation of named entities in a scenario where the language of the input text differs from the language of the knowledge base. We demonstrate this functionality on English and Slovene named entity disambiguation