• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Strong baselines for cross-lingual entity linking (0)

by V I Spitkovsky, A X Chang
Add To MetaCart

Tools

Sorted by:
Results 1 - 3 of 3

A cross-lingual dictionary for English Wikipedia concepts

by Valentin I. Spitkovsky, Angel X. Chang - In LREC , 2012
"... We present a resource for automatically associating strings of text with English Wikipedia concepts. Our machinery is bi-directional, in the sense that it uses the same fundamental probabilistic methods to map strings to empirical distributions over Wikipedia articles as it does to map article URLs ..."
Abstract - Cited by 37 (2 self) - Add to MetaCart
We present a resource for automatically associating strings of text with English Wikipedia concepts. Our machinery is bi-directional, in the sense that it uses the same fundamental probabilistic methods to map strings to empirical distributions over Wikipedia articles as it does to map article URLs to distributions over short, language-independent strings of natural language text. For maximal interoperability, we release our resource as a set of flat line-based text files, lexicographically sorted and encoded with UTF-8. These files capture joint probability distributions underlying concepts (we use the terms article, concept and Wikipedia URL interchangeably) and associated snippets of text, as well as other features that can come in handy when working with Wikipedia articles and related information. Keywords: cross-language information retrieval (CLIR), entity linking (EL), Wikipedia. 1.
(Show Context)

Citation Context

...irre et al., 2009, §2.2). Many other low-level implementation details are in the rest of its section about the dictionary (Agirre et al., 2009, §2) and in the latest, crosslingual system description (=-=Spitkovsky and Chang, 2011-=-). 4. From Strings to Concepts Let us first discuss using the dictionary as a mapping from stringss to canonical URLs of English Wikipedia concepts. Table 1 shows the scores of all entries that match ...

Stanford-UBC Entity Linking at TAC-KBP, Again

by Angel X. Chang, Valentin I. Spitkovsky, Eneko Agirre, Christopher D. Manning
"... This paper describes the joint Stanford-UBC knowledge base population system for the entity linking tasks. We participated in both the English and the cross-lingual tasks, using a dictionary from strings to possible Wikipedia titles, taken from our 2009 submission. This dictionary is based on freque ..."
Abstract - Cited by 4 (2 self) - Add to MetaCart
This paper describes the joint Stanford-UBC knowledge base population system for the entity linking tasks. We participated in both the English and the cross-lingual tasks, using a dictionary from strings to possible Wikipedia titles, taken from our 2009 submission. This dictionary is based on frequencies of Wikipedia back-links, and it provides a strong context-independent baseline. For the English track, we improved on the results given by the dictionary by disambiguating entities using a distantly supervised classifier, trained on context extracted from Wikipedia. Since we did not use any text from the Wikipedia pages associated with the knowledge base nodes for the dictionary, we submitted that run to the no wiki text track, and the one using the distantly supervised classifier to the wiki text track. Our work focused on disambiguating among articles, allowing for very simple NIL strategies: the system returned NIL whenever selected Wikipedia articles were not present in the KB; moreover, NILs were then clustered only according to the target string. These simple approaches were sufficient for our runs to score above the median entry in each of their respective tracks for the English task; for the cross-lingual task, there was only one track, and our submissions (using the English-specific, context-independent dictionaries) fell below the median.
(Show Context)

Citation Context

...ular using more sophisticated NIL-clustering techniques. For cross-lingual entity linking, our monolingual components did less well, but a related approach presented by the Stanford team fared better =-=[SC11]-=-. Their cross-lingual dictionaries and associated components are in the process of being released [SC12]. 2 However, several conceptually-simple extensions to how our monolingual dictionaries were con...

Cross-lingual named entity extraction and disambiguation

by Tadej Štajner, Dunja Mladenić
"... Abstract. We propose a method for the task of identifying and disambiguation of named entities in a scenario where the language of the input text differs from the language of the knowledge base. We demonstrate this functionality on English and Slovene named entity disambiguation ..."
Abstract - Add to MetaCart
Abstract. We propose a method for the task of identifying and disambiguation of named entities in a scenario where the language of the input text differs from the language of the knowledge base. We demonstrate this functionality on English and Slovene named entity disambiguation
(Show Context)

Citation Context

...verage, but it performed much worse than a monolingual scenario. Another simple baseline uses the equivalent of just using the context-independent ‘mention popularity’ measure, backed by a dictionary =-=[2]-=-. The dictionary can be constructed from looking at anchor texts from non-English to English Wikipedia pages. An ideal system would be the one that would simply translate the document in the desired l...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University