Results 1 - 10
of
153
Introduction to the special issue on word sense disambiguation
- Computational Linguistics J
, 1998
"... ..."
Measures of Distributional Similarity
- In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics
, 1999
"... We study distributional similarity measures for the purpose of improving probability estimation for unseen cooccurrences. Our contributions are three-fold: an empirical comparison of a broad range of measures; a classification of similarity functions based on the information that they incorporate; a ..."
Abstract
-
Cited by 173 (2 self)
- Add to MetaCart
We study distributional similarity measures for the purpose of improving probability estimation for unseen cooccurrences. Our contributions are three-fold: an empirical comparison of a broad range of measures; a classification of similarity functions based on the information that they incorporate; and the introduction of a novel function that is superior at evaluating potential proxy distributions.
Forgetting Exceptions is Harmful in Language Learning
- MACHINE LEARNING, SPECIAL ISSUE ON NATURAL LANGUAGE LEARNING
, 1999
"... We show that in language learning, contrary to received wisdom, keeping exceptional training instances in memory can be beneficial for generalization accuracy. We investigate this phenomenon empirically on a selection of benchmark natural language processing tasks: grapheme-to-phoneme conversion, pa ..."
Abstract
-
Cited by 94 (38 self)
- Add to MetaCart
We show that in language learning, contrary to received wisdom, keeping exceptional training instances in memory can be beneficial for generalization accuracy. We investigate this phenomenon empirically on a selection of benchmark natural language processing tasks: grapheme-to-phoneme conversion, part-of-speech tagging, prepositional-phrase attachment, and base noun phrase chunking. In a first series of experiments we combine memory-based learning with training set editing techniques, in which instances are edited based on their typicality and class prediction strength. Results show that editing exceptional instances (with low typicality or low class prediction strength) tends to harm generalization accuracy. In a second series of experiments we compare memory-based learning and decision-tree learning methods on the same selection of tasks, and find that decision-tree learning often performs worse than memory-based learning. Moreover, the decrease in performance can be linked to the degree of abstraction from exceptions (i.e., pruning or eagerness). We provide explanations for both results in terms of the properties of the natural language processing tasks and the learning algorithms.
Distinguishing Systems and Distinguishing Senses: New Evaluation Methods for Word Sense Disambiguation
, 1998
"... Resnik and Yarowsky (1997) made a set of observations about the state of the art in automatic word sense disambiguation and, motivated by those observations, offered several specific proposals regarding improved evaluation criteria, common training and testing resources, and the definition of sense ..."
Abstract
-
Cited by 88 (8 self)
- Add to MetaCart
Resnik and Yarowsky (1997) made a set of observations about the state of the art in automatic word sense disambiguation and, motivated by those observations, offered several specific proposals regarding improved evaluation criteria, common training and testing resources, and the definition of sense inventories. Subsequent discussion of those proposals resulted in senseval, the first evaluation exercise for word sense disambiguation (Kilgarriff and Palmer forthcoming). This article is a revised and extended version of our 1997 workshop paper, reviewing its observations and proposals and discussing them in light of the senseval exercise. It also includes a new in-depth empirical study of translingually-based sense inventories and distance measures, using statistics collected from native-speaker annotations of 222 polysemous contexts across 12 languages. These data show that monolingual sense distinctions at most levels of granularity can be effectively captured by translations into some ...
Relational Learning Techniques for Natural Language Information Extraction
, 1998
"... The recent growth of online information available in the form of natural language documents creates a greater need for computing systems with the ability to process those documents to simplify access to the information. One type of processing appropriate for many tasks is information extraction, a t ..."
Abstract
-
Cited by 73 (4 self)
- Add to MetaCart
The recent growth of online information available in the form of natural language documents creates a greater need for computing systems with the ability to process those documents to simplify access to the information. One type of processing appropriate for many tasks is information extraction, a type of text skimming that retrieves specific types of information from text. Although information extraction systems have existed for two decades, these systems have generally been built by hand and contain domain specific information, making them difficult to port to other domains. A few researchers have begun to apply machine learning to information extraction tasks, but most of this work has involved applying learning to pieces of a much larger system. This paper presents a novel rule representation specific to natural language and a learning system, Rapier, which learns information extraction rules. Rapier takes pairs of documents and filled templates indicating the information to be ext...
Similarity-based models of word cooccurrence probabilities
- Machine Learning
, 1999
"... Abstract. In many applications of natural language processing (NLP) it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations “eat a peach ” and “eat a beach ” is more likely. Statistical NLP met ..."
Abstract
-
Cited by 70 (0 self)
- Add to MetaCart
Abstract. In many applications of natural language processing (NLP) it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations “eat a peach ” and “eat a beach ” is more likely. Statistical NLP methods determine the likelihood of a word combination from its frequency in a training corpus. However, the nature of language is such that many word combinations are infrequent and do not occur in any given corpus. In this work we propose a method for estimating the probability of such previously unseen word combinations using available information on “most similar ” words. We describe probabilistic word association models based on distributional word similarity, and apply them to two tasks, language modeling and pseudo-word disambiguation. In the language modeling task, a similarity-based model is used to improve probability estimates for unseen bigrams in a back-off language model. The similaritybased method yields a 20 % perplexity improvement in the prediction of unseen bigrams and statistically significant reductions in speech-recognition error. We also compare four similarity-based estimation methods against back-off and maximum-likelihood estimation methods on a pseudo-word sense disambiguation task in which we controlled for both unigram and bigram frequency to avoid giving too much weight to easy-to-disambiguate high-frequency configurations. The similaritybased methods perform up to 40 % better on this particular task.
Distinguishing Word Senses in Untagged Text
- In Proceedings of the Second Conference on Empirical Methods in Natural Language Processing
"... This paper describes an experimental com- parison of three unsupervised learning algorithms that distinguish the sense of an ambiguous word in untagged text. ..."
Abstract
-
Cited by 59 (15 self)
- Add to MetaCart
This paper describes an experimental com- parison of three unsupervised learning algorithms that distinguish the sense of an ambiguous word in untagged text.
The Interaction of Knowledge Sources for Word Sense Disambiguation
- Computational Linguistics
, 2001
"... Word sense disambiguation (WSD) is a computational linguistics task likely to benefit from the tradition of combining different knowledge sources in artificial in telligence research. An important step in the exploration of this hypothesis is to determine which linguistic knowledge sources are most ..."
Abstract
-
Cited by 58 (2 self)
- Add to MetaCart
Word sense disambiguation (WSD) is a computational linguistics task likely to benefit from the tradition of combining different knowledge sources in artificial in telligence research. An important step in the exploration of this hypothesis is to determine which linguistic knowledge sources are most useful and whether their combination leads to improved results. We present a sense tagger which uses several knowledge sources. Tested accuracy exceeds 94 % on our evaluation corpus. Our system attempts to disambiguate all content words in running text rather than limiting itself to treating a restricted vocabulary of words. It is argued that this approach is more likely to assist the creation of practical systems. 1.
A Method for Word Sense Disambiguation of Unrestricted Text
, 1999
"... Selecting the most appropriate sense for an ambiguous word in a sentence is a central problem in Natural Language Processing. In this paper, we present a method that attempts to disambiguate all the nouns, verbs, adverbs and adjectives in a text, using the senses pro- vided in WordNet. The senses ar ..."
Abstract
-
Cited by 57 (6 self)
- Add to MetaCart
Selecting the most appropriate sense for an ambiguous word in a sentence is a central problem in Natural Language Processing. In this paper, we present a method that attempts to disambiguate all the nouns, verbs, adverbs and adjectives in a text, using the senses pro- vided in WordNet. The senses are ranked us- ing two sources of information: (1) the Inter- net for gathering statistics for word-word co- occurrences and (2)'WordNet for measuring the semantic density for a pair of words. We report an average accuracy of 80% for the first ranked sense, and 91% for the first two ranked senses. Extensions of this method for larger windows of more than two words are considered.
Wikify!: linking documents to encyclopedic knowledge
- In CIKM ’07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
, 2007
"... This paper introduces the use of Wikipedia as a resource for automatic keyword extraction and word sense disambiguation, and shows how this online encyclopedia can be used to achieve state-of-the-art results on both these tasks. The paper also shows how the two methods can be combined into a system ..."
Abstract
-
Cited by 57 (3 self)
- Add to MetaCart
This paper introduces the use of Wikipedia as a resource for automatic keyword extraction and word sense disambiguation, and shows how this online encyclopedia can be used to achieve state-of-the-art results on both these tasks. The paper also shows how the two methods can be combined into a system able to automatically enrich a text with links to encyclopedic knowledge. Given an input document, the system identifies the important concepts in the text and automatically links these concepts to the corresponding Wikipedia pages. Evaluations of the system show that the automatic annotations are reliable and hardly distinguishable from manual annotations. providing the users a quick way of accessing additional information. Wikipedia contributors perform these annotations by hand following a Wikipedia“manual of style,”which gives guidelines concerning the selection of important concepts in a text, as well as the assignment of links to appropriate related articles. For instance, Figure 1 shows an example of a Wikipedia page, including the definition for one of the meanings of the word “plant.”

