Results 1 - 10
of
74
Automatic Word Sense Discrimination
- Journal of Computational Linguistics
, 1998
"... This paper presents context-group discrimination, a disambiguation algorithm based on clustering. Senses are interpreted as groups (or clusters) of similar contexts of the ambiguous word. Words, contexts, and senses are represented in Word Space, a high-dimensional, real-valued space in which closen ..."
Abstract
-
Cited by 272 (0 self)
- Add to MetaCart
This paper presents context-group discrimination, a disambiguation algorithm based on clustering. Senses are interpreted as groups (or clusters) of similar contexts of the ambiguous word. Words, contexts, and senses are represented in Word Space, a high-dimensional, real-valued space in which closeness corresponds to semantic similarity. Similarity in Word Space is based on second-order co-occurrence: two tokens (or contexts) of the ambiguous word are assigned to the same sense cluster if the words they co-occur with in turn occur with similar words in a training corpus. The algorithm is automatic and unsupervised in both training and application: senses are induced from a corpus without labeled training insta,nces or other external knowledge sources. The paper demonstrates good performance of context-group discrimination for a sample of natural and artificial ambiguous words
Introduction to the special issue on word sense disambiguation
- Computational Linguistics J
, 1998
"... ..."
Indexing with WordNet synsets can improve text retrieval
, 1998
"... The classical, vector space model for text retrieval is shown to give better results (up to 29% better in our experiments) ff WordNet synsets are chosen as the indexing space, instead of word forms. This resuit is obtained for a manually disambiguated test collection (of queries and documents) deriv ..."
Abstract
-
Cited by 110 (2 self)
- Add to MetaCart
The classical, vector space model for text retrieval is shown to give better results (up to 29% better in our experiments) ff WordNet synsets are chosen as the indexing space, instead of word forms. This resuit is obtained for a manually disambiguated test collection (of queries and documents) derived from the SEMCOR semantic concordance. The sensitiv- ity of retrieval performance to (automatic) disambiguation errors when indexing documents is also measured. Finally, it is observed that ff queries are not disambiguated, indexing by synsets performs (at best) only as good as standard word indexing.
Word sense disambiguation: The state of the art
- Computational Linguistics
, 1998
"... The automatic disambiguation of word senses has been an interest and concern since the earliest days of computer treatment of language in the 1950's. Sense disambiguation is an “intermediate task ” (Wilks and Stevenson, 1996) which is not an end in itself, but rather is necessary at one level or ano ..."
Abstract
-
Cited by 92 (3 self)
- Add to MetaCart
The automatic disambiguation of word senses has been an interest and concern since the earliest days of computer treatment of language in the 1950's. Sense disambiguation is an “intermediate task ” (Wilks and Stevenson, 1996) which is not an end in itself, but rather is necessary at one level or another to accomplish most natural language processing tasks. It is
Using wordnet in a knowledge-based approach to information retrieval
, 1995
"... Abstract: The application of natural language processing tools and techniques to information retrieval tasks has long since been identified as potentially useful for the quality of information retrieval. Traditionally, IR has been based on matching words or terms in a query with words or terms in a ..."
Abstract
-
Cited by 67 (0 self)
- Add to MetaCart
Abstract: The application of natural language processing tools and techniques to information retrieval tasks has long since been identified as potentially useful for the quality of information retrieval. Traditionally, IR has been based on matching words or terms in a query with words or terms in a document. In this paper we introduce an approach to IR based on computing a semantic distance measurement between concepts or words and using this word distance to compute a similarity between a query and a document. Two such semantic distance measures are presented in this paper and both are benchmarked on queries and documents from the TREC collection. Although our results in terms of precision and recall are disappointing, we rationalise this in terms of our experimental setup and our results show promise for future work in this area. 1
Using NLP or NLP Resources for Information Retrieval Tasks
- Natural Language Information Retrieval
, 1997
"... The imact of NLP on information retrieval tasks has largely been one of promise rather than substance. While there are exceptions to this as some of the chapters in the present volume demonstrate, for the most part NLP and information retrieval have only recently started to dovetail together. In thi ..."
Abstract
-
Cited by 32 (1 self)
- Add to MetaCart
The imact of NLP on information retrieval tasks has largely been one of promise rather than substance. While there are exceptions to this as some of the chapters in the present volume demonstrate, for the most part NLP and information retrieval have only recently started to dovetail together. In this chapter we will present a pr'ecis of our experiments in information retrieval using NLP which have had mixed successover the last few years. We introduce the respective roles of NLP and IR and then we summarise our early experiments on using syntactic analysis to derive term dependencies and structured representations of term-term relationships. We then re-thought the role that NLP could have for IR tasks and decided to concentrate our efforts onto using NLP resources rather than NLP tools in information retrieval and our more recent experiments in this area in which we use WordNet are summarised. Finally we present our conclusions and the status of our work. 1 2. Introduction The develo...
Semantic Indexing using WordNet Senses
- IN PROCEEDINGS OF ACL WORKSHOP ON IR & NLP, HONG KONG
, 2000
"... We describe in this paper a boolean Information Retrieval system that adds word semantics to the classic word based indexing. Two of the main tasks of our system, namely the indexing and retrieval components, are using a combined wordbased and sense-based approach. The key to ..."
Abstract
-
Cited by 30 (3 self)
- Add to MetaCart
We describe in this paper a boolean Information Retrieval system that adds word semantics to the classic word based indexing. Two of the main tasks of our system, namely the indexing and retrieval components, are using a combined wordbased and sense-based approach. The key to
Word sense disambiguation: a survey
- ACM COMPUTING SURVEYS
, 2009
"... Word sense disambiguation (WSD) is the ability to identify the meaning of words in context in a computational manner. WSD is considered an AI-complete problem, that is, a task whose solution is at least as hard as the most difficult problems in artificial intelligence. We introduce the reader to the ..."
Abstract
-
Cited by 28 (9 self)
- Add to MetaCart
Word sense disambiguation (WSD) is the ability to identify the meaning of words in context in a computational manner. WSD is considered an AI-complete problem, that is, a task whose solution is at least as hard as the most difficult problems in artificial intelligence. We introduce the reader to the motivations for solving the ambiguity of words and provide a description of the task. We overview supervised, unsupervised, and knowledge-based approaches. The assessment of WSD systems is discussed in the context of the Senseval/Semeval campaigns, aiming at the objective evaluation of systems participating in several different disambiguation tasks. Finally, applications, open problems, and future directions are discussed.
Natural Language Processing and Information Retrieval
- Information Extraction: Towards Scalable, Adaptable Systems
, 1999
"... . Information retrieval addresses the problem of finding those documents whose content matches a user's request from among a large collection of documents. Currently, the most successful general purpose retrieval methods are statistical methods that treat text as little more than a bag of words. ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
. Information retrieval addresses the problem of finding those documents whose content matches a user's request from among a large collection of documents. Currently, the most successful general purpose retrieval methods are statistical methods that treat text as little more than a bag of words. However, attempts to improve retrieval performance through more sophisticated linguistic processing have been largely unsuccessful. Indeed, unless done carefully, such processing can degrade retrieval effectiveness. Several factors contribute to the difficulty of improving on a good statistical baseline including: the forgiving nature but broad coverage of the typical retrieval task; the lack of good weighting schemes for compound index terms; and the implicit linguistic processing inherent in the statistical methods. Natural language processing techniques may be more important for related tasks such as question answering or document summarization. 1 Introduction Imagine that you...
HyperLex: Lexical Cartography for Information Retrieval
- TO APPEAR IN COMPUTER SPEECH AND LANGUAGE SPECIAL ISSUE ON WORD SENSE DISAMBIGUATION
"... This article describes an algorithm called HyperLex that is capable of automatically determining word uses in a textbase without recourse to a dictionary. The algorithm makes use of the specific properties of word cooccurrence graphs, which are shown as having "small world" properties. Unl ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
This article describes an algorithm called HyperLex that is capable of automatically determining word uses in a textbase without recourse to a dictionary. The algorithm makes use of the specific properties of word cooccurrence graphs, which are shown as having "small world" properties. Unlike earlier dictionary-free methods based on word vectors, it can isolate highly infrequent uses (as rare as 1 % of all occurrences) by detecting "hubs " and high-density components in the cooccurrence graphs. The algorithm is applied here to information retrieval on the Web, using a set of highly ambiguous test words. An evaluation of the algorithm showed that it only omitted a very small number of relevant uses. In addition, HyperLex offers automatic tagging of word uses in context with excellent precision (97%, compared to 73 % for baseline tagging, with an 82 % recall rate). Remarkably good precision (96%) was also achieved on a selection of the 25 most relevant pages for each use (including highly infrequent ones). Finally, HyperLex is combined with a graphic display technique that allows the user to navigate visually through the lexicon and explore the various domains detected for each word use.

