Results 1 - 10
of
182
Improving the effectiveness of information retrieval with local context analysis.
- ACM Trans. Inf. Syst.,
, 2000
"... Techniques for automatic query expansion have been extensively studied in information retrieval research as a means of addressing the word mismatch between queries and documents. These techniques can be categorized as either global or local. While global techniques rely on analysis of a whole colle ..."
Abstract
-
Cited by 201 (5 self)
- Add to MetaCart
Techniques for automatic query expansion have been extensively studied in information retrieval research as a means of addressing the word mismatch between queries and documents. These techniques can be categorized as either global or local. While global techniques rely on analysis of a whole collection to discover word relationships, local techniques emphasize analysis of the top-ranked documents retrieved for a query. While local techniques have shown to be more effective than global techniques in general, existing local techniques are not robust and can seriously hurt retrieval when few of the retrieved documents are relevant. We propose a new technique, called local context analysis, which selects expansion terms based on cooccurrence with the query terms within the top-ranked documents. Experiments on a number of collections, both English and non-English, show that local context analysis offers more effective and consistent retrieval results.
Resolving Ambiguity for Cross-language Retrieval
- In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, 1998
"... One of the main hurdles to improved CLIR effectiveness is resolving ambiguity associated with translation. Availability of resources is also a problem. First we present a technique based on co-occurrence statistics from unlinked corpora which can be used to reduce the ambiguity associated with phras ..."
Abstract
-
Cited by 200 (4 self)
- Add to MetaCart
(Show Context)
One of the main hurdles to improved CLIR effectiveness is resolving ambiguity associated with translation. Availability of resources is also a problem. First we present a technique based on co-occurrence statistics from unlinked corpora which can be used to reduce the ambiguity associated with phrasal and term translation. We then combine this method with other techniques for reducing ambiguity and achieve more than 90% monolingual effectiveness. Finally, we compare the co-occurrence method with parallel corpus and machine translation techniques and show that good retrieval effectiveness can be achieved without complex resources. 1
Phrasal Translation and Query Expansion Techniques for Cross-Language Information Retrieval
- In Proceedings of the 20th International ACM SIGIR Conference on Research and Development in Information Retrieval
, 1997
"... Dictionary methods for cross-language information retrieval give performance below that for mono-lingual retrieval. Failure to translate multi-term phrases has been shown to be one of the factors responsible for the errors associated with dictionary methods. First, we study the importance of phrasal ..."
Abstract
-
Cited by 199 (3 self)
- Add to MetaCart
(Show Context)
Dictionary methods for cross-language information retrieval give performance below that for mono-lingual retrieval. Failure to translate multi-term phrases has been shown to be one of the factors responsible for the errors associated with dictionary methods. First, we study the importance of phrasal translation for this approach. Second, we explore the role of phrases in query expansion via local context analysis and local feedback and show how they can be used to significantly reduce the error associated with automatic dictionary translation. 1 Introduction The development of IR systems for languages other than English has focused on building mono-lingual systems. Increased availability of on-line text in languages other than English and increased multi-national collaboration have motivated research in cross-language information retrieval (CLIR) - the development of systems to perform retrieval across languages. There have been three main approaches to CLIR: translation via machine t...
Stemming Algorithms - A Case Study for Detailed Evaluation
- Journal of the American Society for Information Science
, 1996
"... The majority of information retrieval experiments are evaluated by measures such as average precision and average recall. Fundamental decisions about the superiority of one retrieval technique over another are made solely on the basis of these measures. We claim that average performance figures n ..."
Abstract
-
Cited by 182 (4 self)
- Add to MetaCart
The majority of information retrieval experiments are evaluated by measures such as average precision and average recall. Fundamental decisions about the superiority of one retrieval technique over another are made solely on the basis of these measures. We claim that average performance figures need to be validated with a careful statistical analysis and that there is a great deal of additional information that can be uncovered by looking closely at the results of individual queries. This paper is a case study of stemming algorithms which describes a number of novel approaches to evaluation and demonstrates their value.
Probabilistic query expansion using query logs
- In Proceedings of WWW’02
, 2002
"... Query expansion has long been suggested as an effective way to resolve the short query and word mismatching problems. A number of query expansion methods have been proposed in traditional information retrieval. However, these previous methods do not take into account the specific characteristics of ..."
Abstract
-
Cited by 162 (4 self)
- Add to MetaCart
(Show Context)
Query expansion has long been suggested as an effective way to resolve the short query and word mismatching problems. A number of query expansion methods have been proposed in traditional information retrieval. However, these previous methods do not take into account the specific characteristics of web searching; in particular, of the availability of large amount of user interaction information recorded in the web query logs. In this study, we propose a new method for query expansion based on query logs. The central idea is to extract probabilistic correlations between query terms and document terms by analyzing query logs. These correlations are then used to select high-quality expansion terms for new queries. The experimental results show that our log-based probabilistic query expansion method can greatly improve the search performance and has several advantages over other existing methods.
Corpus-Based Stemming using Co-occurrence of Word Variants
- ACM Transactions on Information Systems
, 1998
"... Stemming is used in many information retrieval (IR) systems to reduce variant word forms to common roots. It is one of the simplest applications of natural language processing to IR, and one of the most effective in terms of user acceptance and consistent, though small, retrieval improvements. Cu ..."
Abstract
-
Cited by 114 (2 self)
- Add to MetaCart
Stemming is used in many information retrieval (IR) systems to reduce variant word forms to common roots. It is one of the simplest applications of natural language processing to IR, and one of the most effective in terms of user acceptance and consistent, though small, retrieval improvements. Current stemming techniques do not, however, reflect the language use in specific corpora and this can lead to occasional serious retrieval failures. We propose a technique for using corpus-based word variant co-occurrence statistics to modify or create a stemmer. The experimental results generated using English newspaper and legal text and Spanish text demonstrate the viability of this technique and its advantages relative to conventional approaches. Categories and Subject Descriptors: H.3.1. [Information Storage and Retrieval]: Content Analysis and Indexing -- indexing methods; linguistic processing; H.3.3. [Information Storage and Retrieval]: Information Search and Retrieval -- query f...
Effective Retrieval with Distributed Collections
, 1998
"... This paper evaluates the retrieval effectiveness of distributed information retrieval systems in realistic environments. We find that when a large number of collections are available, the retrieval effectiveness is significantly worse than that of centralized systems, mainly because typical queries ..."
Abstract
-
Cited by 112 (12 self)
- Add to MetaCart
(Show Context)
This paper evaluates the retrieval effectiveness of distributed information retrieval systems in realistic environments. We find that when a large number of collections are available, the retrieval effectiveness is significantly worse than that of centralized systems, mainly because typical queries are not adequate for the purpose of choosing the right collections. We propose two techniques to address the problem. One is to use phrase information in the collection selection index and the other is query expansion. Both techniques enhance the discriminatory power of typical queries for choosing the right collections and hence significantly improve retrieval results. Query expansion, in particular, brings the effectiveness of searching a large set of distributed collections close to that of searching a centralized collection. 1 Introduction In today's network environments, information is highly distributed. The Internet or World Wide Web, for example, contains thousands of collections. ...
Information Retrieval Based on Word Senses
, 1995
"... This paper proposes an algorithm for word sense disambiguation based on a vector representation of word similarity derived from lexical co-occurrence. It differs from standard approaches by allowing for as fine grained distinctions as is warranted by the information at hand, rather than supposing a ..."
Abstract
-
Cited by 104 (0 self)
- Add to MetaCart
(Show Context)
This paper proposes an algorithm for word sense disambiguation based on a vector representation of word similarity derived from lexical co-occurrence. It differs from standard approaches by allowing for as fine grained distinctions as is warranted by the information at hand, rather than supposing a fixed number of senses per word, and by allowing for more than one sense to be assigned to a given word occur-rance. The algorithm is applied to the standard vectorspace information retrieval model and an evaluation is performed over the Category B TREC-1 corpus (WSJ subcollection). Results show that this sense disambiguation algorithm improves performance by between 7o and 1o on aver-age.
Latent Semantic Analysis for Text Segmentation
- In Proceedings of EMNLP
, 2001
"... This paper describes a method for linear text segmentation that is more accurate or at least as accurate as state-of-the-art methods (Utiyama and Isahara, 2001 ..."
Abstract
-
Cited by 95 (1 self)
- Add to MetaCart
This paper describes a method for linear text segmentation that is more accurate or at least as accurate as state-of-the-art methods (Utiyama and Isahara, 2001
Dictionary Methods for Cross-Lingual Information Retrieval
- IN PROCEEDINGS OF THE 7TH INTERNATIONAL DEXA CONFERENCE ON DATABASE AND EXPERT SYSTEMS APPLICATIONS
, 1996
"... Multi-lingual information retrieval (IR) has largely been limited to the development of systems for use with a specific foreign language. The explosion in the availability of electronic media in languages other than English makes the development of IR systems that can cross language boundaries incre ..."
Abstract
-
Cited by 88 (5 self)
- Add to MetaCart
(Show Context)
Multi-lingual information retrieval (IR) has largely been limited to the development of systems for use with a specific foreign language. The explosion in the availability of electronic media in languages other than English makes the development of IR systems that can cross language boundaries increasingly important. In this paper, we present experiments that analyze the factors that affect dictionary based methods for cross-lingual retrieval and present methods that dramatically reduce the errors such an approach usually makes.