Results 1 - 10
of
107
Improving automatic query expansion
, 1998
"... Abstract Most casual users of IR systems type short queries. Recent research has shown that adding new words to these queries via odhoc feedback improves the re-trieval effectiveness of such queries. We investigate ways to improve this query expansion process by refining the set of documents used in ..."
Abstract
-
Cited by 286 (4 self)
- Add to MetaCart
Abstract Most casual users of IR systems type short queries. Recent research has shown that adding new words to these queries via odhoc feedback improves the re-trieval effectiveness of such queries. We investigate ways to improve this query expansion process by refining the set of documents used in feedback. We start by using manually formulated Boolean filters along with proxim-ity constraints. Our approach is similar to the one pro-posed by Hearst[l2]. Next, we investigate a completely automatic method that makes use of term cooccurrence information to estimate word correlation. Experimental results show that refining the set of documents used in query expansion often prevents the query drift caused by blind expansion and yields substantial improvements in retrieval effectiveness, both in terms of average preci-sion and precision in the top twenty documents. More importantly, the fully automatic approach developed in this study performs competitively with the best manual approach and requires little computational overhead. 1
How Reliable are the Results of Large-Scale Information Retrieval Experiments?
- Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, 1998
"... Two stages in measurement of techniques for information retrieval are gathering of documents for relevance assessment and use of the assessments to numerically evaluate e#ectiveness. We consider both of these stages in the context of the TREC experiments, to determine whether they lead to measuremen ..."
Abstract
-
Cited by 156 (4 self)
- Add to MetaCart
(Show Context)
Two stages in measurement of techniques for information retrieval are gathering of documents for relevance assessment and use of the assessments to numerically evaluate e#ectiveness. We consider both of these stages in the context of the TREC experiments, to determine whether they lead to measurements that are trustworthy and fair. Our detailed empirical investigation of the TREC results shows that the measured relative performance of systems appears to be reliable, but that recall is overestimated: it is likely that many relevant documents have not been found. We propose a new pooling strategy that can significantly increase the number of relevant documents found for given e#ort, without compromising fairness.
Advantages of query biased summaries in information retrieval
- In Proceedings of ACM SIGIR
, 1998
"... www.dcs.gla.ac.uk/-tombrosa / www-ciir.cs.umass.edu/-sanderso/ Abstract This paper presents an investigation into the utility of document summarisation in the context of information retrieval, more specifically in the application of so called query biased (or user directed) summaries: summaries cust ..."
Abstract
-
Cited by 151 (7 self)
- Add to MetaCart
(Show Context)
www.dcs.gla.ac.uk/-tombrosa / www-ciir.cs.umass.edu/-sanderso/ Abstract This paper presents an investigation into the utility of document summarisation in the context of information retrieval, more specifically in the application of so called query biased (or user directed) summaries: summaries customised to reflect the information need expressed in a query. Employed in the retrieved document list displayed after a retrieval took place, the summaries ’ utility was evaluated in a task-based environment by measuring users ’ speed and accuracy in identifying relevant documents. This was compared to the performance achieved when users were presented with the more typical output of an IR system: a static predefined summary composed of the title and first few sentences of retrieved documents. The results from the evaluation indicate that the use of query biased summaries significantly improves both the accuracy and speed of user relevance judgements. 1
Matrices, vector spaces, and information retrieval
- SIAM Review
, 1999
"... Abstract. The evolution of digital libraries and the Internet has dramatically transformed the processing, storage, and retrieval of information. Efforts to digitize text, images, video, and audio now consume a substantial portion of both academic and industrial activity. Even when there is no short ..."
Abstract
-
Cited by 143 (3 self)
- Add to MetaCart
(Show Context)
Abstract. The evolution of digital libraries and the Internet has dramatically transformed the processing, storage, and retrieval of information. Efforts to digitize text, images, video, and audio now consume a substantial portion of both academic and industrial activity. Even when there is no shortage of textual materials on a particular topic, procedures for indexing or extracting the knowledge or conceptual information contained in them can be lacking. Recently developed information retrieval technologies are based on the concept of a vector space. Data are modeled as a matrix, and a user’s query of the database is represented as a vector. Relevant documents in the database are then identified via simple vector operations. Orthogonal factorizations of the matrix provide mechanisms for handling uncertainty in the database itself. The purpose of this paper is to show how such fundamental mathematical concepts from linear algebra can be used to manage and index large text collections. Key words. information retrieval, linear algebra, QR factorization, singular value decomposition, vector spaces
Overview of the Sixth Text REtrieval Conference (TREC-6)
- The Fifth Text REtrieval Conference (TREC-5). NIST Special Publication 500-238, National Institute of Standards and Technology
, 1998
"... This paper serves as an introduction to the research described in detail in the remainder of the volume. The next section defines the common retrieval tasks performed in TREC-6. Sections 3 and 4 provide details regarding the test collections and the evaluation methodology used in TREC. Section 5 pro ..."
Abstract
-
Cited by 114 (2 self)
- Add to MetaCart
This paper serves as an introduction to the research described in detail in the remainder of the volume. The next section defines the common retrieval tasks performed in TREC-6. Sections 3 and 4 provide details regarding the test collections and the evaluation methodology used in TREC. Section 5 provides an overview of the retrieval results. The final section summarizes the main themes learned from the experiments.
Experimental components for the evaluation of interactive information retrieval systems
- Journal of Documentation
, 2000
"... 1988, no part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying or otherwise without the prior written permission of the publisher. ..."
Abstract
-
Cited by 108 (1 self)
- Add to MetaCart
(Show Context)
1988, no part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying or otherwise without the prior written permission of the publisher.
The TIPSTER SUMMAC Text Summarization Evaluation
, 1999
"... The TIPSTER Text Summarization Evaluation (SUMMAC) has established definitively that automatic text summarization is very effective in relevance as- sessment tasks. Summaries as short as 17% of full text length sped up decision- making by almost a factor of 2 with no statistically significant degrad ..."
Abstract
-
Cited by 88 (1 self)
- Add to MetaCart
The TIPSTER Text Summarization Evaluation (SUMMAC) has established definitively that automatic text summarization is very effective in relevance as- sessment tasks. Summaries as short as 17% of full text length sped up decision- making by almost a factor of 2 with no statistically significant degradation in F- score acuracy. SUMMAC has also in- troduced a new intrinsic method for automated evaluation of informative sum- maries.
Measuring Search Engine Quality
- Information Retrieval
"... Abstract. The effectiveness of twenty public search engines is evaluated using TREC-inspired methods and a set of 54 queries taken from real Web search logs. The World Wide Web is taken as the test collection and a combination of crawler and text retrieval system is evaluated. The engines are compa ..."
Abstract
-
Cited by 84 (9 self)
- Add to MetaCart
Abstract. The effectiveness of twenty public search engines is evaluated using TREC-inspired methods and a set of 54 queries taken from real Web search logs. The World Wide Web is taken as the test collection and a combination of crawler and text retrieval system is evaluated. The engines are compared on a range of measures derivable from binary relevance judgments of the first seven live results returned. Statistical testing reveals a significant difference between engines and high intercorrelations between measures. Surprisingly, given the dynamic nature of the Web and the time elapsed, there is also a high correlation between results of this study and a previous study by Gordon and Pathak. For nearly all engines, there is a gradual decline in precision at increasing cutoff after some initial fluctuation. Performance of the engines as a group is found to be inferior to the group of participants in the TREC-8 Large Web task, although the best engines approach the median of those systems. Shortcomings of current Web search evaluation methodology are identified and recommendations are made for future improvements. In particular, the present study and its predecessors deal with queries which are assumed to derive from a need to find a selection of documents relevant to a topic. By contrast, real Web search reflects a range of other information need types which require different judging and different measures.
"Is This Document Relevant? ...Probably": A Survey of Probabilistic Models in Information Retrieval
, 2001
"... This article surveys probabilistic approaches to modeling information retrieval. The basic concepts of probabilistic approaches to information retrieval are outlined and the principles and assumptions upon which the approaches are based are presented. The various models proposed in the developmen ..."
Abstract
-
Cited by 71 (15 self)
- Add to MetaCart
This article surveys probabilistic approaches to modeling information retrieval. The basic concepts of probabilistic approaches to information retrieval are outlined and the principles and assumptions upon which the approaches are based are presented. The various models proposed in the development of IR are described, classified, and compared using a common formalism. New approaches that constitute the basis of future research are described