Results 1 -
6 of
6
Efficient Query Routing by Improved Peer Description
- in P2P Networks,” in Proc. INFOSCALE
, 2008
"... Peer-to-peer file-sharing systems commonly use the set-of-terms model to describe succinctly a peer’s shared file set: the union of the terms in the share files. This information is used to guide query routing decisions. The problem with this model, however, is that it falsely suggests term co-occur ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Peer-to-peer file-sharing systems commonly use the set-of-terms model to describe succinctly a peer’s shared file set: the union of the terms in the share files. This information is used to guide query routing decisions. The problem with this model, however, is that it falsely suggests term co-occurrences that do not exist in any single file. Consequently, queries get routed erroneously to peers that have no matching files, wasting network and computation resources in the process. We reduce the amount of co-occurrence errors by partitioning each peer’s file set and representing the peer as several file partitions instead of one. Experimental evidence demonstrates that it is possible to reduce the network traffic between neighbors by up to 60 % at virtually no cost.
A Topic-based Measure of Resource Description Quality for Distributed Information Retrieval
"... Abstract. The aim of query-based sampling is to obtain a sufficient, representative sample of an underlying (text) collection. Current measures for assessing sample quality are too coarse grain to be informative. This paper outlines a measure of finer granularity based on probabilistic topic models ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract. The aim of query-based sampling is to obtain a sufficient, representative sample of an underlying (text) collection. Current measures for assessing sample quality are too coarse grain to be informative. This paper outlines a measure of finer granularity based on probabilistic topic models of text. The assumption we make is that a representative sample should capture the broad themes of the underlying text collection. If these themes are not captured, then resource selection will be affected in terms of performance, coverage and reliability. For example, resource selection algorithms that require extrapolation from a small sample of indexed documents to determine which collections are most likely to hold relevant documents may be affected by samples which do not reflect the topical density of a collection. To address this issue we propose to measure the relative entropy between topics obtained in a sample with respect to the complete collection. Topics are both modelled from the collection and inferred in the sample using latent Dirichlet allocation. The paper outlines an analysis and evaluation of this methodology across a number of collections and sampling algorithms. 1
Distributed Information Retrieval using Keyword Auctions
"... Abstract This report motivates the need for large-scale distributed approaches to information retrieval, and proposes ..."
Abstract
- Add to MetaCart
Abstract This report motivates the need for large-scale distributed approaches to information retrieval, and proposes
Quality of language models for distributed information retrieval
, 2009
"... part of this publication covered by copyright may be reproduced or copied in any form or by any means except with the written permission of CSIRO. Important Disclaimer CSIRO advises that the information contained in this publication comprises general statements based on scientific research. The read ..."
Abstract
- Add to MetaCart
part of this publication covered by copyright may be reproduced or copied in any form or by any means except with the written permission of CSIRO. Important Disclaimer CSIRO advises that the information contained in this publication comprises general statements based on scientific research. The reader is advised and needs to be aware that such information may be incomplete or unable to be used in any specific situation. No reliance or actions must therefore be made on that information without seeking prior expert professional, scientific and technical advice. To the extent permitted by law, CSIRO (including its employees and consultants) excludes all liability to any person for any consequences, including but not limited to all losses, damages, costs, expenses and any other compensation, arising directly or indirectly from using this publication (in part Collections used in distributed information retrieval (DIR) are often described by unigram language models, composed of simple term-probability statistics. In most cases, this information is not directly available from constituent collections and must be estimated by the DIR tool itself from a sample of documents. Factors affecting the quality of such estimates are not well understood, and nor is the impact of estimate quality. Several measures of quality for unigram language models have been described, and three are used here to investigate how the quality of a model changes given document samples of differing size or quality. I show that although all models improve given larger samples, those built with more biased samples are of significantly lower quality; and that one of the three measures, Kullback-Leibler divergence, best describes model quality. Finally, it is shown that model quality has an impact on the effectiveness of standard server selection algorithms. iii iv
An Evaluation Measure for Distributed Information Retrieval Systems
"... Abstract. This paper is concerned with the evaluation of distributed and peer-to-peer information retrieval systems. A new measure is introduced that compares results of a distributed retrieval system to those of a centralised system, fully exploiting the ranking of the latter as an indicator of gra ..."
Abstract
- Add to MetaCart
Abstract. This paper is concerned with the evaluation of distributed and peer-to-peer information retrieval systems. A new measure is introduced that compares results of a distributed retrieval system to those of a centralised system, fully exploiting the ranking of the latter as an indicator of gradual relevance. Problems with existing evaluation approaches are verified experimentally. 1
A Task-Based Evaluation of an Aggregated Search Interface
"... Abstract. This paper presents a user study that evaluated the effectiveness of an aggregated search interface in the context of non-navigational search tasks. An experimental system was developed to present search results aggregated from multiple information sources, and compared to a conventional t ..."
Abstract
- Add to MetaCart
Abstract. This paper presents a user study that evaluated the effectiveness of an aggregated search interface in the context of non-navigational search tasks. An experimental system was developed to present search results aggregated from multiple information sources, and compared to a conventional tabbed interface. Sixteen participants were recruited to evaluate the performance of the two interfaces. Our results suggest that the aggregated search interface is a promising way of supporting nonnavigational search tasks. The quantity and diversity of the retrieved items which participants accessed to complete a task, increased in the aggregated interface. Participants also found the aggregated presentation easier to access to retrieved items and to find relevant information, compared to the conventional interface. 1

