MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Efficient Query Evaluation on Large Textual Collections in a Peer-to-Peer Environment

Download:
Download as a PDF
unknown authors
http://www.inf.uni-konstanz.de/p2p2005/papers/Session3_Efficient_Query_Evaluation.pdf
Add To MetaCart

Abstract:

We study the problem of evaluating ranked (top-k) queries on textual collections ranging from multiple gigabytes to terabytes in size. We focus on the case of a global index organization in a highly distributed environment, and consider a class of ranking functions that includes common variants of the Cosine and Okapi measures. The main bottleneck in such a scenario is the amount of communication required during query evaluation. We propose several efficient query evaluation schemes and evaluate their performance. Our results on real search engine query traces and over 120 million web pages show that after careful optimization such queries can be evaluated at a reasonable cost, while challenges remain for even larger collections and more general classes of ranking functions. 1.

Citations

2113 Chord: A scalable peer-to-peer lookup service for internet applications – Stoica, Morris, et al.
1137 Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems – Rowstron, Druschel - 2001
768 Tapestry: An infrastructure for fault-tolerant wide-area location and routing – Zhao, Kubiatowicz, et al. - 2001
687 Space/time trade-offs in hash coding with allowable errors – Bloom - 1970
563 Managing Gigabytes: compressing and indexing documents and images – Witten, Moffat, et al. - 1994
253 Combining Fuzzy Information from Multiple Systems – Fagin - 1996
213 Optimal aggregation algorithms for middleware – Fagin, Lotem, et al. - 2001
118 Network Applications of Bloom Filters: A Survey – Broder, Mitzenmacher - 2002
113 Compressed bloom filters – Mitzenmacher
111 Peer-to-peer information retrieval using self-organizing semantic overlay networks – Tang, Xu, et al. - 2003
82 Building efficient and effective metasearch engines – Meng, Liu, et al. - 2002
67 Odissea: A peer-to-peer architecture for scalable web search and information retrieval – Suel, Mathur, et al. - 2003
61 Filtered document retrieval with frequency-sorted indexes – Persin, Zobel, et al. - 1996
53 pSearch: Information Retrieval in Structured Overlays – Tang, Xu, et al. - 2002
48 Compression of inverted indexes for fast query evaluation – Scholer, Williams, et al. - 2002
47 Design and implementation of a high-performance distributed web crawler – Shkapenyuk, Suel - 2002
45 Text-Based Content Search and Retrieval in ad hoc P2P Communities – Cuenca-Acuna, Nguyen - 2002
43 A keyword set search system for peer-to-peer networks – Gnawali - 2002
34 Combining fuzzy information: an overview – Fagin - 2002
29 Evaluating the performance of distributed architectures for information retrieval using a variety of workloads – Cahoon, McKinley, et al.
27 Make it fresh, make it quick — searching a network of personal webservers – Bawa, Jr, et al. - 2003
26 Hybrid global-local indexing for efficient peer-to-peer information retrieval – Tang, Dwarakadas - 2004
24 Performance of inverted indices in distributed text document retrieval systems – Tomasic, Garcia-Molina
22 Distributed query processing using partitioned inverted files – Badue, Baeza-Yates, et al. - 2001
20 Efficient peer-to-peer searches using result-caching – Bhattacharjee, Chawathe, et al. - 2003
12 On the feasibility of peer-to-peer web indexing – Li, Loo, et al. - 2003
10 FASD: A Fault-Tolerant, Adaptive, Scalable Distributed Search Engine – Kronfol - 2002
8 An mdp-based peer-to-peer search server network – Shen, Lee - 2002
5 Efficient peer-to-peer keyword searching. February 2002. Unpublished manuscript – Reynolds, Vahdat - 2000