We study the problem of evaluating ranked (top-k) queries on textual collections ranging from multiple gigabytes to terabytes in size. We focus on the case of a global index organization in a highly distributed environment, and consider a class of ranking functions that includes common variants of the Cosine and Okapi measures. The main bottleneck in such a scenario is the amount of communication required during query evaluation. We propose several efficient query evaluation schemes and evaluate their performance. Our results on real search engine query traces and over 120 million web pages show that after careful optimization such queries can be evaluated at a reasonable cost, while challenges remain for even larger collections and more general classes of ranking functions. 1.
|
2113
|
Chord: A scalable peer-to-peer lookup service for internet applications
– Stoica, Morris, et al.
|
|
1137
|
Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems
– Rowstron, Druschel
- 2001
|
|
768
|
Tapestry: An infrastructure for fault-tolerant wide-area location and routing
– Zhao, Kubiatowicz, et al.
- 2001
|
|
687
|
Space/time trade-offs in hash coding with allowable errors
– Bloom
- 1970
|
|
563
|
Managing Gigabytes: compressing and indexing documents and images
– Witten, Moffat, et al.
- 1994
|
|
253
|
Combining Fuzzy Information from Multiple Systems
– Fagin
- 1996
|
|
213
|
Optimal aggregation algorithms for middleware
– Fagin, Lotem, et al.
- 2001
|
|
118
|
Network Applications of Bloom Filters: A Survey
– Broder, Mitzenmacher
- 2002
|
|
113
|
Compressed bloom filters
– Mitzenmacher
|
|
111
|
Peer-to-peer information retrieval using self-organizing semantic overlay networks
– Tang, Xu, et al.
- 2003
|
|
82
|
Building efficient and effective metasearch engines
– Meng, Liu, et al.
- 2002
|
|
67
|
Odissea: A peer-to-peer architecture for scalable web search and information retrieval
– Suel, Mathur, et al.
- 2003
|
|
61
|
Filtered document retrieval with frequency-sorted indexes
– Persin, Zobel, et al.
- 1996
|
|
53
|
pSearch: Information Retrieval in Structured Overlays
– Tang, Xu, et al.
- 2002
|
|
48
|
Compression of inverted indexes for fast query evaluation
– Scholer, Williams, et al.
- 2002
|
|
47
|
Design and implementation of a high-performance distributed web crawler
– Shkapenyuk, Suel
- 2002
|
|
45
|
Text-Based Content Search and Retrieval in ad hoc P2P Communities
– Cuenca-Acuna, Nguyen
- 2002
|
|
43
|
A keyword set search system for peer-to-peer networks
– Gnawali
- 2002
|
|
34
|
Combining fuzzy information: an overview
– Fagin
- 2002
|
|
29
|
Evaluating the performance of distributed architectures for information retrieval using a variety of workloads
– Cahoon, McKinley, et al.
|
|
27
|
Make it fresh, make it quick — searching a network of personal webservers
– Bawa, Jr, et al.
- 2003
|
|
26
|
Hybrid global-local indexing for efficient peer-to-peer information retrieval
– Tang, Dwarakadas
- 2004
|
|
24
|
Performance of inverted indices in distributed text document retrieval systems
– Tomasic, Garcia-Molina
|
|
22
|
Distributed query processing using partitioned inverted files
– Badue, Baeza-Yates, et al.
- 2001
|
|
20
|
Efficient peer-to-peer searches using result-caching
– Bhattacharjee, Chawathe, et al.
- 2003
|
|
12
|
On the feasibility of peer-to-peer web indexing
– Li, Loo, et al.
- 2003
|
|
10
|
FASD: A Fault-Tolerant, Adaptive, Scalable Distributed Search Engine
– Kronfol
- 2002
|
|
8
|
An mdp-based peer-to-peer search server network
– Shen, Lee
- 2002
|
|
5
|
Efficient peer-to-peer keyword searching. February 2002. Unpublished manuscript
– Reynolds, Vahdat
- 2000
|