MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Appears in NSDI'04 1 Hybrid Global-Local Indexing for Efficient Peer-to-Peer Information Retrieval

Download:
pdf | ps
by Chunqiang Tang, Hya Dwarkadas
http://www.cs.rochester.edu/u/sarrmor/publications/eSearch-NSDI.ps
Add To MetaCart

Abstract:

Content-based full-text search still remains a particularly challenging problem in peer-to-peer (P2P) systems. Traditionally, there have been two index partitioning structures---partitioning based on the document space or partitioning based on keywords. The former requires search of every node in the system to answer a query whereas the latter transmits a large amount of data when processing multi-term queries. In this paper, we propose eSearch---a P2P keyword search system based on a novel hybrid indexing structure. In eSearch, each node is responsible for certain terms. Given a document, eSearch uses a modern information retrieval algorithm to select a small number of top (important) terms in the document and publishes the complete term list for the document to nodes responsible for those top terms. This selective replication of term lists allows a multi-term query to proceed local to the nodes responsible for query terms. We also propose automatic query expansion to alleviate the degradation of quality of search results due to the selective replication, overlay source multicast to reduce the cost of disseminating term lists, and techniques to balance term list distribution across nodes. eSearch is scalable and efficient, and obtains search results as good as state-of-the-art centralized systems. Despite the use of replication, eSearch actually consumes less bandwidth than systems based on keyword partitioning when publishing metadata for a document. During a retrieval operation, it searches only a small number of nodes and typically transmits a small amount of data (3.3KB) that is independent of the size of the corpus and grows slowly (logarithmically) with the number of nodes in the system. eSearch's efficiency comes at a modest storage cost, 6.8 times that of systems based on keyword partitioning. This cost can be further reduced by adopting index compression or pruning techniques.

Citations

2113 Chord: A scalable peer-to-peer lookup service for internet applications – Stoica, Morris, et al.
1632 The anatomy of a large-scale hypertextual (Web) search engine – Brin, Page - 1998
739 A Case for End System Multicast – Chu, Rao, et al. - 2000
581 Wide-area cooperative storage with CFS – Dabek, Kaashoek, et al. - 2001
564 Managing Gigabytes: Compressing and Indexing Documents and Images – Witten, Bell, et al. - 1994
452 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility – Rowstron, Druschel - 2001
337 Modeling internet topology – Calvert, Doar, et al. - 1997
304 Predicting Internet Network Distance with Coordinates-Based Approaches – Ng, Zhan - 2002
251 Okapi at TREC-3 – Robertson, Walker, et al. - 1994
228 Routing Indices For Peer-to-Peer Systems – Crespo, Garcia-Molina - 2002
201 Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs – BOLOSKY, DOUCEUR, et al. - 2000
158 Text-source Discovery over the Internet – Gravano, Garcia-Molina, et al. - 1999
141 Implementation of the SMART information retrieval system – Buckley - 1985
134 C.Buckley, Improving Automatic Query Expansion – Singhal - 1998
122 Replication strategies in unstructured peer-to-peer networks – Cohen, Shenker - 2002
111 Peer-to-peer information retrieval using self-organizing semantic overlay networks – Tang, Xu, et al. - 2003
105 Efficient peer-to-peer keyword searching – Reynolds, Vahdat - 2002
102 A vector space model for information retrieval – Salton, Wong, et al. - 1975
101 PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities – Cuenca-Acuna, Peery, et al. - 2002
86 Probabilistic location and routing – Rhea, Kubiatowicz - 2002
84 On the feasibility of peer-to-peer web indexing and search – Li, Loo, et al. - 2003
69 High availability, scalable storage, dynamic peer neetworks: Pick two – Blake, Rodrigues - 2003
67 Odissea: A peer-to-peer architecture for scalable web search and information retrieval – Suel, Mathur, et al. - 2003
51 Associative search in peer to peer networks: Harnessing latent semantics – Cohen, Fiat, et al. - 2003
46 Modern information retrieval: A brief overview – Singhal - 2001
43 A keyword set search system for peer-to-peer networks – Gnawali - 2002
39 Sets: search enhanced by topic segmentation – Bawa, Manku, et al. - 2003
35 Replication strategies in unstructured peer-to-peer networks – Choen, Shenker - 2002
34 Static index pruning for information retrieval systems – Carmel, Cohen, et al. - 2001
25 Optimized Query Execution in Large Search Engines with Global Page Ordering – Long, Suel - 2003
24 Distributed pagerank for p2p systems – Sankaralingam, Sethumadhavan, et al.
24 A Scalable, Non-Hierarchical Resource Discovery Mechanism Based on Probabilistic Protocols – Schwartz - 1990
20 Search Engines and Web Dynamics – Risvik, Michelsen - 2002
20 Enabling efficient content location and retrieval in peer-to-peer systems by exploiting locality in interests – Sripanidkulchai, Maggs, et al. - 2002
12 Query processing and inverted indices in sharednothing document information retrieval systems – TOMASIC, GARCIA-MOLINA - 1993
2 Garc a-Molina. Routing Indices for Peer-to-peer Systems – Crespo, H - 2002
2 A Keyword Set Search System for Peerto -Peer Networks – Gnawali - 2002
2 Route Views Project. http://routeviews.org – Oregon