MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Improving Web Search Efficiency via a Locality Based Static Pruning Method (2005) [12 citations — 3 self]

Download:
Download as a PDF
by Edleno S. De Moura, Altigran S. Silva, Pavel Calado, Daniel R. Fernandes, Mario A. Nascimento
In Proceedings of the 14th International Conference on World Wide Web
http://www.www2005.org/cdrom/docs/p235.pdf
Add To MetaCart

Abstract:

The unarguably fast, and continuous, growth of the volume of indexed (and indexable) documents on the Web poses a great challenge for search engines. This is true regarding not only search effectiveness but also time and space efficiency. In this paper we present an index pruning technique targeted for search engines that addresses the latter issue without disconsidering the former. To this effect, we adopt a new pruning strategy capable of greatly reducing the size of search engine indices. Experiments using a real search engine show that our technique can reduce the indices ’ storage costs by up to 60 % over traditional lossless compression methods, while keeping the loss in retrieval precision to a minimum. When compared to the indices size with no compression at all, the compression rate is higher than 88%, i.e., less than one eighth of the original size. More importantly, our results indicate that, due to the reduction in storage overhead, query processing time can be reduced to nearly 65 % of the original time, with no loss in average precision. The new method yields significative improvements when compared against the best known static pruning method for search engine indices. In addition, since our technique is orthogonal to the underlying search algorithms, it can be adopted by virtually any search engine.

Citations

1646 Authoritative Sources in a Hyperlinked Environment – Kleinberg - 1999
1594 Indexing by latent semantic analysis – Deerwester, Dumais, et al. - 1990
1376 Modern Information Retrieval – Baeza-Yates, Ribeiro-Neto - 1999
576 Text Compression – Bell, Cleary, et al. - 1990
129 Introduction to Modern Information Retrieval. McGraw-Hill Computer Science Series – Salton, McGill - 1983
119 Managing Gigabytes – Witten, Moffat, et al. - 1999
108 The use of phrases and structured queries in information retrieval – Croft, Turtle, et al. - 1991
91 A large-scale study of the evolution of web pages – Fetterly, Manasse, et al. - 2003
78 Comparing top k lists – Fagin, Kumar, et al. - 2003
70 Filtered document retrieval with frequency-sorted indexes – Persin, Zobel, et al. - 1996
62 Overview of the TREC-8 Web track – Hawking, Voorhees, et al. - 1999
53 Overview of TREC-7 very large collection track – Hawking, Craswell, et al. - 1998
53 Results and challenges in web search evaluation – Hawking, Craswell, et al. - 1999
44 Fast and flexible word searching on compressed text – Moura, Navarro, et al. - 2000
38 Static Index Pruning for Information Retrieval Systems – Carmel, Cohen, et al. - 2001
22 Rank-preserving two-level caching for scalable search engines – Saraiva, Moura, et al. - 2001
15 Efficient phrase querying with an auxiliary index – Bahle, Williams, et al. - 2002
13 Dynamic maintenance of web indexes using landmarks – Lim, Wang, et al. - 2003
10 Local versus global link information in the web – CALADO, RIBEIRO-NETO, et al. - 2003