Results 11 -
13 of
13
A Hybrid Cache and Prefetch Mechanism for Scientific Literature Search Engines
"... Abstract. CiteSeer, a scientific literature search engine that focuses on documents in the computer science and information science domains, suffers from scalability issue on the number of requests and the size of indexed documents, which increased dramatically over the years. CiteSeer X is an effor ..."
Abstract
- Add to MetaCart
Abstract. CiteSeer, a scientific literature search engine that focuses on documents in the computer science and information science domains, suffers from scalability issue on the number of requests and the size of indexed documents, which increased dramatically over the years. CiteSeer X is an effort to re-architect the search engine. In this paper, we present our initial design of a framework for caching query results, indices, and documents. This design is based on analysis of logged workload in CiteSeer. Our experiments based on mock client requests that simulate actual user behaviors confirm that our approach works well in enhancing system performances. 1
Beijing, China Performance of Compressed Inverted List Caching in Search Engines ∗ ABSTRACT
"... Due to the rapid growth in the size of the web, web search engines are facing enormous performance challenges. The larger engines in particular have to be able to process tens of thousands of queries per second on tens of billions of documents, making query throughput a critical issue. To satisfy th ..."
Abstract
- Add to MetaCart
Due to the rapid growth in the size of the web, web search engines are facing enormous performance challenges. The larger engines in particular have to be able to process tens of thousands of queries per second on tens of billions of documents, making query throughput a critical issue. To satisfy this heavy workload, search engines use a variety of performance optimizations including index compression, caching, and early termination. We focus on two techniques, inverted index compression and index caching, which play a crucial rule in web search engines as well as other high-performance information retrieval systems. We perform a comparison and evaluation of several inverted list compression algorithms, including new variants of existing algorithms that have not been studied before. We then evaluate different inverted list caching policies on large query traces, and finally study the possible performance benefits of combining compression and caching. The overall goal of this paper is to provide an updated discussion and evaluation of these two techniques, and to show how to select the best set of approaches and settings depending on parameter such as disk speed and main memory cache size.

