Abstract:
Abstract. Research into inverted file compression has focused on compression ratio—how small the indexes can be. Compression ratio is important for fast interactive searching. It is taken as read, the smaller the index, the faster the search. The premise “smaller is better ” may not be true. To truly build faster indexes it is often necessary to forfeit compression. For inverted lists consisting of only 128 occurrences compression may only add overhead. Perhaps the inverted list could be stored in 128 bytes in place of 128 words, but it must still be stored on disk. If the minimum disk sector read size is 512 bytes and the word size is 4 bytes, then both the compressed and raw postings would require one disk seek and one disk sector read. A less efficient compression technique may increase the file size, but decrease load/decompress time, thereby increasing throughput.
Citations
|
223
|
Universal codeword sets and representations of the integers
– Elias
- 1975
|
|
88
|
Self-Indexing Inverted Files for Fast Text Retrival
– Moffat, Zobel
- 1996
|
|
45
|
Performance measurements of compressed bitmap indices
– Johnson
- 1999
|
|
34
|
Fast and efficient lossless image compression
– Vitter, Howard
- 1993
|
|
28
|
Compressing integers for fast file access
– Williams, Zobel
- 1999
|
|
22
|
Parameterised compression for sparse bitmaps
– Moffat, Zobel
- 1992
|
|
16
|
Space efficient bitmap indexing
– Koudas
- 2000
|
|
11
|
Improved hierarchical bit-vector compression in document retrieval systems
– Choueka, Fraenkel, et al.
- 1986
|
|
10
|
Design and Implementation of Bitmap Indices for Scientific Data
– Stockinger
- 2001
|
|
8
|
Compression of concordances in full-text retrieval systems
– Choueka, Fraenkel, et al.
- 1988
|
|
8
|
Exploiting clustering in inverted file compression
– Moffat, Stuiver
- 1996
|
|
6
|
SASE: implementation of a compressed text search engine
– Varadarajan, Chiueh
- 1997
|
|
5
|
Byte Aligned Data Compression
– Antoshenkov
- 1994
|
|
2
|
Run-length encodings
– SW
- 1966
|
|
1
|
ST and Raita T
– Bookstein, Klein
- 1994
|
|
1
|
An efficient Bitmap encoding scheme for selection queries
– CY, YE
- 1999
|
|
1
|
Bookstein A and Deerwester S
– ST
- 1989
|
|
1
|
TF (2001) Compressing inverted files in scalable information systems by binary decision diagram encoding. Presented at SC2001, available at http://www.sc2001.org/papers/pap.pap338.pdf (visited
– CH, Chen
- 2002
|
|
1
|
A and Stuiver L (2000) Binary interpolative coding for effective index compression
– Moffat
|
|
1
|
Ziviani N and Baeza-Yates R (2000) Adding compression to block addressing inverted indexes
– Navarro, Moura, et al.
|
|
1
|
Moffat A (1998) Compressed inverted files with reduced decoding overheads
– AN
|
|
1
|
goanna.cs.rmit.edu.au/∼hugh/software/integer.coding.tar.gz (viewed
– HE
- 2002
|
|
1
|
Moffat A and Bell TC
– IH
- 1994
|