MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  c ○ 2003 Kluwer Academic Publishers. Manufactured in The Netherlands. (2002)

Download:
pdf
by Andrew Trotman
http://www.cs.otago.ac.nz/postgrads/andrew/2003-1.pdf
Add To MetaCart

Abstract:

Abstract. Research into inverted file compression has focused on compression ratio—how small the indexes can be. Compression ratio is important for fast interactive searching. It is taken as read, the smaller the index, the faster the search. The premise “smaller is better ” may not be true. To truly build faster indexes it is often necessary to forfeit compression. For inverted lists consisting of only 128 occurrences compression may only add overhead. Perhaps the inverted list could be stored in 128 bytes in place of 128 words, but it must still be stored on disk. If the minimum disk sector read size is 512 bytes and the word size is 4 bytes, then both the compressed and raw postings would require one disk seek and one disk sector read. A less efficient compression technique may increase the file size, but decrease load/decompress time, thereby increasing throughput.

Citations

223 Universal codeword sets and representations of the integers – Elias - 1975
88 Self-Indexing Inverted Files for Fast Text Retrival – Moffat, Zobel - 1996
45 Performance measurements of compressed bitmap indices – Johnson - 1999
34 Fast and efficient lossless image compression – Vitter, Howard - 1993
28 Compressing integers for fast file access – Williams, Zobel - 1999
22 Parameterised compression for sparse bitmaps – Moffat, Zobel - 1992
16 Space efficient bitmap indexing – Koudas - 2000
11 Improved hierarchical bit-vector compression in document retrieval systems – Choueka, Fraenkel, et al. - 1986
10 Design and Implementation of Bitmap Indices for Scientific Data – Stockinger - 2001
8 Compression of concordances in full-text retrieval systems – Choueka, Fraenkel, et al. - 1988
8 Exploiting clustering in inverted file compression – Moffat, Stuiver - 1996
6 SASE: implementation of a compressed text search engine – Varadarajan, Chiueh - 1997
5 Byte Aligned Data Compression – Antoshenkov - 1994
2 Run-length encodings – SW - 1966
1 ST and Raita T – Bookstein, Klein - 1994
1 An efficient Bitmap encoding scheme for selection queries – CY, YE - 1999
1 Bookstein A and Deerwester S – ST - 1989
1 TF (2001) Compressing inverted files in scalable information systems by binary decision diagram encoding. Presented at SC2001, available at http://www.sc2001.org/papers/pap.pap338.pdf (visited – CH, Chen - 2002
1 A and Stuiver L (2000) Binary interpolative coding for effective index compression – Moffat
1 Ziviani N and Baeza-Yates R (2000) Adding compression to block addressing inverted indexes – Navarro, Moura, et al.
1 Moffat A (1998) Compressed inverted files with reduced decoding overheads – AN
1 goanna.cs.rmit.edu.au/∼hugh/software/integer.coding.tar.gz (viewed – HE - 2002
1 Moffat A and Bell TC – IH - 1994