Download:
by Justin Zobel, Alistair Moffat
http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/./vol25/issue8/spe970jz.pdf
Add To MetaCart
Abstract:
We describe the implementation of a data compression scheme as an integral and transparent layer within a full-text retrieval system. Using a semi-static word-based compression model, the space needed to store the text is under 30 per cent of the original requirement. The model is used in conjunction with canonical Huffman coding and together these two paradigms provide fast decompression. Experiments with 500 Mb of newspaper articles show that in full-text retrieval environments compression not only saves space, it can also yield faster query processing – a win–win situation.
Citations
|
578
|
A method for the construction of minimum redundancy codes
– Huffman
- 1952
|
|
523
|
Arithmetic coding for data compression
– Witten, Neal, et al.
- 1987
|
|
223
|
Universal codeword sets and representations of the integers
– Elias
- 1975
|
|
106
|
A locally adaptive data compression scheme
– Bentley, Sleator, et al.
- 1986
|
|
100
|
Implementing the ppm data compression scheme,” in
– Moffat
- 1990
|
|
96
|
Universal modeling and coding
– Rissanen, Langdon
- 1981
|
|
88
|
Self-Indexing Inverted Files for Fast Text Retrival
– Moffat, Zobel
- 1996
|
|
87
|
on a theme by Huffman
– “Variations
- 1978
|
|
28
|
Efficient decoding of prefix codes
– Hirschberg, Lelewer
- 1990
|
|
25
|
Data Compression on a Database System
– Cormack
- 1985
|
|
21
|
Constructing word-based text compression algorithms
– Horspool, Cormack
- 1992
|
|
20
|
Storing text retrieval systems on CD-ROM: compression and encryption considerations
– Klein, Bookstein, et al.
- 1989
|
|
16
|
Bounding the depth of search trees
– Fraenkel, Klein
- 1993
|
|
14
|
A practitioner's guide to data base compression
– Severance
- 1983
|
|
10
|
Word based text compression
– Moffat
- 1989
|
|
9
|
Space-efficient construction of optimal prefix codes
– Moffat, Turpin, et al.
- 1995
|
|
8
|
A Huffman-Shannon-Fano code
– Connell
- 1973
|
|
6
|
Indexing and compressing full-text databases for CD-ROM
– Witten, Bell, et al.
- 1992
|
|
3
|
Static compression for dynamic texts
– Moffat, Sharman, et al.
- 1994
|