MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  AND

Download:
Download as a PDF
by Justin Zobel, Alistair Moffat
http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/./vol25/issue8/spe970jz.pdf
Add To MetaCart

Abstract:

We describe the implementation of a data compression scheme as an integral and transparent layer within a full-text retrieval system. Using a semi-static word-based compression model, the space needed to store the text is under 30 per cent of the original requirement. The model is used in conjunction with canonical Huffman coding and together these two paradigms provide fast decompression. Experiments with 500 Mb of newspaper articles show that in full-text retrieval environments compression not only saves space, it can also yield faster query processing – a win–win situation.

Citations

578 A method for the construction of minimum redundancy codes – Huffman - 1952
523 Arithmetic coding for data compression – Witten, Neal, et al. - 1987
223 Universal codeword sets and representations of the integers – Elias - 1975
106 A locally adaptive data compression scheme – Bentley, Sleator, et al. - 1986
100 Implementing the ppm data compression scheme,” in – Moffat - 1990
96 Universal modeling and coding – Rissanen, Langdon - 1981
88 Self-Indexing Inverted Files for Fast Text Retrival – Moffat, Zobel - 1996
87 on a theme by Huffman – “Variations - 1978
28 Efficient decoding of prefix codes – Hirschberg, Lelewer - 1990
25 Data Compression on a Database System – Cormack - 1985
21 Constructing word-based text compression algorithms – Horspool, Cormack - 1992
20 Storing text retrieval systems on CD-ROM: compression and encryption considerations – Klein, Bookstein, et al. - 1989
16 Bounding the depth of search trees – Fraenkel, Klein - 1993
14 A practitioner's guide to data base compression – Severance - 1983
10 Word based text compression – Moffat - 1989
9 Space-efficient construction of optimal prefix codes – Moffat, Turpin, et al. - 1995
8 A Huffman-Shannon-Fano code – Connell - 1973
6 Indexing and compressing full-text databases for CD-ROM – Witten, Bell, et al. - 1992
3 Static compression for dynamic texts – Moffat, Sharman, et al. - 1994