MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  A modified Burrows-Wheeler transformation for case-insensitive search with application to suffix array compression (1999) [4 citations — 2 self]

Download:
pdf | ps
by Kunihiko Sadakane, M. Burrows
In Proceedings of IEEE Data Compression Conference (DCC'99
http://naomi.is.s.u-tokyo.ac.jp/~sada/papers/Sada99b.ps.gz
Add To MetaCart

Abstract:

Now the Block sorting compression [1] becomes common by its good balance of compression ratio and speed. It has another nice feature, which is the relation between encoding/decoding process and suffix array. The suffix array [2] is a memory-efficient data structure for searching any substring of a text. It is an array of lexicographically sorted pointers to suffixes of a text. It is also used for defining the Burrows-Wheeler transformation (BWT), which is the core of the Block sorting. When a compressed text is decoded, the inverse of BWT, which is faster than forward transformation, is performed and in the process the suffix array of the text is also obtained. This means that we can compress and transfer a text and its suffix array by simply using the Block sorting. This fact can be used for creating large full-text databases. However, the obtained suffix array cannot be used for case-insensitive queries, which are practically important. We propose a modified Burrows-Wheeler transformation. By using our transformation, we obtain a suffix array from a compressed text which can be used for case-insensitive searches. An exact query can be done from the result of a caseinsensitive

Citations

386 Suffix arrays: A new method for on-line string searches – Manber, Myers - 1993
293 A block-sorting lossless data compression algorithm – Burrows, Wheeler - 1994