Approximate Pattern Matching Over the Burrows-Wheeler Transformed Text
Abstract:
Abstract. The compressed pattern matching problem is to locate the occurrence(s) of a pattern P in a text string T using a compressed representation of T, with minimal (or no) decompression. In this paper, we consider approximate pattern matching directly on Burrow-Wheeler transformed (BWT) text which is a critical step for a fully compressed pattern matching algorithm on a BWT based compression algorithm. The BWT provides a lexicographic ordering of the input text as part of its inverse transformation process. Based on this observation, pattern matching is performed by text pre-filtering, using a fast q-gram intersec-tion of segments from the pattern P and the text T. Algorithms are pro-posed that solve the k-mismatch problem in O(min{m|Σ | k log u u, mu log |Σ | |Σ |}) time worst case, and the k-approximate matching problem in O(|Σ | log |Σ|+ m 2 k u log + αk) time on average (α ≤ u), where u = |T | is the size of
Citations
| 293 | A block-sorting lossless data compression algorithm – Burrows, Wheeler - 1994 |
| 187 | A Gu ided Tour to Approximate String Matching – Navarro - 2001 |
| 120 | Finding Approximate Patterns in Strings – Ukkonen - 1985 |
| 78 | Opportunistic data structures with applications. FOCS – Ferragina, Manzini - 2000 |
| 63 | Let sleeping files lie: Pattern matching in Z-compressed file – Amir, Benson, et al. - 1996 |
| 45 | Fast and practical approximate string matching – Baeza-Yates, Perleberg - 1992 |
| 35 | Compressed text databases with efficient query algorithms based on the compressed suffix array – Sadakane - 2000 |
| 23 | NR-grep: a fast and flexible pattern matching tool – Navarro - 2001 |
| 4 | Searching BWT compressed text with the Boyer-Moore algorithm and binary search – Mukherjee, Bell, et al. - 2002 |
| 3 | Pattern matching in compressed text and images – Bell, Adjeroh, et al. - 2001 |
| 2 | Pattern matching in bwt-transformed text – Adjeroh, Mukherjee, et al. - 2002 |

