MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Approximate Pattern Matching Over the Burrows-Wheeler Transformed Text

Download:
Download as a PDF
by Nan Zhang, Amar Mukherjee, Don Adjeroh, Tim Bell
http://www.cs.ucf.edu/~nzhang/WADS/Wads-V02.pdf
Add To MetaCart

Abstract:

Abstract. The compressed pattern matching problem is to locate the occurrence(s) of a pattern P in a text string T using a compressed representation of T, with minimal (or no) decompression. In this paper, we consider approximate pattern matching directly on Burrow-Wheeler transformed (BWT) text which is a critical step for a fully compressed pattern matching algorithm on a BWT based compression algorithm. The BWT provides a lexicographic ordering of the input text as part of its inverse transformation process. Based on this observation, pattern matching is performed by text pre-filtering, using a fast q-gram intersec-tion of segments from the pattern P and the text T. Algorithms are pro-posed that solve the k-mismatch problem in O(min{m|Σ | k log u u, mu log |Σ | |Σ |}) time worst case, and the k-approximate matching problem in O(|Σ | log |Σ|+ m 2 k u log + αk) time on average (α ≤ u), where u = |T | is the size of

Citations

293 A block-sorting lossless data compression algorithm – Burrows, Wheeler - 1994
187 A Gu ided Tour to Approximate String Matching – Navarro - 2001
120 Finding Approximate Patterns in Strings – Ukkonen - 1985
78 Opportunistic data structures with applications. FOCS – Ferragina, Manzini - 2000
63 Let sleeping files lie: Pattern matching in Z-compressed file – Amir, Benson, et al. - 1996
45 Fast and practical approximate string matching – Baeza-Yates, Perleberg - 1992
35 Compressed text databases with efficient query algorithms based on the compressed suffix array – Sadakane - 2000
23 NR-grep: a fast and flexible pattern matching tool – Navarro - 2001
4 Searching BWT compressed text with the Boyer-Moore algorithm and binary search – Mukherjee, Bell, et al. - 2002
3 Pattern matching in compressed text and images – Bell, Adjeroh, et al. - 2001
2 Pattern matching in bwt-transformed text – Adjeroh, Mukherjee, et al. - 2002