| J. Karkkainen, G. Navarro, and E. Ukkonen. Approximate string matching over Ziv-Lempel compressed text. In Proc. 11th Annual Symp. on Combinatorial Pattern Matching (CPM 2000. |
.... pattern matching when both the pattern and text strings are compressed [21] 22] For the LZ78 LZW paradigm, compressed matching has been extended and generalized to that of approximate pattern matching ( nding all occurrences of a short sequence within a long one allowing up to k changes) in [28], 42] The LZ compression methods are based on the idea of self reference: while the text le is scanned, substrings or phrases are identi ed and stored in a dictionary, and whenever, later in the process, a phrase or concatenation of phrases is encountered again, this is compactly encoded by ....
Karkkainen, J., G. Navarro and E. Ukkonen, Approximate String Matching over Ziv-Lempel Compressed Text, Proc. 11th Annual Symposium On Combinatorial Pattern Matching, LNCS 1848, 195-209 (2000).
....with an ordinary search in the original les (Goal 2) In this case, the aim of compression is not only to reduce disk storage requirement but also to speed up string searching task. In fact, we have achieved Goal 2 for the compression method called Byte Pair Encoding (BPE) 21, 20, 23] In [6, 12], approximate string matching algorithms over LZW LZ78 compressed texts were proposed. Very recently, Navarro et al. proposed a practical solution for the LZW LZ78 compressions and showed experimentally that it is up to three times faster than the trivial approach of uncompressing and searching ....
J. Karkkainen, G. Navarro, and E. Ukkonen. Approximate string matching over Ziv-Lempel compressed text. In Proc. 11th Ann. Symp. on Combinatorial Pattern Matching, volume 1848 of Lecture Notes in Computer Science, pages 195-209. Springer-Verlag, 2000.
.... pattern matching when both the pattern and text strings are compressed [16] 17] For the LZ78 LZW paradigm, compressed matching has been extended and generalized to that of approximate pattern matching ( nding all occurrences of a short sequence within a long one allowing up to k changes) in [21], 33] The LZ compression methods are based on the idea of self reference: while the text le is scanned, substrings or phrases are identi ed and stored in a dictionary, and whenever, later in the process, a phrase or concatenation of phrases is encountered again, this is compactly encoded by ....
Karkkainen, J., G. Navarro and E. Ukkonen, Approximate String Matching over Ziv-Lempel Compressed Text, Proc. 11th Annual Symposium On Combinatorial Pattern Matching, LNCS 1848, 195-209 (2000).
....files. In this case, the aim of compression is not only to reduce disk storage requirement but also to speed up string searching task. For some compression methods, in fact this goal has been achieved [3, 11, 14, 22, 23] Moreover, approximate string matching is also aimed for LZ78 LZW format [5, 12], although practically even the first goal has not established yet. In their recent work [15] however, Navarro et al. took a practical approach, which reduces the problem to multipattern searching of pattern pieces plus local decompression and direct verification of candidate text areas, and ....
J. Karkkainen, G. Navarro, and E. Ukkonen. Approximate string matching over ZivLempel compressed text. In Proc. 11th Ann. Symp. on Combinatorial Pattern Matching, pages 195--209. Springer-Verlag, 2000.
....However, the studies on the compressed pattern matching have been undertaken only for the exact pattern matching. The approximate string matching on compressed text has been an open problem since advocated in [1] The first solution to this problem was very recently given by Karkkainen et al. [4]. They addressed the LZ78 and the LZW compression methods. The proposed algorithm runs in O(mkn r) time, where n is the compressed text length, m is the pattern length, k is the number of allowed errors, and r is the number of pattern occurrences with k errors or less. For the existence ....
....errors, and r is the number of pattern occurrences with k errors or less. For the existence problem it needs O(mkn) time and space. The algorithm is based on the simulation of the classical dynamic programming algorithm for approximate string matching [9] The experimental results reported in [4] Bit parallel approach to approximate string matching in compressed texts 3 show that, for small k, the algorithm runs faster than the method of decompressing the text on the fly and searching over it using the classical dynamic programming method that takes O(mN) time, where N is the original ....
[Article contains additional citation context not shown here]
J. Karkkainen, G. Navarro, and E. Ukkonen. Approximate string matching over Ziv-Lempel compressed text. In Proc. 11th Ann. Symp. on Combinatorial Pattern Matching. Springer-Verlag, 2000. to appear.
No context found.
J. Karkkainen, G. Navarro, and E. Ukkonen. Approximate string matching over Ziv-Lempel compressed text. In Proc. 11th Annual Symp. on Combinatorial Pattern Matching (CPM 2000.
....the pattern and its occurrences are permitted. The problem, advocated in 1992 [2] was solved for Hu man coding of words [18] but the solution is limited to search for a whole word and retrieve whole words that are similar. The rst true solutions appeared very recently, by K arkk ainen et al. [11], Matsumoto et al. 16] and Navarro et al. 23] 4 A Search Algorithm We present now our approach for regular expression searching a text Z = b 1 : b n , which is expressed by the LZ78 algorithm as a sequence of n blocks. Our goal is to nd the last positions in T of the regular expression ....
....string of length m in the text where a limited number, 0 k m, of di erences between the pattern and its occurrences are permitted. A di erence is a character insertion, deletion, or substitution. The rst true solutions to compressed approximate string matching appeared very recently [11, 16, 23]. We disregard the latter because it is an engineering solution without a complexity analysis. The rst solution [11] gives worst case time O(mkn R) and average case time O(k n min(mkn; m ) R) where is the alphabet size. The second solution [16] gives O(mk n=w) worst case time ....
[Article contains additional citation context not shown here]
J. Karkkainen, G. Navarro, and E. Ukkonen. Approximate string matching over Ziv-Lempel compressed text. In Proc. 11th Annual Symposium on Combinatorial Pattern Matching (CPM'2000.
....we are interested in k jP j errors. Many studies have been made around the subject of compressed pattern matching over di erent compression formats, starting with the work of Amir and Benson [1] e.g. 2, 10, 17, 16] The only works addressing the approximate variant of the problem have been [14, 19, 22], on Ziv Lempel [27] Our focus is approximate matching over run length encoded strings. In run length encoding, a string that consists of repetitions of letters is compressed by encoding each repetition as a pair ( letter , length of the repetition ) For example, string aaabbbbccaab is encoded ....
J. Karkkainen, G. Navarro, and E. Ukkonen. Approximate string matching over Ziv-Lempel compressed text. In Proc. 11th Annual Symposium on Combinatorial Pattern Matching (CPM'00), LNCS 1848, pages 195-209, 2000. Extended version to appear in the Journal of Discrete Algorithms.
....which in exchange of about 10 in the size of the compressed le permits faster searching. The experimental results show that this technique can take less than half of the time needed by the basic approach, for moderate m and small k values. This paper is an extended and updated version of [KNU00] 2 Related Work Two classes of techniques exist to compress text. The rst ones, called static (or semi static) methods, choose a xed mapping from symbols or sequences in T to symbols or sequences in Z, and apply the same mapping across all the compression process. The second ones, called ....
....[NR98] where w is the length in bits of the machine word. The aim of this paper is to present the rst general solution to this problem for the Ziv Lempel family and the so called regular collage systems. The specialization of this solution to LZ78 LZW and Levenshtein distance rst appeared in [KNU00] of which this work is an extended version. Shortly after [KNU00] an alternative solution was presented in [MKT 00] This alternative solution, based on bit parallelism, is restricted to solve the existence problem for the Levenshtein distance. 3 Approximate String Matching by Dynamic ....
[Article contains additional citation context not shown here]
J. Karkkainen, G. Navarro, and E. Ukkonen. Approximate string matching over ZivLempel compressed text. In Proc. CPM'2000, LNCS 1848, pages 195-209, 2000.
....to achieve together, as the only solution before the 90 s was to process queries by uncompressing the texts and then searching into them. In particular, approximate searching on compressed text was advocated in 1992 as an open problem [AB92] Only very recently a couple of solutions have appeared [KNU00, MKT 00] Despite their theoretical achievements, which are nontrivial, the experimental results in both papers show that in practice they are slower than a decompression of the text followed by a state of the art search on the uncompressed text. In this paper we take a different approach. ....
.... for LZW, was independently found [KTS 99] Approximate string matching on compressed text is an open problem advocated in 1992 [AB92] Very recently, it has been solved for the LZ78 LZW formats in O(mkn R) worst case and O(k 2 n R) average case time using dynamic programming techniques [KNU00] and in O(nmk 3 =w) worst case time using bit parallelism [MKT 00] However, both solutions are very slow in practice. The aim of this paper is to present the first practical solution to this problem for the LZ78 LZW formats. 3 The LZ78 LZW Compression Formats We introduce some notation ....
[Article contains additional citation context not shown here]
J. Karkkainen, G. Navarro, and E. Ukkonen. Approximate string matching over ZivLempel compressed text. In Proc. CPM'2000, LNCS 1848, pages 195--209, 2000.
....matching over di erent compression formats, starting with the work of Amir and Supported by the Academy of Finland under grant 22584. Supported in part by Fondecyt grant 1 990627. Benson [1] e.g. 2, 8, 10, 9] The only works addressing the approximate variant of the problem have been [11, 13, 15], on Ziv Lempel [20] Our focus is approximate matching over run length encoded strings. In runlength encoding a string that consists of repetitions of letters is compressed by encoding each repetition as a pair ( letter , length of the repetition ) For example, string aaabbbbccaab is encoded as ....
J. Krkkinen, G. Navarro, and E. Ukkonen. Approximate string matching over Ziv-Lempel compressed text. In Proc. CPM'2000, LNCS 1848, pages 195-209, 2000.
....the pattern and its occurrences are permitted. The problem, advocated in 1992 [2] had been solved for Huffman coding of words [18] but the solution is limited to search a whole word and retrieve whole words that are similar. The first true solutions appeared very recently, by Karkkainen et al. [11], Matsumoto et al. 16] and Navarro et al. 22] 3 The Ziv Lempel Compression Formats LZ78 and LZW The general idea of Ziv Lempel compression is to replace substrings in the text by a pointer to a previous occurrence of them. If the pointer takes less space than the string it is replacing, ....
J. Karkkainen, G. Navarro, and E. Ukkonen. Approximate string matching over Ziv-Lempel compressed text. In Proc. CPM'2000, LNCS 1848, pages 195--209, 2000.
....of [2] to multipattern searching was presented in [13] together with the first experimental results in this area. They achieve O(m 2 n) time and space, although this time m is the total length of all the patterns. Other algorithms for different specific search problems have been presented in [8, 11]. New practical results appeared in [17] who presented a general scheme to search on Ziv Lempel compressed texts (simple and extended patterns) and specialized it for the particular cases of LZ77, LZ78 and a new variant proposed which was competitive and convenient for search purposes. A similar ....
J. Karkkainen, G. Navarro, and E. Ukkonen. Approximate string matching over ziv-lempel compressed text. In Proc. CPM'2000, LNCS, 2000. In this same volume.
No context found.
J. Karkkainen, G. Navarro, and E. Ukkonen. Approximate string matching over Ziv-Lempel compressed text. In Proc. 11th Ann. Symp. on Combinatorial Pattern Matching, volume 1848 of Lecture Notes in Computer Science, pages 195--209. Springer-Verlag, 2000.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC