32 citations found. Retrieving documents...
M. Rodeh, V. R. Pratt, and S. Even, Linear algorithm for data compression via string matching, Journal of the ACM 28 (1981), no. 1, 16-24.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Approximation Algorithms For The Shortest - Common Superstring Problem   (Correct)

....representation can be obtained by labeling edges with strings rather than single characters. This allows us to eliminate many nodes with single children and results in a representation that requires O(m) space and that can be constructed in O(m) time, as described by McCreight [10] See also [2, 11]. Actually, McCreight defines a suffix tree to contain suffixes of a single string rather than a collection of strings. Our variant requires only minor modifications to McCreight s method. An example of this compact representation of suffix trees is shown in Figure 12. We perform deletion in ....

Rodeh, M., V. R. Pratt and S. Even. "Linear Algorithm for Data Compression via String Matching," Journal of the ACM, vol. 28, 1/81, 16--24.


Interval and Recency Rank Source Coding: Two On-Line Adaptive.. - Elias (1987)   (15 citations)  (Correct)

....length and coping with redundancies of higher order. Modified and augmented versions of the Ziv and Lempel schemes have been implemented by Miller and Wegman [15] and Welch [21] and seem to work very well for some sources, but analytic bounds are less simple. A variant by Rodeh, Pratt, and Even [18] which uses universal codewords as variable length backpointers is close in spirit to interval encoding but preserves the power (and complexity) of dealing with higher order source properties. Other adaptive back pointer schemes are referenced and analyzed in Gonzalez Smith and Storer [10] The ....

M. Rodeh, . Pratt and S. Even, "Linear algorithm for data compression via string matching," J. Assoc. for Comp. Mach., vol. 28, no. 1, pp. 16-24, Jan. 1981.


Authentication of LZ-77 compressed data - Atallah, Lonardi (2003)   (Correct)

....a few experiments on some files of the Calgary corpus, which is the standard benchmark in data compression. As a comparison, we also included a non structured document, called mito, which contains the mitochondrial DNA of the yeast. We instrumented an implementation of LZ 77 based on su#x trees [17], and we kept track of the multiplicity q for each phrase of the LZ 77 parsing, when the length of the phrase is greater than 2. The average value of q is shown in Figure 4, for increasing lengths of the prefixes. Note that for both graphs, the average for q appears to converge asymptotically to ....

M. Rodeh, V. R. Pratt, and S. Even. Linear algorithm for data compression via string matching. J. Assoc. Comput. Mach., 28(1):16--24, Jan. 1981.


Optimal Encoding of Non-Stationary Sources - Reif, Storer (2001)   (Correct)

....a linear time and space algorithm for the construction of a suffix tree, which effectively represents all substrings of a given string in linear space and allows an LZ77 algorithm to find a longest match in constant time per character read, which yields a linear time implementation. Rodeh et al. [24] presented a linear time implementation for LZ77 string matching for a bounded size sliding window, by employing three overlapping suffix tries. Fiala and Greene [1989] 14] presented a modification to McCreight s approach for a sliding window, where vertices are continuously deleted 90 in ....

....where the number is preceded by its length, the length by its length, and so on until a constant number of bits is reached; the total length of the code is llog(i)J lloglog(i)J . 1 bits; see Levenshtein [19] Ellias [12] and Even and Rodeh [13] This technique was also used by Rodeh et al. [24] in their sliding window implementation, and described (and given the name cascading lengths) in the book of Storer [29] Extensions of this idea to near optimal bounds are given in Bentley and Yao [7] Given that we are using cascading lengths codes, it is convenient to represent a match in a ....

M. Rodeh, V. Pratt, S. Even, Linear algorithm for data compression via string matching, Journal of the Association for Computing Machinery 228 (1) (1981) 16 24.


Indexing Compressed Text - He (2003)   (Correct)

....of replacing the strings in the text with a pointer to where they have occurred earlier in the text. This family of algorithms are generally derived from one of the two di#erent approaches published in [63] and [64] namely LZ77 and LZ78, respectively. Some variants in the LZ77 family are: LZR [50], LZSS [12] LZB [13] and LZH [16] Some variants in the LZ78 family are: LZW [59] LZC [54] LZT [55] LZMW [43] and LZJ [36] LZFG [29] is based on both LZ77 and LZ78. One particularly e#ective variant is that of Welch labeled LZW [59] and is derived from LZ78. In this algorithm, we first ....

M. Rodeh, V. R. Pratt, and S. Even. Linear algorithm for data compression via string matching. Journal of the ACM, 28(1):16--24, January 1981.


Space Efficient Suffix Trees - Munro, Raman, Rao (1998)   (3 citations)  (Correct)

....have been applied to fundamental string problems such as finding the longest repeated substring [26] finding all squares or repetitions in a string [1] approximate string matching [15] and string comparison. They have also been used to address other types of problems such as text compression [24], compressing assembly code [10] inverted indices [5] and analysing genetic sequences [8] A suffix tree for a text string x of length n is an ordered trie whose entries are all the suffixes of x. In order to ensure that each suffix has a minimal prefix that distinguishes it from the other ....

M. Rodeh, V. R. Pratt, and S. Even, Linear algorithm for data compression via string matching, Journal of the ACM 28(1) (1991) 16-24.


Space Efficient Suffix Trees - Munro, Raman, Rao (1998)   (3 citations)  (Correct)

....also been applied to fundamental string problems such as finding the longest repeated substring [20] finding all squares or repetitions in a string [1] approximate string matching [10] and string comparison. They have also been used to address other types of problems such as text compression [18], compressing assembly code [6] inverted indices [2] and analyzing genetic sequences [4] A suffix tree for a text string x of length n is an ordered trie whose entries are all the suffixes of x. In order to ensure that each suffix corresponds to a unique position in the text, a special ....

M. Rodeh, V. R. Pratt, and S. Even, "Linear algorithm for data compression via string matching", Journal of the ACM 28,1 (1991) 16-24.


Finding approximate repetitions under Hamming distance - Kolpakov, Kucherov (2001)   (2 citations)  (Correct)

....does not occur earlier (see [Gus97] for a related discussion) In this paper, we use the Lempel Ziv factorization which suits better to our purposes. RR n 4163 14 Roman Kolpakov and Gregory Kucherov ajabjbajbabjabbbjc. Both variants of Lempel Ziv factorization can be computed in linear time [RPE81, Gus97] If w = f 1 f 2 : f m is the Lempel Ziv factorization, we call f i s Lempel Ziv factors or simply factors of w. The last character of f i will be called the head of f i . We are now ready to describe the algorithm for nding all K match globallyde ned repetitions. Consider the ....

M. Rodeh, V.R. Pratt, and S. Even. Linear algorithm for data compression via string matching. Journal of the ACM, 28(1):16 24, Jan 1981.


Data Compression and Database Performance - Graefe, Shapiro (1991)   (13 citations)  (Correct)

....in relational database systems. We offer our conclusions in Section 5. 1 2. Related Work A number of researchers have considered text compression schemes based on letter frequency, as pioneered by Huffman [10, 11, 16, 17] Other recent research has considered schemes based on string matching [22, 30, 32, 33] Comprehensive surveys of compression methods and schemes are given in [2, 18, 27] Others have focussed on fast implementation of algorithms, parallel algorithms and VLSI implementations [12, 29] Few researchers have investigated the effect of compression on database systems and their ....

M. Rodeh, V. R. Pratt and S. Even, Linear Algorithm for Data Compression via String Matching, J. of the ACM 28, 1 (January 1981), 16.


Optimal Lossless Compression of a Class of Dynamic Sources - Reif, Storer (1997)   (2 citations)  (Correct)

....time and space algorithm for the construction of a suffix tree, which effectively represents all substrings of a given string in linear space and allows an LZ77 algorithm to find a longest match in constant time per character read, which yields a linear time implementation. Rodeh, Pratt, and Even [RPE 81] presented a linear time implementation for LZ77 string matching for a sliding window, by employing three overlapping suffix tries. Fiala and Greene [FG89] presented a modification to McCreight s approach for a sliding window, where vertices are continuously deleted in (amortized) constant time ....

.... by its length, the length by its length, and so on until a constant number of bits is reached; the total length of the code is blog ic blog log ic : 1 bits; see also Levenshtein [L 68] Ellias [E 75] and Even and Rodeh [ER 78] This technique is also was used by Pratt, Rodeh, and Even [RPE 81] in their sliding window implementation, and described (and given the name cascading lengths) in the book of Storer [S 88] Extensions of this idea to near optimal bounds are given in Bentley and Yao [BY 76] Given that we are using cascading lengths codes, it is convenient to represent a match ....

M. Rodeh, V. Pratt, and S. Even, Linear Algorithm for Data Compression via String Matching, Journal of the Association for Computing Machinery, 228:1, 16--24, (1981).


Suffix Trees and their Applications in String Algorithms - Grossi, Italiano (1993)   (2 citations)  (Correct)

.... of strings [55] finding the longest substring that appears in h out of k strings, for any h 2 [58] computing characteristic strings [59] matching a string as an arbitrary path of an unrooted labeled tree [4] performing efficient dictionary matching [6, 5, 7, 21, 43] data compression schemes [39, 40, 73, 83, 84, 102, 103]; searching for the longest run of a given motif in molecular sequences [53, 54, 100] metric distance on strings [34] complexity measure on random strings for cryptology [81] inverted indices [22] analyzing genetic sequences [25, 23] finding duplication in programming code [13] generating ....

....the pathstring y having locus in the parent of leaf j, and append to y the (j jyj) th character of x , say a. Thus y is a prefix of x[j; n] that occurs at least twice in x, but ya is a prefix of x[j; n] that occurs only once. Problems of this kind are found also in data compression schemes [39, 73, 83, 84, 102, 103], in compressing assembly code [40] and in searching for the longest run of a given motif in molecular sequences [53, 54, 100] Suffix trees help also to design elegant algorithms for finding squares [10, 68] and repetitions in a string [10] computing statistics for the non overlapping ....

Rodeh, M., Pratt, V., and Even, S., Linear algorithm for data compression via string matching, J. ACM, 28, 16--24, (1991).


A Guaranteed Compression Scheme for Repetitive DNA Sequences - Rivals (1996)   (3 citations)  (Correct)

.... a compression scheme dedicated to genetic sequences is due to [GT93] their algorithm is able to encode direct repeats and genetic palindromes (a special kind of symmetry between words) but is derived from a LZR scheme 1 , i.e. a LZ77 scheme which avoids the limitation on the window size (cf. [ZL77, RPE81]) The main principle of dictionary methods is to substitute an occurrence of a repeated factor by a pointer that references an earlier occurrence of the same factor. We define the encoding gain of a factor occurrence as the difference in bits between: 1. the original size of the factor, 2. the ....

M. Rodeh, V.R. Pratt, and S. Even. Linear algorithm for data compression via string matching. Journal Association for Computing Machinery, 28(1):16--24, January 1981.


Maximal repetitions and Application to DNA sequences - Giraud, Kucherov (2000)   (Correct)

....in O( n log n) time. Improving this bound is an interesting problem which is a subject of our current investigation. 2. Implementation of MKK algorithm 2.1 Off line construction of s factorization using DAWG Computing s factorization can be done in O(n) time. Methods proposed in the literature [5, 14] consist of constructing the s factorization along with constructing the suffix tree. We adopt another strategy, and construct the s factorization using the DAWG (Directed Acyclic Word Graph) instead of the suffix tree. DAWG is a powerful data structure obtained by identifying isomorphic branches ....

RODEH, M., PRATT, V. R., AND EVEN, S. Linear algorithm for data compression via string matching. Journal of the ACM 28, 1 (1981), 16--24.


Finding Repeats With Fixed Gap - Kolpakov, Kucherov   (Correct)

.... [ZL77] and to the underlying de nition of complexity of a string [LZ76] A salient property of Lempel Ziv factorization is that it can be computed in time O(n) This can be done using the sux tree data structure [McC76, Ukk95] developed in the context of string matching applications (see [RPE81] 3 Conversely, the Lempel Ziv factorization (and its close relative the s factorizaiton [Cro83] turned out itself to be useful in string matching applications related to the search for repetitions in the word [Mai89, KK99a, SG98b] This paper gives another example of such an application. ....

M. Rodeh, V.R. Pratt, and S. Even. Linear algorithm for data compression via string matching. Journal of the ACM, 28(1):16-24, Jan 1981. 9


Linear Time Algorithms for Finding and Representing all.. - Gusfield, Stoye (1998)   (5 citations)  (Correct)

....inductively by i 1 = 1 and i B 1 = i B max(1; l i B ) for i B n. The substring S[i B : i B 1 Gamma 1] 1 B k, obtained in this way is called the Bth block of the LZ decomposition of S. It is well known that this decomposition can be computed in O(n) time, e.g. using the suffix tree of S [RPE81] 1 . Table 1 shows the values s i and l i for the string abaabaabbaaabaaba , and Figure 2 shows the Lempel Ziv decomposition. The following two basic facts are stated explicitly or implicitly in [Cro83, Cro86, CR94] and connect tandem repeats to the blocks of the LZ decomposition. Lemma 1. The ....

M. Rodeh, V. R. Pratt, and S. Even. Linear algorithm for data compression via string matching. J. ACM, 28(1):16--24, 1981.


An N-Best representation for bidirectional parsing strategies - Corazza, Lavelli   (Correct)

....the concatenation of the N sequences interleaved by N different delimiters (one of them must be put at the end of the string) The use of these delimiters avoids two repetitions of the same substring crossing the boundary between two of the N best sentences. Initially the algorithm described in [ Rodeh et al. 1981 ] was chosen to make the compression and therefore to construct the representation of the N Best. This algorithm is linear, but it tends to optimality only in case the input length tends to infinity, which is not at all our case. On the contrary, the fact that in our case the input length is not ....

Rodeh, M.; Pratt, V. R.; and Even, S. 1981. Linear algorithm for data compression via string matching.


Fast Pattern Matching for Entropy Bounded Text - Chen, Reif (1995)   (1 citation)  (Correct)

....1 Introduction 1.1 Pattern matching problem Given a text of length n and a pattern of length m, the pattern matching problem is to find all occurrences of the pattern in the text. There have been various efficient algorithms designed for both one and higher dimensional pattern matching problems [19,14,17,15]. 15] gave an algorithm which works in O(n=m) time assuming the text is generated from a source of uniform distribution. Many of these algorithms run in O(n) time in the worst case. Two approaches have been adopted in designing these algorithms: to base the analysis of the algorithm either on ....

M. Rodeh, V.R. Pratt, and S. Even. Linear algorithm for data compression via string matching. J. of ACM, 28:1:16--24, 1981.


Self-Alignment in Words and their Applications - Apostolico, Szpankowski (1992)   (11 citations)  (Correct)

....the intuition that a very random string is not compressible. For such a sequence, the expected length of each phrase is 2 Delta log n, i.e. exactly the length in bits of the pair of pointers (In practice, efficient representations of integers in an unbounded range, like those in, e.g. [AF, ER, RPE], may be used to reduce the length of such pointer encodings. 3.2 Advanced applications We analyze here (i) the problems of testing the square freedom of a string X or finding all squares in it, and (ii) the related problem of building indexes for the statistics without overlap of all ....

M. Rodeh, V. Pratt and S. Even, Linear Algorithm for Data Compression via String Matching, Journal of the ACM, 28, 16-24 (1981).


Parallel Text Compression - Stauffer, Hirschberg (1993)   (Correct)

....state. 4. PARALLEL DICTIONARY COMPRESSION Dictionary compression (also referred to as Ziv Lempel compression or textual substitution) removes data redundancy by replacing repeated input substrings by references (also called indices or pointers) to earlier copies of the identical substring [RPE81, SS82, ZL77, ZL78]. A dictionary of characters, words or phrases that are expected to occur frequently is maintained and a recurring substring is encoded by the index of its corresponding dictionary entry. Compression is achieved by choosing indices so that on average they require less space than the phrase they ....

Rodeh, M., Pratt, V. R. and Even, S. Linear algorithm for data compression via string matching. J. ACM 28, 1 (Jan., 1981), 16--24.


A Program for Identifying Duplicated Code - Brenda Baker (1992)   (19 citations)  (Correct)

....for genome sequencing [SK,LMW] have been based on finding a best match (longest common subsequence or smallest edit distance) between two sets of data, from start to end, whereas dup looks for pairs of shorter related sections that could appear in any order. Data compression algorithms such as [RPE,ZL] have been based on finding copies of substrings, but for data compression, one copy is sufficient, as opposed to this work, in which the goal is to find all sufficiently long copies. Exact Matching This section describes how dup finds the set of pairs that are longest exact matches. First, a ....

M. Rodeh, V. R. Pratt, and S. Even, Linear algorithms for data compression via string matching, J. ACM 28,1 (1981), pp. 16-24.


On Finding Duplication in Strings and Software - Baker (1993)   (2 citations)  (Correct)

....of distinct occurrences of identical symbols (where the alphabet size was potentially linear in n) Bak1] The problem of finding duplication in a string is closely related to string pattern matching and to data compression. Suffix trees have been used as the basis for algorithms for both problems [McC, RPE]. For string pattern matching, however, the basic problem is to determine whether one string occurs within another, rather than to find all maximal matches. The problems closest to ours studied previously have been (1) to find the longest substring common to two strings [Cr87] 2) to construct a ....

....for each substring of a string [Cr86] and (3) to find squares in a string, or to determine whether as string is square free, where a square is an occurrence of xx for a substring x [AP,Cr86,Gu,ML,Rab,SeiG] Problem (3) has also been called the problem of finding repetition in a string. In the [RPE] algorithm for Lempel Ziv data compression [ZL] the algorithm finds some maximal matches in the course of building a suffix tree, but does not find all maximal matches. The goal of reporting all pairs of matches is quite different from the goal of finding the minimum edit distance between two ....

M. Rodeh, V. R. Pratt, and S. Even, Linear algorithms for data compression via string matching, J. ACM 28,1 (1981), pp. 16-24.


Compacting Object Code via Parameterized Procedural Abstraction - Zastre (1995)   (12 citations)  (Correct)

.... in linear time and reasonable (polynomial) space [19] As it contains indexes to all substrings, we can also get answers to such questions as, What is the longest 24 substring that occurs in k places [25] Other researchers have used the suffix tree as the basis for regular text compression [23]. By treating the assembly language text as one string, substrings with several occurrences correspond to potential subroutines. Assembler directives are flushed directly to the compressed file as these must be left unmodified. For the rest of the assembly code, the authors restrict the possible ....

Michael Rodeh, Vaughan R. Pratt, and Shimon Even, "Linear Algorithm for Data Compression via String Matching," Journal of the ACM, Vol. 28, No.1, January 1981, 16-24.


Extended Application of Suffix Trees to Data Compression - Larsson (1996)   (10 citations)  (Correct)

....in linear time. Our primary contribution is to present a scheme that enables practical use of suffix trees for PPM style statistical modeling methods, together with its necessary theoretical justification. Also, application to Ziv Lempel compression is natural. Some compression schemes [3, 9] require that each character, once read from the input, resides in primary storage until all of the input has been processed. This is not feasible in practice. We need a scheme that allows maintaining only a limited part of the input preceding the current position a sliding window. Fiala and ....

....longest matching string. This can be done in asymptotically optimal time by maintaining a suffix tree to index the buffer. For each iteration, the longest match can be located by traversing the tree from the root, following edges corresponding to characters from the input stream. Rodeh and Pratt [9] show that it is possible to implement a linear time Ziv Lempel algorithm utilizing a suffix tree. However, since they do not allow deletions from the suffix tree, their algorithm requires Omega Gamma n) space to process an input of length n. This implies that large inputs must be split into ....

M. Rodeh and V. R. Pratt. Linear algorithm for data compression via string matching. J. ACM, 28(1):16--24, Jan. 1981.


Maximal tandem repetitions and Applications to DNA words - Giraud (1999)   (Correct)

No context found.

M. Rodeh, V. R. Pratt, and S. Even, Linear algorithm for data compression via string matching, Journal of the ACM 28 (1981), no. 1, 16-24.


Longest-match String Searching for Ziv-Lempel Compression - Bell, Kulp (1993)   (12 citations)  (Correct)

No context found.

M. Rodeh, V. R. Pratt and S. Even, `Linear algorithm for data compression via string matching', J ACM, 28, (1), 16--24 (1981).

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC