| Bell, T. C., and Witten, I. H. The Relationship between Greedy Parsing and Symbolwise Text Compression. Journal of ACM 41, 4 (July 1994), pp. 708-724. |
....substring is encoded separately. There is much freedom possible, perhaps too much freedom, in how the input should be decomposed into strings. As a major simplification, and because it is an easily implemented approach that achieves excellent results, a greedy parsing approach is commonly used [2]. For the LZ77 class of methods, this means that the compression program always attempts to match the longest possible sequence of symbols, starting at the first un encoded input symbol, against the contents of the buffer. The program ignores the possibility of matching some shorter sequence so ....
Bell, T. C., and Witten, I. H. The Relationship between Greedy Parsing and Symbolwise Text Compression. Journal of ACM 41, 4 (July 1994), pp. 708-724.
....Will the compression efficiency lost by having shorter matches be more than offset by the more concise representation of dictionary pointers Insights gained from results concerning the equivalence between dictionary and statistical coding suggest that this should be the case. It has been shown [11, 2], that some of the Ziv Lempel algorithms are, in a sense, equivalent to corresponding statistical coders. Loosely speaking, a statistical method of this sort uses a varying number of previous characters as a context in order to calculate its estimated probability distribution for the next ....
T. C. Bell and I. H. Witten. Relationship between greedy parsing and symbolwise text compression. Journal of the ACM, 41(4):708--724, July 1994.
....have been proposed. However, not all the algorithms are realized by computers. We consider realization of compression algorithms. Text compression algorithms can be classified into two methods: ffl dictionary method, and ffl statistical method. Though equivalence of these methods is shown [BW94a] their practical performance is different. The statistical methods generally achieve better compression ratio than dictionary methods though statistical methods require many computer resources. Ziv and Lempel proposed two universal text compression algorithms called LZ77 [ZL77] and LZ78 [ZL78] ....
T. C. Bell and I. H. Witten. The Relationship between Greedy Parsing and Symbolwise Text Compression. Journal of the ACM, 41(4):708--724, 1994.
....estimate given by the fraction of time c has occurred given that the context was the previous three characters. If c has not occurred previously in the given three character context, the encoder sends a special escape character, and recurses to using a two character context. It has been shown [2,9], that some of the Ziv Lempel algorithms are, in a sense, equivalent to corresponding statistical coders. Loosely speaking, a statistical method of this sort uses a varying number of previous characters in order to calculate its estimated probability distribution for the next character in a manner ....
T. C. Bell and I. H. Witten, "The relationship between greedy parsing and symbolwise text compression," Journal of the ACM (In press).
....we can use lossy compression scheme and achieve high compression performance. However, for binary files and text files, we must use lossless scheme. The lossless data compression scheme can be divided into two schemes: dictionary method and statistical method. These are essentially equivalent [4], but practical performance are different. Recently a new compression scheme, called block sorting [7] was developed. It seems not to belong to both schemes, but in reality it is related to the statistical scheme [8, 11] The dictionary method encodes a text string to be compressed to indices ....
T. C. Bell and I. H. Witten. The relationship between greedy parsing and symbolwise text compression. Journal of the ACM, 41(4):708--724, 1994.
....the application and available resources. Preliminary experiments indicate that this yields highly competitive results. Furthermore, with our sliding window technique we obtain a natural implementation of Ziv Lempel compression which runs in linear time. It has been noted, e.g. by Bell and Witten [2], that there is a strong connection between string substituting compression methods and symbolwise (statistical) methods. Our assertion that the exact same data structure is useful in both these families of algorithms serves as a further illustration of this. ....
T. C. Bell and I. H. Witten. The relationship between greedy parsing and symbolwise text compression. J. ACM, 41(4):708--724, July 1994.
....called the symmetric property. This is in contrast to the asymmetric property of LZ77 and LZ78, where encoding is more complex than decoding. 1. 3 Equivalence of LZ and Statistical Coders Although dictionary and statistical coding appear at first glance to be quite di#erent, it has been shown [93, 56, 3], that some of the Ziv Lempel algorithms that use greedy parsing have equivalent statistic coders that give the same compression for all inputs. 5 The connection between LZ and statistical coders goes beyond equivalence of code lengths. A more fundamental result is that an LZ dictionary ....
T. C. Bell and I. H. Witten. Relationship between greedy parsing and symbolwise text compression. Journal of the ACM, 41(4):708--724, July 1994.
....algorithms in the literature (e.g. ZL77, ZL78, Wel84, MW85, Yok92] use greedy parsing, which takes the uncompressed suffix of the input and parses its longest prefix, which is a phrase in the dictionary. The next substring to be parsed starts where the currently parsed substring ends. See [BW94] for a study of greedy parsing with static dictionaries. Greedy parsing is fast and can usually be applied on line, and is hence very suitable for communications applications. However, it was shown in [GSS85] that for static dictionaries greedy parsing can be quite far from optimal: there are ....
T. Bell and I. Witten. The relationship between greedy parsing and symbolwise text compression. Journal of the ACM, 41(4):708--724, July 1994.
....algorithms in the literature (e.g. ZL77, ZL78, Wel84, MW85, Yok92] use greedy parsing, which takes the uncompressed suffix of the input and parses its longest prefix, which is a phrase in the dictionary. The next substring to be parsed starts where the currently parsed substring ends. See [BW94] for a study of greedy parsing with static dictionaries. Greedy parsing is fast and can usually be applied on line, and is hence very suitable for communications applications. However, it was shown in [GSS85] that for static dictionaries greedy parsing can be quite far from optimal: there are ....
T. Bell and I. Witten. The relationship between greedy parsing and symbolwise text compression. Journal of the ACM, 41(4):708--724, July 1994.
....for each phrase in the hierarchy. 3. Compression E#ectiveness 3.1. Symbolwise Equivalent. To understand the structure of dictionarybased models it is helpful to consider the structure of their symbolwise equivalent models models that process one character at a time with an entropy coder [1, 3]. Consider the final sequence of phrases. Suppose that there are k # distinct phrases in the sequence, and n # phrases in total. Then each occurrence of a phrase that appears # times in the final sequence generates approximately log 2 (# n # ) bits in the compressed message, since the final ....
Timothy C. Bell and Ian H. Witten, The relationship between greedy parsing and symbolwise text compression, Journal of the acm 41 (1994), no. 4, 708--724.
....described here. However, for almost all known compression methods a symbol wise equivalent method can be determined in which probabilities for each symbol are determined, and a coder used to represent that symbol. Langdon (1983) gives an example of this kind of equivalence; others may be found in Bell and Witten (1994). 2.2 Overview of Coding Methods By Shannon s source coding theorem (Shannon, 1948) optimal coding is achieved when the length of the code assigned to the ith symbol of the alphabet is Gamma log 2 p i where p i is the probability of that symbol. Such an assignment of codewords is the goal if ....
....so decoding can proceed despite the apparent circular reference. 3.3 Model equivalence Although we have distinguished between context based methods and dictionary methods, the line between the two is blurred. For example, it is possible to use contexts to predict dictionary entries (Gutmann Bell, 1994; Hoang et al. 1995) and it is possible for contextbased systems to use strings as their symbols (such as the word based coder mentioned 22 Previously coded characters Characters to be encoded phrase 109 = aabaa phrase phrase 110 = 111 = abcaa ababa . aaba,abca,abab, aabaa,aab. coded ....
[Article contains additional citation context not shown here]
Bell, T.C., & Witten, I.H. (1994). The relationship between greedy parsing and symbolwise text compression. Journal of the ACM, 41(4):708--724.
No context found.
Timothy C. Bell and Ian H. Witten, The relationship between greedy parsing and symbolwise text compression, Journal of the ACM 41 (1994), no 4, 708--724.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC