35 citations found. Retrieving documents...
T. Bell, I. H. Witten, and J. G. Cleary. Modeling for text compression. ACM Computing Surveys, 21(4):557--592, December 1989.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Streaming BDD Manipulation for Large-Scale Combinatorial.. - Minato, Ishihara (2001)   (1 citation)  (Correct)

....The special ID 0 represents the 0 terminal nodes. As we use comple Stream : MaxID Inv Node Inv : empty Node : SavedNode TempNode SavedNode : 0 ID ( SavedNode ) SavedNode Inv SavedNode ) ID TempNode : Node Inv Node ) MaxID : [1 9][0 9] ID : 1 9] 0 9] Figure 4: Syntax of BDD data format. 0 0 1 0 a (0 0) 1 b ( 0 0) 1) c ( 0 0) 1) c ( 0 0) 1) 1 0) 2) 3 a 8 b 8 c ( 0 0) 1 1) 2 2) 3. ab ac ( 0 0) 1) 0 0) 2) 3. bc ac ( 0(0 0) 1) 2(1 0) 3) 4. Figure 5: Simple examples of BDD data streams. ment ....

....0 represents the 0 terminal nodes. As we use comple Stream : MaxID Inv Node Inv : empty Node : SavedNode TempNode SavedNode : 0 ID ( SavedNode ) SavedNode Inv SavedNode ) ID TempNode : Node Inv Node ) MaxID : 1 9] 0 9] ID : [1 9][0 9] Figure 4: Syntax of BDD data format. 0 0 1 0 a (0 0) 1 b ( 0 0) 1) c ( 0 0) 1) c ( 0 0) 1) 1 0) 2) 3 a 8 b 8 c ( 0 0) 1 1) 2 2) 3. ab ac ( 0 0) 1) 0 0) 2) 3. bc ac ( 0(0 0) 1) 2(1 0) 3) 4. Figure 5: Simple examples of BDD data streams. ment edges[10] the ....

[Article contains additional citation context not shown here]

T. Bell, I. H. Witten, and J. G. Cleary. Modeling for text compression. ACM Computing Surveys, 21(4):557--591, Dec. 1989.


Optimal Encoding of Non-Stationary Sources - Reif, Storer (2001)   (Correct)

....and Reif [31] describe a massively parallel implementation of a systolic LZ compression scheme also using pair wise recursive parsing. Storer and Reif [30] describe the adaptation of LZ techniques to insure error resiliency. 1.5. Stationary source models for lossless compression Bell et al. [4] discuss many source models for lossless text compression. A number of models for string sources are now listed in increasing generality: i) Symmetric Bernoulli sources, where each symbol is independently randomly generated with equal likelihood; ii) Asymmetric Bernoulli sources, where each ....

T. Bell, I.H. Witten, G.J. Cleary, Modeling for text compression, ACM Computing Surveys 24 (1) (1989) 557 591.


Learning Nonstationary Models of Normal Network Traffic for.. - Mahoney, Chan (2002)   (13 citations)  (Correct)

....are most interested in those events that have the lowest probability. As a simplification, we assign anomaly scores only to those events that have never occurred in training, because these are certainly the least likely. We use the PPMC model of novel events, which is also used in data compression [2]. This model states that if an experiment is performed n times and r different outcomes are observed, then the probability that the next outcome will not be one of these r values is approximately r n. Stated another way, the fraction of events that were novel in training is r n, and we expect that ....

Bell, Timothy, Ian H. Witten, John G. Cleary, "Modeling for Text Compression", ACM Computing Surveys (21)4, pp. 557-591, Dec. 1989.


Energy Aware Lossless Data Compression - Barr (2000)   (9 citations)  (Correct)

....guidelines are derived for those seeking to minimize energy consumption of compressed data transmission. 4.1 Benchmark selection I have collected and compiled several benchmarks for the Skiff which have been described in Section 3.3. For input datasets, I have chosen the popular Calgary Corpus [7]. Though more modern and or, methodically chosen corpora exist, compression ratios for a given compressor have remained nearly identical over a range of well chosen input datasets [3] The Calgary Corpus remains the most popular reference for comparison despite its age. It consists of several ....

T. Bell, I. H. Witten, and J. G. Cleary. Modeling for text compression. ACM Computing Surveys, 21(4):557--591, 1989.


Compression Techniques for Chinese Text - Vines, Zobel   (Correct)

....by the total number of observed occurrences of any symbol in the context, which must include an estimate of the probability of observing a novel symbol, that is, one previously unseen in that context. There are several methods for estimating the number of occurrences allocated to the escape [23]. A successful method for English is to estimate the likelihood of observing a new character as equal to the number of characters observed once or more; this is method C of Bell et al. 1] With this method the escape probability can never exceed 50 once a context has been observed. Mo#at [24] ....

T.C. Bell, I.H. Witten, and J.G. Cleary, `Modeling for text compression', Computing Surveys, 21(4), 557--592 (1989).


Semantic Lossy Compression of XML Data - Cannataro, Carelli, Pugliese, Sacca   (4 citations)  (Correct)

....by a source and described by messages composed of symbols. Compression is often referred to as coding, since its objective is to represent source messages with corresponding codes. Source coding is related with the semantics of data, whereas entropy coding refers only to its redundancy [2, 5, 8]. The most important distinction among compression techniques concerns with their reversibility. If decoded data are identical to original ones, compression is called lossless; otherwise, it is called lossy. Lossless compression schemes refer e.g. to the work 2 by Hu#man [4, 5] the algorithm ....

Bell, T., Witten I.H., Cleary J.G., "Modeling for text Compression", ACM Computing Surveys 21,4 (Dec.) : 557-591, 1989.


Fast Text Compression with Neural Networks - Mahoney   (Correct)

....compression, Data compression, On line training learning, Maximum entropy, Prediction, Efficiency. 1. Introduction One of the motivations for using neural networks for data compression is that they excel in complex pattern recognition. Standard compression algorithms, such as LimpelZiv or PPM (Bell, Witten, and Cleary, 1989) or BurrowsWheeler (Burrows and Wheeler, 1994) are based on simple n gram models: they exploit the nonuniform distribution of text sequences found in most data. For example, the character trigram the is more common than qzv in English text, so the former would be assigned a shorter code. However, ....

....few characters to previous occurrences in the input) with a 3 layer neural network trained by back propagation to assign character probabilities when given the context as input. Unfortunately the algorithm was too slow to make it practical to test on standard benchmarks, such as the Calgary corpus (Bell, Witten, and Cleary, 1989). Training on a 10K to 20K text file required days of computation on an HP 700 workstation, and the prediction phase (which compressed English and German newspaper articles to 2.94 bpc) ran 1000 times slower than standard methods. In contrast, the 2 layer network we describe, which learns and ....

Bell, Timothy, Witten, Ian H., and Cleary, John G. 1989. Modeling for Text Compression, ACM Computing Surveys 21(4): 557-591.


A VLSI Systolic Array Architecture for Lempel-Ziv-Based.. - Bongjin Jung Wayne (1994)   (1 citation)  (Correct)

....in until the present encoding is complete. The architecture fully exploits this to eliminate the need for delay elements required for skewed inputs. The architecture consists of M processing elements where M is the maximum allowable length of source symbols to be encoded, typically 10 to 20 [3]. It achieves significant improvement in encoding latency while consuming substantially less area compared to [6] 2. LEMPEL ZIV CODING ALGORITHM The Lempel Ziv algorithm is based on the concept of encoding a string of source symbols whose length is less This research was supported by the ....

T. Bell, I. Witten, and J. Cleary, "Modeling for Text Compression," ACM Computing Surveys, pp558-591, Vol.21, No. 4, December 1989.


Optimal Lossless Compression of a Class of Dynamic Sources - Reif, Storer (1997)   (2 citations)  (Correct)

....freezing it when it is full or by incorporating a deletion method) Another practical variation is to allow non greedy parsing; that is, use a shorter than possible match at a given step to allow for a longer match at a later step. See the books of Storer [S 88] and Bell, Cleary, and Witten [BWC 89] for a presentation of other variations and practical implementations. For simplicity, we limit our attention to pure LZ77 and LZ78. However, all of the techniques we present can be generalized to incorporate most practical implementations of LZ77 and LZ78, but if the variation in question ....

....91] describe a massively parallel implementation of a systolic LZ compression scheme also using pair wise recursive parsing. Storer and Reif [SR 97] describe the adaptation of LZ techniques to insure error resiliency. Stationary Source Models for Lossless Compression. Bell, Witten, and Cleary [BWC 89] discuss many source models for lossless text compression. A number of models for string sources are now listed in increasing generality: i) Symmetric Bernoulli sources, where each symbol is independently randomly generated with equal likelihood, ii) Asymmetric Bernoulli sources, where each ....

T. Bell, I. H. Witten, and J. G. Cleary, Modeling for Text Compression, ACM Computing Surveys, 21:4, 557--591 (1989).


Data Compression in Haskell with Imperative Extensions - A Case.. - Thiemann   (Correct)

....lists of pairs of key and data. lzw it1 Version using an imperative version (with mutable variables) of the data structure of lzwh. lzw it2 The same, but implemented with mutable arrays. For experimental evaluation we have used some files from the Standard Calgary Text Compression Corpus [1, 2] (the same as in [13] Here are their characteristics: name size contents paper1 53161 text: scientific bib 111261 text: bibliography (REFER format) geo 102400 binary: geophysical data book1 768771 text: novel fiction 8 From the vivid description you can imagine that we found out the ....

Timothy C. Bell, Ian H. Witten, and J.G. Cleary. Modeling for text compression. ACM Computing Surveys, 21(4):557--591, December 1989.


On-line Data Compression in a Log-structured File System - Burrows, Jerian, Lampson.. (1992)   (27 citations)  (Correct)

....the freed space or see it go to waste. In either case, disk space tends to become fragmented, which reduces the effective compression. input block size compression ratio (bytes) output size input size) 1K 68 2K 63 4K 59 8K 55 16K 53 32K 51 The file progc from the Calgary Compression Corpus [3] was compressed using various block sizes. The file contains 39611 bytes of C source. The entire file was compressed, one block at a time. The compression algorithm is described below in Section 4 as Algorithm 2. Table 1: An example of improved compression with increased block size. Second, the ....

....algorithm is described below in Section 4 as Algorithm 2. Table 1: An example of improved compression with increased block size. Second, the best compression algorithms are adaptive they use patterns discovered in one part of a block to do a better job of compressing information in other parts [3]. These algorithms work better on large blocks of data than on small blocks. Table 1 shows the variation in compression ratio with block size for a simple adaptive compression algorithm. The details vary for different compression algorithms and different data, but the overall trend is the ....

[Article contains additional citation context not shown here]

T. Bell, I. H. Witten, and J.G. Cleary. Modeling for Text Compression. ACM Computing Surveys, Vol. 21, No. 4, December 1989, pp. 557--589.


Compression of Correlated Bit-Vectors - Bookstein, Klein (1990)   (6 citations)  (Correct)

....must be created that themselves require a substantial ammount of space. Thus, mecanisms for compressing a wide range of data structures must be sought for the efficient operation of such systems [10] To date, most attention has been given to, and progress made in, the area of text compression ([2], 13] 16] In this paper, we shall describe and examine the possibilities of compressing bitmaps, a data structure often proposed for improving the performance of retrieval systems ( 6] 18] Bitmaps occur often in information retrieval. They can represent the occurrences of a word in the ....

Bell T., Witten I.H., Cleary J.G., Modeling for Text Compression, ACM Computing Surveys 21 (1989) 557--591.


A Systematic Appproach to Compressing a Full Text Retrieval .. - Bookstein, Klein, Ziff   (Correct)

....(see, for example, This paper is a revision of [5] y Center for Information and Language Studies (CILS) University of Chicago, 1100 E. 57th St. Chicago, IL 60637 z Department of Mathematics and Computer Science, Bar Ilan University, Ramat Gan 52900, ISRAEL the reviews [13] and [1]) Of course, part of this interest is associated with requirements unrelated to storage for example the need to transmit large amounts of information over still expensive communication lines. But even within the area of data storage, compression is becoming increasingly important, ironically, ....

T. Bell, I.H. Witten, J.G. Cleary, "Modeling for Text Compression", ACM Computing Surveys, 21 (1989) 557--592.


Models of Bitmap Generation: A Systematic Approach to Bitmap .. - Bookstein, Klein (1992)   (1 citation)  (Correct)

....the structure between as well as within bitmaps to compress the whole bit table, using simple models of bit occurrence. The strategy of separating model construction and compression method continues the now well established practice of basing compression on an explicit model of message generation [2]. 2. Description of the compression technique The basic idea of all the methods described below is to partition the rows of the table into blocks of fixed size N bits and to estimate the probabilities of the different bit blocks by means of the assumed model. Once we have formulated the problem ....

Bell T., Witten I.H., Cleary J.G., Modeling for text compression, ACM Computing Surveys 21 (1989) 557--591.


Modeling Word Occurences for the Compression of Concordances - Bookstein, Klein, Raita   (Correct)

....the concordance efficiently. But concordance compression has theoretical interest as well. Current approaches to data compression tend to take a two stage approach: first one models the source, defining the message set and the probability of each message; then one creates the code for each message [2]. The concordance of a full text information retrieval system is the ideal entity on which to test this approach. As a well structured component of the IR system, it would seem particularly susceptible to modeling. When choosing an appropriate compression scheme, we note that in a static IRS, ....

Bell T.C., Witten I.H., Cleary J.G., Modeling for text compression, ACM Computing Surveys 21 (1989) 557--591.


On-Line Stochastic Processes in Data Compression - Bunton (1996)   (3 citations)  (Correct)

.... have systematically proved that the other known practical methods can be exactly and efficiently emulated with adaptive stochastic models [Ris83, Lan83, Bel86] Furthermore, the most effective practical data compression method for the past thirteen years, namely PPM, is a stochastic technique [Bel86, BWC89, BCW90, CTW95, CW84b, Mof90, WMB94]. Other techniques, namely any variant of the elegant Ziv Lempel techniques [ZL77, ZL78, MW85, Wel84, FG89, BB92] trade compression performance for speed and memory conservation more appropriately for today s technology. These techniques are examples of the more general textual substitution ....

T. C. Bell, I. H. Witten, and J. G. Cleary. Modeling for text compression. ACM Computer Surveys, 24(4):555--591, 1989.


Practical Implementations of Arithmetic Coding - Howard, Vitter (1992)   (18 citations)  (Correct)

....PPMA and PPMB. PPMP and PPMX appear in [57] they are based on the assumption that the appearance of symbols for the first time in a 12 3 FAST ARITHMETIC CODING file is approximately a Poisson process. See Table 1 for formulas for the probabilities used by the different methods, and see [5] or [6] for a detailed description of the PPM method. In Section 3.5 we indicate two methods that provide improved estimation of the escape probability. 2.4 Other applications of arithmetic coding Because of its nearly optimal compression performance, arithmetic coding has been proposed as an ....

T. C. Bell, I. H. Witten & J. G. Cleary, "Modeling for Text Compression," Comput. Surveys 21 (Dec. 1989), 557--591.


Text Compression as a Test for Artificial Intelligence - Mahoney   (Correct)

....tables, and mathematical formulas, leaving mostly readable English text. The file was then reduced to 27 characters as in alice, reducing it from 610,856 to 315,749 characters. The compression programs tested are of two types, ZivLimpel (LZ) and prediction by partial match (PPM) both described in (Bell, Witten, Cleary 1989). LZ compressors are the most popular, due to their high rate of decompression, but PPM achieves a better compression ratio. UNIX compress, pkzip (PKZIP 1993) and gzip (Gailly 1993) are all LZ compressors. All compression algorithms exploit the lexical redundancy found in most files, including ....

....entire file, x. An order n PPM encoder uses only the context of the last n characters, x i n . x i 1 to determine P(x i ) based on statistics from previous occurrences of the same context. Going beyond order 4 or 5 rarely helps. It has been shown that LZ is a special case of predictive encoding (Bell, Witten, Cleary 1989). The following compression programs were tested. Options shown were selected for maximum compression when possible. For archivers, which compress multiple files into a single file, the overhead of storing the filename, date, checksum, etc. is not included in table 1 for individual files. ....

[Article contains additional citation context not shown here]

Bell, T., Witten I. H., Cleary J. G., 1989. Modeling for Text Compression. ACM Computing Surveys 21:557-591.


An Asymmetric, Semi-adaptive Text Compression Algorithm - Plantinga   (Correct)

....compress files, but decompression is competitive with the fastest algorithms and requires little memory. Small parts of a file compressed with this algorithm may be accessed and decompressed without decompressing the rest of the file. Experimental results show that for a standard corpus of texts [Bell et al. 1989], the compression ratios achieved are competitive with and in some cases better than those for the best of the adaptive dictionary based techniques, especially for large files. Some statistical modeling arithmetic coding techniques achieve better compression ratios, but they require much more ....

....3.27 1.62 1.64 2.20 Total: 3141622 1259141 1061830 848885 965620 Avg: 3.64 2.71 2.47 2.82 Largefileavg: 3.59 2.92 2.52 2.75 Comp. time 0:00:27 0:01:01 0:08:09 15:53:10 Decomp.time 0:36 0:35 8:18 0:30 Table 3. Compression results for several algorithms. 7 The corpus of texts is that used by Bell et al. 1989]. The version of the UNIX compress program used is 4.0 and the version of gzip is 1.2.3. comp 2 is an experimental implementation [Nelson, 1991] of Prediction by Partial Match and Arithmetic Coding [Cleary and Witten, 1984; Moffat 1988] using 4th order statistics. All runtimes presented are the ....

Bell, T., I. H. Witten, and J. G. Cleary, "Modeling for text compression," ACM Computing Surveys 21(4), December 1989, pp. 557-592.


Objective Evaluation of Inferred Context-Free Grammars - Tony Smith (1994)   Self-citation (Witten Cleary)   (Correct)

....for the sample grammars. As previously noted, possible instantiations are rarely equiprobable, and this fact can be used to improve the predictive power of each grammar and thus reduce the amount of disambiguation information required. Techniques such as Huffman coding [9] or arithmetic coding [2, 10, 14] give optimum encodings for such decisions. Conclusions The minimum goal of grammatical inference is to produce a compact grammar general enough to accept all well formed expressions of a language without admitting those that are malformed. There appears to be no shortage of methods by which ....

T.C. Bell, I.H. Witten, and J.G. Cleary. Modeling for text compression. Computing Surveys, 21(4):557--591, December 1989.


DNA-Based Cryptography - Gehani, LaBean, Reif (1999)   (1 citation)  (Correct)

No context found.

T. Bell, I. H. Witten, and J. G. Cleary. Modeling for text compression. ACM Computing Surveys, 21(4):557--592, December 1989.


DNA-Based Cryptography - Ashish Gehani Thomas (1999)   (1 citation)  (Correct)

No context found.

T. Bell, I. H. Witten, and J. G. Cleary. Modeling for text compression. ACM Computing Surveys, 21(4):557--592, December 1989.


Text Augmentation: Inserting XML tags into natural language text.. - Yeates (2003)   (Correct)

No context found.

Timothy C. Bell, Ian H. Witten, and John G. Cleary. Modeling for text compression. Computing Surveys, 21(4):557--591, December 1989.


Unknown - Figure There Are   (Correct)

No context found.

T. Bell, I. H. Witten, and J. G. Cleary. Modeling for text compression. ACM Computing Surveys, 21(4):557--591, 1989.


MIRAGE+: A Kernel Implementation of Distributed Shared Memory.. - Fleisch, Hyde (1994)   (2 citations)  (Correct)

No context found.

T. Bell, I. H. Witten and J. G. Cleary, `Modeling for text compression', ACM Computing Surveys, 21, (4), 557--589 (1989).

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC