25 citations found. Retrieving documents...
Paul Glor Howard. The Design and Analysis of Efficient Lossless Data Compression Systems. PhD thesis, Department of Computer Science, Brown University, June 1993.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Adaptive Text Mining: Inferring Structure from Sequences - Witten   (Correct)

....text compression for the purposes of text mining. In this section and the next, we will review applications of character based compression methods. Throughout this work, the well known PPM text compression scheme is used [1, 6] with order 5 (except where otherwise mentioned) and escape method D [13]. However, the methods and results are not particularly sensitive to the compression scheme used, although characterbased prediction is assumed. Named entities are defined as proper names and quantities of interest in a piece of text, including personal, organization, and location names, as ....

Howard, P.G. (1993) The design and analysis of efficient lossless data compression systems. PhD thesis, Brown University.


A Compression-based Algorithm for Chinese Word Segmentation - Teahan, Wen, McNab, Witten (2000)   (6 citations)  (Correct)

....discussion of this question, and several different methods have been proposed. Our experiments calculate the escape probability in a particular context as n d 2 1 where n is the number of times that context has appeared and d is the number of different symbols that have directly followed it (Howard, 1993). The probability of a character that has occurred c times in that context is n c 2 1 . Since there are d such characters, and their counts sum to n, it is easy to confirm that the probabilities in the distribution (including the escape probability) sum to 1. To illustrate the PPM ....

Howard, P.G. (1993) The design and analysis of efficient lossless data compression systems. PhD thesis, Brown University, Providence, RI.


Relationship Between Hidden Markov Models And Prediction By.. - Yeates (2001)   (Correct)

....to handle low probabilities. All probabilities are calculated from the counts of previously seen characters, with a small probability of seeing a novel character. The size of this small probability and the way it is spread among the possible novel characters is determined by the escape method [4,9]. The escape method specifies a mapping from a collection of counts to a probability distribution. It is not clear whether the Baum Welsh algorithm has the same or similar effects as commonly used escape methods and determining this is beyond the scope of this paper. It should be stressed that ....

Paul Glor Howard. The Design and Analysis of Efficient Lossless Data Compression Systems. Ph.D. Thesis, Department of Computer Science, Brown University, May


An Image Compression Method for Spatial Search - Pajarola, Widmayer (2000)   (6 citations)  (Correct)

....of the location and value of some particular neighboring pixels. This spatial relation imposes restrictions on the traversal order of the pixels in the image, and it therefore makes the construction of well shaped pixel regions impossible. IV. A SPATIAL COMPRESSION ALGORITHM As discussed in [19], lossless image compression algorithms usually view a pixel value as a random variable. A compression algorithm then typically consists of the following four processing steps: 1) select the location of the next pixel to be encoded (pixel selection) 2) compute a prediction of the value of the ....

....coder [31] generates minimum redundancy codes which have a one to one relationship with the input symbols, and it also approximates the entropy bound for a given probability model. To reduce coding complexity, we use fixed precomputed Huffman tables for a specific set of variances presented in [19]. This set of variances introduced in [19] guarantees an unnoticeable loss in coding efficiency. PAJAROLA AND WIDMAYER: IMAGE COMPRESSION METHOD FOR SPATIAL SEARCH 361 Fig. 4. Overlap of bounding boxes of space filling curve segments. Fig. 5. Coding a pixel. Fig. 5 shows all processing steps in ....

[Article contains additional citation context not shown here]

P. G. Howard, "The Design and Analysis of Efficient Lossless Data Compression Systems," Ph.D. dissertation, Dept. Comput. Sci., Brown Univ., Providence, RI, 1993.


Unifying Text Search And Compression - Suffix Sorting, Block.. - Sadakane (2000)   (Correct)

....We call the length of a context order. For each context we take statistics of symbol frequency and use them to 1 predict the following symbol. Then it is encoded by arithmetic code [WNC87] The most famous statistical method is PPM [CW84] It is improved by some authors: PPMC [Mof90] PPMD [How93] PPMD [TC96] PPM [CTW95] PPMZ [Blo95] PPME [ ASS97] These have better compression ratio than gzip. In the PPMC, the order is limited to three due to memory limitation and compression speed. As CPU power increases, larger values of order are applicable. The PPMD, the PPMD and the PPME use ....

....using multi precision operations. Aberg and Shtarkov [ AS97] calculate block probability of x t 1 by using multi precision operations. However, they calculate only width of a range encoded by arithmetic code and they did not implement the decoder. They showed that the CTW combined with the PPMD [How93] has superior performance. Willems and Tjalkens [WT97] reduced space complexity of the CTW. They use not weighted block probabilities but conditional weighted probabilities. They store not P s e (x t 1 ) and P s w (x t 1 ) but their ratio in the node s of the context tree. They also use ....

[Article contains additional citation context not shown here]

P. G. Howard. The Design and Analysis of Efficient Lossless Data Compression Systems. Technical Report Report CS-93-28, Brown University, 1993.


Lossless Document Image Compression - Inglis (1999)   (4 citations)  (Correct)

....no context of any size matches, PPM encodes the symbol using an order Gamma1 model which allocates a fixed probability to each symbol. The variations of PPM have different methods of calculating the escape probability. The specific version we use is PPM with method D escape estimation, or PPMD [How93] Method C calculates the escape probability as e = d n d , where d is the number of distinct symbols that have followed the context, and n is the total number of symbols that have followed the context. Method D is a minor modification of method C instead of incrementing the symbol and escape ....

....on a variety of test 47 Figure 4.1: Image H04F images under three different page transformations. Section 4. 4 discusses methods of compressing component indices (which are identifiers that correspond to a character class in the codebook) and compares the gamma encoding method with PPMD [How93] The compression ratios for each method are compared with the best that can be achieved on the ground truth text data for the images. This section also discusses the issue of whether the automatic determination of spaces aids compression. The results for this chapter are summarised in Section ....

Paul G. Howard. The design and analysis of efficient lossless data compression systems. PhD thesis, Brown University, USA, 1993.


An Image Compression Method for Spatial Search - Pajarola, Widmayer (2000)   (6 citations)  (Correct)

....of the location and value of some particular neighboring pixels. This spatial relation imposes restrictions on the traversal order of the pixels in the image, and it therefore makes the construction of well shaped pixel regions impossible. IV. A spatial compression algorithm As discussed in [19], lossless image compression algorithms usually view a pixel value as a random variable. A compression algorithm then typically consists of four processing steps: first, select the location of the next pixel to be encoded (pixel selection) second, compute a prediction of the value of the ....

....coder [31] generates minimum redundancy codes which have a one to one relationship with the input symbols, and it also approximates the entropy bound for a given probability model. To reduce coding complexity, we use fixed precomputed Huffman tables for a specific set of variances presented in [19]. This set of variances introduced in [19] guarantees an unnoticeable loss in coding efficiency. Figure 5 shows all processing steps in a flowchart. Prediction and variance estimation are performed on a sliding window over the input sequence, the context Cv . Therefore, only local operations on a ....

[Article contains additional citation context not shown here]

Paul G. Howard, The Design and Analysis of Efficient Lossless Data Compression Systems, Ph.D. thesis, Department of Computer Science at Brown University, 1993.


On Tag Insertion and it's Complexity - Yeates, Witten (2000)   (Correct)

.... algorithm pairs to give a bound on the compressibility of messages or files and noise reducing encoder decoder pairs to give bounds on error reduction [6] Entropy theory originally left the nature of models unexplored, but several models have since been developed, including DMC and PPMD [21, 7]. These are of interest for text mining because they can be implemented in a character based way allowing entropy to be calculated incrementally. Our implementation uses PPMD [7] exclusively. The entropy of a marked up sequence of characters calculated with respect to a model trained on ....

....left the nature of models unexplored, but several models have since been developed, including DMC and PPMD [21, 7] These are of interest for text mining because they can be implemented in a character based way allowing entropy to be calculated incrementally. Our implementation uses PPMD [7] exclusively. The entropy of a marked up sequence of characters calculated with respect to a model trained on pre markedup data can be used as a measure of goodness of that markup in relationship to other possible encodings. This entropy calculation is, however, an expensive operation and, ....

[Article contains additional citation context not shown here]

Paul Glor Howard. The Design and Analysis of Efficient Lossless Data Compression Systems. PhD thesis, Department of Computer Science, Brown University, June 1993. Also appeared as Technical Report CS-93-28.


Lossless Compression for Text and Images - Moffat, Bell, Witten (1995)   (1 citation)  (Correct)

.... FELICS scheme for compression of grayscale images FELICS, an acronym for fast, efficient, lossless image compression system, is a simple and remarkably effective technique for lossless compression of grayscale images that compares well with the lossless variant of JPEG (Howard Vitter, 1993; Howard, 1993). The idea is to code each pixel based on its two neighbors and use a specially tailored non adaptive scheme to represent its value. Tested on an extensive set of Landsat test images, and some general images, the scheme has been reported to provide slightly more compression than the lossless modes ....

Howard, P.G. (1993). The Design and Analysis of Efficient Lossless Data Compression Systems. Ph.D. thesis, Brown University, Rhode Island. Available as Technical Report CS-93-28.


Text mining: A new frontier for lossless compression - Witten, Bray, Mahoui, Teahan (1999)   (4 citations)  (Correct)

....carried out to determine the power of language models to discriminate these tokens, both out of context and within their context in the newsletter. Throughout this work, we use a PPM text compression scheme (Cleary and Witten, 1984) with order 5 (except where otherwise mentioned) escape method D (Howard, 1993), and a further improvement for deterministic scaling (Teahan, 1997) 3.1 Discriminating isolated tokens Lists of names, dates, locations, etc. in 19 issues of the newsletter were input to PPM separately to form ten compression models labeled n, d, l, s, o, u, e, p, f, m. In addition, a plain ....

Howard, P.G. (1993) The design and analysis of efficient lossless data compression systems. PhD thesis, Brown University, Providence, RI.


SQUEEZE: Fast and Progressive Decompression of Triangle Meshes - Pajarola, Rossignac (2000)   (4 citations)  (Correct)

....(x) a Huffman code can be constructed for every batch based on its variance. However, this process can be very time consuming. We avoid this problem by precomputing a set of Huffman codes for a set of 37 pre specified variances that guarantee an unnoticeable loss in coding efficiency as shown in [15, 26]. At the beginning of every batch, the Huffman code for a fixed variance closest to the given variance is chosen and used to decode the compressed geometry data of the entire batch. 7. Experimental results We conducted a variety of experiments comparing our fast progressive mesh compression ....

Paul G. Howard. The Design and Analysis of Efficient Lossless Data Compression Systems. Ph.D. thesis, Department of Computer Science at Brown University, 1993.


A Generalization and Improvement to PPM's "Blending" - Bunton (1997)   (Correct)

.... passed through the buffer pass out of the buffer, the event frequencies originally incremented by these symbols can be decremented [Wil91] Alternatively, at regular intervals, all the frequencies in the model can be scaled by a small constant, which would implement an exponential decay function [How93]. Or, the same process could be carried on locally, on a per state basis, when the state s total frequency exceeded a threshold [Mof90] However, regardless of whatever merit direct techniques for recency weighting stored frequencies may have (none has been shown to consistently improve ....

P. G. Howard. The Design and Analysis of Efficient Lossless Data Compression Systems. PhD thesis, Brown University, 1993.


On-Line Stochastic Processes in Data Compression - Bunton (1996)   (3 citations)  (Correct)

.... passed through the buffer pass out of the buffer, the event frequencies originally incremented by these symbols can be decremented [Wil91] Alternatively, at regular intervals, all the frequencies in the model can be scaled by a small constant, which would implement an exponential decay function [How93]. Or, the same process could be carried on locally, on a per state basis, when the state s total frequency exceeded a threshold [Mof90] However, regardless of whatever merit direct techniques for recency weighting stored frequencies may have (none has been shown to consistently improve ....

....claimed that PPM outperformed PPMC in their paper. However, PPMC was known to achieve superior compression performance as the order bound increased up to 5 [Mof90] after which its performance starts to decline. In 1993, Howard published a simple change to PPMC s escape mechanism, called PPMD [How93]: add .5 instead of 1.0 to the escape count and scanned event count, whenever a novel event is seen. PPMD gets even better performance than PPMC. Thus, the original PPM algorithm cannot be called the state of the art until it is shown to perform favorably compared to these higher order PPMC and ....

P. G. Howard. The Design and Analysis of Efficient Lossless Data Compression Systems. PhD thesis, Brown University, 1993.


Spatial Queries on Compressed Raster Images: How to get the .. - Pajarola, Widmayer (1995)   (Correct)

....from JPEG and FELICS, depend on the knowledge of some particular neighboring pixels; this imposes restrictions on the traversal order of the pixels in the image, and it therefore makes the construction of well shaped pixel regions impossible. 4 A Spatial Compression Algorithm As discussed in [5], lossless image compression algorithms usually view a pixel value as a random variable. An algorithm then typically consists of four processing steps: first, select the location of the next pixel to be encoded (pixel selection) second, compute a prediction of the value of the selected pixel from ....

....entropy coder [7] generates minimum redundancy codes which have a one to one relationship with the input symbols, and it also approximates the entropy bound for a given model. To reduce coding complexity, the Huffman tables are only computed for a specific set of variances, as suggested e.g. in [5]. Figure 3 shows these processing steps in a flowchart. Prediction and variance estimation are performed on a sliding window over the input sequence, where only local operations on a limited support area around the current picture element are used. The entropy coder which encodes the prediction ....

P. G. Howard. The Design and Analysis of Efficient Lossless Data Compression Systems. PhD thesis, Department of Computer Science at Brown University, 1993.


An Executable Taxonomy of On-Line Modeling Algorithms - Bunton (1997)   (Correct)

....claimed that PPM outperformed PPMC in their paper. However, PPMC was known to achieve superior compression performance as the order bound increased up to 5 [Mof90] after which its performance starts to decline. In 1993, Howard published a simple change to PPMC s escape mechanism, called PPMD [How93]: add .5 instead of 1.0 to the escape count and scanned event count, whenever a novel event is seen. PPMD gets even better performance than PPMC. Thus, the original PPM algorithm cannot be called the state of the art until it is shown to perform favorably compared to these 5 comparable ....

P. G. Howard. The Design and Analysis of Efficient Lossless Data Compression Systems. PhD thesis, Brown University, 1993.


Probability estimation for PPM - Teahan (1995)   (5 citations)  (Correct)

....escape count is incremented, and the new character s count is set to one. The escape probability is computed as u= n u) where u is the number of unique characters, and n is the total number of characters seen so far. Method C has been found to be superior to methods A and B in practice. Howard [5] proposed a small modification to method C. Instead of adding 1 to both the escape count and the new character s count, each count is incremented by 1 2 , hence the total weight is incremented by 1 as for the other non novel characters. The escape probability for method D is computed as u 2 ....

P.G. Howard. The design and analysis of efficient lossless data compression systems. Technical Report CS--93--28, Brown University, Providence, Rhode Island, 1993.


Pattern-Based Compression of Text Images - Broder, Mitzenmacher   (Correct)

....in two flavors: pixel based or character based. Pixel based schemes consider Supported by the Office of Naval Research. This work was done during an internship at Digital Systems Research Center. and encode each pixel individually. Examples include the international standard jbig [3] felics [2], and mgbilevel, part of the mg system [6, 7] Character based schemes attempt to derive the original text from the document by inferring or keeping information regarding character boundaries and using derived or provided font information. The mg system also includes such a scheme, called mgtic ....

P. G. Howard. The design and analysis of efficient lossless data compression systems. Technical Report CS-93-28, Brown University, 1993. (Ph. D. thesis).


Spatial Indexing into Compressed Raster Images: How to.. - Pajarola, Widmayer (1996)   (11 citations)  (Correct)

....those from JPEG and FELICS, depend on the knowledge of some particular neighboring pixels; this imposes restrictions on the traversal order of the pixels in the image, and it therefore makes the construction of wellshaped pixel regions impossible. 4. A spatial compression algorithm As discussed in [6], lossless image compression algorithms usually view a pixel value as a random variable. An algorithm then typically consists of four processing steps: first, select the location of the next pixel to be encoded (pixel selection) second, compute a prediction of the value of the selected pixel from ....

....entropy coder [8] generates minimum redundancy codes which have a one to one relationship with the input symbols, and it also approximates the entropy bound for a given model. To reduce coding complexity, the Huffman tables are only computed for a specific set of variances, as suggested e.g. in [6]. Figure 3 shows these processing steps in a flowchart. Prediction and variance estimation are performed on a sliding window over the input sequence, where only local operations on a limited support area around the current picture element are used. The entropy coder which encodes the prediction ....

P. G. Howard. The Design and Analysis of Efficient Lossless Data Compression Systems. PhD thesis, Department of Computer Science at Brown University, 1993.


Experiments on the Zero Frequency Problem - Cleary, Teahan (1995)   (Correct)

....A and B are described in [3, 8] Method A is based upon Laplace s Law whereas method B still classifies a character as being novel even if it has occurred once before. Method C proposed in [6] uses the number of times a novel character has occurred before as the basis of its probability. Method D [4] is a minor modification to method C. Experiments with compressing files show that method D is slightly better than method C, but both methods perform better than methods A and B. Methods P, X and XC described in [8] are based upon a Poisson process model and perform better than the other methods ....

Howard, P.G. (1993) "The design and analysis of efficient lossless data compression systems," Technical Report No. CS--93--28, Department of Computer Science, Brown University, Providence, Rhode Island.


State of the art concerning Lossless Medical Image Coding - Denecker, van Assche.. (1997)   (2 citations)  (Correct)

....already known pixels; the smallest one being denoted as L and the largest one as H. The first bit is used to indicate whether P 2 [L; H] if not, a second bit is used to indicate whether either P L or P H. Then the actual value of P is brought out: if P 2 [L; H] adapted binary coding is used [8], otherwise exponential Rice coding is used. The latter statistical coder is a faster but less efficient variant of Huffman coding with one parameter. This method works because the probability distribution of natural contone images appears to have the right properties, see figure 3. One half of ....

P. G. Howard, The Design and Analysis of Efficient Lossless Data Compression Systems, Ph.D. thesis, Department of Computer Science, Brown University, Providence, Rhode Island, June 1993.


Unbounded Length Contexts for PPM - Cleary, Teahan, Witten (1995)   (49 citations)  (Correct)

....character is found. If the bottom of the list is reached, a uniform fixed distribution (k = Gamma1) is used. We are presently using escape method C, which needs an extra escape count, equal to the number of branches, to be stored with every trie node. Other methods such as the ones proposed in [6, 9] may improve compression, although this remains to be investigated. file size PPMC PPM BW94 (bytes) bpc) bpc) bpc) bib 111261 2.11 1.91 2.07 book1 768771 2.48 2.40 2.49 book2 610856 2.26 2.02 2.13 geo 102400 4.78 4.83 4.45 news 377109 2.65 2.42 2.59 obj1 21504 3.76 4.00 3.98 obj2 246814 ....

Howard, P.G. (1993) "The design and analysis of efficient lossless data compression systems," Computer Science Report CS--93--28, Brown University, Providence, Rhode Island.


A Simple Block-Based Lossless Image Compression Scheme - Grace Chang   (Correct)

....value of the flat region is then sent as original, and the non flat portion is mix encoded. 3 Results We now present some experimental results. Table 1 provides a comparison of our schemes with other existing schemes against the USC image database. Column 1 shows the compression attained by FELICS [4] (with the maximum k parameter set to 6) which is a low complexity context based pixel wise adaptive algorithm also based on the Rice code. The JPEG data shown corresponds to the independent lossless JPEG function employing the 2 point predictor no. 7 and arithmetic coding [5] The third column ....

P. Howard, "The design and analysis of efficient lossless data compression systems," Ph.D. Thesis, Brown University, Department of Computer Science, June 1993.


Semantically Motivated Improvements for PPM Variants - Bunton (1997)   (12 citations)  Self-citation (Howard)   (Correct)

.... combination of techniques presented here increases the benchmark performance of PPM # by 7 over the original implementation [2] We reduce the memory requirements of PPM and improve its performance by 12 over the standard reference [4] and by 5 over the best of all previously known variants [5]. This paper is organized as follows. Section 2 introduces terminology, suffix tree context models, and describes PPM and PPM # models. Section 3 transforms PPM # and PPM into a single suffix tree data structure that uses the fewest possible nodes and edges to precisely duplicate the information ....

....introduction to applying the combination of update excluded mixtures and state selection to PPM variants, without adding distracting details. Blending has been defined for PPM in terms of several escape mechanisms [9] of which the simplest and bestperforming are commonly called C [4] and D [5]. Both are easily described in terms of the weighting function W (s) if we define it as W (s) count(s) count(s) #(s) d . Then the escape mechanism C is implemented if d = 1, and the initial event frequency k = 0, while mechanism D is implemented by letting d = 2 and k = ....

[Article contains additional citation context not shown here]

Howard, P. G. (1993) The Design and Analysis of Efficient Lossless Data Compression Systems. Ph.D. Thesis, Brown University.


Text Augmentation: Inserting XML tags into natural language text.. - Yeates (2003)   (Correct)

No context found.

Paul Glor Howard. The Design and Analysis of Efficient Lossless Data Compression Systems. PhD thesis, Department of Computer Science, Brown University, June 1993.


Lossless Compression of Pre-Press Images Using a Novel .. - Van Assche, Philips..   (Correct)

No context found.

P. G. Howard, The Design and Analysis of Efficient Lossless Data Compression Systems. PhD thesis, Department of Computer Science, Brown University, Providence, Rhode Island, June 1993.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC