• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Pointwise redundancy in lossy data compression and universal lossy data compression (2000)

by I Kontoyiannis
Venue:IEEE Trans. Inform. Theory
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 11
Next 10 →

Precise Minimax Redundancy and Regret

by Michael Drmota, Wojciech Szpankowski - IEEE TRANS. INFORMATION THEORY , 2004
"... Recent years have seen a resurgence of interest in redundancy of lossless coding. The redundancy (regret) of universal xed{to{variable length coding for a class of sources determines by how much the actual code length exceeds the optimal (ideal over the class) code length. In a minimax scenario ..."
Abstract - Cited by 19 (8 self) - Add to MetaCart
Recent years have seen a resurgence of interest in redundancy of lossless coding. The redundancy (regret) of universal xed{to{variable length coding for a class of sources determines by how much the actual code length exceeds the optimal (ideal over the class) code length. In a minimax scenario one nds the best code for the worst source either in the worst case (called also maximal minimax) or on average. We rst study the worst case minimax redundancy over a class of stationary ergodic sources and replace Shtarkov's bound by an exact formula. Among others, we prove that a generalized Shannon code minimizes the worst case redundancy, derive asymptotically its redundancy, and establish some general properties. This allows us to obtain precise redundancy rates for memoryless, Markov and renewal sources. For example, we derive the exact constant of the redundancy rate for memoryless and Markov sources by showing that an integer nature of coding contributes log(log m=(m 1))= log m+ o(1) where m is the size of the alphabet. Then we deal with the average minimax redundancy and regret. Our approach

Source coding, large deviations, and approximate pattern matching

by Amir Dembo, Ioannis Kontoyiannis - IEEE Trans. Inform. Theory , 2002
"... Dedicated to the memory of Aaron Wyner, a valued friend and colleague. Abstract—In this review paper, we present a development of parts of rate-distortion theory and pattern-matching algorithms for lossy data compression, centered around a lossy version of the asymptotic equipartition property (AEP) ..."
Abstract - Cited by 17 (8 self) - Add to MetaCart
Dedicated to the memory of Aaron Wyner, a valued friend and colleague. Abstract—In this review paper, we present a development of parts of rate-distortion theory and pattern-matching algorithms for lossy data compression, centered around a lossy version of the asymptotic equipartition property (AEP). This treatment closely parallels the corresponding development in lossless compression, a point of view that was advanced in an important paper of Wyner and Ziv in 1989. In the lossless case, we review how the AEP underlies the analysis of the Lempel–Ziv algorithm by viewing it as a random code and reducing it to the idealized Shannon code. This also provides information about the redundancy of the Lempel–Ziv algorithm and about the asymptotic behavior of several relevant quantities. In the lossy case, we give various versions of the statement of the generalized AEP and we outline the general methodology of its proof via large deviations. Its relationship with Barron and Orey’s generalized AEP is also discussed. The lossy AEP is applied to i) prove strengthened versions of Shannon’s direct sourcecoding theorem and universal coding theorems; ii) characterize the performance of “mismatched ” codebooks in lossy data compression; iii) analyze the performance of pattern-matching algorithms for lossy compression (including Lempel–Ziv schemes); and iv) determine the first-order asymptotic of waiting times between stationary processes. A refinement to the lossy AEP is then presented, and it is used to i) prove second-order (direct and converse) lossy source-coding theorems, including universal coding theorems; ii) characterize which sources are quantitatively easier to compress; iii) determine the second-order asymptotic of waiting times between stationary processes; and iv) determine the precise asymptotic behavior of longest match-lengths between stationary processes. Finally, we discuss extensions of the above framework and results to random fields. Index Terms—Data compression, large deviations, patternmatching, rate-distortion theory.

Arbitrary Source Models and Bayesian Codebooks in Rate-Distortion Theory

by Ioannis Kontoyiannis, Junshan Zhang - IEEE Trans. Inform. Theory , 2002
"... We characterize the best achievable performance of lossy compression algorithms operating on arbitrary random sources, and with respect to general distortion measures. Direct and converse coding theorems are given for variable-rate codes operating at a xed distortion level, emphasizing: (a) non-asym ..."
Abstract - Cited by 13 (6 self) - Add to MetaCart
We characterize the best achievable performance of lossy compression algorithms operating on arbitrary random sources, and with respect to general distortion measures. Direct and converse coding theorems are given for variable-rate codes operating at a xed distortion level, emphasizing: (a) non-asymptotic results, (b) optimal or near-optimal redundancy bounds, and (c) results with probability one. This development is based in part on the observation that there is a precise correspondence between compression algorithms and probability measures on the reproduction alphabet. This is analogous to the Kraft inequality in lossless data compression. In the case of stationary ergodic sources our results reduce to the classical coding theorems. As an application of these general results, we examine the performance of codes based on mixture codebooks for discrete memoryless sources. A mixture codebook (or Bayesian codebook) is a random codebook generated from a mixture over some class of reproduction distributions. We demonstrate the existence of universal mixture codebooks, and show that it is possible to universally encode memoryless sources with redundancy of approximately (d=2) log n bits, where d is the dimension of the simplex of probability distributions on the reproduction alphabet.

Source Coding Exponents for Zero-Delay Coding with Finite Memory

by Neri Merhav, Ioannis Kontoyiannis - IEEE TRANSACTIONS ON INFORMATION THEORY , 2003
"... Fundamental limits on the source coding exponents (or large deviations) performance of zero-delay finite-memory (ZDFM) lossy source codes are studied. Our main results are the following: For any memoryless source, a suitably designed encoder that time-shares (at most two) memoryless scalar quanti ..."
Abstract - Cited by 13 (1 self) - Add to MetaCart
Fundamental limits on the source coding exponents (or large deviations) performance of zero-delay finite-memory (ZDFM) lossy source codes are studied. Our main results are the following: For any memoryless source, a suitably designed encoder that time-shares (at most two) memoryless scalar quantizers is as good as any time-varying fixed-rate ZDFM code, in that it can achieve the fastest exponential rate of decay for the probability of excess distortion. A dual result is shown to apply to the probability of excess code-length, among all fixed-distortion ZDFM codes with variable rate. Finally, it is shown that if the scope is broadened to ZDFM codes with variable rate and variable distortion, then a time-invariant entropy-coded memoryless quantizer (without time-sharing) is asymptotically optimal under a "fixed-slope" large deviations criterion (introduced and motivated here in detail) corresponding to a linear combination of the code-length and the distortion. These results also lead to single-letter characterizations for the source coding error-exponents of ZDFM codes.

Model Selection via Rate-Distortion Theory

by I. Kontoyiannis , 2000
"... | Rissanen's Minimum Description Length (MDL) principle for model selection proposes that, among a predetermined collection of models, we choose the one which assigns the shortest description to the data at hand. In this context, a \description" is a lossless representation of the data that also tak ..."
Abstract - Cited by 4 (3 self) - Add to MetaCart
| Rissanen's Minimum Description Length (MDL) principle for model selection proposes that, among a predetermined collection of models, we choose the one which assigns the shortest description to the data at hand. In this context, a \description" is a lossless representation of the data that also takes into account the cost of describing the chosen model itself. We examine how the MDL principle might extend to the case when the requirement for lossless coding is relaxed (lossy compression), and we outline some of the mathematical and conceptual ingredients that facilitate this extension I. Introduction Rissanen's Minimum Description Length (MDL) principle, as well as several other prominent model selection criteria, are based on the idea that among a predetermined collection of models (or model classes), the one which best captures the characteristics of the data is the one which can be used to encode the data using the smallest number of bits. 2 In applications, it is often the cas...

Second-order properties of lossy likelihoods and the MLE/MDL dichotomy in lossy compression

by Mokshay Madiman, Ioannis Kontoyiannis
"... lossy compression ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
lossy compression

Average Redundancy for Known Sources: Ubiquitous Trees in Source Coding

by Wojciech Szpankowski , 2008
"... Analytic information theory aims at studying problems of information theory using analytic techniques of computer science and combinatorics. Following Hadamard’s precept, these problems are tackled by complex analysis methods such as generating functions, Mellin transform, Fourier series, saddle poi ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Analytic information theory aims at studying problems of information theory using analytic techniques of computer science and combinatorics. Following Hadamard’s precept, these problems are tackled by complex analysis methods such as generating functions, Mellin transform, Fourier series, saddle point method, analytic poissonization and depoissonization, and singularity analysis. This approach lies at the crossroad of computer science and information theory. In this survey we concentrate on one facet of information theory (i.e., source coding better known as data compression), namely the redundancy rate problem. The redundancy rate problem determines by how much the actual code length exceeds the optimal code length. We further restrict our interest to the average redundancy for known sources, that is, when statistics of information sources are known. We present precise analyses of three types of lossless data compression schemes, namely fixed-to-variable (FV) length codes, variable-to-fixed (VF) length codes, and variable-to-variable (VV) length codes. In particular, we investigate average redundancy of Huffman, Tunstall, and Khodak codes. These codes have succinct representations as trees, either as coding or parsing trees, and we analyze here some of their parameters (e.g., the average path from the root to a leaf).

Pattern Matching and Lossy Data Compression on Random Fields

by I. Kontoyiannis , 2001
"... We consider the problem of lossy data compression for data arranged on twodimensional arrays (such as images), or more generally on higher-dimensional arrays (such as video sequences). Several of the most commonly used algorithms are based on pattern matching: Given a distortion level D and a block ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
We consider the problem of lossy data compression for data arranged on twodimensional arrays (such as images), or more generally on higher-dimensional arrays (such as video sequences). Several of the most commonly used algorithms are based on pattern matching: Given a distortion level D and a block of data to be compressed, the encoder rst nds a D- close match of this block into some database, and then describes the position of the match. We consider two idealized versions of this scenario. In the rst one, the database is taken to be a collection of independent realizations of the same size and from the same distribution as the original data. In the second, the database is assumed to be a single long realization from the same source as the data. We show that the compression rate achieved (in either version) is no worse than R(D=2) bits per symbol, where R(D) is the rate-distortion function. This is proved under the assumption that the data is generated by a Gibbs distribution, and it generalizes the corresponding one-dimensional bound of Steinberg and Gutman. Using recent large deviations results by Dembo and Kontoyiannis and by Chi, we are able to give short proofs for the present results.

Critical Redundancy in Lossy Source Coding

by A. Dembo, I. Kontoyiannis , 1999
"... The following critical phenomenon was recently discovered. When a memoryless source is compressed using a variable-length xed-distortion code, the fastest convergence rate of the (pointwise) compression ratio to R(D) is either O( p n) or O(log n). We show it is always O( p n), except for discre ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
The following critical phenomenon was recently discovered. When a memoryless source is compressed using a variable-length xed-distortion code, the fastest convergence rate of the (pointwise) compression ratio to R(D) is either O( p n) or O(log n). We show it is always O( p n), except for discrete, uniformly distributed sources. Keywords Redundancy, rate-distortion theory, lossy data compression Department of Mathematics and Department of Statistics, Stanford University, Stanford, CA 94305. Email: amir@stat.stanford.edu Web: www-stat.stanford.edu/amir y Department of Statistics, Purdue University, 1399 Mathematical Sciences Building, W. Lafayette, IN 479071399. Email: yiannis@stat.purdue.edu Web: www.stat.purdue.edu/yiannis 1 A.D.'s research was supported in part by NFS grant # NSF-DMS 9704552. I.K.'s research was supported in part by a grant from the Purdue Research Foundation. 1 Introduction Suppose that data is produced by a stationary memoryless source fX n ; n ...

Critical Behavior in Data Compression

by A. Dembo, I. Kontoyiannis , 1999
"... Let x n 1 = (x 1 ; x 2 ; : : : ; xn ) be a realization of the independent and identically distributed random variables (X 1 ; X 2 ; : : : ; Xn ). A compression algorithm operating at distortion level D consists of an encoder that takes strings x n 1 to binary strings of variable length, and a d ..."
Abstract - Add to MetaCart
Let x n 1 = (x 1 ; x 2 ; : : : ; xn ) be a realization of the independent and identically distributed random variables (X 1 ; X 2 ; : : : ; Xn ). A compression algorithm operating at distortion level D consists of an encoder that takes strings x n 1 to binary strings of variable length, and a decoder that maps these binary strings to new strings y n 1 = (y 1 ; y 2 ; : : : ; yn ), so that the decoded y n 1 is always within distortion D of the encoded x n 1 . Distortion is measured by some single-letter distortion measure such as mean-squared error. The description length ` n (x n 1 ) is the length of the binary description of x n 1 . For long realizations, the best compression ratio ` n (X n 1 )=n that can be achieved by any sequence of algorithms operating at distortion level D is given by Shannon's rate-distortion function R(D). The following critical phenomenon was recently discovered in [10]. Depending on the distribution of the random variables X i and the distort...
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University