In this paper, the role of pattern matching information theory is motivated and discussed. We describe the relationship between a pattern's recurrence time and its probability under the data generating stochastic source. We motivate how this relationship has led to great advances in universal data-compression. We then describe non-asymptotic uniform bounds on the performance of data compression algorithms in cases where the size of the training data that is available to the encoder is not large enough so as to yield the asymptotic compression: the Shannon entropy. We then discuss applications of pattern matching and universal compression to universal prediction, classification and to entropy estimation.
|
4364
|
Elements of Information Theory
– Cover, Thomas
- 1991
|
|
1034
|
An Introduction to the Bootstrap
– Efron, Tibshirani
- 1993
|
|
198
|
Universal coding, information, prediction, and estimation
– Rissanen
- 1984
|
|
80
|
The context-tree weighting method: Basic properties
– Willems, Shtarkov, et al.
- 1995
|
|
56
|
String overlaps, pattern matching, and nontransitive games
– Guibas, Odlyzko
- 1981
|
|
47
|
Some Asymptotic Properties of the Entropy of a Stationary Ergodic Data Source with Applications to Data Compression
– Wyner, Ziv
- 1989
|
|
43
|
Entropy and data compression schemes
– Ornstein, Weiss
- 1993
|
|
37
|
On the performance of data compression algorithms based upon string matching
– Yang, Kieffer
- 1998
|
|
37
|
Autocorrelation on Words and Its Applications. Analysis of Suffix Trees by String-Ruler Approach
– Jacquet, Szpankowski
- 1994
|
|
24
|
On the notion of recurrence in discrete stochastic processes
– Kac
- 1947
|
|
24
|
Redundancy of the Lempel–Ziv incremental parsing rule
– Savari
- 1997
|
|
22
|
The Redundancy and Distribution of the Phrase Lengths of the Fixed-Database Lempel-Ziv Algorithm
– Wyner
- 1997
|
|
21
|
On the entropy of DNA: Algorithms and measurements based on memory and rapid convergence
– Farach, Noordewier, et al.
- 1995
|
|
21
|
A measure of relative entropy between individual sequences with application to universal classification
– Ziv, Merhav
- 1993
|
|
20
|
The Ergodic Theory of Discrete Sample Paths
– Shields
- 1996
|
|
19
|
Upper Bounds on the Probability of Sequences Emitted by Finite-State Sources and on the Redundancy of the Lempel-Ziv Algorithm
– Plotnik, Weinberger, et al.
|
|
18
|
An implementable lossy version of the Lempel-Ziv algorithm { Part I: Optimality for memoryless sources
– Kontoyiannis
- 1999
|
|
17
|
The sliding-window Lempel-Ziv algorithm is asymptotically optimal
– Wyner, Ziv
- 1994
|
|
15
|
Simple universal lossy data compression schemes derived from the Lempel-Ziv algorithm
– Yang, Kieffer
- 1996
|
|
14
|
On the average redundancy rate of the lempel-ziv code
– Louchard, Szpankowski
- 1997
|
|
14
|
A martingale approach to the study of occurrence of sequence patterns in repeated experiments
– Li
- 1980
|
|
13
|
Estimating the information content of symbol sequences and efficient codes
– Grassberger
- 1989
|
|
11
|
Improved redundancy of a version of the Lempel-Ziv algorithm
– Wyner, Wyner
- 1995
|
|
8
|
Nonparametric entropy estimation for stationary processes and random fields, with applications to English text
– Kontoyiannis, Algoet, et al.
- 1998
|
|
7
|
On Sliding-Window Universal Data Compression with Limited memory
– Hershkovits, Ziv
- 1997
|
|
6
|
Classification with finite memory
– Wyner, Ziv
- 1996
|
|
6
|
An extension of LZW coding algorithm to source coding subject to a fidelity criterion
– Morita, Kobayashi
- 1989
|
|
4
|
A universal algorithm for sequential datecompression
– Ziv, Lempel
- 1977
|
|
4
|
On the Entropy of DNA
– Farach, Noordewier
- 1995
|
|
4
|
Nonparametric entropy estimation for stationary processes and random fields, with applications to English text
– Suhov, Wyner
- 1998
|
|
4
|
A Lossy Data Compression Based on String Matching: Preliminary Analysis and Suboptimal Algorithms
– Luczak, Szpankowski
- 1994
|
|
3
|
A Thomas, "Elements of Information Theory
– Cover, J
- 1991
|
|
2
|
Universal compression and repetition times
– Willems
- 1989
|
|
2
|
More on Recurrence and Waiting Times
– Wyner
- 1996
|
|
2
|
Drosophila Genome Project
– Berkeley
- 1997
|
|
1
|
The Compression of Discrete Information
– Fitingof
- 1967
|
|
1
|
Y.Shtarkov and T.J. Tjalkens: "The Contex-Tree Weighting Method: Basic Properties
– Willems
- 1995
|
|
1
|
1994 Shannon Lecture: Typical Sequences an All That
– Wyner
- 1995
|
|
1
|
Approximate-match waiting times for the substitution/deletion metric
– Shields
|
|
1
|
An algorithm for source coding based upon string matching
– Steinberg, Gutman
- 1993
|
|
1
|
On the Redundancy of the Lempel-Ziv Algorithm for /-mixing sources
– Yang, Kieffer
- 1997
|
|
1
|
J.Ziv, "A Measure of Relative Entropy between Individual Sequences with Application to Universal Classification
– Merhav
- 1993
|
|
1
|
Berkeley Drosophila Genome Project," private communication
– Rubin
- 1997
|