#### DMCA

## Universal divergence estimation for finite-alphabet sources (2006)

### Cached

### Download Links

- [www.ee.princeton.edu]
- [www.princeton.edu]
- [www.princeton.edu]
- [www.princeton.edu]
- [www.princeton.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | IEEE Trans. Inf. Theory |

Citations: | 9 - 3 self |

### Citations

942 |
Stochastic Processes,
- Doob
- 1953
(Show Context)
Citation Context ...hain is irreducible and aperiodic, there exists an integer , where is the order of the Markov source and that is a constant, such (146) where is the -step transition probability. It follows that (see =-=[6]-=-, [14]) for any and any where and . For arbitrary initial states and ,wehave where . It follows that Therefore, for any and any ,wehave where and . (147) (148) (149) (150) Lemma 5: Suppose APPENDIX F ... |

809 | D.J.: A BlockSorting Lossless Data Compression Algorithm. Digital Systems Research Center, Research Report 124, - Burrows, Wheeler - 1994 |

442 | Data compression using adaptive coding and partial string matching, - Cleary, Witten - 1984 |

332 | Entropy and Information Theory.
- Gray
- 1990
(Show Context)
Citation Context ...rgence of our divergence estimators assuming that both sources are possibly dependent stationary ergodic Markov sources, a case for which the following almost-sure convergence result is known to hold =-=[10]-=-: where is the alphabet and is the set of states. (1) a.s. (2) 1A commercial embodiment of LZ data compression. 2The use of lossless data compression techniques in other related problems such as model... |

281 | The similarity metric
- Li, Chen, et al.
(Show Context)
Citation Context ...ximated by where is the number of characters of the short sequence . A new class of “normalized information distance” loosely based on the noncomputable notion of Kolmogorov complexity is proposed in =-=[12]-=-, and then applied to the genome phylogeny problem and the problem of building language trees considered in [2]. The method in [12] to approximate the measure therein is heuristic (see also the discus... |

236 |
PHYLIF’ (Phylogeny Inference Package).
- FELSENSTEIN
- 1993
(Show Context)
Citation Context ...Fig. 15. Divergence between languages (Bible) estimated by the BWT-based method. Fig. 16. Language tree derived from the divergence estimates of Fig. 15. Fitch–Margoliash method in the package PhylIP =-=[9]-=- for inferring evolutionary trees. Our algorithm successfully recognizes major language groups, such as Romance and Germanic (see Fig. 16). Note that according to our algorithm, English is a Germanic ... |

105 | On Prediction Using Variable Order Markov Models.
- Begleiter, El-Yaniv, et al.
- 2004
(Show Context)
Citation Context ...ercial embodiment of LZ data compression. 2The use of lossless data compression techniques in other related problems such as modeling and prediction has also been considered; see for example [13] and =-=[1]-=-.sCAI et al.: UNIVERSAL DIVERGENCE ESTIMATION FOR FINITE-ALPHABET SOURCES 3457 A variety of compression algorithms have been proposed using the BWT as a front end followed by modules such as move-to-f... |

104 | The context tree weighting method: basic properties
- Willems, Shtarkov, et al.
- 1995
(Show Context)
Citation Context ...ata compression. 2 The first estimator, originally proposed in [4], uses the Burrows–Wheeler block sorting transform (BWT) [3], while the second estimator uses the Context Tree Weighting method (CTW) =-=[15]-=-. We prove the convergence of our divergence estimators assuming that both sources are possibly dependent stationary ergodic Markov sources, a case for which the following almost-sure convergence resu... |

55 | A Measure of relative entropy between individual sequences with application to universal classification
- Ziv, Merhav
(Show Context)
Citation Context ...memory Markov sources, denoted by and . The input to the estimator consists of a realization of length from source , denoted by , and a realization of length from source , denoted by . Ziv and Merhav =-=[17]-=- applied the idea of Lempel–Ziv (LZ) parsing to divergence estimation. They developed a scheme to estimate the divergence between two finite-alphabet, finite-order, stationary Markov processes. The LZ... |

21 |
Universal entropy estimation via block sorting
- Cai, Kulkarni, et al.
- 2004
(Show Context)
Citation Context ...orithms have been proposed using the BWT as a front end followed by modules such as move-to-front, runlength coding, and adaptive Huffman coding. An entropy estimator based on the BWT was proposed in =-=[5]-=- using a uniform segmentation scheme. Based on that, we can show that segments of the BWT output sequence are close to an independent and identically distributed (i.i.d.) sequence. This property is ex... |

16 |
More exact statements of limit theorems for homogeneous Markov chains, Theory Prob.
- NAGAEV
- 1961
(Show Context)
Citation Context ...is irreducible and aperiodic, there exists an integer , where is the order of the Markov source and that is a constant, such (146) where is the -step transition probability. It follows that (see [6], =-=[14]-=-) for any and any where and . For arbitrary initial states and ,wehave where . It follows that Therefore, for any and any ,wehave where and . (147) (148) (149) (150) Lemma 5: Suppose APPENDIX F is a s... |

15 | Linear time universal coding and time reversal of tree sources via FSM closure
- Martin, Seroussi, et al.
- 2004
(Show Context)
Citation Context ...) 1A commercial embodiment of LZ data compression. 2The use of lossless data compression techniques in other related problems such as modeling and prediction has also been considered; see for example =-=[13]-=- and [1].sCAI et al.: UNIVERSAL DIVERGENCE ESTIMATION FOR FINITE-ALPHABET SOURCES 3457 A variety of compression algorithms have been proposed using the BWT as a front end followed by modules such as m... |

10 | Algorithms for estimating information distance with application to bioinformatics and linguistics
- Kaltchenko
(Show Context)
Citation Context ...en applied to the genome phylogeny problem and the problem of building language trees considered in [2]. The method in [12] to approximate the measure therein is heuristic (see also the discussion in =-=[11]-=-). In this paper, we present two divergence estimation algorithms, both of which are motivated by techniques in data compression. 2 The first estimator, originally proposed in [4], uses the Burrows–Wh... |

4 |
Language trees and zipping,” Phys
- Benedetto, Caglioti, et al.
(Show Context)
Citation Context ... is universal in the sense of not depending on the order or any other information about the transition probability matrices of the sources. Another algorithm based on LZ compression was introduced in =-=[2]-=- and applied to problems motivated by linguistics. Unlike [17], the approach in [2] is heuristic and there is no claim that the algorithm converges to the divergence of the sources. The idea is to app... |

4 |
Universal estimation of entropy and divergence via block sorting
- Cai, Kulkarni, et al.
- 2002
(Show Context)
Citation Context ...o the discussion in [11]). In this paper, we present two divergence estimation algorithms, both of which are motivated by techniques in data compression. 2 The first estimator, originally proposed in =-=[4]-=-, uses the Burrows–Wheeler block sorting transform (BWT) [3], while the second estimator uses the Context Tree Weighting method (CTW) [15]. We prove the convergence of our divergence estimators assumi... |

3 |
The context tree weighting method: Extensions
- Willems
- 1998
(Show Context)
Citation Context ...he context of the longest depth (a leaf node) to the root. The limitation is that with fixed maximum memory length we can only learn statistical models of order no more than . The extended CTW method =-=[16]-=- assumes unbounded memory length, and therefore the depth of the context tree is unbounded and grows with the length . The sources are assumed to be binary. We pre-append before as unknown symbols. Th... |

2 | Implementing the Context Tree Weighting Method for Context Recognition
- Dawy, Hagenauer, et al.
- 2004
(Show Context)
Citation Context ...y distributed (i.i.d.) sequence. This property is exploited in our algorithm to estimate divergence without knowing the memory length of the sources. Recently, experimental results have been reported =-=[8]-=- using the CTW method [15] for classification of binary sequences. The similarity metric used in [8] for classification can be seen to be an estimate of . Another natural method, which we refer to as ... |