Agnostic classification of Markovian sequences (1997) [24 citations — 9 self]
Abstract:
Classification of finite sequences without explicit knowledge of their statistical nature is a fundamental problem with many important applications. We propose a new information theoretic approach to this problem which is based on the following ingredients: (i) sequences are similar when they are likely to be generated by the same source; (ii) cross entropies can be estimated via "universal compression"; (iii) Markovian sequences can be optimally merged. With these ingredients we can classify discrete sequences whenever they can be compressed. We introduce the method and illustrate its application for hierarchical clustering of languages and for estimating similarities of protein sequences. 1
Citations
| 4364 | Elements of Information Theory – Cover, Thomas - 1991 |
| 374 | Information Theory: Coding Theorems for Discrete Memoryless Systems – Csiszár, Körner - 1982 |
| 320 | Amino acid substitution matrices from protein blocks – Henikoff, Henikoff - 1992 |
| 178 | Divergence Measures Based on the Shannon Entropy – Lin - 1991 |
| 177 | Testing statistical hypotheses – Lehmann - 1986 |
| 21 | A measure of relative entropy between individual sequences with application to universal classification – Ziv, Merhav - 1993 |
| 3 | An Improved Measure of Relative Entropy Between Individual Sequences, unpublished manuscript – Bachrach, El-Yaniv - 1997 |

