Download:
by Mireille Regnier, Wojciech Szpankowski
In IEEE Computer Society, editor, Compression and Complexity of SEQUENCES 1997
http://www.deetc.isel.ipl.pt/ftp/FicheirosSeccoes/AnaliseSinais/ccd/temp/Szpankowski_CCS97_81320253.pdf
Add To MetaCart
Abstract:
Consider a given pattern H and a random text T generated randomly according to the Bernoulli model. We study the frequency of approximate occurrences of the pattern H in a random text when overlapping copies of the approximate pattern are counted separately. We provide exact and asymptotic formul for mean, variance and probability of occurrence as well as asymptotic results including the central limit theorem and large deviations. Our approach is combinatorial: we rst construct some language expressions that characterize pattern occurrences which are translated into generating functions, and nally we use analytical methods to extract asymptotic behaviors of the pattern frequency. Applications of these results include molecular biology, source coding, synchronization, wireless communications, approximate pattern matching, games, and stock market analysis. These ndings are of particular interest to information theory (e.g., second-order properties of the relative frequency), and molecular biology problems (e.g., nding patterns with unexpected high or low frequencies, and gene recognition). 1
Citations
|
1993
|
Matrix analysis
– Horn, Johnson
- 1985
|
|
1497
|
The Art of Computer Programming
– Knuth
- 1973
|
|
393
|
Information Theory: Coding Theorems for Discrete Memoryless Systems
– Csiszar, Korner
- 1981
|
|
337
|
Algebraic combinatorics on words
– Lothaire
|
|
313
|
Text algorithms
– Crochemore, Rytter
- 1994
|
|
200
|
Introduction to Computational Biology
– Waterman
- 1986
|
|
109
|
Central and local limit theorems applied to asymptotic enumeration
– Bender
- 1973
|
|
69
|
Two algorithms for approximate string matching in static texts
– Jokinen, Ukkonen
- 1991
|
|
55
|
Large deviations for a general class of random vectors
– ELLIS
- 1984
|
|
51
|
Theoremes limites pour les structures combinatoires et les fonctiones arithmetiques
– Hwang
- 1994
|
|
49
|
On Pattern Frequency Occurrences in a Markovian Sequence
– Régnier, Szpankowski
- 1998
|
|
37
|
On the performance of data compression algorithms based upon string matching
– Yang, Kieffer
- 1998
|
|
36
|
Asymptotic properties of data compression and suffix trees
– Szpankowski
- 1993
|
|
34
|
An on-line universal lossy data compression algorithm via continuous codebook refinement—Part I: Basic results
– Zhang, Wei
- 1996
|
|
29
|
A suboptimal lossy data compression based on approximate pattern matching
– Luczak, Szpankowski
- 1997
|
|
21
|
An Introduction to Probability and its
– Feller
- 1958
|
|
21
|
Theory of Complex Functions
– Remmert
- 1991
|
|
15
|
Finding words with unexpected frequencies in deoxyribonucleic acid sequences
– Prum, Rodolphe, et al.
- 1995
|
|
13
|
The Positive-Divergence and Blowing-up Properties
– Marton, Shields
- 1994
|
|
13
|
Asymptotic Enumeration, in Handbook of Combinatorics
– Odlyzko
- 1995
|
|
8
|
Etude Asymptotique du Nombre d'Occurrences d'un mot dans une Cha ine de Markov et Application `a la Recherche de Mots de Fr'equence Exceptionnelle dans les S'equences d'ADN, Th`ese Universit'e Ren'e Descartes Paris V
– Schbath
- 1995
|
|
7
|
Greedy Algorithms for the Shortest Common Superstring That Are Asymptotically Optimal
– Frieze, Szpankowski
- 1996
|
|
6
|
A Martingale Approach to the Study of
– Li
- 1980
|
|
5
|
The Occurrence of Sequence of
– Chrysaphinou, Papastavridis
- 1990
|
|
2
|
Autocorrelation on Words and Its Applications. Analysis of Su x Trees by String-Ruler Approach
– Jacquet, Szpankowski
- 1994
|
|
1
|
Exact Calculation of Word Occurrence Probabilities and Its Applications to the Search of Repeats
– Coward
- 1997
|
|
1
|
Maximal Pre x-Synchronized Codes
– Guibas, Odlyzko
- 1978
|
|
1
|
A Generalized Su x Tree and Its (Un)Expected Asymptotic Behaviors
– Szpankowski
- 1993
|