MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  On the Approximate Pattern Occurrences in a Text (1997) [26 citations — 9 self]

Download:
Download as a PDF
by Mireille Regnier, Wojciech Szpankowski
In IEEE Computer Society, editor, Compression and Complexity of SEQUENCES 1997
http://www.deetc.isel.ipl.pt/ftp/FicheirosSeccoes/AnaliseSinais/ccd/temp/Szpankowski_CCS97_81320253.pdf
Add To MetaCart

Abstract:

Consider a given pattern H and a random text T generated randomly according to the Bernoulli model. We study the frequency of approximate occurrences of the pattern H in a random text when overlapping copies of the approximate pattern are counted separately. We provide exact and asymptotic formul for mean, variance and probability of occurrence as well as asymptotic results including the central limit theorem and large deviations. Our approach is combinatorial: we rst construct some language expressions that characterize pattern occurrences which are translated into generating functions, and nally we use analytical methods to extract asymptotic behaviors of the pattern frequency. Applications of these results include molecular biology, source coding, synchronization, wireless communications, approximate pattern matching, games, and stock market analysis. These ndings are of particular interest to information theory (e.g., second-order properties of the relative frequency), and molecular biology problems (e.g., nding patterns with unexpected high or low frequencies, and gene recognition). 1

Citations

1993 Matrix analysis – Horn, Johnson - 1985
1497 The Art of Computer Programming – Knuth - 1973
393 Information Theory: Coding Theorems for Discrete Memoryless Systems – Csiszar, Korner - 1981
337 Algebraic combinatorics on words – Lothaire
313 Text algorithms – Crochemore, Rytter - 1994
200 Introduction to Computational Biology – Waterman - 1986
109 Central and local limit theorems applied to asymptotic enumeration – Bender - 1973
69 Two algorithms for approximate string matching in static texts – Jokinen, Ukkonen - 1991
55 Large deviations for a general class of random vectors – ELLIS - 1984
51 Theoremes limites pour les structures combinatoires et les fonctiones arithmetiques – Hwang - 1994
49 On Pattern Frequency Occurrences in a Markovian Sequence – Régnier, Szpankowski - 1998
37 On the performance of data compression algorithms based upon string matching – Yang, Kieffer - 1998
36 Asymptotic properties of data compression and suffix trees – Szpankowski - 1993
34 An on-line universal lossy data compression algorithm via continuous codebook refinement—Part I: Basic results – Zhang, Wei - 1996
29 A suboptimal lossy data compression based on approximate pattern matching – Luczak, Szpankowski - 1997
21 An Introduction to Probability and its – Feller - 1958
21 Theory of Complex Functions – Remmert - 1991
15 Finding words with unexpected frequencies in deoxyribonucleic acid sequences – Prum, Rodolphe, et al. - 1995
13 The Positive-Divergence and Blowing-up Properties – Marton, Shields - 1994
13 Asymptotic Enumeration, in Handbook of Combinatorics – Odlyzko - 1995
8 Etude Asymptotique du Nombre d'Occurrences d'un mot dans une Cha ine de Markov et Application `a la Recherche de Mots de Fr'equence Exceptionnelle dans les S'equences d'ADN, Th`ese Universit'e Ren'e Descartes Paris V – Schbath - 1995
7 Greedy Algorithms for the Shortest Common Superstring That Are Asymptotically Optimal – Frieze, Szpankowski - 1996
6 A Martingale Approach to the Study of – Li - 1980
5 The Occurrence of Sequence of – Chrysaphinou, Papastavridis - 1990
2 Autocorrelation on Words and Its Applications. Analysis of Su x Trees by String-Ruler Approach – Jacquet, Szpankowski - 1994
1 Exact Calculation of Word Occurrence Probabilities and Its Applications to the Search of Repeats – Coward - 1997
1 Maximal Pre x-Synchronized Codes – Guibas, Odlyzko - 1978
1 A Generalized Su x Tree and Its (Un)Expected Asymptotic Behaviors – Szpankowski - 1993