This paper surveys some of the main combinatorial methods for inferring patterns from a string, or a set of strings. The types of problems that will be addressed are repeat identication and common pattern inference. The strings that will concern us represent biological entities, nucleic acid and protein sequences or, in some cases, structures. As is well-known, exact (\identical") patterns hardly make sense in biology; we consider here two types of similar (\nonidentical ") patterns. One comes from looking at what \hides " behind each letter of the dna/rna or protein alphabet while the other corresponds to the more familiar notion of \errors". The errors concern mutational events that may aect a molecule during dna replication. Those that will be of interest to us are point mutations, that is, mutations operating each time on single letters of a biological sequence: substitution, insertion or deletion. Various notions of \similarity" are discussed, as well as other constraints the patterns may have to satisfy depending on the biological situation one is facing. Algorithms for the various notions, together with complexity analyses, are presented in some detail. 1.1
|
65
|
Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Nucleic Acids Res
– Helden, Rios, et al.
- 2000
|
|
49
|
Rigorous patternrecognition methods for DNA sequence: analysis of promoter sequences from Escherichia coli
– Galas, Eggert, et al.
- 1985
|
|
32
|
Heniko . Amino acid substitution matrices from protein blocks
– Heniko, G
- 1992
|
|
29
|
F.(1999): Promoter sequences and algorithmical methods for identifying them
– VANET, MARSAN, et al.
|
|
25
|
Functional promoter modules can be detected by formal models independent of overall nucleotide sequence similarity. Bioinformatics
– Klingenhoff, Frech, et al.
- 1999
|
|
21
|
Identification of common motifs in unaligned dna sequences: application to Escherichia coli Lrp regulon. Bioinformatics
– Fraenkel, Mandel, et al.
- 1995
|
|
20
|
An Algorithm For Locating Non-Overlapping Regions of Maximum Alignment
– Kannan, Myers
- 1996
|
|
20
|
Inferring regulatory elements from a whole genome. an analysis of helicobacter pylori � � family of promoter signals
– Vanet, Marsan, et al.
|
|
19
|
Direct Construction of Compact Directed Acyclic Word Graphs
– Crochemore, Verin
- 1997
|
|
18
|
Identifying satellites in nucleic acid sequences
– Sagot, Myers
- 1998
|
|
17
|
The evolutionary dynamics of repetitive DNA in eukaryotes
– Charlesworth, Sniegowski, et al.
- 1994
|
|
17
|
Identifying periodic occurrences of a template with applications to protein structure
– Fischetti, Landau, et al.
- 1992
|
|
14
|
Pattern discovery on character sets and real valued data: Linear bound on irredundant motifs and an efficient polynomial time algorithm
– Parida, Rigoutsos, et al.
- 2000
|
|
14
|
A double combinatorial approach to discovering patterns in biological sequences
– Sagot, Viari
- 1996
|
|
13
|
A distance-based block searching algorithm
– Sagot, Viari, et al.
- 1995
|
|
10
|
Extracting structured motifs using a sux tree { algorithms and application to promoter consensus identi
– Marsan, Sagot
- 2000
|
|
10
|
Multiple comparison: a peptide matching approach
– Sagot, Viari, et al.
- 1997
|
|
9
|
Identifying satellites and periodic repetitions in biological sequences
– Sagot, Myers
- 1998
|
|
8
|
algorithme rapide pour une compression modulaire optimale. Application `a l'analyse de s'equences g'en'etiques
– Un
- 1997
|
|
6
|
Rapid identi of repeated patterns in strings, trees and arrays
– Karp, Miller, et al.
- 1972
|
|
4
|
Generalized su#x trees for biological sequence data: applications and implementations
– Bieganski, Riedl, et al.
- 1994
|
|
3
|
Sagot M: Algorithms for extracting structured motifs using a Suffix tree with an application to promoter and regulatory site consensus identification
– Marsan
|
|
3
|
Amino acid substitutions in srtucturally related proteins: a pattern recognition approach
– Risler, Delorme, et al.
- 1988
|
|
2
|
Sasisekharan.Stereochemistry of polypeptide chain con
– Ramachandran, Ramakrishnan, et al.
- 1963
|
|
2
|
Delgrange.A step toward chromosome analysis by compression algorithms.In
– Rivals, O
- 1995
|
|
2
|
Searching for repeated patterns using a non transitive similarity relation
– Soldano, Viari, et al.
- 1995
|
|
2
|
Waterman.General methods of sequence comparison.Bull
– S
- 1984
|
|
2
|
Waterman.Consensus patterns in sequences.In
– S
- 1989
|
|
1
|
Stormo.Sequence landscapes.Nucleic Acids Res
– Clift, Haussler, et al.
- 1986
|
|
1
|
optimal algorithm for computing the repetitions in a word.Inf
– An
- 1981
|
|
1
|
Rytter.Text algorithms.Oxford
– Crochemore, W
- 1994
|
|
1
|
Orcutt.A model of evolutionary change in proteins.In
– Dayho, Schwartz, et al.
- 1978
|
|
1
|
Viari.Pairwise and multiple identi of three dimensional common substructures in proteins.J
– Escalier, Pothier, et al.
- 1996
|
|
1
|
Hui.Color set size problem with applications to string matching.In
– K
- 1992
|
|
1
|
discovery of conserved patterns using a pattern graph.Comput
– Ecient
- 1997
|
|
1
|
Higgins.Finding patterns in unaligned protein sequences.Protein Science
– Jonassen, Collins, et al.
- 1995
|
|
1
|
Taylor.Discovery of local packing motifs in protein structures.Proteins: Structure, Function, and Genetics
– Jonassen, Eidhammer, et al.
- 1999
|
|
1
|
Schmidt.An algorithm for approximate tandem repeats.In Z
– Landau, J
- 1993
|
|
1
|
McCreight.A space-economical sux tree construction algorithm.J
– M
- 1976
|
|
1
|
Jurka.Discovering simple DNA sequences by the algorithmic signi method.Comput
– Milosavljevic, J
- 1993
|
|
1
|
Korn.Improvements to a program for DNA analysis: a procedure to homologies among many sequences.Nucleic Acids Res
– Queen, Wegman, et al.
- 1982
|
|
1
|
Soldano.Searching for repeated words in a text allowing for mismatches and gaps.In
– Sagot, Escalier, et al.
- 1995
|
|
1
|
Soldano.Finding patterns in a text - an application to 3D molecular matching.Comput
– Sagot, Viari, et al.
- 1995
|
|
1
|
for discovering novel motifs in nucleic acid sequences.Comput
– Methods
- 1989
|
|
1
|
Ukkonen.Constructing sux trees on-line in linear time.pages
– unknown authors
- 1992
|
|
1
|
Waterman.Multiple sequence alignments by consensus.Nucleic Acids Res
– S
- 1986
|