MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  This is page 1 Printer: Opaque this 1 Pattern Inference under many Guises

Download:
Download as a PDF | Download as a PS
by Marie-france Sagot, Yoshiko Wakabayashi
http://www-igm.univ-mlv.fr/~sagot/publications/ceara/ceara.ps
Add To MetaCart

Abstract:

This paper surveys some of the main combinatorial methods for inferring patterns from a string, or a set of strings. The types of problems that will be addressed are repeat identication and common pattern inference. The strings that will concern us represent biological entities, nucleic acid and protein sequences or, in some cases, structures. As is well-known, exact (\identical") patterns hardly make sense in biology; we consider here two types of similar (\nonidentical ") patterns. One comes from looking at what \hides " behind each letter of the dna/rna or protein alphabet while the other corresponds to the more familiar notion of \errors". The errors concern mutational events that may aect a molecule during dna replication. Those that will be of interest to us are point mutations, that is, mutations operating each time on single letters of a biological sequence: substitution, insertion or deletion. Various notions of \similarity" are discussed, as well as other constraints the patterns may have to satisfy depending on the biological situation one is facing. Algorithms for the various notions, together with complexity analyses, are presented in some detail. 1.1

Citations

65 Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Nucleic Acids Res – Helden, Rios, et al. - 2000
49 Rigorous patternrecognition methods for DNA sequence: analysis of promoter sequences from Escherichia coli – Galas, Eggert, et al. - 1985
32 Heniko . Amino acid substitution matrices from protein blocks – Heniko, G - 1992
29 F.(1999): Promoter sequences and algorithmical methods for identifying them – VANET, MARSAN, et al.
25 Functional promoter modules can be detected by formal models independent of overall nucleotide sequence similarity. Bioinformatics – Klingenhoff, Frech, et al. - 1999
21 Identification of common motifs in unaligned dna sequences: application to Escherichia coli Lrp regulon. Bioinformatics – Fraenkel, Mandel, et al. - 1995
20 An Algorithm For Locating Non-Overlapping Regions of Maximum Alignment – Kannan, Myers - 1996
20 Inferring regulatory elements from a whole genome. an analysis of helicobacter pylori � � family of promoter signals – Vanet, Marsan, et al.
19 Direct Construction of Compact Directed Acyclic Word Graphs – Crochemore, Verin - 1997
18 Identifying satellites in nucleic acid sequences – Sagot, Myers - 1998
17 The evolutionary dynamics of repetitive DNA in eukaryotes – Charlesworth, Sniegowski, et al. - 1994
17 Identifying periodic occurrences of a template with applications to protein structure – Fischetti, Landau, et al. - 1992
14 Pattern discovery on character sets and real valued data: Linear bound on irredundant motifs and an efficient polynomial time algorithm – Parida, Rigoutsos, et al. - 2000
14 A double combinatorial approach to discovering patterns in biological sequences – Sagot, Viari - 1996
13 A distance-based block searching algorithm – Sagot, Viari, et al. - 1995
10 Extracting structured motifs using a sux tree { algorithms and application to promoter consensus identi – Marsan, Sagot - 2000
10 Multiple comparison: a peptide matching approach – Sagot, Viari, et al. - 1997
9 Identifying satellites and periodic repetitions in biological sequences – Sagot, Myers - 1998
8 algorithme rapide pour une compression modulaire optimale. Application `a l'analyse de s'equences g'en'etiques – Un - 1997
6 Rapid identi of repeated patterns in strings, trees and arrays – Karp, Miller, et al. - 1972
4 Generalized su#x trees for biological sequence data: applications and implementations – Bieganski, Riedl, et al. - 1994
3 Sagot M: Algorithms for extracting structured motifs using a Suffix tree with an application to promoter and regulatory site consensus identification – Marsan
3 Amino acid substitutions in srtucturally related proteins: a pattern recognition approach – Risler, Delorme, et al. - 1988
2 Sasisekharan.Stereochemistry of polypeptide chain con – Ramachandran, Ramakrishnan, et al. - 1963
2 Delgrange.A step toward chromosome analysis by compression algorithms.In – Rivals, O - 1995
2 Searching for repeated patterns using a non transitive similarity relation – Soldano, Viari, et al. - 1995
2 Waterman.General methods of sequence comparison.Bull – S - 1984
2 Waterman.Consensus patterns in sequences.In – S - 1989
1 Stormo.Sequence landscapes.Nucleic Acids Res – Clift, Haussler, et al. - 1986
1 optimal algorithm for computing the repetitions in a word.Inf – An - 1981
1 Rytter.Text algorithms.Oxford – Crochemore, W - 1994
1 Orcutt.A model of evolutionary change in proteins.In – Dayho, Schwartz, et al. - 1978
1 Viari.Pairwise and multiple identi of three dimensional common substructures in proteins.J – Escalier, Pothier, et al. - 1996
1 Hui.Color set size problem with applications to string matching.In – K - 1992
1 discovery of conserved patterns using a pattern graph.Comput – Ecient - 1997
1 Higgins.Finding patterns in unaligned protein sequences.Protein Science – Jonassen, Collins, et al. - 1995
1 Taylor.Discovery of local packing motifs in protein structures.Proteins: Structure, Function, and Genetics – Jonassen, Eidhammer, et al. - 1999
1 Schmidt.An algorithm for approximate tandem repeats.In Z – Landau, J - 1993
1 McCreight.A space-economical sux tree construction algorithm.J – M - 1976
1 Jurka.Discovering simple DNA sequences by the algorithmic signi method.Comput – Milosavljevic, J - 1993
1 Korn.Improvements to a program for DNA analysis: a procedure to homologies among many sequences.Nucleic Acids Res – Queen, Wegman, et al. - 1982
1 Soldano.Searching for repeated words in a text allowing for mismatches and gaps.In – Sagot, Escalier, et al. - 1995
1 Soldano.Finding patterns in a text - an application to 3D molecular matching.Comput – Sagot, Viari, et al. - 1995
1 for discovering novel motifs in nucleic acid sequences.Comput – Methods - 1989
1 Ukkonen.Constructing sux trees on-line in linear time.pages – unknown authors - 1992
1 Waterman.Multiple sequence alignments by consensus.Nucleic Acids Res – S - 1986