| Staden, R. (1990) Searching for patterns in protein and nucleic acid sequences. Methods Enzymol., 183, 193-211. |
....are: QUEST [1] and ANREP [39] can only search for sequential patterns. PROSITE [3] accepts patterns described in a declarative notation, but only sequential patterns can be given. The expressive power of its specification language lies within the class of regular languages. Staden s program [55] is the first system that we are aware of in which one could search for structural patterns, although in a restricted way. A pattern is defined to as comprise motifs and is built up and searched for by interactively specifying new motifs, giving the class to which a motif belongs. Nine classes are ....
....the start points of a pair of helices. cBLISS [45] is an implementation in the constraint logic programming language Eclipse of the language of Brazma and Gilbert [8] for describing constrained patterns in biosequences. This language is a formalisation and development of Staden s pattern language [55]. Brazma and Gilbert follow the notations used by Staden, and consider a pattern to comprise motifs as the basic elements. A motif may be a simple string, ff 2 Sigma for some alphabet Sigma, or a more complex expression in some grammar. Motifs can be combined in a logical manner using AND, OR ....
R. Staden. Searching for Patterns in Protein and Nucleic Acid Sequencies. In R. F. Doolittle, editor, Methods in Enzymology, Vol. 183, pages 193--211. Academic Press, 1990.
....[AEM 84] and ANREP [MM93] can only search for sequential patterns. PROSITE [BBH95] accepts patterns described in a declarative notation, but only sequential patterns can be given. The expressive power of its specification language lies within the class of regular languages. Staden s program [Sta90] is the first system that we are aware of in which one could search for structural patterns, although in a restricted way. A pattern is defined to as comprise motifs and is built up and searched for by interactively specifying new motifs, giving the class to which a motif belongs. Nine classes ....
....points of a pair of helices. cBLISS [Rat96] is an implementation in the constraint logic programming language Eclipse of the language of Brazma and Gilbert [BG95] for describing constrained patterns in biosequences. This language is a formalisation and development of Staden s pattern language [Sta90] Brazma and Gilbert follow the notations used by Staden, and consider a pattern to comprise motifs as the basic elements. A motif may be a simple string, ff 2 Sigma for some alphabet Sigma, or a more complex expression in some grammar. Motifs can be combined in a logical manner using AND, OR ....
R. Staden. Searching for Patterns in Protein and Nucleic Acid Sequencies. In R. F. Doolittle, editor, Methods in Enzymology, Vol. 183, pages 193--211. Academic Press, 1990.
....than the common pattern they share. For the purposes of this research note, it is assumed that the pattern being searched for is known or assumed to be W residues long. The patterns discovered by aligning a set of sequences can be described as consensus patterns [Chappey et al. 1991] motifs [Staden, 1990], profiles [Gribskov et al. 1990] specificity matrices [Hertz et al. 1990] or regular expressions. Hertz, Hartzell and Stormo [Hertz et al. 1990] describe a successful program which automatically aligns sets of sequences, produces a specificity matrix describing the discovered pattern and ....
Rodger Staden. Searching for patterns in protein and nucleic acid sequences. Methods in Enzymology, 183:193--210, 1990.
....so the distribution functions are discrete. This makes it possible to use iterative formulas to calculate the distribution functions. Step 1) Estimate C i (x) the probability of observing a match score f i (s) x with a random sequence of the same length as sequence s using the method of Staden [1990]. Step 2) Estimate D i (x) the probability distribution of f i (s) using D i (x) P r(f i (s) x) C i (x) Gamma C i (x 1) if x R i C i (R i ) otherwise. 5) Step 3) Estimate P (x) the cumulative distribution function for g(s; Q) using an induction formula similar to that used in ....
Rodger Staden. Searching for patterns in protein and nucleic acid sequences. Methods in Enzymology, 183:193--210, 1990.
....at a conserved position in a protein family. Accurate estimates of amino acid frequencies are useful not only for characterizing a protein family, but also for classifying a given protein for possible membership in the family. Amino acid frequencies are the basis for computing weight matrices [15] and profiles [6] which are essentially log odds representations of frequencies. Accurate frequency estimates are also central to hidden Markov model [10] and Gibbs sampling [11] methods in computational biology. Each aligned position in a protein family yields a sample of observed amino acid ....
R. Staden. Searching for patterns in protein and nucleic acid sequences. Methods in Enzymology, 183:193--211, 1990.
....is very important if we want to use it for predicting biological properties of the sequences too specific a language may not be expressive enough, too a general language may lead to a hypothesis space too large for efficient search. Biologists have introduced quite a large number of languages [3, 8, 9, 11, 12, 14, 15, 16] each of which differs from the others in more or less important ways, as well as differing from what is commonly understood by pattern languages in computer science [1, 2, 5, 10] Up to now, computer scientists have paid relatively little attention to these biopattern languages. One of the ....
....models less interesting for mathematicians. As an exception to the rule, recently Melhdau and Myers [7] and Myers [6] have studied constrained pattern matching for biosequences using network expressions with spacers. In this paper, we present a formal language based on Staden s pattern language [14], one of several languages employed by biologists for pattern search in genetic databases. After presenting informal description of Staden s pattern language, we will try to formalise it and study some of its mathematical properties. Then we describe a simple, but efficient and elegant algorithm ....
[Article contains additional citation context not shown here]
R. Staden. Searching for patterns in protein and nucleic acid sequences. Methods in Enzymology, 183:193--211, 1990.
....use structure predictions and biochemical properties. Some allow for mismatches and insertion of gaps, and have different ways for penalising mismatches and gaps. Some (of the best known) programs or languages are: QUEST [AEM 84] can only search for sequential patterns. Staden s program [Sta90] is the first system that we are aware of in which one could search for structural patterns, though in a restricted way. He defines a pattern as comprising motifs. A pattern is built up and searched for by interactively specifying new motifs, by giving the class to which a motif belongs. Nine ....
....variables can be specified. cBLISS [Rat96] is an implementation in the constraint logic programming language Eclipse of the language of Br azma and Gilbert [BG95] for describing constrained patterns in biosequences. This language is a formalisation and development of Staden s pattern language [Sta90] Br azma and Gilbert follow the notations used by Staden, and consider a pattern to comprise motifs as the basic elements. A motif may be a simple string, ff 2 Sigma for some alphabet Sigma or a more complex expression in some grammar. Motifs can be combined in a logical manner using AND, ....
R. Staden. Searching for Patterns in Protein and Nucleic Acid Sequencies. In R. F. Doolittle, editor, Methods in Enzymology, Vol. 183, pages 193--211. Academic Press, 1990.
....Brown, Richard Hughey, I. Saira Mian, Kimmen Sjolander and my advisor David Haussler. I am indebted in particular to biologist Saira Mian for providing the snRNA data sets and alignments she developed with Christine Guthrie et al. I thank Tracy Larrabee for generously allowing me to use her Decstation 5000 240 computer, without which the results of this thesis would not have been obtainable. I am grateful to Leslie Grate, author of the graphical grammar describing program scfgedit, which greatly simplified the task of designing my initial grammars. For supplying his graphical RNAdisplaying program ....
....overly sensitive to point mutations. 4 Some researchers are attempting to combine both phylogenetic and energetic approaches [LZ91] Using methods different from those described above, several groups have enumerated schemes or programs to search for patterns in proteins or nucleic acid sequences [Sta90, LWS87, SA90, AWM 84, SM87, GMC90, CAKF86, PC93] String pattern matching programs based on the UNIX grep function, developed in unpublished work by S. R. Eddy [STG92] and others [MMCN93] search for secondary structure elements in a sequence database. If there is prior knowledge about ....
[Article contains additional citation context not shown here]
R. Staden. Searching for patterns in protein and nucleic acid sequences. In Doolittle [Doo90], pages 193--211.
....components of the pattern to be searched for. Examples of languages which can only search for sequential patterns are QUEST [3] SCRUTINEER [26] ANREP [21] as well as PROSITE [5] which uses a declarative notation. Languages which can search for structural patterns include Staden s language [28], OVERSEER [27] PALM [18] and GENLANG [25] Palingol [7] is a constraint programming language whose data types and search engines have been adapted for secondary structures, and which is implemented directly in C; constraints are boolean expressions. We think that one of the advantages our our ....
R. Staden. Searching for Patterns in Protein and Nucleic Acid Sequencies. In R. F. Doolittle, editor, Methods in Enzymology, Vol. 183, pages 193--211. Academic Press, 1990.
No context found.
Staden, R. (1990) Searching for patterns in protein and nucleic acid sequences. Methods Enzymol., 183, 193-211.
No context found.
--825. Staden, R. (1990). Searching for patterns in protein and nucleic acid sequences. In: (Doolittle,
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC