by Yasubumi Sakakibara, Michael Brown, Richard Hughey, I. Saira Mian, Kimmen Sjolander, Rebecca C. Underwood, David Haussler
ftp://ftp.cse.ucsc.edu/pub/rna/cpm94.ps.Z
Add To MetaCart
Abstract:
Abstract. Stochastic context-free grammars (SCFGs) can be applied to the problems of folding, aligning and modeling families of homologous RNA sequences. SCFGs capture the sequences ' common primary and secondary structure and generalize the hidden Markov models (HMMs) used in related work on protein and DNA. This paper discusses our new algorithm, Tree-Grammar EM, for deducing SCFG parameters automatically from unaligned, unfolded training sequences. Tree-Grammar EM, a generalization of the HMM forward-backward algorithm, is based on tree grammars and is faster than the previously proposed inside-outside SCFG training algorithm. Independently, Sean Eddy and Richard Durbin have introduced a trainable "covariance model " (CM) to perform similar tasks. We compare and contrast our methods with theirs. Tools for analyzing RNA will become increasingly important as in vitro evolution and selection techniques produce greater numbers of synthesized RNA families to supplement those related by phylogeny. Recent efforts have applied stochastic context-free grammars (SCFGs) to the problems of statistical modeling, multiple alignment, discrimination and prediction of the secondary structure of RNA families. Our approach in applying SCFGs to modeling RNA is highly related to our work on modeling protein families and domains with HMMs
Citations
|
2103
|
A tutorial on hidden markov models and selected applications in speech recognition
– Rabiner
- 1989
|
|
248
|
The estimation of stochastic context-free grammars using the inside-outside algorithm
– Lari, Young
- 1990
|
|
169
|
Syntactic Pattern Recognition and Applications
– Fu
- 1982
|
|
155
|
The Theory of Parsing
– Aho, Ullman
- 1972
|
|
119
|
Rna sequence analysis using covariance models
– Eddy, Durbin
- 1994
|
|
107
|
Generalized finite automata theory with an application to a decision problem of secondorder logic
– Thatcher, Wright
- 1968
|
|
103
|
On finding all suboptimal foldings of an RNA molecule
– Zuker
- 1989
|
|
100
|
Simultaneous solution of the RNA folding, alignment, and proto-sequence problems
– Sankoff
- 1985
|
|
100
|
Comparing multiple rna secondary structures using tree comparison
– Shapiro, Zhang
- 1990
|
|
99
|
Algorithms for loop matching
– Nussinov, Pieczenik, et al.
- 1978
|
|
71
|
Efficient learning of context-free grammars from positive structural examples
– Sakakibara
- 1992
|
|
60
|
The Statistical Analysis of Discrete Data
– Santner, Duffy
- 1989
|
|
52
|
Protein modeling using hidden Markov models: Analysis of globins
– Haussler, Krogh, et al.
- 1993
|
|
49
|
RNA structure prediction
– Turner, Sugimoto, et al.
- 1988
|
|
44
|
The Linguistics of DNA
– Searls
- 1992
|
|
38
|
Improved estimation of secondary structure in ribonucleic acids
– Tinoco, Borer, et al.
- 1973
|
|
36
|
The computational linguistics of biological sequences
– Searls
- 1993
|
|
33
|
String Variable Grammar: A logic grammar formalism for the biological language of DNA
– Searls
- 1993
|
|
24
|
Trainable grammars for speech recognition, Speech Communication Papers for the 97th Meeting of the Acoustical Society of America
– Baker
- 1979
|
|
19
|
Graph grammars based on node rewriting: an introduction to nlc graph grammars
– Engelfriet, Rozenberg
- 1991
|
|
15
|
Modelling of the three-dimensional architecture of group I catalytic introns based on comparative sequence analysis
– Michel, Westhof
- 1990
|
|
14
|
A branch-and-bound algorithm for optimal protein threading with pairwise (contact potential) amino acid interactions
– Lathrop, Smith
- 1994
|
|
13
|
A syntactic pattern recognition system for DNA sequences
– Searls, Dong
- 1993
|
|
12
|
Detailed analysis of the higher-order structure of 16s-like ribosomal ribonucleic acids
– Woese, Gutell, et al.
- 1983
|
|
11
|
Consensus methods for folding single-stranded nucleic acids
– Waterman
- 1989
|
|
10
|
Detection of correlations in trna sequences with structural implications
– Klinger, Brutlag
- 1993
|
|
9
|
Comparative and functional anatomy of group II catalytic introns—a review
– Michel, Umesono, et al.
- 1989
|
|
6
|
Computer analysis of nucleic acid sequences
– Waterman
- 1988
|
|
5
|
5S RNA secondary structure
– Fox
- 1975
|
|
4
|
Phylogenetic and genetic evidence for base-triples in the catalytic domain of group I introns
– Michel, Ellington, et al.
- 1990
|
|
4
|
Principles of nucleic acid structure. (Springer Advanced Texts in Chemistry series
– Saenger
- 1984
|
|
4
|
Structure and function of signal recognition particle RNA
– Zwieb
- 1989
|
|
3
|
Secondary structure prediction of RNA
– Gouy
- 1987
|
|
3
|
Comparative structural analysis of nuclear RNase P RNAs from yeast
– Tranguch, Engelke
- 1993
|
|
3
|
RNA folding: pseudoknots, loops and bulges
– Wyatt, Puglisi, et al.
- 1989
|
|
1
|
Spliceosomal snRNAs. Annual Review of Genetics, 22:387--419
– Guthrie, Patterson
- 1988
|
|
1
|
The ribosomal database project
– communication
- 1992
|