by Michael Brown, Richard Hughey, Anders Krogh, I. Saira Mian, Sinsheimer Laboratories, Kimmen Sjolander, David Haussler
Proceedings of the Third International Conference on Intelligent Systems for Molecular Biology
http://alpha-bits.ai.mit.edu/people/mpf/homedir/files/limbo/levels/level1/research/Mol.Bio./HMM-protein/ismb93.ps
Add To MetaCart
Abstract:
A Bayesian method for estimating the amino acid distributions in the states of a hidden Markov model (HMM) for a protein family or the columns of a multiple alignment of that family is introduced. This method uses Dirichlet mixture densities as priors over amino acid distributions. These mixture densities are determined from examination of previously constructed HMMs or multiple alignments. It is shown that this Bayesian method can improve the quality of HMMs produced from small training sets. Specific experiments on the EF-hand motif are reported, for which these priors are shown to produce HMMs with higher likelihood on unseen data, and fewer false positives and false negatives in a database search task.
Citations
|
2961
|
Pattern Classification and Scene Analysis
– Duba, Hart
- 1973
|
|
628
|
Statistical decision theory and Bayesian analyses
– Berger
- 1985
|
|
332
|
Hidden Markov models in computational biology. Applications to protein modeling
– Krogh, Brown, et al.
- 1994
|
|
180
|
A model of evolutionary change in proteins
– Dayhoff, Schwartz, et al.
- 1978
|
|
154
|
Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. The Annals of Statistics 2:1152–1174
– Antoniak
- 1974
|
|
141
|
A method to identify protein sequences that fold into a known three-dimensional structure. Science 253:164–170
– JU, Lüthy, et al.
- 1991
|
|
119
|
Database of homology-derived protein structures and the 38. Fauchere JL, Pliska V. Hydrophobic parameters pi of amino-acid side chains from structural meaning of sequence alignment. Proteins 1991; 9: 56-8 the partitioning of N-acetyl-amino-acid amides. Eu
– Sander, Schneider
- 1983
|
|
81
|
Stochastic models for heterogeneous DNA sequences
– Churchill
- 1989
|
|
60
|
The Statistical Analysis of Discrete Data
– Santner, Duffy
- 1989
|
|
52
|
Protein modeling using hidden Markov models: Analysis of globins
– Haussler, Krogh, et al.
- 1993
|
|
38
|
Smooth on-line learning algorithms for hidden Markov models
– Baldi, Chauvin
- 1994
|
|
36
|
The classification of amino acid conservation
– Taylor
- 1986
|
|
29
|
Line geometries for sequence comparisons
– Waterman, Perlwitz
|
|
20
|
Secondary structure-based profiles: Use of structure-conserving scoring tables in searching protein sequence databases for structural similarities. Profeins Struct Funct Genef 10~229-239. Downloaded from www.proteinscience.org on April 17
– Liithy, McLachlan, et al.
- 1991
|
|
16
|
Flexible protein sequence patterns: A sensitive method to detect weak structural similarities
– Barton, Sternberg
- 1990
|
|
15
|
Machine learning approach for the prediction of protein secondary structure
– King, Sternberg
- 1990
|
|
14
|
Structural analysis based on state-space modeling
– Stultz, White, et al.
- 1993
|
|
13
|
Onizuka: “HMM with Protein Structure Grammar
– Asai, Hayamizu, et al.
- 1993
|
|
12
|
Evolution of EF-hand calciummodulated proteins. I. relationships based on amino acid sequences
– Moncrief, Kretsinger, et al.
- 1990
|
|
11
|
The EF-hand family of calcium-modulated proteins. Trends in Neurosciences 12(11):462--467
– Persechini, Moncrief, et al.
- 1989
|
|
8
|
Soft weight-sharing
– Nowlan, Hinton
- 1992
|
|
7
|
Adaptive algorithms for modeling and analysis of biological primary sequence information
– Baldi, Chauvin, et al.
- 1992
|
|
7
|
Profile analysis. Methods in Enzymology 183:146--159
– Gribskov, Luthy, et al.
- 1990
|
|
7
|
Protein kinase catalytic domain sequence database: identification of conserved features of primary structure and classification of family members. Methods in Enzymology
– Hanks, Quinn
- 1991
|
|
7
|
A smooth learning algorithm for hidden Markov models
– Baldi, Chauvin
- 1993
|
|
6
|
Massively parallel biosequence analysis
– Hughey
- 1993
|
|
4
|
Prediction of Protein Structure and The Principles qf Protein Conformation
– Fasman
- 1989
|
|
2
|
Dirichlet mixture priors for HMMs
– Brown, Hughey, et al.
- 1993
|
|
2
|
Representing Amino Acids with Bitstrings
– Hunter
- 1991
|
|
1
|
Massively parallel bioseqeunce analysis
– Hughey
- 1993
|
|
1
|
The classification of amino acid conservation. Journ Theor Biol 119:205--218
– Taylor
- 1986
|