| Brown, M. P.; Hughey, R.; Krogh, A.; Mian, I. S.; Sj# olander, K.; and Haussler, D. 1993. Using Dirichlet mixture priors to derive hidden Markov models for protein families. In Hunter, L.; Searls, D.; and Shavlik, J., editors 1993, ISMB-93, Menlo Park, CA. AAAI#MIT Press. 47#55. |
....tend to appear in groups that share similar biochemical properties such as the hydrophobic group. The presence of one amino acid in the group increases the likelihood of seeing other amino acids from the same group. A method that addresses these problems is mixtures of Dirichlet distributions [4, 28]. The expectation and the maximum aposteriori value of a (single component) random variable from the Dirichlet distribution can be viewed as a smoothing process with a pseudo count. A mixture of Dirichlet distributions can encode the grouping information by having a component in the mixture for ....
M. Brown, R. Hughey, A. Krogh, I. Mian, K. Sjolander, and D. Haussler. Using Dirichlet mixture priors to derive hidden Markov models for protein families. In C. Rawlings, editor, Proceedings of the Third International Conference on Intelligent Systems for Molecular Biology, pages 4755. AAAI Press, 1995.
....In our test domain of coiled coils, we found that this method of updating probabilities missed more sequences that contain coiled coils than did our method for updating probabilities. Using Dirichlet mixture densities as priors to estimate amino acid probabilities has been studied by Brown et al. [29]. Their approach uses as a prior the maximum likelihood estimate of a mixture Dirichlet density, based on data previously obtained from multiple alignments of various sets of sequences. Their approach is a pure Bayesian approach, and their prior distribution has a smaller effect on the final ....
M. Brown, R. Hughey, A. Krogh, I. S. Mian, K. Sjolander, and D. Haussler. Using Dirichlet mixture priors to derive hidden Markov models for protein families. In International Conference on Intelligent Systems and Molecular Biology, pages 47--55, 1993.
.... what are the e#ects of taking a wrong guess at the underlying architecture, e.g. how close can we come to the probability distribution of the original model when varying the length of a profile hidden Markov model relative to the length of the original model how much do Dirichlet mixtures [21] help are there di#erences between di#erent packages for constructing hidden Markov models, e.g. HMMER [1] SAM [3] and HMMpro [2] Two major sources of uncertainties when working with hidden Markov models is how successful we are at reconstructing a model based on a set of data and to what ....
M. P. Brown, R. Hughey, A. Krogh, I. S. Mian, K. Sjolander, and D. Haussler. Using Dirichlet mixture priors to derive hidden Markov models for protein families. In L. Hunter, D. Searls, and J. Shavlik, editors, Proceedings of the 1st International Conference on Intelligent Systems for Molecular Biology (ISMB), pages 47--55, Menlo Park, California, U.S.A., July 1993. AAAI/MIT Press.
....tend to appear in groups that share similar biochemical properties such as the hydrophobic group. The presence of one amino acid in the group increases the likelihood of seeing other amino acids from the same group. A method that addresses these problems is mixtures of Dirichlet distributions [4, 27]. The expectation and the maximum aposteriori value of a (singlecomponent) random variable from the Dirichlet distribution can be viewed as a smoothing process with a pseudo count. A mixture of Dirichlet distributions can encode the grouping information by having a component in the mixture for ....
M. Brown, R. Hughey, A. Krogh, I. Mian, K. Sjolander, and D. Haussler. Using Dirichlet mixture priors to derive hidden Markov models for protein families. In C. Rawlings, editor, Proceedings of the Third International Conference on Intelligent Systems for Molecular Biology, pages 47--55. AAAI Press, 1995.
....Quantitative measure for regularizers The traditional method used in computational biology to demonstrate the superiority of one technique to another is to compare them on a biologically interesting search or alignment problem. Many of the regularizers in Section 5 have been validated in this way [9, 4, 18]. This sort of anecdotal evidence is valuable for establishing the usefulness of techniques in real biological problems, but is very dicult to quantify. In this paper, regularizers are compared quantitatively on the problem of encoding the columns of multiple alignments. This generic problem has ....
....mentioned so far. Note that adding pseudocounts is not equivalent to any change in the substitution matrix and makes a noticeable improvement in the excess entropy, probably justifying the 5 increase in the number of parameters. 5. 6 Dirichlet mixtures The Dirichlet mixture method introduced in [4] has similarities to the pseudocount methods, but is somewhat more complex. They have been used quite successfully by several researchers [4, 18, 11] The results excess entropy subst pseudo subst scaled subst pseudo scaled optimized for jsj = optimized for jsj = optimized for jsj = jsj 2 ....
[Article contains additional citation context not shown here]
M. P. Brown, R. Hughey, A. Krogh, I. S. Mian, K. Sjolander, and D. Haussler. Using Dirichlet mixture priors to derive hidden Markov models for protein families. In L. Hunter, D. Searls, and J. Shavlik, editors, ISMB-93, pages 47-55, Menlo Park, CA, July 1993. AAAI/MIT Press.
....substitution is empirical, it enjoys the same advantages and suffers the same limitations as all empirical studies. One feature of our approach, which could be viewed as either an advantage or limitation, is that our analysis is general. It is not as general as work on Dirichlet mixture priors [Brown et al. 1993], because our work conditions on specific groups that represent specific biochemical contexts. Nevertheless, the substitution groups in our study are conserved empirically across an entire database. In contrast, many models for describing conservation, such as motifs [Bairoch 1991] profiles ....
Brown, M., Hughey, R., Krogh, A., Mian, I. S., Sjlander, K., and Haussler, D. 1993. Using Dirichlet mixture priors to derive hidden Markov models for protein families. ISMB--93, pages 47--55.
.... kinds of patterns that are expected (i.e. definitions of contexts and properties) This knowledge based approach contrasts with other methods, such as using Bayesian techniques to calculate prior weights for mixtures of similarity scales that best explain a set of amino acids observed at a site (Brown et al. 1993). In our experiments, we used biochemical knowledge to choose a particular definition of context that we believed would be useful in separating sites with distinct substitution patterns, and to choose a particular set of properties that we expected might be relevant. These choices resulted in the ....
Brown, M.; Hughey, R.; Krogh, A.; Mian, I.; Sjolander, K.; and Haussler, D. 1993. Using Dirichlet mixture priors to derive hidden Markov models for protein families. In Proceedings of the First International Conference on Intelligent Systems for Molecular Biology, 47--55.
....to modeling proteins using HMMs is that they contain many free parameters and therefore require a large amount of training data. A typical, 200 state HMM may contain on the order of 5000 trainable parameters. Adequate training of such a model can require on the order of 200 homologous sequences [13]. The use of empirically derived Dirichlet mixture priors [13, 38] can partially offset the need for larger training sets. The size of the model may be greatly reduced by focusing only upon regions that are highly conserved across family members. Usually these regions, called motifs, have been ....
....parameters and therefore require a large amount of training data. A typical, 200 state HMM may contain on the order of 5000 trainable parameters. Adequate training of such a model can require on the order of 200 homologous sequences [13] The use of empirically derived Dirichlet mixture priors [13, 38] can partially offset the need for larger training sets. The size of the model may be greatly reduced by focusing only upon regions that are highly conserved across family members. Usually these regions, called motifs, have been conserved by evolution for important structural or functional ....
M. Brown, R. Hughey, A. Krogh, I. Mian, K. Sjolander, and D. Haussler. Using Dirichlet mixture priors to derive hidden Markov models for protein families. In C. Rawlings et al., editor, Proceedings of the Third International Conference on Intelligent Systems for Molecular Biology, pages 47--55. AAAI Press, 1995.
....expression for this task might be coding region finding, but gene finding has become standard usage. b The VEIL system is available at fpt: ftp.cs.jhu.edu pub veil. This directory contains the executables and the database used in this study. for protein secondary structure prediction [7, 4]. Krogh et al. 14] have used HMMs to find genes in E. coli, where the problem of introns does not arise. The VEIL system described herein demonstrates how to use HMMs to find complex gene structures in eukaryotic DNA sequences. Kulp et al. 15, 20] are also developing an HMM system for this task, ....
M. Brown, R. Hughey, A. Krogh, I. Mian, K. Sjolander, and D. Haussler. Using Dirichlet mixture priors to derive hidden Markov models for protein families. In L. Hunter, D. Searls, and J. Shavlik, editors, Proc. First Internatl. Conf. on Intelligent Systems for Molecular Biology (ISMB-93), pages 47--55, Menlo Park, CA, 1993. AAAI Press.
....to Bayesian networks representing knowledge about protein foldings. More precisely, we are interested in statistical modeling of protein folding motifs ( invariant parts in known 3D protein structures important for predicting unknown foldings ) This is related to research described in [9] 10] [11]. To summarize, the main thrusts of this project are : CNRS Research Project Proposal 4 ffl A new knowledge representation formalism ( Bayesian networks ) combining the explicit character of symbolic formalisms with computational power of numerical ones ; This is a new problem for machine ....
....promising line of research in machine learning needs further investigation. ffl Challenging real world application demanding Bayesian network based representation ; The stuck problem of protein structure prediction seems to get easier under a recent push of statistical modeling methods [9] 10] [11]. The rest of this proposal is organized as follows. In Section 2 we discuss Bayesian networks as knowledge representation formalism and compare them with neural networks and classification trees. In Section 3 we discuss in details the learning of Bayesian networks. We present Bayesian learning ....
Brown M. et al. (1993) Using Dirichlet mixture priors to derive hidden markov models for protein families, in Proc. of 1st Int. conf. on Intelligent Systems for Molecular Biology, pp. 47-55, July 6-9, National Library of Medicine, Bethesda, MD.
....in such a way that all the numbers become positive. The algorithm estimates q m;n as q m;n = C m;n D Theta pm N P 20 i=1 C i;n e M i;m N D where is the natural scale for the substitution matrix [44] The third algorithm estimates q m;n using a mixture of multiple Dirichlet distributions [13]. Here we assume that the amino acids in the n th column of the block B are generated independently at random according to an underlying probability distribution q = q 1 ; q 20 ) over the 20 amino acids, where q is chosen independently from a Dirichlet mixture density ae of the form ....
....1 i 20, be unknown parameters for the Dirichlet density ae j . The value of ae j at a particular point q is given by: ae j ( q) Q 20 i=1 q ff (j) i Gamma1 i Z where Z is the normalizing constant such that ae j integrates to unity. Using the standard expectation maximization algorithm [13, 23], one can estimate the k mixture coefficients fi j , 1 j k, and the Dirichlet parameter vectors ff (j) 1 ; ff (j) 20 . In the study presented here, we adopted the same fi, ff and k, which was set to 8, as used in Tatusov et al. s work [13, 81] Let P rob(jj C n ) denote the ....
[Article contains additional citation context not shown here]
M. Brown, R. Hughey, A. Krogh, I. S. Mian, K. Sjolander, and D. Haussler, "Using dirichlet mixture priors to derive hidden markov models for protein families," in Proceedings of the 1st International Conference on Intelligent Systems for Molecular Biology, Menlo Park, CA, pp. 47--55, AAAI, 1993. 99
....dynamic programming. One drawback to modeling proteins using HMMs is that they contain many free parameters and therefore require a large amount of training data. A typical, 200 state HMM may contain on the order of 5000 trainable parameters. The use of empirically derived Dirichlet mixture priors [13] can partially offset the need for larger training sets. The size of the model may be greatly reduced by focusing only upon regions that are highly conserved across family members. Usually these regions, called motifs, have been conserved by evolution for important structural or functional ....
M. Brown, R. Hughey, A. Krogh, I. Mian, K. Sjolander, and D. Haussler. Using Dirichlet mixture priors to derive hidden Markov models for protein families. In C. Rawlings et al., editor, Proceedings of the Third International Conference on Intelligent Systems for Molecular Biology, pages 47--55. AAAI Press, 1995.
....size can be accomplished to create megapriors. 4.1 Dirichlet mixture priors MEME uses a Dirichlet mixture prior on the parameters of the model components. Using a mixture of Dirichlet densities as a prior in the estimation of the parameters of a model of biopolymer sequences has been proposed by Brown et al. 1993]. This approach makes sense especially for proteins where many of the 20 letters in the sequence alphabet have similar chemical properties. Motif columns which give high probability to two (or more) letters representing similar amino acids are a priori more likely. A Dirichlet mixture density ....
....of the motif is then p (t 1) k = c k d(c k ) jc k d(c k )j for k = 1 to W . This gives the Bayes estimate of the letter probabilities for column k of the motif and is used to reestimate in the M step. The prior used by MEME for this paper is a 30 component Dirichlet mixture prior due to Brown et al. 1993]. The (approximate) parameters of the 30 component Dirichlet mixture prior (multiplied by 1000 and rounded to integers) are given Table 1. 1 Each column of the table contains the parameters of one component of the mixture. The component number i, mixing parameter q i and magnitude b i are shown ....
Michael Brown, Richard Hughey, Anders Krogh, I. Saira Mian, Kimmen Sjolander, and David Haussler. Using Dirichlet mixture priors to derive hidden Markov models for protein families. In Intelligent Systems for Molecular Biology, pages 47--55. AAAI Press, 1993.
....amino acids, and the coe#cient 5 was chosen empirically. Among the three methods, the unique multiple method produced the best results, as measured by searches using profiles. Furthermore, the unique multiple method performed slightly better than the Dirichlet mixture method for making profiles [5]. Figure 2 exemplifies the the di#erent values for K computed by existing methods and by the minimum risk method. As the figure shows, the minimum risk method produces a wider range of values for K. When amino acids are relatively well conserved, K is fractional. When they are not well conserved, ....
M. Brown, R. Hughey, A. Krogh, I. S. Mian, K. Sjolander, and D. Haussler. Using Dirichlet mixture priors to derive hidden Markov models for protein families. In L. Hunter, D. Searls, and J. Shavlik, editors, ISMB93, pages 47--55, Menlo Park, CA, 1993. AAAI Press.
....contains the executables and the database used in this study. finding periodicities in DNA [2] for exploring structural similarities of families of genes [6] for producing multiple sequence alignments [13, 3] for finding palindromic repeats [12] and for protein secondary structure prediction [7, 4]. Krogh et al. 14] have used HMMs to find genes in E. coli, where the problem of introns does not arise. The VEIL system described herein demonstrates the use of HMMs to find complex gene structures in eukaryotic DNA sequences. Kulp et al. 15, 19] have also developed an HMM system for this task ....
M. Brown, R. Hughey, A. Krogh, I. Mian, K. Sjolander, and D. Haussler. Using Dirichlet mixture priors to derive hidden Markov models for protein families. In L. Hunter, D. Searls, and J. Shavlik, editors, Proc. First Internatl. Conf. on Intelligent Systems for Molecular Biology (ISMB-93), pages 47--55, Menlo Park, CA, 1993. AAAI Press.
....DNA and protein sequences. For example, they have been used for finding periodicities in DNA [2] for exploring structural similarities of families of genes [6] for producing multiple sequence alignments [13, 3] for finding palindromic repeats [11] and for protein secondary structure prediction [8, 4]. Most of the models produced have been relatively small (in comparison to speech recognition systems) in part because of the limited amount of data available but also to reduce the number of free parameters of the system. In an HMM, larger models tend to have many more free parameters and ....
M. Brown, R. Hughey, A. Krogh, I. Mian, K. Sjolander, and D. Haussler. Using Dirichlet mixture priors to derive hidden Markov models for protein families. In L. Hunter, D. Searls, and J. Shavlik, editors, Proc. First Internatl. Conf. on Intelligent Systems for Molecular Biology (ISMB-93), pages 47--55, Menlo Park, CA, 1993. AAAI Press.
....analysis of DNA and protein sequences. HMMs have been used for finding periodicities in DNA [2] for exploring structural similarities of families of genes [6] for producing multiple sequence alignments [14, 3] for finding palindromic repeats [13] and for protein secondary structure prediction [7, 4, 8]. Krogh et al. 15] have used HMMs to find genes in E. coli, where the problem of introns does not arise. The VEIL system described herein demonstrates how to use HMMs to find complex gene structures in eukaryotic DNA sequences. Kulp et al. 16] are also developing an HMM system for this task, ....
M. Brown, R. Hughey, A. Krogh, I. Mian, K. Sjolander, and D. Haussler. Using Dirichlet mixture priors to derive hidden Markov models for protein families. In L. Hunter, D. Searls, and J. Shavlik, editors, Proc. First Internatl. Conf. on Intelligent Systems for Molecular Biology (ISMB-93), pages 47--55, Menlo Park, CA, 1993. AAAI Press.
....L k=1 c jk ; j = 0; W and k = 1; L: 19) Estimating the parameters of a multinomial random variable by maximum likelihood is subject to boundary problems. If any letter frequency f ij ever becomes 0, as is prone to happen in small datasets, its value can never change. Following Brown et al. 1993] and Lawrence et al. 1993] the equations above for f ij are replaced by f ij = c ij fi j P L k=1 c ik fi ; i = 0; W; j = 1; L; fi = L X k=1 fi k : 20) This turns out to be equivalent to using the Bayes estimate for the value of under squared error loss (SEL) ....
Michael Brown, Richard Hughey, Anders Krogh, I. Saira Mian, Kimmen Sjolander, and David Haussler. Using Dirichlet mixture priors to derive hidden Markov models for protein families. In Intelligent Systems for Molecular Biology, pages 47--55. AAAI Press, 1993.
No context found.
Brown, M. P.; Hughey, R.; Krogh, A.; Mian, I. S.; Sj# olander, K.; and Haussler, D. 1993. Using Dirichlet mixture priors to derive hidden Markov models for protein families. In Hunter, L.; Searls, D.; and Shavlik, J., editors 1993, ISMB-93, Menlo Park, CA. AAAI#MIT Press. 47#55.
....known distributions, and away from distributions that are unusual biologically. The models produced are more effective at generalizing to previously unseen data, and are often superior at database search and discrimination experiments (Karplus, 1995a; Tatusov et al. 1994; Bailey and Elkan, 1995; Brown et al. 1993) . 1.3.1 Comparison with other methods for computing these probabilities We are certainly not the first group to notice the need for incorporating prior information about such amino acid distributions into the parameter estimation process. Indeed, our present work has several conceptual ....
....with these priors to produce posterior estimates of the probabilities of the amino acids. Section 3.3 contains the mathematical derivation of the learning rule for estimating Dirichlet mixtures. In Section 4, we present an overview of work done both at Santa Cruz (Karplus, 1995a; Karplus, 1995b; Brown et al. 1993) and elsewhere (Tatusov et al. 1994; Bailey and Elkan, 1995; Henikoff and Henikoff, 1995) that demonstrates the effectiveness of these densities in a variety of statistical models, and the superiority of this technique in general over others tried. Some pointers to help users avoid underflow and ....
[Article contains additional citation context not shown here]
Brown, M. P.; Hughey, R.; Krogh, A.; Mian, I. S.; Sj¨olander, K.; and Haussler, D. 1993. Using Dirichlet mixture priors to derive hidden Markov models for protein families. In Hunter, L.; Searls, D.; and Shavlik, J., editors 1993, ISMB-93, Menlo Park, CA. AAAI/MIT Press. 47--55.
.... statistical models produced are more effective at generalizing to previously unseen data, and are often superior at database search and discrimination experiments (Wang et al. 1996; Hughey and Krogh, 1996; Karplus, 1995a; Bailey and Elkan, 1995; Tatusov et al. 1994; Henikoff and Henikoff, 1996; Brown et al. 1993) . 1.1 Database search using statistical models Statistical models for proteins capture the statistics defining a protein family or domain. These models have two essential aspects: 1) parameters for every position in the molecule or domain that express the probabilities of the amino acids, gap ....
....substitute on average for the conserved residue. Dirichlet mixtures were shown to give superior results in encoding multiple alignments and in database discrimination experiments in comparison with various pseudocount and substitution matrix based methods in (Karplus, 1995a; Tatusov et al. 1994; Brown et al. 1993; Henikoff and Henikoff, 1996) 2 Algorithm 2.1 Computing Amino Acid Probabilities The raw frequencies in small samples are often poor approximations to the distribution of amino acids among all proteins which the model is supposed to represent. This section will show how to use Dirichlet ....
[Article contains additional citation context not shown here]
Brown, M. P.; Hughey, R.; Krogh, A.; Mian, I. S.; Sjolander, K.; and Haussler, D. 1993. Using Dirichlet mixture priors to derive hidden Markov models for protein families. In Hunter, L.; Searls, D.; and Shavlik, J., editors 1993, ISMB-93, Menlo Park, CA. AAAI/MIT Press. 47--55.
No context found.
M. P. Brown, R. Hughey, A. Krogh, I. S. Mian, K. Sj¨olander, and D. Haussler. Using Dirichlet mixture priors to derive hidden Markov models for protein families. In L. Hunter, D. Searls, and J. Shavlik, editors, ISMB-93, pages 47--55, Menlo Park, CA, July 1993. AAAI/MIT Press.
No context found.
Michael Brown, Richard Hughey, Anders Krogh, I. Saira Mian, Kimmen Sjolander, and David Haussler. Using Dirichlet mixture priors to derive hidden Markov models for protein families. In Intelligent Systems for Molecular Biology, pages 47--55. AAAI Press, 1993.
No context found.
M. P. Brown, R. Hughey, A. Krogh, I. S. Mian, K. Sjolander, and D. Haussler. Using dirichlet mixtures priors to derive hidden Markov models for protein families. In International Conf. on Intelligent Systems for Molecular Biology, pages 47--55, 1993.
No context found.
M. P. Brown, R. Hughey, A. Krogh, I. S. Mian, K. Sjolander, and D. Haussler. Using dirichlet mixtures priors to derive hidden Markov models for protein families. In International Conf. on Intelligent Systems for Molecular Biology, pages 47--55, 1993.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC