• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

What is a hidden Markov model? (2004)

by Sean R. Eddy
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 1,344
Next 10 →

Pfam protein families database

by Robert D. Finn, John Tate, Jaina Mistry, Penny C. Coggill, Stephen John Sammut, Hans-rudolf Hotz, Goran Ceric, Kristoffer Forslund, Sean R. Eddy, Erik L. L. Sonnhammer, Alex Bateman - Nucleic Acids Research, 2008, 36(Database issue): D281–D288
"... Pfam is a comprehensive collection of protein domains and families, represented as multiple sequence alignments and as profile hidden Markov models. The current release of Pfam (22.0) contains 9318 protein families. Pfam is now based not only on the UniProtKB sequence database, but also on NCBI GenP ..."
Abstract - Cited by 771 (13 self) - Add to MetaCart
Pfam is a comprehensive collection of protein domains and families, represented as multiple sequence alignments and as profile hidden Markov models. The current release of Pfam (22.0) contains 9318 protein families. Pfam is now based not only on the UniProtKB sequence database, but also on NCBI GenPept and on sequences from selected metage-nomics projects. Pfam is available on the web from the consortium members using a new, consistent and improved website design in the UK
(Show Context)

Citation Context

...e of curated protein families, each of which is defined by two alignments and a profile hidden Markov model (HMM). Profile HMMs are probabilistic models used for the statistical inference of homology =-=(1,2)-=- built from an aligned set of curator-defined familyrepresentative sequences. A high-quality seed alignment is essential, as it provides the basis for the positionspecific amino-acid frequencies, gap ...

Protein homology detection by HMM-HMM comparison

by Johannes Söding - BIOINFORMATICS , 2005
"... Motivation: Protein homology detection and sequence alignment are at the basis of protein structure prediction, function prediction, and evolution. Results: We have generalized the alignment of protein se-quences with a profile hidden Markov model (HMM) to the case of pairwise alignment of profile H ..."
Abstract - Cited by 401 (8 self) - Add to MetaCart
Motivation: Protein homology detection and sequence alignment are at the basis of protein structure prediction, function prediction, and evolution. Results: We have generalized the alignment of protein se-quences with a profile hidden Markov model (HMM) to the case of pairwise alignment of profile HMMs. We present a method for detecting distant homologous relationships between proteins based on this approach. The method (HHsearch) is benchmarked together with BLAST, PSI-BLAST, HMMER, and the profile-profile comparison tools PROF_SIM and COMPASS, in an all-against-all compari-son of a database of 3691 protein domains from SCOP 1.63 with pairwise sequence identities below 20%. Sensitivity: When predicted secondary structure is included in the HMMs, HHsearch is able to detect between 2.7 and 4.2 times more homologs than PSI-BLAST or HMMER and between 1.44 and 1.9 times more than COMPASS or PROF_SIM for a rate of false positives of 10%. Approxi-mately half of the improvement over the profile–profile com-parison methods is attributable to the use of profile HMMs in place of simple profiles. Alignment quality: Higher sensitivity is mirrored by an in-creased alignment quality. HHsearch produced 1.2, 1.7, and 3.3 times more good alignments (“balanced ” score> 0.3) than the next best method (COMPASS), and 1.6, 2.9, and 9.4 times more than PSI-BLAST, at the family, super-family, and fold level. Speed: HHsearch scans a query of 200 residues against 3691 domains in 33s on an AMD64 3GHz PC. This is 10 times faster than PROF_SIM and 17 times faster than
(Show Context)

Citation Context

...ivalent to position-specific gap penalties (Durbin et al., 1998). Profile HMMs perform better than sequence profiles in the detection of homologs and in the quality of alignments (Krogh et al., 1994; =-=Eddy, 1998-=-; Karplus et al., 2001), albeit at the price of a decrease in computational speed. The higher sensitivity is due to the fact that the position-specific gap penalties penalize chance hits much more tha...

A hidden Markov model for predicting transmembrane helices in protein sequences

by Erik L. L. Sonnhammer - In Proceedings of the 6th International Conference on Intelligent Systems for Molecular Biology (ISMB , 1998
"... A novel method to model and predict the location and orientation of alpha helices in membrane- spanning proteins is presented. It is based on a hidden Markov model (HMM) with an architecture that corresponds closely to the biological system. The model is cyclic with 7 types of states for helix core, ..."
Abstract - Cited by 373 (9 self) - Add to MetaCart
A novel method to model and predict the location and orientation of alpha helices in membrane- spanning proteins is presented. It is based on a hidden Markov model (HMM) with an architecture that corresponds closely to the biological system. The model is cyclic with 7 types of states for helix core, helix caps on either side, loop on the cytoplasmic side, two loops for the non-cytoplasmic side, and a globular domain state in the middle of each loop. The two loop paths on the non-cytoplasmic side are used to model short and long loops separately, which corresponds biologically to the two known different membrane insertions mechanisms. The close mapping between the biological and computational states allows us to infer which parts of the model architecture are important to capture the information that encodes the membrane topology, and to gain a better understanding of the mechanisms and constraints involved. Models were estimated both by maximum likelihood and a discriminative method, and a method for reassignment of the membrane helix boundaries were developed. In a cross validated test on single sequences, our transmembrane HMM, TMHMM, correctly predicts the entire topology for 77 % of the sequencesin a standard dataset of 83 proteins with known topology. The same accuracy was achieved on a larger dataset of 160 proteins. These results compare favourably with existing methods.
(Show Context)

Citation Context

...e helix prediction. Hidden Markov models have been used successfully in computational biology to model e.g. the statistical structure of genomes (Churchill 1992), protein families (Krogh et al. 1994; =-=Eddy 1996-=-) and gene structure (Kulp et al. 1996; Krogh 1997). The basic principle is to define a set of states, each corresponding to a region or specific site in the proteins being modelled. In the simplest c...

Sequence Comparisons Using Multiple Sequences Detect Three Times as Many Remote . . .

by Jong Park, Kevin Karplus, Christian Barrett, Richard Hughey, David Haussler, Tim Hubbard, Cyrus Chothia , 1998
"... The sequences of related proteins can diverge beyond the point where their relationship can be recognised by pairwise sequence comparisons. In attempts to overcome this limitation, methods have been developed that use as a query, not a single sequence, but sets of related sequences or a representati ..."
Abstract - Cited by 244 (16 self) - Add to MetaCart
The sequences of related proteins can diverge beyond the point where their relationship can be recognised by pairwise sequence comparisons. In attempts to overcome this limitation, methods have been developed that use as a query, not a single sequence, but sets of related sequences or a representation of the characteristics shared by related sequences. Here we describe an assessment of three of these methods: the SAM-T98 implementation of a hidden Markov model procedure; PSI-BLAST; and the intermediate sequence search (ISS) procedure. We determined the extent to which these procedures can detect evolutionary relationships between the members of the sequence database PDBD40-J. This database, derived from the structural classification of proteins (SCOP), contains the sequences of proteins of known structure whose sequence identities with each other are 40 % or less. The evolutionary relationships that exist between those that have low sequence identities were found by the examination of their structural details and, in many cases, their functional

The PredictProtein server

by Burkhard Rost, Guy Yachdav, Jinfeng Liu , 2004
"... ..."
Abstract - Cited by 228 (21 self) - Add to MetaCart
Abstract not found

Assignment of homology to genome sequences using a library of hidden markov models that represent all proteins of known structure

by Julian Gough, Kevin Karplus, Richard Hughey, Cyrus Chothia - J. Mol. Biol , 2001
"... Protein structure prediction, to discover the fold and hence information about the probable function of the sequence of a gene about which nothing is known, is possible via homology to a sequence of ..."
Abstract - Cited by 203 (25 self) - Add to MetaCart
Protein structure prediction, to discover the fold and hence information about the probable function of the sequence of a gene about which nothing is known, is possible via homology to a sequence of
(Show Context)

Citation Context

...as obtained from NRDB and NRDB90. 13 Subsequent releases of ASTRAL however now include these joined domains. (2) A small number of documented ASTRAL errors which are signi®cant are corrected by hand. =-=(3)-=- Some errors in domain de®nitions in the SCOP classi®cation were detected and corrected in the SUPERFAMILY sequence ®les. These were mostly due to typographical mistakes and have been corrected in a s...

Identification of regulatory regions which confer muscle-specific gene expression

by Wyeth W. Wasserman, James W. Fickett, Smithkline Beecham - J. Mol. Biol , 1998
"... For many newly sequenced genes, sequence analysis of the putative protein yields no clue on function. It would be bene®cial to be able to identify in the genome the regulatory regions that confer temporal and spatial expression patterns for the uncharacterized genes. Additionally, it would be advant ..."
Abstract - Cited by 181 (13 self) - Add to MetaCart
For many newly sequenced genes, sequence analysis of the putative protein yields no clue on function. It would be bene®cial to be able to identify in the genome the regulatory regions that confer temporal and spatial expression patterns for the uncharacterized genes. Additionally, it would be advantageous to identify regulatory regions within genes of known expression pattern without performing the costly and time consuming laboratory studies now required. To achieve these goals, the wealth of case studies performed over the past 15 years will have to be collected into predictive models of expression. Extensive studies of genes expressed in skeletal muscle have identi®ed speci®c transcription factors which bind to regulatory elements to control gene expression. However, potential binding sites for these factors occur with suf®cient frequency that it is rare for a gene to be found without one. Analysis of experimentally determined muscle regulatory sequences indicates that muscle expression requires multiple elements in close proximity. A model is generated with predictive capability for identifying these muscle-speci®c regulatory modules. Phylogenetic footprinting, the identi®cation of sequences conserved between distantly related species, complements the statistical predictions. Through the use of logistic regression analysis, the model promises to be easily modi®ed to take advantage of the elucidation of additional factors, cooperation rules, and spacing constraints.
(Show Context)

Citation Context

...Several areas need to be explored in order to achieve such a system. While other methods for making predictions have been successfully applied to sequence analysis, for instance hidden Markov models (=-=Eddy, 1996-=-) and neural networks (Hirst & Sternberg, 1992), we elected to use logistic regression for speci®c reasons. Logistic regression is not necessarily the only or the best means of identifying regulatory ...

Review: Protein Secondary Structure Prediction Continues to Rise

by Burkhard Rost - J. Struct. Biol , 2001
"... f prediction accuracy? We shall see. 2001 Academic Press INTRODUCTION History. Linus Pauling correctly guessed the formation of helices and strands (14, 15) (and falsely hypothesized other structures). Three years before Pauling's guess was verified by the publications of the first X-ray stru ..."
Abstract - Cited by 180 (22 self) - Add to MetaCart
f prediction accuracy? We shall see. 2001 Academic Press INTRODUCTION History. Linus Pauling correctly guessed the formation of helices and strands (14, 15) (and falsely hypothesized other structures). Three years before Pauling's guess was verified by the publications of the first X-ray structures (16, 17), one group had already ventured to predict secondary structure from sequence (18). The first-generation prediction methods following in the 1960s and 1970s were all based on single amino acid propensities (19). The second-generation methods dominating the scene until the early 1990s used propensities for segments of 3--51 adjacent residues (19). Basically any imaginable theoretical algorithm had been applied to the problem of predicting secondary structure from sequence. However, it seemed that prediction accuracy stalled at levels slightly above 60% (percentage of residues predicted correctly in one of the three states: helix, strand, and other). The reason for this limit was the
(Show Context)

Citation Context

...successfully used position-specific profiles for searching (25). However, the breakthrough for large-scale routine searches was achieved with the development of PSIBLAST (10) and hidden Markov models =-=(12, 26)-=-. In particular, the gapped, profile-based, and iterated search tool PSI-BLAST continues to revolutionize the field of protein sequence analysis through its unique combination of speed and accuracy. M...

Dirichlet Mixtures: A Method for Improving Detection of Weak but Significant Protein Sequence Homology

by Kimmen Sjölander, Kevin Karplus, Michael Brown, Richard Hughey, Anders Krogh, I. Saira Mian, David Haussler , 1996
"... This paper presents the mathematical foundations of Dirichlet mixtures, which have been used to improve database search results for homologous sequences, when a variable number of sequences from a protein family or domain are known. We present a method for condensing the information in a protein dat ..."
Abstract - Cited by 175 (24 self) - Add to MetaCart
This paper presents the mathematical foundations of Dirichlet mixtures, which have been used to improve database search results for homologous sequences, when a variable number of sequences from a protein family or domain are known. We present a method for condensing the information in a protein database into a mixture of Dirichlet densities. These mixtures are designed to be combined with observed amino acid frequencies, to form estimates of expected amino acid probabilities at each position in a profile, hidden Markov model, or other statistical model. These estimates give a statistical model greater generalization capacity, such that remotely related family members can be more reliably recognized by the model. Dirichlet mixtures have been shown to outperform substitution matrices and other methods for computing these expected amino acid distributions in database search, resulting in fewer false positives and false negatives for the families tested. This paper corrects a previously p...
(Show Context)

Citation Context

...) (Churchill, 1989; White et al., 1994; Stultz et al., 1993; Krogh et al., 1994; Hughey and Krogh, 1996; Baldi et al., 1992; Baldi and Chauvin, 1994; Asai et al., 1993; Eddy, 1995; Eddy et al., 1995; =-=Eddy, 1996-=-) . We address this problem by incorporating prior information about amino acid distributions that typically occur in columns of multiple alignments into the process of building a statistical model. W...

RSEARCH: Finding homologs of single structured RNA sequences

by Robert J. Klein, Sean R. Eddy - BMC Bioinformatics , 2003
"... Background: Many trans-acting noncoding RNA genes and cis-acting RNA regulatory elements conserve secondary structure rather than primary sequence. Most homology search tools only look at the primary sequence level, however. ..."
Abstract - Cited by 170 (3 self) - Add to MetaCart
Background: Many trans-acting noncoding RNA genes and cis-acting RNA regulatory elements conserve secondary structure rather than primary sequence. Most homology search tools only look at the primary sequence level, however.
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University