Results 1 - 10
of
124
Protein homology detection by HMM-HMM comparison
- BIOINFORMATICS
, 2005
"... Motivation: Protein homology detection and sequence alignment are at the basis of protein structure prediction, function prediction, and evolution. Results: We have generalized the alignment of protein se-quences with a profile hidden Markov model (HMM) to the case of pairwise alignment of profile H ..."
Abstract
-
Cited by 401 (8 self)
- Add to MetaCart
Motivation: Protein homology detection and sequence alignment are at the basis of protein structure prediction, function prediction, and evolution. Results: We have generalized the alignment of protein se-quences with a profile hidden Markov model (HMM) to the case of pairwise alignment of profile HMMs. We present a method for detecting distant homologous relationships between proteins based on this approach. The method (HHsearch) is benchmarked together with BLAST, PSI-BLAST, HMMER, and the profile-profile comparison tools PROF_SIM and COMPASS, in an all-against-all compari-son of a database of 3691 protein domains from SCOP 1.63 with pairwise sequence identities below 20%. Sensitivity: When predicted secondary structure is included in the HMMs, HHsearch is able to detect between 2.7 and 4.2 times more homologs than PSI-BLAST or HMMER and between 1.44 and 1.9 times more than COMPASS or PROF_SIM for a rate of false positives of 10%. Approxi-mately half of the improvement over the profile–profile com-parison methods is attributable to the use of profile HMMs in place of simple profiles. Alignment quality: Higher sensitivity is mirrored by an in-creased alignment quality. HHsearch produced 1.2, 1.7, and 3.3 times more good alignments (“balanced ” score> 0.3) than the next best method (COMPASS), and 1.6, 2.9, and 9.4 times more than PSI-BLAST, at the family, super-family, and fold level. Speed: HHsearch scans a query of 200 residues against 3691 domains in 33s on an AMD64 3GHz PC. This is 10 times faster than PROF_SIM and 17 times faster than
MODBASE, a database of annotated comparative protein structure models
- Nucleic Acids Res
, 2000
"... and associated resources ..."
(Show Context)
The HHpred interactive server for protein homology detection and structure prediction
- Nucleic Acids Res
, 2005
"... doi:10.1093/nar/gki408 ..."
(Show Context)
A Machine Learning Information Retrieval Approach to Protein Fold Recognition
"... Motivation: Recognizing proteins that have similar tertiary structure is the key step of template-based protein structure prediction methods. Traditionally, a variety of alignment methods are used to identify similar folds, based on sequence similarity and sequencestructure compatibility. Although t ..."
Abstract
-
Cited by 78 (12 self)
- Add to MetaCart
Motivation: Recognizing proteins that have similar tertiary structure is the key step of template-based protein structure prediction methods. Traditionally, a variety of alignment methods are used to identify similar folds, based on sequence similarity and sequencestructure compatibility. Although these methods are complementary, their integration has not been thoroughly exploited. Statistical machine learning methods provide tools for integrating multiple features, but so far these methods have been used primarily for protein and fold classification, rather than addressing the retrieval problem of fold recognition–finding a proper template for a given query protein. Results: Here we present a two-stage machine learning, information retrieval, approach to fold recognition. First, we use alignment methods to derive pairwise similarity features for query-template protein pairs. We also use global profile-profile alignments in combination with predicted secondary structure, relative solvent accessibility, contact map, and beta-strand pairing to extract pairwise structural compatibility features. Second, we apply support vector machines to these features to predict the structural relevance (i.e. in the same fold or not) of the query-template pairs. For each query, the continuous relevance scores are used to rank the templates. The FOLDpro approach is modular, scalable, and effective. Compared to 11 other fold recognition methods, FOLDpro yields the best results in almost all standard categories on a comprehensive benchmark dataset. Using predictions of the top-ranked template, the sensitivity is about 85%, 56%, and 27 % at the family, superfamily, and fold levels respectively. Using the 5 top-ranked templates, the sensitivity increases to 90%, 70%, and 48%. Availability: The FOLDpro server is available with the SCRATCH
Alignment of protein sequences by their profiles
, 2004
"... The accuracy of an alignment between two protein sequences can be improved by including other detectably related sequences in the comparison. We optimize and benchmark such an approach that relies on aligning two multiple sequence alignments, each one including one of the two protein sequences. Thir ..."
Abstract
-
Cited by 69 (14 self)
- Add to MetaCart
The accuracy of an alignment between two protein sequences can be improved by including other detectably related sequences in the comparison. We optimize and benchmark such an approach that relies on aligning two multiple sequence alignments, each one including one of the two protein sequences. Thirteen different protocols for creating and comparing profiles corresponding to the multiple sequence alignments are implemented in the SALIGN command of MODELLER. A test set of 200 pairwise, structure-based align-ments with sequence identities below 40 % is used to benchmark the 13 protocols as well as a number of previously described sequence alignment methods, including heuristic pairwise sequence alignment by BLAST, pairwise sequence alignment by global dynamic programming with an affine gap penalty function by the ALIGN command of MODELLER, sequence-profile alignment by PSI-BLAST, Hidden Markov Model methods implemented in SAM and LOBSTER, pairwise sequence alignment relying on predicted local structure by SEA, and multiple sequence alignment by CLUSTALW and COMPASS. The alignment accuracies of the best new protocols were significantly better than those of the other tested methods. For example, the fraction of the correctly aligned residues relative to the structure-based alignment by the best protocol is 56%, which can be compared with the accuracies of 26%, 42%, 43%, 48%, 50%, 49%, 43%, and 43 % for the other methods, respectively. The new method is currently applied to large-scale comparative protein structure modeling of all known sequences.
PISCES: recent improvements to a PDB sequence culling server
- Bioinformatics
, 2003
"... server ..."
(Show Context)
Homology modeling using parametric alignment ensemble generation with consensus and energy-based model selection
, 2006
"... ..."
COACH: profile-profile alignment of protein families using hidden Markov models
- BIOINFORMATICS
, 2004
"... ..."
(Show Context)
Berkeley Phylogenomics Group web servers: resources for structural phylogenomic analysis. Nucleic Acids Res., 35(web server issue
, 2007
"... resources for structural phylogenomic analysis ..."