Results 1 - 10
of
76
A Machine Learning Information Retrieval Approach to Protein Fold Recognition
"... Motivation: Recognizing proteins that have similar tertiary structure is the key step of template-based protein structure prediction methods. Traditionally, a variety of alignment methods are used to identify similar folds, based on sequence similarity and sequencestructure compatibility. Although t ..."
Abstract
-
Cited by 78 (12 self)
- Add to MetaCart
Motivation: Recognizing proteins that have similar tertiary structure is the key step of template-based protein structure prediction methods. Traditionally, a variety of alignment methods are used to identify similar folds, based on sequence similarity and sequencestructure compatibility. Although these methods are complementary, their integration has not been thoroughly exploited. Statistical machine learning methods provide tools for integrating multiple features, but so far these methods have been used primarily for protein and fold classification, rather than addressing the retrieval problem of fold recognition–finding a proper template for a given query protein. Results: Here we present a two-stage machine learning, information retrieval, approach to fold recognition. First, we use alignment methods to derive pairwise similarity features for query-template protein pairs. We also use global profile-profile alignments in combination with predicted secondary structure, relative solvent accessibility, contact map, and beta-strand pairing to extract pairwise structural compatibility features. Second, we apply support vector machines to these features to predict the structural relevance (i.e. in the same fold or not) of the query-template pairs. For each query, the continuous relevance scores are used to rank the templates. The FOLDpro approach is modular, scalable, and effective. Compared to 11 other fold recognition methods, FOLDpro yields the best results in almost all standard categories on a comprehensive benchmark dataset. Using predictions of the top-ranked template, the sensitivity is about 85%, 56%, and 27 % at the family, superfamily, and fold levels respectively. Using the 5 top-ranked templates, the sensitivity increases to 90%, 70%, and 48%. Availability: The FOLDpro server is available with the SCRATCH
Generalized protein structure prediction based on combination of fold-recognition with de novo folding and evaluation of models
- Proteins
, 2005
"... ABSTRACT To predict the tertiary structure of full-length sequences of all targets inCASP6, regard-less of their potential category (from easy compara-tive modeling to fold recognition to apparent new folds) we used a novel combination of two very different approaches developed independently in our ..."
Abstract
-
Cited by 34 (17 self)
- Add to MetaCart
ABSTRACT To predict the tertiary structure of full-length sequences of all targets inCASP6, regard-less of their potential category (from easy compara-tive modeling to fold recognition to apparent new folds) we used a novel combination of two very different approaches developed independently in our laboratories, which ranked quite well in differ-ent categories in CASP5. First, the GeneSilico meta-serverwas used to identify domains, predict second-ary structure, and generate fold recognition (FR) alignments, whichwere converted to full-atommod-els using the “FRankenstein’s Monster ” approach for comparative modeling (CM) by recombination of protein fragments. Additional models generated “de novo ” by fully automated servers were obtained
Y: SP5: improving protein fold recognition by using torsion angle profiles and profile-based gap penalty model. PLoS One 2008
"... How to recognize the structural fold of a protein is one of the challenges in protein structure prediction. We have developed a series of single (non-consensus) methods (SPARKS, SP 2,SP 3,SP 4) that are based on weighted matching of two to four sequence and structure-based profiles. There is a robus ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
(Show Context)
How to recognize the structural fold of a protein is one of the challenges in protein structure prediction. We have developed a series of single (non-consensus) methods (SPARKS, SP 2,SP 3,SP 4) that are based on weighted matching of two to four sequence and structure-based profiles. There is a robust improvement of the accuracy and sensitivity of fold recognition as the number of matching profiles increases. Here, we introduce a new profile-profile comparison term based on real-value dihedral torsion angles. Together with updated real-value solvent accessibility profile and a new variable gap-penalty model based on fractional power of insertion/deletion profiles, the new method (SP 5) leads to a robust improvement over previous SP method. There is a 2 % absolute increase (5 % relative improvement) in alignment accuracy over SP 4 based on two independent benchmarks. Moreover, SP 5 makes 7 % absolute increase (22 % relative improvement) in success rate of recognizing correct structural folds, and 32 % relative improvement in model accuracy of models within the same fold in Lindahl benchmark. In addition, modeling accuracy of top-1 ranked models is improved by 12 % over SP 4 for the difficult targets in CASP 7 test set. These results highlight the importance of harnessing predicted structural properties in
Analysis of TASSER-based CASP7 protein structure prediction results
- PROTEINS. 69 (SUPPL 8): TASSER_WT PROTEIN STRUCTURE PREDICTION 30751. MURZIN, A. G. 2001. PROGRESS IN PROTEIN STRUCTURE
, 2007
"... ..."
Low-homology protein threading
- Bioinformatics 2010
"... Motivation: The challenge of template-based modeling lies in the recognition of correct templates and generation of accurate sequence-template alignments. Homologous information has proved to be very powerful in detecting remote homologs, as demonstrated by the state-of-the-art profile-based method ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
(Show Context)
Motivation: The challenge of template-based modeling lies in the recognition of correct templates and generation of accurate sequence-template alignments. Homologous information has proved to be very powerful in detecting remote homologs, as demonstrated by the state-of-the-art profile-based method HHpred. However, HHpred does not fare well when proteins under consideration are low-homology. A protein is low-homology if we cannot obtain sufficient amount of homologous information for it from existing protein sequence databases. Results: We present a profile-entropy dependent scoring function for low-homology protein threading. This method will model correlation among various protein features and determine their relative importance according to the amount of homologous information available. When proteins under consideration are low-homology, our method will rely more on structure information; otherwise, homologous information. Experimental results indicate that our threading method greatly outperforms the best profile-based method HHpred and all the top CASP8 servers on low-homology proteins. Tested on the CASP8 hard targets, our threading method is also better than all the top CASP8 servers but slightly worse than Zhang-Server. This is significant considering that Zhang-Server and other top CASP8 servers use a combination of multiple structure-prediction techniques including consensus method, multiple-template modeling, template-free modeling and model refinement while our method is a classical single-template-based threading method without any post-threading refinement. Contact:
QBES: predicting real values of solvent accessibility from sequences by efficient, constrained energy optimization. Proteins 2006;63:961–966
"... ABSTRACT Solvent accessibility, one of the key properties of amino acid residues in proteins, can be used to assist protein structure prediction. Various approaches such as neural network, sup-port vector machines, probability profiles, informa-tion theory, Bayesian theory, logistic function, and mu ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
ABSTRACT Solvent accessibility, one of the key properties of amino acid residues in proteins, can be used to assist protein structure prediction. Various approaches such as neural network, sup-port vector machines, probability profiles, informa-tion theory, Bayesian theory, logistic function, and multiple linear regression have been developed for solvent accessibility prediction. In this article, a much simpler quadratic programmingmethodbased on the buriability parameter set of amino acid resi-dues is developed. The new method, called QBES (Quadratic programming and Buriability Energy function for Solvent accessibility prediction), is rea-sonably accurate for predicting the real value of solvent accessibility. By using a dataset of 30 pro-teins to optimize three parameters, the average correlation coefficients between the predicted and actual solvent accessibility are about 0.5 for all four independent test sets ranging from 126 to 513 pro-teins. The method is efficient. It takes only 20 min for a regular PC to obtain results of 30 proteins with an average length of 263 amino acids. Although the proposed method is less accurate than a few more sophisticated methods based on neural network or support vector machines, this is the first attempt to predict solvent accessibility by energy optimization with constraints. Possible improvements and other applications of the method are discussed. Proteins
Poleksic A: STRUCTFAST: Protein sequence remote homology detection and alignment using novel dynamic programming and profile-profile scoring. Proteins 2006
"... ABSTRACT STRUCTFAST is a novel profile— profile alignment algorithm capable of detecting weak similarities between protein sequences. The increased sensitivity and accuracy of the STRUCT-FAST method are achieved through several unique features. First, the algorithm utilizes a novel dy-namic programm ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
ABSTRACT STRUCTFAST is a novel profile— profile alignment algorithm capable of detecting weak similarities between protein sequences. The increased sensitivity and accuracy of the STRUCT-FAST method are achieved through several unique features. First, the algorithm utilizes a novel dy-namic programming engine capable of incorporat-ing important information from a structural family directly into the alignment process. Second, the algorithm employs a rigorous analytical formula for profile–profile scoring to overcome the limitations of ad hoc scoring functions that require adjustable parameter training. Third, the algorithm employs Convergent Island Statistics (CIS) to compute the statistical significance of alignment scores indepen-dently for each pair of sequences. STRUCTFAST routinely produces alignments that meet or exceed the quality obtained by an expert human homology modeler, as evidenced by its performance in the latest CAFASP4 and CASP6 blind prediction bench-mark experiments. Proteins 2006;64:960–967. © 2006Wiley-Liss, Inc. Key words: protein structure; homology modeling; comparative modeling; alignment algo-rithms; alignment statistics
Generalized Pattern Search Algorithm for Peptide Structure Prediction
, 2008
"... AQ1Š ABSTRACT Finding the near-native structure of a protein is one of the most important open problems in structural biology and AQ2Š biological physics. The problem becomes dramatically more difficult when a given protein has no regular secondary structure or it does not show a fold similar to str ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
(Show Context)
AQ1Š ABSTRACT Finding the near-native structure of a protein is one of the most important open problems in structural biology and AQ2Š biological physics. The problem becomes dramatically more difficult when a given protein has no regular secondary structure or it does not show a fold similar to structures already known. This situation occurs frequently when we need to predict the tertiary structure of small molecules, called peptides. In this research work, we propose a new ab initio algorithm, the generalized pattern search algorithm, based on the well-known class of Search-and-Poll algorithms. Inspired by the approach proposed by other researchers, we performed an extensive set of simulations over a well-known set of 44 peptides to investigate the robustness and reliability of the proposed algorithm, and we compared the peptide conformation with a state-of-the-art algorithm for peptide structure prediction known as PEPstr. In particular, we tested the algorithm on the instances proposed by the originators of PEPstr, to validate the proposed algorithm; the experimental results confirm that the generalized pattern search algorithm outperforms AQ3Š When analyzing the complex structure of a biological system, proteins are the most attracting molecular devices. They are likely involved in all processes of a living organism; they are responsible for behavioral changes in the cells. Due to the
Incorporation of Local Structural Preference Potential Improves Fold Recognition
, 2010
"... Fold recognition, or threading, is a popular protein structure modeling approach that uses known structure templates to build structures for those of unknown. The key to the success of fold recognition methods lies in the proper integration of sequence, physiochemical and structural information. Her ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Fold recognition, or threading, is a popular protein structure modeling approach that uses known structure templates to build structures for those of unknown. The key to the success of fold recognition methods lies in the proper integration of sequence, physiochemical and structural information. Here we introduce another type of information, local structural preference potentials of 3-residue and 9-residue fragments, for fold recognition. By combining the two local structural preference potentials with the widely used sequence profile, secondary structure information and hydrophobic score, we have developed a new threading method called FR-t5 (fold recognition by use of 5 terms). In benchmark testings, we have found the consideration of local structural preference potentials in FR-t5 not only greatly enhances the alignment accuracy and recognition sensitivity, but also significantly improves the quality of prediction models.
Rychlewski L.: SURVEY AND SUMMARY, Practical lessons from protein structure prediction
- 2005) 1874–1891. Zbigniew Starosolski, Andrzej Polański
"... Despite recent efforts to develop automated protein structure determination protocols, structural genomics projects are slow in generating fold assignments for complete proteomes, and spatial structures remain unknown for many protein families. Alternative cheap and fast methods to assign folds usin ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Despite recent efforts to develop automated protein structure determination protocols, structural genomics projects are slow in generating fold assignments for complete proteomes, and spatial structures remain unknown for many protein families. Alternative cheap and fast methods to assign folds using prediction algorithms continue to provide valuable structural information for many proteins. The development of high-quality prediction methods has been boosted in the last years by objective community-wide assessment experiments. This paper gives an overview of the currently available practical approaches to protein structure prediction capable of generating accurate fold assignment. Recent advances in assessment of the prediction quality are also discussed.