Results 1 - 10
of
102
A Machine Learning Information Retrieval Approach to Protein Fold Recognition
"... Motivation: Recognizing proteins that have similar tertiary structure is the key step of template-based protein structure prediction methods. Traditionally, a variety of alignment methods are used to identify similar folds, based on sequence similarity and sequencestructure compatibility. Although t ..."
Abstract
-
Cited by 78 (12 self)
- Add to MetaCart
Motivation: Recognizing proteins that have similar tertiary structure is the key step of template-based protein structure prediction methods. Traditionally, a variety of alignment methods are used to identify similar folds, based on sequence similarity and sequencestructure compatibility. Although these methods are complementary, their integration has not been thoroughly exploited. Statistical machine learning methods provide tools for integrating multiple features, but so far these methods have been used primarily for protein and fold classification, rather than addressing the retrieval problem of fold recognition–finding a proper template for a given query protein. Results: Here we present a two-stage machine learning, information retrieval, approach to fold recognition. First, we use alignment methods to derive pairwise similarity features for query-template protein pairs. We also use global profile-profile alignments in combination with predicted secondary structure, relative solvent accessibility, contact map, and beta-strand pairing to extract pairwise structural compatibility features. Second, we apply support vector machines to these features to predict the structural relevance (i.e. in the same fold or not) of the query-template pairs. For each query, the continuous relevance scores are used to rank the templates. The FOLDpro approach is modular, scalable, and effective. Compared to 11 other fold recognition methods, FOLDpro yields the best results in almost all standard categories on a comprehensive benchmark dataset. Using predictions of the top-ranked template, the sensitivity is about 85%, 56%, and 27 % at the family, superfamily, and fold levels respectively. Using the 5 top-ranked templates, the sensitivity increases to 90%, 70%, and 48%. Availability: The FOLDpro server is available with the SCRATCH
SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids Res
, 2014
"... Protein structure homology modelling has become a routine technique to generate 3D models for pro-teins when experimental structures are not available. Fully automated servers such as SWISS-MODEL with user-friendly web interfaces generate reliable mod-els without the need for complex software pack-a ..."
Abstract
-
Cited by 54 (1 self)
- Add to MetaCart
(Show Context)
Protein structure homology modelling has become a routine technique to generate 3D models for pro-teins when experimental structures are not available. Fully automated servers such as SWISS-MODEL with user-friendly web interfaces generate reliable mod-els without the need for complex software pack-ages or downloading large databases. Here, we de-scribe the latest version of the SWISS-MODEL ex-pert system for protein structure modelling. The SWISS-MODEL template library provides annotation of quaternary structure and essential ligands and co-factors to allow for building of complete structural models, including their oligomeric structure. The im-proved SWISS-MODEL pipeline makes extensive use of model quality estimation for selection of the most suitable templates and provides estimates of the ex-pected accuracy of the resulting models. The accu-racy of the models generated by SWISS-MODEL is continuously evaluated by the CAMEO system. The new web site allows users to interactively search for templates, cluster them by sequence similarity, structurally compare alternative templates and se-lect the ones to be used for model building. In cases where multiple alternative template structures are available for a protein of interest, a user-guided tem-plate selection step allows building models in differ-ent functional states. SWISS-MODEL is available at
Large-scale prediction of disulphide bridges using kernel methods, two-dimensional recursive neural networks, and weighted graph matching. Proteins,
, 2006
"... ABSTRACT The formation of disulphide bridges between cysteines plays an important role in protein folding, structure, function, and evolution. Here, we develop new methods for predicting disulphide bridges in proteins. We first build a large curated data set of proteins containing disulphide bridge ..."
Abstract
-
Cited by 40 (5 self)
- Add to MetaCart
ABSTRACT The formation of disulphide bridges between cysteines plays an important role in protein folding, structure, function, and evolution. Here, we develop new methods for predicting disulphide bridges in proteins. We first build a large curated data set of proteins containing disulphide bridges to extract relevant statistics. We then use kernel methods to predict whether a given protein chain contains intrachain disulphide bridges or not, and recursive neural networks to predict the bonding probabilities of each pair of cysteines in the chain. These probabilities in turn lead to an accurate estimation of the total number of disulphide bridges and to a weighted graph matching problem that can be addressed efficiently to infer the global disulphide bridge connectivity pattern. This approach can be applied both in situations where the bonded state of each cysteine is known, or in ab initio mode where the state is unknown. Furthermore, it can easily cope with chains containing an arbitrary number of disulphide bridges, overcoming one of the major limitations of previous approaches. It can classify individual cysteine residues as bonded or nonbonded with 87% specificity and 89% sensitivity. The estimate for the total number of bridges in each chain is correct 71% of the times, and within one from the true value over 94% of the times. The prediction of the overall disulphide connectivity pattern is exact in about 51% of the chains. In addition to using profiles in the input to leverage evolutionary information, including true (but not predicted) secondary structure and solvent accessibility information yields small but noticeable improvements. Finally, once the system is trained, predictions can be computed rapidly on a proteomic or protein-engineering scale. The disulphide bridge prediction server (DIpro), software, and datasets are available through www.igb.uci.edu/servers/pass.html. Proteins 2006;62:617-629.
DOMpro: Protein domain prediction using profiles, secondary structure, relative solvent accessibility, and recursive neural networks
- Data Mining Knowledge Discovery
, 2006
"... Abstract. Protein domains are the structural and functional units of proteins. The ability to parse protein chains into different domains is important for protein classification and for understanding protein structure, function, and evolution. Here we use machine learning algorithms, in the form of ..."
Abstract
-
Cited by 29 (7 self)
- Add to MetaCart
(Show Context)
Abstract. Protein domains are the structural and functional units of proteins. The ability to parse protein chains into different domains is important for protein classification and for understanding protein structure, function, and evolution. Here we use machine learning algorithms, in the form of recursive neural networks, to develop a protein domain predictor called DOMpro. DOMpro predicts protein domains using a combination of evolutionary information in the form of profiles, predicted secondary structure, and predicted relative solvent accessibility. DOMpro is trained and tested on a curated dataset derived from the CATH database. DOMpro correctly predicts the number of domains for 69 % of the combined dataset of single and multi-domain chains. DOMpro achieves a sensitivity of 76 % and specificity of 85 % with respect to the single-domain proteins and sensitivity of 59 % and specificity of 38 % with respect to the two-domain proteins. DOMpro also achieved a sensitivity and specificity of 71 % and 71 % respectively in the Critical Assessment of Fully
NNcon: Improved Protein Contact Map Prediction Using 2D-Recursive Neural Networks
- Nucleic Acids Research
, 2009
"... doi:10.1093/nar/gkp305 ..."
DOMAC: an accurate, hybrid protein domain prediction server
, 2007
"... Protein domain prediction is important for protein structure prediction, structure determination, function annotation, mutagenesis analysis and protein engineering. Here we describe an accurate protein domain prediction server (DOMAC) combining both template-based and ab initio methods. The prelimin ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
(Show Context)
Protein domain prediction is important for protein structure prediction, structure determination, function annotation, mutagenesis analysis and protein engineering. Here we describe an accurate protein domain prediction server (DOMAC) combining both template-based and ab initio methods. The preliminary version of the server was ranked among the top domain prediction servers in the seventh edition of Critical
DISULFIND: a disulfide bonding state and cysteine connectivity prediction server
- Nucleic Acids Res
, 2006
"... DISULFIND is a server for predicting the disulfide bonding state of cysteines and their disulfide connectivity starting from sequence alone. Optionally, disulfide connectivity can be predicted from sequence and a bonding state assignment given as input. The output is a simple visualization of the as ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
(Show Context)
DISULFIND is a server for predicting the disulfide bonding state of cysteines and their disulfide connectivity starting from sequence alone. Optionally, disulfide connectivity can be predicted from sequence and a bonding state assignment given as input. The output is a simple visualization of the assigned bonding state (with confidence degrees) and the most likely connectivity patterns. The server is available at
PPT-DB: the protein property prediction and testing database. Nucleic Acids Res., 36(Database issue
, 2008
"... database ..."
(Show Context)
P (2012) Deep architectures for protein contact map prediction
- Bioinformatics
"... Motivation: Residue-residue contact prediction is important for protein structure prediction and other applications. However, the accuracy of current contact predictors often barely exceeds 20 % on long-range contacts, falling short of the level required for ab-initio structure prediction. Results: ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
(Show Context)
Motivation: Residue-residue contact prediction is important for protein structure prediction and other applications. However, the accuracy of current contact predictors often barely exceeds 20 % on long-range contacts, falling short of the level required for ab-initio structure prediction. Results: Here we develop a novel machine learning approach for contact map prediction using three steps of increasing resolution. First, we use two dimensional recursive neural networks to predict coarse contacts and orientations between secondary structure elements. Second, we use an energy based method to align secondary structure elements and predict contact probabilities between residues in contacting alpha-helices or strands. Third, we use a deep neural network architecture to organize and progressively refine the prediction of contacts, integrating information over both space and time. We train the architecture on a large set of non-redundant proteins and test it on a large set of non-homologous domains, as well as on the set of protein domains used for contact prediction in the two most recent CASP8 and CASP9 experiments. For long-range contacts, the accuracy of the new CMAPpro predictor is close to 30%, a significant increase over existing approaches.
High-accuracy prediction of bacterial type III secreted effectors based on position-specific amino acid composition profiles. Bioinformatics 27:777–784
, 2011
"... Motivation: Bacterial type III secreted (T3S) effectors are delivered into host cells specifically via type III secretion systems (T3SSs), which play important roles in the interaction between bacteria and their hosts. Previous computational methods for T3S protein prediction have only achieved limi ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Motivation: Bacterial type III secreted (T3S) effectors are delivered into host cells specifically via type III secretion systems (T3SSs), which play important roles in the interaction between bacteria and their hosts. Previous computational methods for T3S protein prediction have only achieved limited accuracy, and distinct features for effective T3S protein prediction remain to be identified. Results: In this work, a distinctive N-terminal position-specific amino acid composition (Aac) feature was identified for T3S proteins. A large portion (∼50%) of T3S proteins exhibit distinct position-specific Aac features that can tolerate position shift. A classifier, BPBAac, was developed and trained using Support Vector Machine (SVM) based on the Aac feature extracted using a Bi-profile Bayes model. We demonstrated that the BPBAac model outperformed other implementations in classification of T3S and non-T3S proteins, giving