• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

A: Improving prediction of protein secondary structure using structured neural networks and multiple sequence alignments (1996)

by Riis SK, Krogh
Venue:J Comp Biol
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 35
Next 10 →

A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach

by Sujun Hua, Zhirong Sun - J MOL BIOL , 2001
"... We have introduced a new method of protein secondary structure prediction which is based on the theory of support vector machine (SVM). SVM represents a new approach to supervised pattern classification which has been successfully applied to a wide range of pattern recognition problems, including ob ..."
Abstract - Cited by 97 (1 self) - Add to MetaCart
We have introduced a new method of protein secondary structure prediction which is based on the theory of support vector machine (SVM). SVM represents a new approach to supervised pattern classification which has been successfully applied to a wide range of pattern recognition problems, including object recognition, speaker identification, gene function prediction with microarray expression profile, etc. In these cases, the performance of SVM either matches or is significantly better than that of traditional machine learning approaches, including neural networks. The first use of the SVM approach to predict protein secondary structure is described here. Unlike the previous studies, we first constructed several binary classifiers, then assembled a tertiary classifier for three secondary structure states (helix, sheet and coil) based on these binary classifiers. The SVM method achieved a good performance of segment overlap accuracy SOV = 76.2 % through sevenfold cross validation on a database of 513 non-homologous protein chains with multiple sequence alignments, which out-performs existing methods. Meanwhile three-state overall per-residue accuracy Q 3 achieved 73.5 %, which is at least comparable to existing single prediction methods. Furthermore a useful "reliability index" for the predictions was developed. In addition, SVM has many attractive features, including effective avoidance of overfitting, the ability to handle large feature spaces, information condensing of the given data set, etc. The SVM method is conveniently applied to many other pattern classification tasks in biology.

Exploiting the Past and the Future in Protein Secondary Structure Prediction

by Pierre Baldi, Søren Brunak, Paolo Frasconi, Giovanni Soda, Gianluca Pollastri , 1999
"... Motivation: Predicting the secondary structure of a protein (alpha-helix, beta-sheet, coil) is an important step towards elucidating its three dimensional structure, as well as its function. Presently, the best predictors are based on machine learning approaches, in particular neural network archite ..."
Abstract - Cited by 91 (19 self) - Add to MetaCart
Motivation: Predicting the secondary structure of a protein (alpha-helix, beta-sheet, coil) is an important step towards elucidating its three dimensional structure, as well as its function. Presently, the best predictors are based on machine learning approaches, in particular neural network architectures with a fixed, and relatively short, input window of amino acids, centered at the prediction site. Although a fixed small window avoids overfitting problems, it does not permit to capture variable long-ranged information. Results: We introduce a family of novel architectures which can learn to make predictions based on variable ranges of dependencies. These architectures extend recurrent neural networks, introducing non-causal bidirectional dynamics to capture both upstream and downstream information. The prediction algorithm is completed by the use of mixtures of estimators that leverage evolutionary information, expressed in terms of multiple alignments, both at the input and output levels. While our system currently achieves an overall performance close to 76% correct prediction---at least comparable to the best existing systems---the main emphasis here is on the development of new algorithmic ideas. Availability: The executable program for predicting protein secondary structure is available from the authors free of charge. Contact: pfbaldi@ics.uci.edu, gpollast@ics.uci.edu, brunak@cbs.dtu.dk, paolo@dsi.unifi.it. 1

Improving the Prediction of Protein Secondary Structure in Three and Eight Classes Using Recurrent Neural Networks and Profiles

by Gianluca Pollastri, Darisz Przybylski, Burkhard Rost, Pierre Baldi , 2001
"... Secondarystructurepredictions areincreasinglybecomingtheworkhorseforseveralmethodsaimingatpredictingproteinstructure andfunction.Hereweuseensemblesofbidirectionalrecurrentneuralnetworkarchitectures, PSIBLAST -derivedprofiles,andalargenonredundant trainingsettoderivetwonewpredictors:(a)the secondvers ..."
Abstract - Cited by 87 (21 self) - Add to MetaCart
Secondarystructurepredictions areincreasinglybecomingtheworkhorseforseveralmethodsaimingatpredictingproteinstructure andfunction.Hereweuseensemblesofbidirectionalrecurrentneuralnetworkarchitectures, PSIBLAST -derivedprofiles,andalargenonredundant trainingsettoderivetwonewpredictors:(a)the secondversionoftheSSproprogramforsecondary structureclassificationintothreecategoriesand(b) thefirstversionoftheSSpro8programforsecondarystructureclassificationintotheeightclasses producedbytheDSSPprogram.Wedescribethe resultsofthreedifferenttestsetsonwhichSSpro achievedasustainedperformanceofabout78% correctprediction.Wereportconfusionmatrices, comparePSI-BLASTtoBLAST-derivedprofiles,and assessthecorrespondingperformanceimprovements. SSproandSSpro8areimplementedasweb servers,availabletogetherwithotherstructural featurepredictorsat:http://promoter.ics.uci.edu/ BRNN-PRED/.Proteins2002;47:228--235.

Cascaded Multiple Classifiers for Secondary Structure Prediction

by Mohammed Ouali, Ross D. King - Protein Science , 2000
"... We describe a new classifier for protein secondary structure prediction which is formed by cascading together different types of classifiers using neural networks and linear discrimination. The new classifier achieves an accuracy of 76.7% (assessed by a rigorous full Jack-knife procedure) on a new n ..."
Abstract - Cited by 39 (4 self) - Add to MetaCart
We describe a new classifier for protein secondary structure prediction which is formed by cascading together different types of classifiers using neural networks and linear discrimination. The new classifier achieves an accuracy of 76.7% (assessed by a rigorous full Jack-knife procedure) on a new non-redundant dataset of 496 non-homologous sequences (obtained from G.J. Barton and J.A. Cuff). This database was especially designed to train and test protein secondary structure prediction methods, and it uses a more stringent definition of homologous sequence than in previous studies. We show that it is possible to design classifiers which can highly discriminate the 3 classes (H, E, C) with an accuracy of up to 78% for b-strands, using only a local window and resampling techniques. This indicates that the importance of long range interactions for the prediction of b-strands has been previously overestimated. 2 Introduction Although the protein folding process may require catalysts suc...

Bayesian Segmentation of Protein Secondary Structure

by Scott C. Schmidler, Jun S. Liu, Douglas L. Brutlag - JOURNAL OF COMPUTATIONAL BIOLOGY , 2000
"... We present a novel method for predicting the secondary structure of a protein from its amino acid sequence. Most existing methods predict each position in turn based on a local window of residues, sliding this window along the length of the sequence. In contrast, we develop a probabilistic model of ..."
Abstract - Cited by 32 (6 self) - Add to MetaCart
We present a novel method for predicting the secondary structure of a protein from its amino acid sequence. Most existing methods predict each position in turn based on a local window of residues, sliding this window along the length of the sequence. In contrast, we develop a probabilistic model of protein sequence/structure relationships in terms of structural segments, and formulate secondary structure prediction as a general Bayesian inference problem. A distinctive feature of our approach is the ability to develop explicit probabilistic models for -helices, -strands, and other classes of secondary structure, incorporating experimentally and empirically observed aspects of protein structure such as helical capping signals, side chain correlations, and segment length distributions. Our model is Markovian in the segments, permitting ef# cient exact calculation of the posterior probability distribution over all possible segmentations of the sequence using dynamic programming. The optimal segmentation is computed and compared to a predictor based on marginal posterior modes, and the latter is shown to provide signi# cant improvement in predictive accuracy. The marginalization procedure provides exact secondary structure probabilities at each sequence position, which are shown to be reliable estimates of prediction uncertainty. We apply this model to a database of 452 nonhomologous structures, achieving accuracies as high as the best currently available methods. We conclude by discussing an extension of this framework to model nonlocal interactions in protein structures, providing a possible direction for future improvements in secondary structure prediction accuracy.

Combining Discriminant Models with new Multi-Class SVMs

by Yann Guermeur , 2000
"... The idea of combining models instead of simply selecting the best one, in order to improve performance, is well known in statistics and has a long theoretical background. However, making full use of theoretical results is ordinarily subject to the satisfaction of strong hypotheses (weak correlati ..."
Abstract - Cited by 27 (6 self) - Add to MetaCart
The idea of combining models instead of simply selecting the best one, in order to improve performance, is well known in statistics and has a long theoretical background. However, making full use of theoretical results is ordinarily subject to the satisfaction of strong hypotheses (weak correlation among the errors, availability of large training sets, possibility to rerun the training procedure an arbitrary number of times, etc.). In contrast, the practitioner who has to make a decision is frequently faced with the dicult problem of combining a given set of pretrained classiers, with highly correlated errors, using only a small training sample. Overtting is then the main risk, which cannot be overcome but with a strict complexity control of the combiner selected. This suggests that SVMs, which implement the SRM inductive principle, should be well suited for these dicult situations. Investigating this idea, we introduce a new family of multi-class SVMs and assess them as ensemble methods on a real-world problem. This task, protein secondary structure prediction, is an open problem in biocomputing for which model combination appears to be an issue of central importance. Experimental evidence highlights the gain in quality resulting from combining some of the most widely used prediction methods with our SVMs rather than with the ensemble methods traditionally used in the eld. The gain is increased when the outputs of the combiners are post-processed with a simple DP algorithm.

Combining protein secondary structure prediction models with ensemble methods of optimal complexity

by Yann Guermeur , Gianluca Pollastri , Andre Elisseeff, Dominique Zelus ,Helene Paugam-Moisy , Pierre Baldi , 2004
"... ..."
Abstract - Cited by 22 (6 self) - Add to MetaCart
Abstract not found

Bidirectional dynamics for protein secondary structure prediction

by Pierre Baldi, Gianluca Pollastri - Sequence Learning: Paradigms, Algorithms, and Applications , 2000
"... For certain categories of sequences, information from both the past and the future can be used for analysis and predictions at time t. This is the case for biological sequences where the nature and function of a region in a sequence may strongly depend on events located both upstream and downstream. ..."
Abstract - Cited by 18 (5 self) - Add to MetaCart
For certain categories of sequences, information from both the past and the future can be used for analysis and predictions at time t. This is the case for biological sequences where the nature and function of a region in a sequence may strongly depend on events located both upstream and downstream. We develop a new family of adaptive graphical model architectures for learning non-causal sequence translations. These architectures employ two chains of hidden variables that propagate information from the past and from the future, respectively. This general idea can be instantiated either as a stochastic model (generalizing input output hidden Markov models), or as a neural network (generalizing recurrent neural networks). We illustrate the methodology by applying bidirectional models to the problem of protein secondary structure prediction. 1

Improved Protein Secondary Structure Prediction Using Support Vector Machine With a New Encoding Scheme and an Advanced Tertiary Classifier

by Hae-jin Hu, Yi Pan, Senior Member, Robert Harrison, Phang C. Tai - IEEE Transactions on Nanobioscience , 2004
"... Abstract—Prediction of protein secondary structures is an important problem in bioinformatics and has many applications. The recent trend of secondary structure prediction studies is mostly based on the neural network or the support vector machine (SVM). The SVM method is a comparatively new learnin ..."
Abstract - Cited by 8 (3 self) - Add to MetaCart
Abstract—Prediction of protein secondary structures is an important problem in bioinformatics and has many applications. The recent trend of secondary structure prediction studies is mostly based on the neural network or the support vector machine (SVM). The SVM method is a comparatively new learning system which has mostly been used in pattern recognition problems. In this study, SVM is used as a machine learning tool for the prediction of secondary structure and several encoding schemes, including orthogonal matrix, hydrophobicity matrix, BLOSUM62 substitution matrix, and combined matrix of these, are applied and optimized to improve the prediction accuracy. Also, the optimal window length for six SVM binary classifiers is established by testing different window sizes and our new encoding scheme is tested based on this optimal window size via sevenfold cross validation tests. The results show 2 % increase in the accuracy of the binary classifiers when compared with the instances in which the classical orthogonal matrix is used. Finally, to combine the results of the six SVM binary classifiers, a new tertiary classifier which combines the results of one-versus-one binary classifiers is introduced and the performance is compared with those of existing tertiary classifiers. According to the results, the Q prediction accuracy of new tertiary classifier reaches 78.8 % and this is better than the best result reported in the literature. Index Terms—Binary classifier, BLOSUM62, encoding scheme, orthorgonal matrix, Position Specific Scoring Matrix (PSSM), support

Predicting Protein Secondary Structure with Probabilistic Schemata of Evolutionarily-derived Information

by Michael J. Thompson, Richard A. Goldstein - Protein Science , 1997
"... We demonstrate the applicability of our previously developed Bayesian probabilistic approach for predicting residue solvent accessibility to the problem of predicting secondary structure. Using only single sequence data, this method achieves a 3-state accuracy of 67% over a database of 473 non-homol ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
We demonstrate the applicability of our previously developed Bayesian probabilistic approach for predicting residue solvent accessibility to the problem of predicting secondary structure. Using only single sequence data, this method achieves a 3-state accuracy of 67% over a database of 473 non-homologous proteins. This approach is more amenable to inspection and less likely to overlearn specifics of a dataset than "black box" methods such as neural networks. It is also conceptually simpler and less computationally costly. We also introduce a novel method for representing and incorporating multiple sequence alignment information within the prediction algorithm, achieving 72% accuracy over a dataset of 304 non-homologous proteins. This is accomplished by creating a statistical model of the evolutionarily-derived correlations between patterns of amino acid substitution and local protein structure. This model consists of parameter vectors, termed "substitution schemata", which probabilisti...
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University