• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins 47:228–235 (2002)

by G Pollastri, D Przybylski, B Rost, P Baldi
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 216
Next 10 →

Protein structure prediction on the Web: a case study using the Phyre server,

by Lawrence A Kelley , Michael Je Sternberg - Nat. Protoc. , 2009
"... Abstract Determining the structure and function of a novel protein sequence is a cornerstone of many aspects of modern biology. Over the last three decades a number of state-of-the-art computational tools for structure prediction have been developed. It is critical that the broader biological commu ..."
Abstract - Cited by 247 (10 self) - Add to MetaCart
Abstract Determining the structure and function of a novel protein sequence is a cornerstone of many aspects of modern biology. Over the last three decades a number of state-of-the-art computational tools for structure prediction have been developed. It is critical that the broader biological community are aware of such tools and, more importantly, are capable of using them and interpreting their results in an informed way. This protocol provides a guide to interpreting the output of structure prediction servers in general and details one such tool in particular, the Phyre server. Phyre is widely used by the biological community with over 150 submissions per day and provides a simple interface to what can often seem an overwhelming wealth of data.

ELM server: a new resource for investigating short functional sites in modular eukaryotic proteins. Nucleic Acids Res

by Rune Linding, Christine Gemünd, Sophie Chabanis-davidson, Morten Mattingsdal, Scott Cameron, David M. A. Martin, Gabriele Ausiello, Barbara Brannetti, Anna Costantini, Fabrizio Ferrè, Vincenza Maselli, Allegra Via, Gianni Cesareni, Francesca Diella, Giulio Superti-furga, Lucjan Wyrwicz, Chenna Ramu, Caroline Mcguigan, Rambabu Gudavalli, Ivica Letunic, Peer Bork, Leszek Rychlewski, Bernhard Küster, Manuela Helmer-citterich, William N. Hunter, Rein Aasl, Toby J. Gibson , 2003
"... Multidomain proteins predominate in eukaryotic proteomes. Individual functions assigned to different sequence segments combine to create a complex function for the whole protein. While on-line resources are available for revealing globular domains in sequences, there has hitherto been no comprehensi ..."
Abstract - Cited by 133 (6 self) - Add to MetaCart
Multidomain proteins predominate in eukaryotic proteomes. Individual functions assigned to different sequence segments combine to create a complex function for the whole protein. While on-line resources are available for revealing globular domains in sequences, there has hitherto been no comprehensive collection of small functional sites/ motifs comparable to the globular domain resources, yet these are as important for the function of multidomain proteins. Short linear peptide motifs are used for cell compartment targeting, protein–protein interaction, regulation by phosphorylation, acetylation, glycosylation and a host of other post-translational modifications. ELM, the Eukaryotic Linear Motif server at
(Show Context)

Citation Context

...creen. In many cases users will be able to investigate surface accessibility by examination of an available three-dimensional structure or by using a good quality two-dimensional structure prediction =-=(23,24)-=- or perhaps by using a homology modelling server such as SWISSMODEL or the Meta server (25,26). We are working to provide better domain filtering in the future, for example, by using surface accessibi...

Scratch: a protein structure and structural feature prediction server

by J. Cheng, M. J. Sweredoski, P. Baldi - Nucleic Acids Res , 2005
"... server ..."
Abstract - Cited by 102 (17 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...ions to predict protein structural features and tertiary structures. See Table 1 for a summary of the specific methods used by each predictor. The suite includes the following main modules: (i) SSpro =-=(2)-=-: three class secondary structure. (ii) SSpro8 (2): eight class secondary structure. (iii) ACCpro (3): relative solvent accessibility. (iv) CONpro (3): contacts with other residues compared to average...

A Machine Learning Information Retrieval Approach to Protein Fold Recognition

by Jianlin Cheng, Pierre Baldi
"... Motivation: Recognizing proteins that have similar tertiary structure is the key step of template-based protein structure prediction methods. Traditionally, a variety of alignment methods are used to identify similar folds, based on sequence similarity and sequencestructure compatibility. Although t ..."
Abstract - Cited by 78 (12 self) - Add to MetaCart
Motivation: Recognizing proteins that have similar tertiary structure is the key step of template-based protein structure prediction methods. Traditionally, a variety of alignment methods are used to identify similar folds, based on sequence similarity and sequencestructure compatibility. Although these methods are complementary, their integration has not been thoroughly exploited. Statistical machine learning methods provide tools for integrating multiple features, but so far these methods have been used primarily for protein and fold classification, rather than addressing the retrieval problem of fold recognition–finding a proper template for a given query protein. Results: Here we present a two-stage machine learning, information retrieval, approach to fold recognition. First, we use alignment methods to derive pairwise similarity features for query-template protein pairs. We also use global profile-profile alignments in combination with predicted secondary structure, relative solvent accessibility, contact map, and beta-strand pairing to extract pairwise structural compatibility features. Second, we apply support vector machines to these features to predict the structural relevance (i.e. in the same fold or not) of the query-template pairs. For each query, the continuous relevance scores are used to rank the templates. The FOLDpro approach is modular, scalable, and effective. Compared to 11 other fold recognition methods, FOLDpro yields the best results in almost all standard categories on a comprehensive benchmark dataset. Using predictions of the top-ranked template, the sensitivity is about 85%, 56%, and 27 % at the family, superfamily, and fold levels respectively. Using the 5 top-ranked templates, the sensitivity increases to 90%, 70%, and 48%. Availability: The FOLDpro server is available with the SCRATCH

Prediction of contact maps by GIOHMMs and recurrent neural networks using lateral propagation from all four cardinal corners

by G. Pollastri, P. Baldi , 2002
"... Motivation: Accurate prediction of protein contact maps is an important step in computational structural proteomics. Because contact maps provide a translation and rotation invariant topological representation of a protein, they can be used as a fundamental intermediary step in protein structure pre ..."
Abstract - Cited by 75 (16 self) - Add to MetaCart
Motivation: Accurate prediction of protein contact maps is an important step in computational structural proteomics. Because contact maps provide a translation and rotation invariant topological representation of a protein, they can be used as a fundamental intermediary step in protein structure prediction.

Prediction of protein stability changes for single-site mutations,

by Jianlin Cheng , Arlo Randall , Pierre Baldi - Proteins , 2006
"... Abstract Accurate prediction of protein stability changes resulting from single amino acid mutations is important for understanding protein structures and designing new proteins. We use support vector machines to predict protein stability changes for single amino acid mutations leveraging both sequ ..."
Abstract - Cited by 74 (2 self) - Add to MetaCart
Abstract Accurate prediction of protein stability changes resulting from single amino acid mutations is important for understanding protein structures and designing new proteins. We use support vector machines to predict protein stability changes for single amino acid mutations leveraging both sequence and structural information. We evaluate our approach using cross-validation methods on a large dataset of single amino acid mutations. When only the sign of the stability changes is considered, the predictive method achieves 84% accuracy-a significant improvement over previously published results. Moreover, the experimental results show that the prediction accuracy obtained using sequence alone is close to the accuracy obtained using tertiary structure information. Because our method can accurately predict protein stability changes using primary sequence information only, it is applicable to many situations where the tertiary structure is unknown, overcoming a major limitation of previous methods which require tertiary information. The web server for predictions of protein stability changes upon mutations (MUpro), software, and datasets are available at www.igb.uci.edu/servers/servers.html.
(Show Context)

Citation Context

...ability changes can be used to infer the directions of mutations by taking the sign of ? ? G. There are a variety of ways in which sequence information can be used for protein stability prediction. Previous methods use residue composition [35] or local interactions derived from a sequence. Our method directly leverages sequence information by using it as an input to the SVM. We use a local window centered around the mutated residue as input. This approach has been applied successfully to the prediction of other protein structural features, such as secondary structure and solvent accessibility [40, 41, 42, 43]. The direct use of sequence information as inputs can help machine learning methods extract the sequence motifs which are shown to be important for protein stability [29]. We take advantage of the large amount of experimental mutation data deposited in the ProTherm [44] database to train and test our method. On the same dataset compiled in [8] , our method yields a significant improvement over previous energy-based and neural network-based methods using 20-fold cross-validation. An important methodological caveat results from the dataset containing a significant number of identical mutations ...

Generating Text with Recurrent Neural Networks

by Ilya Sutskever, James Martens, Geoffrey Hinton
"... Recurrent Neural Networks (RNNs) are very powerful sequence models that do not enjoy widespread use because it is extremely difficult to train them properly. Fortunately, recent advances in Hessian-free optimization have been able to overcome the difficulties associated with training RNNs, making it ..."
Abstract - Cited by 73 (3 self) - Add to MetaCart
Recurrent Neural Networks (RNNs) are very powerful sequence models that do not enjoy widespread use because it is extremely difficult to train them properly. Fortunately, recent advances in Hessian-free optimization have been able to overcome the difficulties associated with training RNNs, making it possible to apply them successfully to challenging sequence problems. In this paper we demonstrate the power of RNNs trained with the new Hessian-Free optimizer (HF) by applying them to character-level language modeling tasks. The standard RNN architecture, while effective, is not ideally suited for such tasks, so we introduce a new RNN variant that uses multiplicative (or “gated”) connections which allow the current input character to determine the transition matrix from one hidden state vector to the next. After training the multiplicative RNN with the HF optimizer for five days on 8 high-end Graphics Processing Units, we were able to surpass the performance of the best previous single method for characterlevel language modeling – a hierarchical nonparametric sequence model. To our knowledge this represents the largest recurrent neural network application to date. 1.

The principled design of large-scale recursive neural network architectures–dag-rnns and the protein structure prediction problem

by Pierre Baldi, Gianluca Pollastri, I. Jordan , 2003
"... We describe a general methodology for the design of large-scale recursive neural network architectures (DAG-RNNs) which comprises three fundamental steps: (1) representation of a given domain using suitable directed acyclic graphs (DAGs) to connect visible and hidden node variables; (2) parameteriza ..."
Abstract - Cited by 62 (20 self) - Add to MetaCart
We describe a general methodology for the design of large-scale recursive neural network architectures (DAG-RNNs) which comprises three fundamental steps: (1) representation of a given domain using suitable directed acyclic graphs (DAGs) to connect visible and hidden node variables; (2) parameterization of the relationship between each variable and its parent variables by feedforward neural networks; and (3) application of weight-sharing within appropriate subsets of DAG connections to capture stationarity and control model complexity. Here we use these principles to derive several specific classes of DAG-RNN architectures based on lattices, trees, and other structured graphs. These architectures can process a wide range of data structures with variable sizes and dimensions. While the overall resulting models remain probabilistic, the internal deterministic dynamics allows efficient propagation of information, as well as training by gradient descent, in order to tackle large-scale problems. These methods are used here to derive state-of-the-art predictors for protein structural features such as secondary structure (1D) and both fine- and coarse-grained contact maps (2D). Extensions, relationships to graphical models, and implications for the design of neural architectures are briefly discussed. The protein prediction servers are available over the

Graph Kernels for Chemical Informatics

by Liva Ralaivola , Sanjay J. Swamidass , Hiroto Saigo , Pierre Baldi , 2005
"... Increased availability of large repositories of chemical compounds is creating new challenges and opportunities for the application of machine learning methods to problems in computational chemistry and chemical informatics. Because chemical compounds are often represented by the graph of their cova ..."
Abstract - Cited by 59 (7 self) - Add to MetaCart
Increased availability of large repositories of chemical compounds is creating new challenges and opportunities for the application of machine learning methods to problems in computational chemistry and chemical informatics. Because chemical compounds are often represented by the graph of their covalent bonds, machine learning methods in this domain must be capable of processing graphical structures with variable size. Here we first briefly review the literature on graph kernels and then introduce three new kernels (Tanimoto, MinMax, Hybrid) based on the idea of molecular fingerprints and counting labeled paths of depth up to d using depthfirst search from each possible vertex. The kernels are applied to three classification problems to predict mutagenicity, toxicity, and anti-cancer activity on three publicly available data sets. The kernels achieve performances at least comparable, and most often superior, to those previously reported in the literature reaching accuracies of 91.5 % on the Mutag dataset, 65-67 % on the PTC (Predictive Toxicology Challenge) dataset, and 72 % on the NCI (National Cancer Institute) dataset. Properties and tradeoffs of these kernels, as well as other proposed kernels that leverage 1D or 3D representations of molecules, are briefly discussed.

Prediction of Coordination Number and Relative Solvent Accessibility in Proteins

by Gianluca Pollastri, Pierre Baldi, Pietro Fariselli, Rita Casadio , 2001
"... Knowingthecoordinationnumber andrelativesolventaccessibilityofalltheresidues inaproteiniscrucialforderivingconstraintsuseful inmodelingproteinfoldingandproteinstructure andinscoringremotehomologysearches.Wedevelopensemblesofbidirectionalrecurrentneural networkarchitecturestoimprovethestateofthe arti ..."
Abstract - Cited by 53 (19 self) - Add to MetaCart
Knowingthecoordinationnumber andrelativesolventaccessibilityofalltheresidues inaproteiniscrucialforderivingconstraintsuseful inmodelingproteinfoldingandproteinstructure andinscoringremotehomologysearches.Wedevelopensemblesofbidirectionalrecurrentneural networkarchitecturestoimprovethestateofthe artinbothcontactandaccessibilityprediction, leveragingalargecorpusofcurateddatatogether withevolutionaryinformation.Theensemblesare usedtodiscriminatebetweentwodifferentstatesof residuecontactsorrelativesolventaccessibility, higherorlowerthanathresholddeterminedbythe averagevalueoftheresiduedistributionorthe accessibilitycutoff.Forcoordinationnumbers,the ensembleachievesperformancesrangingwithin 70.6--73.9%dependingontheradiusadoptedtodiscriminatecontacts (6--12).Theseperformances representgainsof16--20%overthebaselinestatisticalpredictor, alwaysassigninganaminoacidtothe largestclass,andare4--7%betterthananyprevious method.Acombinationofdifferentradiuspredictorsfurtherimprovesperformance. Foraccessibilitythresholdsintherelevant15 --30%range,the ensembleconsistentlyachievesaperformanceabove 77%,whichis10--16%abovethebaselineprediction andbetterthanotherexistingpredictors,byupto severalpercentagepoints.Forbothproblems,we quantifytheimprovementduetoevolutionaryinformationintheformofPSI -BLAST-generatedprofiles overBLASTprofiles.Thepredictionprogramsare implementedintheformoftwowebservers,CONproandACCpro, availableathttp://promoter.ics. uci.edu/BRNN-PRED/.Proteins2002;47:142--153.
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University