Results 1 - 10
of
216
Protein structure prediction on the Web: a case study using the Phyre server,
- Nat. Protoc.
, 2009
"... Abstract Determining the structure and function of a novel protein sequence is a cornerstone of many aspects of modern biology. Over the last three decades a number of state-of-the-art computational tools for structure prediction have been developed. It is critical that the broader biological commu ..."
Abstract
-
Cited by 247 (10 self)
- Add to MetaCart
Abstract Determining the structure and function of a novel protein sequence is a cornerstone of many aspects of modern biology. Over the last three decades a number of state-of-the-art computational tools for structure prediction have been developed. It is critical that the broader biological community are aware of such tools and, more importantly, are capable of using them and interpreting their results in an informed way. This protocol provides a guide to interpreting the output of structure prediction servers in general and details one such tool in particular, the Phyre server. Phyre is widely used by the biological community with over 150 submissions per day and provides a simple interface to what can often seem an overwhelming wealth of data.
ELM server: a new resource for investigating short functional sites in modular eukaryotic proteins. Nucleic Acids Res
, 2003
"... Multidomain proteins predominate in eukaryotic proteomes. Individual functions assigned to different sequence segments combine to create a complex function for the whole protein. While on-line resources are available for revealing globular domains in sequences, there has hitherto been no comprehensi ..."
Abstract
-
Cited by 133 (6 self)
- Add to MetaCart
(Show Context)
Multidomain proteins predominate in eukaryotic proteomes. Individual functions assigned to different sequence segments combine to create a complex function for the whole protein. While on-line resources are available for revealing globular domains in sequences, there has hitherto been no comprehensive collection of small functional sites/ motifs comparable to the globular domain resources, yet these are as important for the function of multidomain proteins. Short linear peptide motifs are used for cell compartment targeting, protein–protein interaction, regulation by phosphorylation, acetylation, glycosylation and a host of other post-translational modifications. ELM, the Eukaryotic Linear Motif server at
Scratch: a protein structure and structural feature prediction server
- Nucleic Acids Res
, 2005
"... server ..."
(Show Context)
A Machine Learning Information Retrieval Approach to Protein Fold Recognition
"... Motivation: Recognizing proteins that have similar tertiary structure is the key step of template-based protein structure prediction methods. Traditionally, a variety of alignment methods are used to identify similar folds, based on sequence similarity and sequencestructure compatibility. Although t ..."
Abstract
-
Cited by 78 (12 self)
- Add to MetaCart
Motivation: Recognizing proteins that have similar tertiary structure is the key step of template-based protein structure prediction methods. Traditionally, a variety of alignment methods are used to identify similar folds, based on sequence similarity and sequencestructure compatibility. Although these methods are complementary, their integration has not been thoroughly exploited. Statistical machine learning methods provide tools for integrating multiple features, but so far these methods have been used primarily for protein and fold classification, rather than addressing the retrieval problem of fold recognition–finding a proper template for a given query protein. Results: Here we present a two-stage machine learning, information retrieval, approach to fold recognition. First, we use alignment methods to derive pairwise similarity features for query-template protein pairs. We also use global profile-profile alignments in combination with predicted secondary structure, relative solvent accessibility, contact map, and beta-strand pairing to extract pairwise structural compatibility features. Second, we apply support vector machines to these features to predict the structural relevance (i.e. in the same fold or not) of the query-template pairs. For each query, the continuous relevance scores are used to rank the templates. The FOLDpro approach is modular, scalable, and effective. Compared to 11 other fold recognition methods, FOLDpro yields the best results in almost all standard categories on a comprehensive benchmark dataset. Using predictions of the top-ranked template, the sensitivity is about 85%, 56%, and 27 % at the family, superfamily, and fold levels respectively. Using the 5 top-ranked templates, the sensitivity increases to 90%, 70%, and 48%. Availability: The FOLDpro server is available with the SCRATCH
Prediction of contact maps by GIOHMMs and recurrent neural networks using lateral propagation from all four cardinal corners
, 2002
"... Motivation: Accurate prediction of protein contact maps is an important step in computational structural proteomics. Because contact maps provide a translation and rotation invariant topological representation of a protein, they can be used as a fundamental intermediary step in protein structure pre ..."
Abstract
-
Cited by 75 (16 self)
- Add to MetaCart
Motivation: Accurate prediction of protein contact maps is an important step in computational structural proteomics. Because contact maps provide a translation and rotation invariant topological representation of a protein, they can be used as a fundamental intermediary step in protein structure prediction.
Prediction of protein stability changes for single-site mutations,
- Proteins
, 2006
"... Abstract Accurate prediction of protein stability changes resulting from single amino acid mutations is important for understanding protein structures and designing new proteins. We use support vector machines to predict protein stability changes for single amino acid mutations leveraging both sequ ..."
Abstract
-
Cited by 74 (2 self)
- Add to MetaCart
(Show Context)
Abstract Accurate prediction of protein stability changes resulting from single amino acid mutations is important for understanding protein structures and designing new proteins. We use support vector machines to predict protein stability changes for single amino acid mutations leveraging both sequence and structural information. We evaluate our approach using cross-validation methods on a large dataset of single amino acid mutations. When only the sign of the stability changes is considered, the predictive method achieves 84% accuracy-a significant improvement over previously published results. Moreover, the experimental results show that the prediction accuracy obtained using sequence alone is close to the accuracy obtained using tertiary structure information. Because our method can accurately predict protein stability changes using primary sequence information only, it is applicable to many situations where the tertiary structure is unknown, overcoming a major limitation of previous methods which require tertiary information. The web server for predictions of protein stability changes upon mutations (MUpro), software, and datasets are available at www.igb.uci.edu/servers/servers.html.
Generating Text with Recurrent Neural Networks
"... Recurrent Neural Networks (RNNs) are very powerful sequence models that do not enjoy widespread use because it is extremely difficult to train them properly. Fortunately, recent advances in Hessian-free optimization have been able to overcome the difficulties associated with training RNNs, making it ..."
Abstract
-
Cited by 73 (3 self)
- Add to MetaCart
Recurrent Neural Networks (RNNs) are very powerful sequence models that do not enjoy widespread use because it is extremely difficult to train them properly. Fortunately, recent advances in Hessian-free optimization have been able to overcome the difficulties associated with training RNNs, making it possible to apply them successfully to challenging sequence problems. In this paper we demonstrate the power of RNNs trained with the new Hessian-Free optimizer (HF) by applying them to character-level language modeling tasks. The standard RNN architecture, while effective, is not ideally suited for such tasks, so we introduce a new RNN variant that uses multiplicative (or “gated”) connections which allow the current input character to determine the transition matrix from one hidden state vector to the next. After training the multiplicative RNN with the HF optimizer for five days on 8 high-end Graphics Processing Units, we were able to surpass the performance of the best previous single method for characterlevel language modeling – a hierarchical nonparametric sequence model. To our knowledge this represents the largest recurrent neural network application to date. 1.
The principled design of large-scale recursive neural network architectures–dag-rnns and the protein structure prediction problem
, 2003
"... We describe a general methodology for the design of large-scale recursive neural network architectures (DAG-RNNs) which comprises three fundamental steps: (1) representation of a given domain using suitable directed acyclic graphs (DAGs) to connect visible and hidden node variables; (2) parameteriza ..."
Abstract
-
Cited by 62 (20 self)
- Add to MetaCart
We describe a general methodology for the design of large-scale recursive neural network architectures (DAG-RNNs) which comprises three fundamental steps: (1) representation of a given domain using suitable directed acyclic graphs (DAGs) to connect visible and hidden node variables; (2) parameterization of the relationship between each variable and its parent variables by feedforward neural networks; and (3) application of weight-sharing within appropriate subsets of DAG connections to capture stationarity and control model complexity. Here we use these principles to derive several specific classes of DAG-RNN architectures based on lattices, trees, and other structured graphs. These architectures can process a wide range of data structures with variable sizes and dimensions. While the overall resulting models remain probabilistic, the internal deterministic dynamics allows efficient propagation of information, as well as training by gradient descent, in order to tackle large-scale problems. These methods are used here to derive state-of-the-art predictors for protein structural features such as secondary structure (1D) and both fine- and coarse-grained contact maps (2D). Extensions, relationships to graphical models, and implications for the design of neural architectures are briefly discussed. The protein prediction servers are available over the
Graph Kernels for Chemical Informatics
, 2005
"... Increased availability of large repositories of chemical compounds is creating new challenges and opportunities for the application of machine learning methods to problems in computational chemistry and chemical informatics. Because chemical compounds are often represented by the graph of their cova ..."
Abstract
-
Cited by 59 (7 self)
- Add to MetaCart
Increased availability of large repositories of chemical compounds is creating new challenges and opportunities for the application of machine learning methods to problems in computational chemistry and chemical informatics. Because chemical compounds are often represented by the graph of their covalent bonds, machine learning methods in this domain must be capable of processing graphical structures with variable size. Here we first briefly review the literature on graph kernels and then introduce three new kernels (Tanimoto, MinMax, Hybrid) based on the idea of molecular fingerprints and counting labeled paths of depth up to d using depthfirst search from each possible vertex. The kernels are applied to three classification problems to predict mutagenicity, toxicity, and anti-cancer activity on three publicly available data sets. The kernels achieve performances at least comparable, and most often superior, to those previously reported in the literature reaching accuracies of 91.5 % on the Mutag dataset, 65-67 % on the PTC (Predictive Toxicology Challenge) dataset, and 72 % on the NCI (National Cancer Institute) dataset. Properties and tradeoffs of these kernels, as well as other proposed kernels that leverage 1D or 3D representations of molecules, are briefly discussed.
Prediction of Coordination Number and Relative Solvent Accessibility in Proteins
, 2001
"... Knowingthecoordinationnumber andrelativesolventaccessibilityofalltheresidues inaproteiniscrucialforderivingconstraintsuseful inmodelingproteinfoldingandproteinstructure andinscoringremotehomologysearches.Wedevelopensemblesofbidirectionalrecurrentneural networkarchitecturestoimprovethestateofthe arti ..."
Abstract
-
Cited by 53 (19 self)
- Add to MetaCart
Knowingthecoordinationnumber andrelativesolventaccessibilityofalltheresidues inaproteiniscrucialforderivingconstraintsuseful inmodelingproteinfoldingandproteinstructure andinscoringremotehomologysearches.Wedevelopensemblesofbidirectionalrecurrentneural networkarchitecturestoimprovethestateofthe artinbothcontactandaccessibilityprediction, leveragingalargecorpusofcurateddatatogether withevolutionaryinformation.Theensemblesare usedtodiscriminatebetweentwodifferentstatesof residuecontactsorrelativesolventaccessibility, higherorlowerthanathresholddeterminedbythe averagevalueoftheresiduedistributionorthe accessibilitycutoff.Forcoordinationnumbers,the ensembleachievesperformancesrangingwithin 70.6--73.9%dependingontheradiusadoptedtodiscriminatecontacts (6--12).Theseperformances representgainsof16--20%overthebaselinestatisticalpredictor, alwaysassigninganaminoacidtothe largestclass,andare4--7%betterthananyprevious method.Acombinationofdifferentradiuspredictorsfurtherimprovesperformance. Foraccessibilitythresholdsintherelevant15 --30%range,the ensembleconsistentlyachievesaperformanceabove 77%,whichis10--16%abovethebaselineprediction andbetterthanotherexistingpredictors,byupto severalpercentagepoints.Forbothproblems,we quantifytheimprovementduetoevolutionaryinformationintheformofPSI -BLAST-generatedprofiles overBLASTprofiles.Thepredictionprogramsare implementedintheformoftwowebservers,CONproandACCpro, availableathttp://promoter.ics. uci.edu/BRNN-PRED/.Proteins2002;47:142--153.