Results 1 - 10
of
920
Improving the Prediction of Protein Secondary Structure in Three and Eight Classes Using Recurrent Neural Networks and Profiles
, 2001
"... Secondarystructurepredictions areincreasinglybecomingtheworkhorseforseveralmethodsaimingatpredictingproteinstructure andfunction.Hereweuseensemblesofbidirectionalrecurrentneuralnetworkarchitectures, PSIBLAST -derivedprofiles,andalargenonredundant trainingsettoderivetwonewpredictors:(a)the secondvers ..."
Abstract
-
Cited by 86 (21 self)
- Add to MetaCart
Secondarystructurepredictions areincreasinglybecomingtheworkhorseforseveralmethodsaimingatpredictingproteinstructure andfunction.Hereweuseensemblesofbidirectionalrecurrentneuralnetworkarchitectures, PSIBLAST -derivedprofiles,andalargenonredundant trainingsettoderivetwonewpredictors:(a)the secondversionoftheSSproprogramforsecondary structureclassificationintothreecategoriesand(b) thefirstversionoftheSSpro8programforsecondarystructureclassificationintotheeightclasses producedbytheDSSPprogram.Wedescribethe resultsofthreedifferenttestsetsonwhichSSpro achievedasustainedperformanceofabout78% correctprediction.Wereportconfusionmatrices, comparePSI-BLASTtoBLAST-derivedprofiles,and assessthecorrespondingperformanceimprovements. SSproandSSpro8areimplementedasweb servers,availabletogetherwithotherstructural featurepredictorsat:http://promoter.ics.uci.edu/ BRNN-PRED/.Proteins2002;47:228--235.
ProtoMap: automatic classification of protein sequences and hierarchy of protein families
- Nucleic Acids Res
, 1999
"... ABSTRACT We investigate the space of all protein sequences in search of clusters of related proteins. Our aim is to automatically detect these sets, and thus obtain a classification of all protein sequences. Our analysis, which uses standard measures of sequence similarity as applied to an all-vs.al ..."
Abstract
-
Cited by 71 (13 self)
- Add to MetaCart
ABSTRACT We investigate the space of all protein sequences in search of clusters of related proteins. Our aim is to automatically detect these sets, and thus obtain a classification of all protein sequences. Our analysis, which uses standard measures of sequence similarity as applied to an all-vs.all comparison of SWISSPROT, gives a very conservative initial classification based on the highest scoring pairs. The many classes in this classification correspond to protein subfamilies. Subsequently we merge the subclasses using the weaker pairs in a two-phase clustering algorithm. The algorithm makes use of transitivity to identify homologous proteins; however, transitivity is applied restrictively in an attempt to prevent unrelated proteins from clustering together. This process is repeated at varying levels of statistical significance. Consequently, a hierarchical organization of all proteins is obtained. The resulting classification splits the protein space into well-defined groups of proteins, which are closely correlated with natural biological families and superfamilies. Different indices of validity were applied to assess the quality of our classification and compare it with the protein families in the PROSITE and Pfam databases. Our classification agrees with these domain-based classifications for between 64.8 % and 88.5 % of the proteins. It also finds many new clusters of protein sequences which were not classified by these databases. The hierarchical organization suggested by our analysis reveals finer subfamilies in families of known proteins as well as many novel relations between protein families. Proteins 1999;37:360–378. � 1999 Wiley-Liss, Inc. Key words: clustering; protein families; protein classification; sequence alignment; homologous proteins
BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark
- Proteins
, 2005
"... ABSTRACT Multiple sequence alignment is one of the cornerstones of modern molecular biology. It is used to identify conserved motifs, to determine protein domains, in 2D/3D structure prediction by homology and in evolutionary studies. Recently, high-throughput technologies such as genome sequencing ..."
Abstract
-
Cited by 48 (1 self)
- Add to MetaCart
ABSTRACT Multiple sequence alignment is one of the cornerstones of modern molecular biology. It is used to identify conserved motifs, to determine protein domains, in 2D/3D structure prediction by homology and in evolutionary studies. Recently, high-throughput technologies such as genome sequencing and structural proteomics have lead to an explosion in the amount of sequence and structure information available. In response, several new multiple alignment methods have been developed that improve both the efficiency and the quality of protein alignments. Consequently, the benchmarks used to evaluate and compare these methods must also evolve. We present here the latest release of the most widely used multiple alignment benchmark, BAliBASE, which provides high quality, manually refined, reference alignments based on 3D structural superpositions. Version 3.0 of BAliBASE includes new, more challenging test cases, representing the real problems encountered when aligning large sets of complex sequences. Using a novel, semiautomatic update protocol, the number of protein families in the benchmark has been increased and representative test cases are now available that cover most of the protein fold space. The total number of proteins in BAliBASE has also been significantly increased from 1444 to 6255 sequences. In addition, full-length sequences are now provided for all test cases, which represent difficult cases for both global and local alignment programs. Finally, the BAliBASE Web site
The Sorcerer II Global Ocean Sampling expedition: Expanding the universe of protein families. PLoS Biol 5: e16
, 2007
"... Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predic ..."
Abstract
-
Cited by 33 (3 self)
- Add to MetaCart
Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in
Predicting intrinsic disorder from amino acid sequence
- Proteins
, 2003
"... ABSTRACT Blind predictions of intrinsic order and disorder were made on 42 proteins subsequently revealed to contain 9,044 ordered residues, 284 disordered residues in 26 segments of length 30 residues or less, and 281 disordered residues in 2 disordered segments of length greater than 30 residues. ..."
Abstract
-
Cited by 26 (11 self)
- Add to MetaCart
ABSTRACT Blind predictions of intrinsic order and disorder were made on 42 proteins subsequently revealed to contain 9,044 ordered residues, 284 disordered residues in 26 segments of length 30 residues or less, and 281 disordered residues in 2 disordered segments of length greater than 30 residues. The accuracies of the six predictors used in this experiment ranged from 77 % to 91 % for the ordered regions and from 56 % to 78 % for the disordered segments. The average of the order and disorder predictions ranged from 73 % to 77%. The prediction of disorder in the shorter segments was poor, from 25 % to 66 % correct, while the prediction of disorder in the longer segments was better, from 75 % to 95 % correct. Four of the predictors were composed of ensembles of neural networks. This enabled them to deal more efficiently with the large asymmetry in the training data through diversified sampling from the significantly larger ordered set and achieve better accuracy on ordered and long disordered regions. The exclusive use of long disordered regions for predictor training likely contributed to the disparity of the predictions on long versus short disordered regions, while averaging the output values over 61-residue windows to eliminate short predictions of order or disorder probably contributed to the even greater disparity for three of the predictors. This experiment supports the predictability of intrinsic disorder from amino acid sequence. Proteins 2003;53:566–572. © 2003 Wiley-Liss, Inc. Key words: natively unfolded; intrinsically disordered; neural networks; ordinary least squares regression; machine learning
Predicting interresidue contacts using templates and pathways
- Proteins
, 2003
"... ABSTRACT We present a novel method, HMMSTR-CM, for protein contact map predictions. Contact potentials were calculated by using HMMSTR, a hidden Markov model for local sequence structure correlations. Targets were aligned against protein templates using a Bayesian method, and contact maps were gener ..."
Abstract
-
Cited by 23 (5 self)
- Add to MetaCart
ABSTRACT We present a novel method, HMMSTR-CM, for protein contact map predictions. Contact potentials were calculated by using HMMSTR, a hidden Markov model for local sequence structure correlations. Targets were aligned against protein templates using a Bayesian method, and contact maps were generated by using these alignments. Contact potentials then were used to evaluate these templates. An ab initio method based on the target contact potentials using a rule-based strategy to model the protein-folding pathway was developed. Fold recognition and ab initio methods were combined to produce accurate, protein-like contact maps. Pathways sometimes led to an unambiguous prediction of topology, even without using templates. The results on CASP5 targets are discussed. Also included is a brief update on the quality of fully automated ab initio predictions using the I-sites server. Proteins 2003;53:497–502.
Modeling structurally variable regions in homologous proteins with rosetta
- Proteins
, 2004
"... ABSTRACT A major limitation of current comparative modeling methods is the accuracy with which regions that are structurally divergent from homologues of known structure can be modeled. Because structural differences between homologous proteins are responsible for variations in protein function and ..."
Abstract
-
Cited by 22 (9 self)
- Add to MetaCart
ABSTRACT A major limitation of current comparative modeling methods is the accuracy with which regions that are structurally divergent from homologues of known structure can be modeled. Because structural differences between homologous proteins are responsible for variations in protein function and specificity, the ability to model these differences has important functional consequences. Although existing methods can provide reasonably accurate models of short loop regions, modeling longer structurally divergent regions is an unsolved problem. Here we describe a method based on the de novo structure prediction algorithm, Rosetta, for predicting conformations of structurally divergent regions in comparative models.
Improving the Performance of Rosetta Using Multiple Sequence Alignment Information and Global Measures of Hydrophobic Core Formation
, 2001
"... Thisstudyexplorestheuseofmultiplesequencealignment (MSA)informationand globalmeasuresofhydrophobiccoreformationfor improvingtheRosettaabinitioproteinstructure predictionmethod.Themosteffectiveuseofthe MSAinformationisachievedbycarryingoutindependentfoldingsimulationsforasubsetofthe homologoussequenc ..."
Abstract
-
Cited by 22 (14 self)
- Add to MetaCart
Thisstudyexplorestheuseofmultiplesequencealignment (MSA)informationand globalmeasuresofhydrophobiccoreformationfor improvingtheRosettaabinitioproteinstructure predictionmethod.Themosteffectiveuseofthe MSAinformationisachievedbycarryingoutindependentfoldingsimulationsforasubsetofthe homologoussequencesintheMSAandthenidentifyingthefreeenergyminimacommontoallfolded sequencesviasimultaneousclusteringoftheindependentfoldingruns. Globalmeasuresofhydrophobiccoreformation, usingellipsoidalratherthan sphericalrepresentationsofthehydrophobiccore, arefoundtobeusefulinremovingnon-nativeconformationsbeforeclusteranalysis. ThroughthiscombinationofMSAinformationandglobalmeasuresof proteincoreformation,wesignificantlyincrease theperformanceofRosettaonachallengingtestset. Proteins2001;43:1--11.2001Wiley-Liss,Inc.
Expectations From Structural Genomics
, 2000
"... Structural genomics projects aim to provide an experimental structure or a good model for every protein in all completed genomes. Most of the experimental work for these projects will be directed toward proteins whose fold cannot be readily recognized by simple sequence comparison with proteins of k ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
Structural genomics projects aim to provide an experimental structure or a good model for every protein in all completed genomes. Most of the experimental work for these projects will be directed toward proteins whose fold cannot be readily recognized by simple sequence comparison with proteins of known structure. Based on the history of proteins classified in the SCOP structure database, we expect that only about a quarter of the early structural genomics targets will have a new fold. Among the remaining ones, about half are likely to be evolutionarily related to proteins of known structure, even though the homology could not be readily detected by sequence analysis.
Exploiting sequence and structure homologs to identify protein-protein binding sites
- Proteins
, 2006
"... ABSTRACT A rapid increase in the number of experimentally derived three-dimensional structures provides an opportunity to better understand and subsequently predict protein–protein interactions. In this study, structurally conserved residues were derived from multiple structure alignments of the ind ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
ABSTRACT A rapid increase in the number of experimentally derived three-dimensional structures provides an opportunity to better understand and subsequently predict protein–protein interactions. In this study, structurally conserved residues were derived from multiple structure alignments of the individual components of known complexes and the assigned conservation score was weighted based on the crystallographic B factor to account for the structural flexibility that will result in a poor alignment. Sequence profile and accessible surface area information was then combined with the conservation score to predict protein–protein binding sites using a Support Vector Machine (SVM). The incorporation of the conservation score significantly improved the performance of the SVM. About 52 % of the binding sites were precisely predicted (greater than 70 % of the residues in the site were identified); 77 % of the binding sites were correctly predicted (greater than 50 % of the residues in the site were identified), and 21 % of the binding sites were partially covered by the predicted residues (some residues were identified). The results support the hypothesis that in many cases protein interfaces require some residues to provide rigidity to minimize the entropic cost upon complex formation. Proteins 2006;62:630–640. © 2005 Wiley-Liss, Inc. Key words: structurally conserved surface residues; protein interfaces; support vector machines

