Results 11 - 20
of
95
Valenzia A: Effective use of sequence correlation and conservation in fold recognition
- J Mol Biol
, 1999
"... Protein families are a rich source of information; sequence conservation and sequence correlation are two of the main properties that can be derived from the analysis of multiple sequence alignments. Sequence conservation is related to the direct evolutionary pressure to retain the chemical characte ..."
Abstract
-
Cited by 73 (7 self)
- Add to MetaCart
(Show Context)
Protein families are a rich source of information; sequence conservation and sequence correlation are two of the main properties that can be derived from the analysis of multiple sequence alignments. Sequence conservation is related to the direct evolutionary pressure to retain the chemical characteristics of some positions in order to maintain a given function. Sequence correlation is attributed to the small sequence adjustments needed to maintain protein stability against constant mutational drift. Here, we showed that sequence conservation and correlation were each frequently informative enough to detect incorrectly folded proteins. Furthermore, combining conservation, correlation, and polarity, we achieved an almost perfect discrimination between native and incorrectly folded proteins. Thus, we made use of this information for threading by evaluating the models suggested by a threading method according to the degree of proximity of the corresponding correlated, conserved, and apolar residues. The results showed that the fold recognition capacity of a given threading approach could be improved almost fourfold by selecting the alignments that score best under the three different sequencebased approaches.
Prospects for ab initio protein structural genomics
- J Mol Biol
"... We present the results of a large-scale testing of the ROSETTA method for ab initio protein structure prediction. Models were generated for two independently generated lists of small proteins (up to 150 amino acid residues), and the results were evaluated using traditional rmsd based measures and a ..."
Abstract
-
Cited by 72 (11 self)
- Add to MetaCart
(Show Context)
We present the results of a large-scale testing of the ROSETTA method for ab initio protein structure prediction. Models were generated for two independently generated lists of small proteins (up to 150 amino acid residues), and the results were evaluated using traditional rmsd based measures and a novel measure based on the structure-based comparison of the models to the structures in the PDB using DALI. For 111 of 136 all a and a/b proteins 50 to 150 residues in length, the method produced at least one model within 7 AÊ rmsd of the native structure in 1000 attempts. For 60 of these proteins, the closest structure match in the PDB to at least one of the ten most frequently generated conformations was found to be structurally related (four standard deviations above background) to the native protein. These results suggest that ab initio structure prediction approaches may soon be useful for generating low resolution models and identifying distantly related proteins with similar structures and perhaps functions for these classes of proteins on the genome scale.
Sequence Conserved for Subcellular Localization
, 2002
"... The more proteins diveins in sequence, the more difficult it becomes for bioinformatics to infer similarities of protein function and structure from sequence. The precise thresholds used in automated genome annotations depend on the particular aspect of protein function transferred by homology. He ..."
Abstract
-
Cited by 68 (9 self)
- Add to MetaCart
The more proteins diveins in sequence, the more difficult it becomes for bioinformatics to infer similarities of protein function and structure from sequence. The precise thresholds used in automated genome annotations depend on the particular aspect of protein function transferred by homology. Here, we presented the first large-scale analysis of the relation between sequence similarity and identity in subcellular localization.
De novo prediction of three-dimensional structures for major protein families
- J. Mol. Biol
, 2002
"... As the number of gene sequences in databases, public and private, increase dramatically, so do the number of genes of unknown function. Of the protein sequences currently available approximately ..."
Abstract
-
Cited by 65 (12 self)
- Add to MetaCart
As the number of gene sequences in databases, public and private, increase dramatically, so do the number of genes of unknown function. Of the protein sequences currently available approximately
Molecular Modeling Of Proteins And Mathematical Prediction Of Protein Structure
- SIAM Review
, 1997
"... . This paper discusses the mathematical formulation of and solution attempts for the so-called protein folding problem. The static aspect is concerned with how to predict the folded (native, tertiary) structure of a protein, given its sequence of amino acids. The dynamic aspect asks about the possib ..."
Abstract
-
Cited by 61 (5 self)
- Add to MetaCart
(Show Context)
. This paper discusses the mathematical formulation of and solution attempts for the so-called protein folding problem. The static aspect is concerned with how to predict the folded (native, tertiary) structure of a protein, given its sequence of amino acids. The dynamic aspect asks about the possible pathways to folding and unfolding, including the stability of the folded protein. From a mathematical point of view, there are several main sides to the static problem: -- the selection of an appropriate potential energy function; -- the parameter identification by fitting to experimental data; and -- the global optimization of the potential. The dynamic problem entails, in addition, the solution of (because of multiple time scales very stiff) ordinary or stochastic differential equations (molecular dynamics simulation), or (in case of constrained molecular dynamics) of differential-algebraic equations. A theme connecting the static and dynamic aspect is the determination and formation of...
Amino acid empirical contact energy definitions for fold recognition in the space of contact maps,”
- BMC Bioinformatics,
, 2003
"... ..."
Fast pairwise structural RNA alignments by pruning of the dynamical programming matrix. PLoS Comput Biol 3
, 2007
"... It has become clear that noncoding RNAs (ncRNA) play important roles in cells, and emerging studies indicate that there might be a large number of unknown ncRNAs in mammalian genomes. There exist computational methods that can be used to search for ncRNAs by comparing sequences from different genome ..."
Abstract
-
Cited by 43 (6 self)
- Add to MetaCart
(Show Context)
It has become clear that noncoding RNAs (ncRNA) play important roles in cells, and emerging studies indicate that there might be a large number of unknown ncRNAs in mammalian genomes. There exist computational methods that can be used to search for ncRNAs by comparing sequences from different genomes. One main problem with these methods is their computational complexity, and heuristics are therefore employed. Two heuristics are currently very popular: pre-folding and pre-aligning. However, these heuristics are not ideal, as pre-aligning is dependent on sequence similarity that may not be present and pre-folding ignores the comparative information. Here, pruning of the dynamical programming matrix is presented as an alternative novel heuristic constraint. All subalignments that do not exceed a length-dependent minimum score are discarded as the matrix is filled out, thus giving the advantage of providing the constraints dynamically. This has been included in a new implementation of the FOLDALIGN algorithm for pairwise local or global structural alignment of RNA sequences. It is shown that time and memory requirements are dramatically lowered while overall performance is maintained. Furthermore, a new divide and conquer method is introduced to limit the memory requirement during global alignment and backtrack of local alignment. All branch points in the computed RNA structure are found and used to divide the structure into smaller unbranched segments. Each segment is then realigned and backtracked in a normal fashion. Finally, the FOLDALIGN algorithm has also been updated with a better memory implementation and an improved energy model. With these improvements in the algorithm, the FOLDALIGN software package provides the molecular biologist with an efficient and user-friendly tool for searching for new ncRNAs. The software package is available for download at
Large-scale prediction of disulphide bridges using kernel methods, two-dimensional recursive neural networks, and weighted graph matching. Proteins,
, 2006
"... ABSTRACT The formation of disulphide bridges between cysteines plays an important role in protein folding, structure, function, and evolution. Here, we develop new methods for predicting disulphide bridges in proteins. We first build a large curated data set of proteins containing disulphide bridge ..."
Abstract
-
Cited by 40 (5 self)
- Add to MetaCart
ABSTRACT The formation of disulphide bridges between cysteines plays an important role in protein folding, structure, function, and evolution. Here, we develop new methods for predicting disulphide bridges in proteins. We first build a large curated data set of proteins containing disulphide bridges to extract relevant statistics. We then use kernel methods to predict whether a given protein chain contains intrachain disulphide bridges or not, and recursive neural networks to predict the bonding probabilities of each pair of cysteines in the chain. These probabilities in turn lead to an accurate estimation of the total number of disulphide bridges and to a weighted graph matching problem that can be addressed efficiently to infer the global disulphide bridge connectivity pattern. This approach can be applied both in situations where the bonded state of each cysteine is known, or in ab initio mode where the state is unknown. Furthermore, it can easily cope with chains containing an arbitrary number of disulphide bridges, overcoming one of the major limitations of previous approaches. It can classify individual cysteine residues as bonded or nonbonded with 87% specificity and 89% sensitivity. The estimate for the total number of bridges in each chain is correct 71% of the times, and within one from the true value over 94% of the times. The prediction of the overall disulphide connectivity pattern is exact in about 51% of the chains. In addition to using profiles in the input to leverage evolutionary information, including true (but not predicted) secondary structure and solvent accessibility information yields small but noticeable improvements. Finally, once the system is trained, predictions can be computed rapidly on a proteomic or protein-engineering scale. The disulphide bridge prediction server (DIpro), software, and datasets are available through www.igb.uci.edu/servers/pass.html. Proteins 2006;62:617-629.
Adaptation of Protein Surfaces to Subcellular Location
- J. Mol. Biol
, 1998
"... this paper, we have approached this problem by grouping proteins by their subcellular location and looking at structural properties that are characteristic to each location. We hypothesize that, throughout evolution, each subcellular location has maintained a characteristic physio-chemical environme ..."
Abstract
-
Cited by 36 (2 self)
- Add to MetaCart
(Show Context)
this paper, we have approached this problem by grouping proteins by their subcellular location and looking at structural properties that are characteristic to each location. We hypothesize that, throughout evolution, each subcellular location has maintained a characteristic physio-chemical environment, and that proteins in each location have adapted to these environments. If so, we would expect that protein structures from different locations will show characteristic differences, particularly at the surface, which is directly exposed to the environment. To test this hypothesis, we have examined all eukaryotic proteins with known three-dimensional structure and for which the subcellular location is known to be either nuclear, cytoplasmic, or extracellular. In agreement with previous studies, we nd that the total amino acid composition carries a signal that identies the subcellular location. This signal was due almost entirely to the surface residues. The surface residue signal was often strong enough to accurately predict subcellular location, given only a knowledge of which residues are at the protein surface. The results suggest how the accuracy of prediction of location from sequence can be improved. We concluded that protein surfaces show adaptation to their subcellular location. The nature of these adaptations suggests several principles that proteins may have used in adapting to particular physio-chemical environments; these principles may be useful for protein design.