Results 1 - 10
of
346
How well is enzyme function conserved as a function of pairwise sequence identity
- J. Mol. Biol
, 2003
"... Enzyme function conservation has been used to derive the threshold of sequence identity necessary to transfer function from a protein of known function to an unknown protein. Using pairwise sequence comparison, several studies suggested that when the sequence identity is above 40%, enzyme function i ..."
Abstract
-
Cited by 125 (15 self)
- Add to MetaCart
Enzyme function conservation has been used to derive the threshold of sequence identity necessary to transfer function from a protein of known function to an unknown protein. Using pairwise sequence comparison, several studies suggested that when the sequence identity is above 40%, enzyme function is well conserved. In contrast, Rost argued that because of database bias, the results from such simple pairwise comparisons might be misleading. Thus, by grouping enzyme sequences into families based on sequence similarity and selecting representative sequences for comparison, he showed that enzyme function starts to diverge quickly when the sequence identity is below 70%. Here, we employ a strategy similar to Rost’s to reduce the database bias; however, we classify enzyme families based not only on sequence similarity, but also on functional similarity, i.e. sequences in each family must have the same four digits or the same first three digits of the enzyme commission (EC) number. Furthermore, instead of selecting representative sequences for comparison,
ConSurf: identification of functional regions in proteins by surface-mapping of phylogenetic information
- Bioinformatics
, 2003
"... Summary: We recently developed algorithmic tools for the identification of functionally important regions in proteins of known three dimensional structure by estimating the degree of conservation of the amino-acid sites among their close sequence homologues. Projecting the conservation grades onto t ..."
Abstract
-
Cited by 124 (13 self)
- Add to MetaCart
(Show Context)
Summary: We recently developed algorithmic tools for the identification of functionally important regions in proteins of known three dimensional structure by estimating the degree of conservation of the amino-acid sites among their close sequence homologues. Projecting the conservation grades onto the molecular surface of these proteins reveals patches of highly conserved (or occasionally highly variable) residues that are often of important biological function. We present a new web server, ConSurf, which automates these algorithmic tools. ConSurf may be used for high-throughput characterization of functional regions in proteins.
Automated protein function prediction–the genomic challenge. Brief Bioinform 2006;7:225–42
"... Overwhelmed with genomic data, biologists are facing the first big post-genomic questionçwhat do all genes do? First, not only is the volume of pure sequence and structure data growing, but its diversity is growing as well, leading to a disproportionate growth in the number of uncharacterized gene p ..."
Abstract
-
Cited by 107 (5 self)
- Add to MetaCart
(Show Context)
Overwhelmed with genomic data, biologists are facing the first big post-genomic questionçwhat do all genes do? First, not only is the volume of pure sequence and structure data growing, but its diversity is growing as well, leading to a disproportionate growth in the number of uncharacterized gene products. Consequently, established methods of gene and protein annotation, such as homology-based transfer, are annotating less data and in many cases are amplifying existing erroneous annotation. Second, there is a need for a functional annotation which is standardized and machine readable so that function prediction programs could be incorporated into larger work-flows.This is problematic due to the subjective and contextual definition of protein function.Third, there is a need to assess the quality of function predictors. Again, the subjectivity of the term ‘function ’ and the various aspects of biological function make this a challenging effort. This article briefly outlines the history of automated protein function prediction and surveys the latest innovations in all three topics.
Predicting the functional impact of protein mutations: application to cancer genomics
- Nucleic Acids Res. 39, e118. ACS Chemical Biology Letters dx.doi.org/10.1021/cb500347p | ACS Chem. Biol
, 2011
"... As large-scale re-sequencing of genomes reveals many protein mutations, especially in human cancer tissues, prediction of their likely functional impact becomes important practical goal. Here, we introduce a new functional impact score (FIS) for amino acid residue changes using evolutionary conserva ..."
Abstract
-
Cited by 99 (3 self)
- Add to MetaCart
As large-scale re-sequencing of genomes reveals many protein mutations, especially in human cancer tissues, prediction of their likely functional impact becomes important practical goal. Here, we introduce a new functional impact score (FIS) for amino acid residue changes using evolutionary conservation patterns. The information in these patterns is derived from aligned families and sub-families of sequence homologs within and between species using combinatorial entropy formalism. The score performs well on a large set of human protein mutations in separating disease-associated variants (19200), assumed to be strongly functional, from common polymorphisms (35600), assumed to be weakly functional (area under the receiver operating characteristic curve of 0.86). In cancer, using recurrence, multiplicity and annotation for 10000 mutations in the COSMIC database, the method does well in assign-ing higher scores to more likely functional mutations (‘drivers’). To guide experimental prioritization, we report a list of about 1000 top human cancer genes frequently mutated in one or more cancer types ranked by likely functional impact; and, an additional 1000 candidate cancer genes with rare but likely functional mutations. In addition, we estimate that at least 5 % of cancer-relevant mutations involve switch of function, rather than simply loss or gain of function.
Automatic methods for predicting functionally important residues
, 2003
"... Sequence analysis is often the first guide for the prediction of residues in a protein family that may have functional significance. A few methods have been proposed which use the division of protein families into subfamilies in the search for those positions that could have some functional signific ..."
Abstract
-
Cited by 80 (2 self)
- Add to MetaCart
Sequence analysis is often the first guide for the prediction of residues in a protein family that may have functional significance. A few methods have been proposed which use the division of protein families into subfamilies in the search for those positions that could have some functional significance for the whole family, but at the same time which exhibit the specificity of each subfamily (“Tree-determinant residues”). However, there are still many unsolved questions like the best division of a protein family into subfamilies, or the accurate detection of sequence variation patterns characteristic of different subfamilies. Here we present a systematic study in a significant number of protein families, testing the statistical meaning of the Tree-determinant residues predicted by three different methods that represent the range of available approaches. The first method takes as a starting point a phylogenetic representation of a protein family and, following the principle of Relative Entropy from Information Theory, automatically searches for the optimal division of the family into
ProFunc: a server for predicting protein function from 3D structure
- Nucleic Acids Res
, 2005
"... ProFunc ..."
(Show Context)
Prediction of protein–protein interaction sites in heterocomplexes with neural networks
- Eur. J. Biochem
, 2002
"... In this paper we address the problem of extracting features relevant for predicting protein–protein interaction sites from the three-dimensional structures of protein complexes. Our approach is based on information about evolutionary conservation and surface disposition. We implement a neural networ ..."
Abstract
-
Cited by 76 (0 self)
- Add to MetaCart
(Show Context)
In this paper we address the problem of extracting features relevant for predicting protein–protein interaction sites from the three-dimensional structures of protein complexes. Our approach is based on information about evolutionary conservation and surface disposition. We implement a neural network based system, which uses a cross validation procedure and allows the correct detection of 73 % of the residues involved in protein interactions in a selected database comprising 226 heterodimers. Our analysis confirms that the chemico-physical properties of interacting surfaces are difficult to distinguish from those of the whole protein surface. However neural networks trained with a reduced representation of the interacting patch and sequence profile are sufficient to generalize over the different features of the contact patches and to predict whether a residue in the protein surface is or is not in contact. By using a blind test, we report the prediction of the surface interacting sites of three structural components of the Dnak molecular chaperone system, and find close agreement with previously published experimental results. We propose that the predictor can significantly complement results from structural and functional proteomics.
A family of evolution-entropy hybrid methods for ranking protein residues by importance
- J Mol Biol
"... In order to identify the amino acids that determine protein structure and function it is useful to rank them by their relative importance. Previous approaches belong to two groups; those that rely on statistical inference, and those that focus on phylogenetic analysis. Here, we introduce a class of ..."
Abstract
-
Cited by 69 (6 self)
- Add to MetaCart
In order to identify the amino acids that determine protein structure and function it is useful to rank them by their relative importance. Previous approaches belong to two groups; those that rely on statistical inference, and those that focus on phylogenetic analysis. Here, we introduce a class of hybrid methods that combine evolutionary and entropic information from multiple sequence alignments. A detailed analysis in insulin recep-tor kinase domain and tests on proteins that are well-characterized experi-mentally show the hybrids ’ greater robustness with respect to the input choice of sequences, as well as improved sensitivity and specificity of pre-diction. This is a further step toward proteome scale analysis of protein structure and function.
Y: Protein binding site prediction using an empirical scoring function
- Nucleic Acids Res
"... function ..."
(Show Context)
MS: Using orthologous and paralogous proteins to identify specificity determining residues bacterial transcription factors
- J. Mol. Biol
"... Concepts of orthology and paralogy are become increasingly important as whole-genome comparison allows their identification in complete genomes. Functional specificity of proteins is assumed to be conserved among orthologs and is different among paralogs. We used this assumption to identify residues ..."
Abstract
-
Cited by 64 (0 self)
- Add to MetaCart
Concepts of orthology and paralogy are become increasingly important as whole-genome comparison allows their identification in complete genomes. Functional specificity of proteins is assumed to be conserved among orthologs and is different among paralogs. We used this assumption to identify residues which determine specificity of protein–DNA and protein–ligand recognition. Finding such residues is crucial for understanding mechanisms of molecular recognition and for rational protein and drug design. Assuming conservation of specificity among orthologs and different specificity of paralogs, we identify residues that correlate with this grouping by specificity. The method is taking advantage of complete genomes to find multiple orthologs and paralogs. The central part of this method is a procedure to compute statistical significance of the predictions. The procedure is based on a simple statistical model of protein evolution. When applied to a large family of bacterial transcription factors, our method identified 12 residues that are presumed to determine the protein–DNA and protein–ligand recognition specificity. Structural analysis of the proteins and available experimental results strongly support our predictions. Our results suggest new experiments aimed at rational re-design of specificity in bacterial transcription factors by a minimal number of mutations.