Results 1 - 10
of
43
PROFcon: novel prediction of long-range contacts. Bioinformatics 2005;21:2960–2968
"... Motivation: Despite the continuing advance in the experimental determination of protein structures, the gap between the number of known protein sequences and structures continues to increase. Prediction methods can bridge this sequence-structure gap only partially. Better predictions of non-local co ..."
Abstract
-
Cited by 68 (8 self)
- Add to MetaCart
(Show Context)
Motivation: Despite the continuing advance in the experimental determination of protein structures, the gap between the number of known protein sequences and structures continues to increase. Prediction methods can bridge this sequence-structure gap only partially. Better predictions of non-local contacts between residues could improve comparative modeling, fold recognition, and could assist the experimental structure determination. Results: Here, we introduced PROFcon, a novel contact prediction method that combines information from alignments, from predictions of secondary structure and solvent accessibility, from the region between two residues, and from the average properties of the entire protein. In contrast to some other methods, PROFcon predicted short and long proteins at similar levels of accuracy. As expected, PROFcon was clearly less accurate when tested on sparse evolutionary profiles, i.e. on families with few homologues. Prediction accuracy was highest for proteins belonging to the SCOP alpha/beta class. PROFcon compared favorably with state-of-the-art prediction methods at the CASP6 meeting. While the performance may still be perceived as low, our method clearly pushed the mark higher. Furthermore,
NORSp: predictions of long regions without regular secondary structure
- Nucleic Acids Res
, 2003
"... Many structurally flexible regions play important roles in biological processes. It has been shown that extended loopy regions are very abundant in the protein universe and that they have been conserved through evolution. Here, we present NORSp, a publicly available predictor for disordered regions ..."
Abstract
-
Cited by 35 (10 self)
- Add to MetaCart
(Show Context)
Many structurally flexible regions play important roles in biological processes. It has been shown that extended loopy regions are very abundant in the protein universe and that they have been conserved through evolution. Here, we present NORSp, a publicly available predictor for disordered regions in protein. Specifically, NORSp predicts long regions with NO Regular Secondary structure. Upon user submission of a protein sequence, NORSp will analyse the protein for its secondary structure, presence of transmembrane helices and coiled-coil. It will then return email to the user about the presence and position of disordered regions. NORSp can be accessed from
Prediction of dna-binding residues from sequence
- Bioinformatics
"... Motivation: Thousands of proteins are known to bind to DNA; for most of them the mechanism of action and the residues that bind to DNA, i.e. the binding sites, are yet unknown. Experimental identifica-tion of binding sites requires expensive and laborious methods such as mutagenesis and binding essa ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
(Show Context)
Motivation: Thousands of proteins are known to bind to DNA; for most of them the mechanism of action and the residues that bind to DNA, i.e. the binding sites, are yet unknown. Experimental identifica-tion of binding sites requires expensive and laborious methods such as mutagenesis and binding essays. Hence, such studies are not applicable on a large scale. If the 3D structure of a protein is known, it is often possible to predict DNA-binding sites in silico. However, for most proteins, such knowledge is not available. Results: It has been shown that DNA-binding residues have distinct biophysical characteristics. Here we demonstrate that these char-acteristics are so distinct that they enable accurate prediction of the residues that bind DNA directly from amino acid sequence, without requiring any additional experimental or structural information. In a cross-validation based on the largest non-redundant dataset of high-resolution protein–DNA complexes available today, we found that 89 % of our predictions are confirmed by experimental data. Thus, it is now possible to identify DNA-binding sites on a proteomic scale even in the absence of any experimental data or 3D-structural information.
Sequence-based prediction of protein domains. Nucleic Acids Research 32(12):3522–3530
, 2004
"... Guessing the boundaries of structural domains has been an important and challenging problem in experimental and computational structural biology. Predictions were based on intuition, biochemical properties, statistics, sequence homology and other aspects of predicted protein structure. Here, we intr ..."
Abstract
-
Cited by 31 (1 self)
- Add to MetaCart
(Show Context)
Guessing the boundaries of structural domains has been an important and challenging problem in experimental and computational structural biology. Predictions were based on intuition, biochemical properties, statistics, sequence homology and other aspects of predicted protein structure. Here, we intro-duced CHOPnet, a de novo method that predicts structural domains in the absence of homology to known domains. Our method was based on neural networks and relied exclusively on information available for all proteins. Evaluating sustained perfor-mance through rigorous cross-validation on proteins ofknownstructure,wecorrectlypredictedthenumber of domains in 69 % of all proteins. For 50 % of the two-domain proteins the centre of the predicted boundary was closer than 20 residues to the boundary assigned from three-dimensional (3D) structures; this was about eight percentage points better than predic-tions by ‘equal split’. Our results appeared to compare favourably with those from previously published methods. CHOPnet may be useful to restrict the experimental testing of different fragments for structure determination in the context of structural genomics.
Better Prediction of Sub-Cellular Localization by Combining Evolutionary and Structural Information
- PROTEINS: STRUCTURE, FUNCTION, AND GENETICS 53:917Â930 (2003)
, 2003
"... The native sub-cellular compartment of a protein is one aspect of its function. Thus, predicting localization is an important step toward predicting function. Short zip code-like sequence fragments regulate some of the shuttling between compartments. Cataloguing and predicting such motifs is the mo ..."
Abstract
-
Cited by 26 (2 self)
- Add to MetaCart
The native sub-cellular compartment of a protein is one aspect of its function. Thus, predicting localization is an important step toward predicting function. Short zip code-like sequence fragments regulate some of the shuttling between compartments. Cataloguing and predicting such motifs is the most accurate means of determining localization in silico. However, only few motifs are currently known, and not all the trafficking appears regulated in this way. The amino acid composition of a protein correlates with its localization. All general prediction methods employed this observation. Here, we explored the evolutionary information contained in multiple alignments and aspects of protein structure to predict localization in absence of homology and targeting motifs. Our final system combined statistical rules and a variety of neural networks to achieve an overall four-state accuracy above 65%, a significant improvement over systems using only composition. The system was at its best for extra-cellular and nuclear proteins; it was significantly less accurate than TargetP for mitochondrial proteins. Interestingly, all methods that were developed on SWISS-PROT sequences failed grossly when fed with sequences from proteins of known structures taken from PDB. We therefore developed two separate systems: one for proteins of known structure and one for proteins of unknown structure. Finally, we applied the PDBbased system along with homology-based inferences and automatic text analysis to annotate all eukaryotic proteins in the PDB
Implications of structural genomics target selection strategies: Pfam5000, whole genome, and random approaches
- Proteins
, 2005
"... ABSTRACT Structural genomics is an international effort to determine the three-dimensional shapes of all important biological macromolecules, with a primary focus on proteins. Target proteins should be selected according to a strategy that is medically and biologically relevant, of good value, and t ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
ABSTRACT Structural genomics is an international effort to determine the three-dimensional shapes of all important biological macromolecules, with a primary focus on proteins. Target proteins should be selected according to a strategy that is medically and biologically relevant, of good value, and tractable. As an option to consider, we present the “Pfam5000 ” strategy, which involves selecting the 5000 most important families from the Pfam database as sources for targets. We compare the Pfam5000 strategy to several other proposed strategies that would require similar numbers of targets. These strategies include complete solution of several small to moderately sized bacterial proteomes, partial coverage of the human proteome, and random
Exploring Bias in the Protein Data Bank Using Contrast Classifiers
- J. Phys.: Condens. Matter
, 2004
"... this paper we provide a complementary view of the bias in PDB that explores differences in sequence properties of PDB and SWISS-PROT [2] proteins. This was accomplished by training an ensemble of neural network classifiers to distinguish between distributions of the non-redundant subsets of PDB and ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
(Show Context)
this paper we provide a complementary view of the bias in PDB that explores differences in sequence properties of PDB and SWISS-PROT [2] proteins. This was accomplished by training an ensemble of neural network classifiers to distinguish between distributions of the non-redundant subsets of PDB and SWISS-PROT. Following the recently proposed contrast classifier framework [14], output of such an ensemble of classifiers measures the level to which a given sequence property is overrepresented/underrepresented in PDB as compared to SWISS-PROT. We applied the contrast classifier to analyze the bias in PDB towards numerous protein properties and to examine whether our approach can be useful in selecting the most interesting target proteins for structural characterization
SPINE 2: a system for collaborative structural proteomics within a federated database framework
- Nucleic Acids Res
, 2003
"... We present version 2 of the SPINE system for structural proteomics. SPINE is available over the web at ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
(Show Context)
We present version 2 of the SPINE system for structural proteomics. SPINE is available over the web at
Protein family clustering for structural genomics
- J. Mol. Biol
, 2005
"... A major goal of structural genomics is the provision of a structural template for a large fraction of protein domains. The magnitude of this task depends on the number and nature of protein sequence families. With a large number of bacterial genomes now fully sequenced, it is possible to obtain impr ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
A major goal of structural genomics is the provision of a structural template for a large fraction of protein domains. The magnitude of this task depends on the number and nature of protein sequence families. With a large number of bacterial genomes now fully sequenced, it is possible to obtain improved estimates of the number and diversity of families in that kingdom. We have used an automated clustering procedure to group all sequences in a set of genomes into protein families. Benchmarking shows the clustering method is sensitive at detecting remote family members, and has a low level of false positives. This comprehensive protein family set has been used to address the following questions. (1) What is the structure coverage for currently known families? (2) How will the number of known apparent families grow as more genomes are sequenced? (3) What is a practical strategy for maximizing structure coverage in future? Our study indicates that approximately 20 % of known families with three or more members currently have a representative structure. The study indicates also that the number of apparent protein families will be considerably larger than previously thought: We estimate that, by the criteria of this work, there will be about 250,000 protein families when 1000 microbial genomes have been sequenced. However, the vast majority of these families will be small, and it will be possible to obtain structural templates for 70–80 % of protein domains with an achievable number of representative structures, by systematically sampling the larger families.
Kolaskar AS: Prediction of 3D structure of envelope glycoprotein of Sri Lanka strain of Japanese encephalitis virus. the proceedings of first APBC conference: 4–7 February 2003; Conferences in research and practice in information technology 19
- In Yi-Ping Phoebe Chen (ed.), Conferences in Research and Practice in Information Technology, Australian Computer Society Inc
, 2003
"... Abstract This paper describes knowledge-based homology modeling studies of envelope glycoprotein (Egp) of Sri Lanka strain of Japanese encephalitis virus (JEVS). JEVS is a mosquito-borne Flavivirus, which is an important human pathogen. The Egp is a major structural antigen and is responsible for v ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract This paper describes knowledge-based homology modeling studies of envelope glycoprotein (Egp) of Sri Lanka strain of Japanese encephalitis virus (JEVS). JEVS is a mosquito-borne Flavivirus, which is an important human pathogen. The Egp is a major structural antigen and is responsible for viral hemagglutination and neutralisation. The 3D structure of 399 amino acids from the extra cellular domain of Egp of JEVS has been predicted using the x-ray crystal structure of Egp of Tickborne encephalitis virus as a template and the knowledge-based homology modeling approach. Even though the homology modeling is the best method for prediction of 3D structure, prediction of structures of loop regions is still a challenge. A novel approach of molecular dynamics simulations and geometry optimisation has been used to sample the conformations of loop regions. The Egp of JEVS has an extended structure with nine β-sheets, two α-helices and three domains. The predicted structure was compared with the model of Egp of Nakayama strain of Japanese encephalitis virus (JEVN), which was developed earlier (Kolaskar & KulkarniKale, 1999). Similarities and differences between the structures of Egps of two strains of JEV are discussed. These models illustrate effect of mutations on the local and global conformation of Egp and help to explain strain specific properties. The sequential and conformational epitopes of Egp of JEV were predicted using an algorithm developed in house