Results 1 - 10
of
228
The ProDom database of protein domain families: more emphasis on 3D
- NUCLEIC ACIDS RES
, 2005
"... ..."
(Show Context)
PROFcon: novel prediction of long-range contacts. Bioinformatics 2005;21:2960–2968
"... Motivation: Despite the continuing advance in the experimental determination of protein structures, the gap between the number of known protein sequences and structures continues to increase. Prediction methods can bridge this sequence-structure gap only partially. Better predictions of non-local co ..."
Abstract
-
Cited by 68 (8 self)
- Add to MetaCart
(Show Context)
Motivation: Despite the continuing advance in the experimental determination of protein structures, the gap between the number of known protein sequences and structures continues to increase. Prediction methods can bridge this sequence-structure gap only partially. Better predictions of non-local contacts between residues could improve comparative modeling, fold recognition, and could assist the experimental structure determination. Results: Here, we introduced PROFcon, a novel contact prediction method that combines information from alignments, from predictions of secondary structure and solvent accessibility, from the region between two residues, and from the average properties of the entire protein. In contrast to some other methods, PROFcon predicted short and long proteins at similar levels of accuracy. As expected, PROFcon was clearly less accurate when tested on sparse evolutionary profiles, i.e. on families with few homologues. Prediction accuracy was highest for proteins belonging to the SCOP alpha/beta class. PROFcon compared favorably with state-of-the-art prediction methods at the CASP6 meeting. While the performance may still be perceived as low, our method clearly pushed the mark higher. Furthermore,
SUPERFAMILY—sophisticated comparative genomics, data mining
, 2008
"... visualization and phylogeny ..."
(Show Context)
Components of coated vesicles and nuclear pore complexes share a common molecular architecture
- PLoS Biol
, 2004
"... Numerous features distinguish prokaryotes from eukaryotes, chief among which are the distinctive internal membrane systems of eukaryotic cells. These membrane systems form elaborate compartments and vesicular trafficking pathways, and sequester the chromatin within the nuclear envelope. The nuclear ..."
Abstract
-
Cited by 46 (11 self)
- Add to MetaCart
(Show Context)
Numerous features distinguish prokaryotes from eukaryotes, chief among which are the distinctive internal membrane systems of eukaryotic cells. These membrane systems form elaborate compartments and vesicular trafficking pathways, and sequester the chromatin within the nuclear envelope. The nuclear pore complex is the portal that specifically mediates macromolecular trafficking across the nuclear envelope. Although it is generally understood that these internal membrane systems evolved from specialized invaginations of the prokaryotic plasma membrane, it is not clear how the nuclear pore complex could have evolved from organisms with no analogous transport system. Here we use computational and biochemical methods to perform a structural analysis of the seven proteins comprising the yNup84/vNup107–160 subcomplex, a core building block of the nuclear pore complex. Our analysis indicates that all seven proteins contain either a b-propeller fold, an a-solenoid fold, or a distinctive arrangement of both, revealing close similarities between the structures comprising the yNup84/vNup107–160 subcomplex and those comprising the major types of vesicle coating complexes that maintain vesicular trafficking pathways. These similarities suggest a common evolutionary origin for nuclear pore complexes and coated vesicles in an early membrane-curving module that led to the formation of the internal membrane systems in modern eukaryotes.
B: Distinguishing protein-coding from noncoding RNAs through support vector machines. PLos Genetics 2006
- PLoS Genet
, 2006
"... RIKEN’s FANTOM project has revealed many previously unknown coding sequences, as well as an unexpected degree of variation in transcripts resulting from alternative promoter usage and splicing. Ever more transcripts that do not code for proteins have been identified by transcriptome studies, in gene ..."
Abstract
-
Cited by 43 (2 self)
- Add to MetaCart
RIKEN’s FANTOM project has revealed many previously unknown coding sequences, as well as an unexpected degree of variation in transcripts resulting from alternative promoter usage and splicing. Ever more transcripts that do not code for proteins have been identified by transcriptome studies, in general. Increasing evidence points to the important cellular roles of such non-coding RNAs (ncRNAs). The distinction of protein-coding RNA transcripts from ncRNA transcripts is therefore an important problem in understanding the transcriptome and carrying out its annotation. Very few in silico methods have specifically addressed this problem. Here, we introduce CONC (for ‘‘coding or non-coding’’), a novel method based on support vector machines that classifies transcripts according to features they would have if they were coding for proteins. These features include peptide length, amino acid composition, predicted secondary structure content, predicted percentage of exposed residues, compositional entropy, number of homologs from database searches, and alignment entropy. Nucleotide frequencies are also incorporated into the method. Confirmed coding cDNAs for eukaryotic proteins from the Swiss-Prot database constituted the set of true positives, ncRNAs from RNAdb and NONCODE the true negatives. Ten-fold cross-validation suggested that CONC distinguished coding RNAs from ncRNAs at about 97 % specificity and 98 % sensitivity. Applied to 102,801 mouse cDNAs from the FANTOM3 dataset, our method reliably identified over 14,000 ncRNAs and estimated the total number
DisProt: the database of disordered proteins
- Nucleic Acids Res
, 2007
"... The Database of Protein Disorder (DisProt) links structure and function information for intrinsically disordered proteins (IDPs). Intrinsically disordered proteins do not form a fixed three-dimensional structure under physiological conditions, either in their entireties or in segments or regions. We ..."
Abstract
-
Cited by 36 (7 self)
- Add to MetaCart
(Show Context)
The Database of Protein Disorder (DisProt) links structure and function information for intrinsically disordered proteins (IDPs). Intrinsically disordered proteins do not form a fixed three-dimensional structure under physiological conditions, either in their entireties or in segments or regions. We define IDP as a protein that contains at least one experi-mentally determined disordered region. Although lacking fixed structure, IDPs and regions carry out important biological functions, being typically involved in regulation, signaling and control. Such functions can involve high-specificity low-affinity interactions, the multiple binding of one protein to many partners and the multiple binding of many proteins to one partner. These three features are all enabled and enhanced by protein intrinsic disorder. One of the major hindrances in the study of IDPs has been the lack of organized information. DisProt was developed to enable IDP research by collecting and organizing knowledge regarding the experi-mental characterization and the functional associa-tions of IDPs. In addition to being a unique source of biological information, DisProt opens doors for a plethora of bioinformatics studies. DisProt is openly available at
NORSp: predictions of long regions without regular secondary structure
- Nucleic Acids Res
, 2003
"... Many structurally flexible regions play important roles in biological processes. It has been shown that extended loopy regions are very abundant in the protein universe and that they have been conserved through evolution. Here, we present NORSp, a publicly available predictor for disordered regions ..."
Abstract
-
Cited by 35 (10 self)
- Add to MetaCart
(Show Context)
Many structurally flexible regions play important roles in biological processes. It has been shown that extended loopy regions are very abundant in the protein universe and that they have been conserved through evolution. Here, we present NORSp, a publicly available predictor for disordered regions in protein. Specifically, NORSp predicts long regions with NO Regular Secondary structure. Upon user submission of a protein sequence, NORSp will analyse the protein for its secondary structure, presence of transmembrane helices and coiled-coil. It will then return email to the user about the presence and position of disordered regions. NORSp can be accessed from
Prediction of dna-binding residues from sequence
- Bioinformatics
"... Motivation: Thousands of proteins are known to bind to DNA; for most of them the mechanism of action and the residues that bind to DNA, i.e. the binding sites, are yet unknown. Experimental identifica-tion of binding sites requires expensive and laborious methods such as mutagenesis and binding essa ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
Motivation: Thousands of proteins are known to bind to DNA; for most of them the mechanism of action and the residues that bind to DNA, i.e. the binding sites, are yet unknown. Experimental identifica-tion of binding sites requires expensive and laborious methods such as mutagenesis and binding essays. Hence, such studies are not applicable on a large scale. If the 3D structure of a protein is known, it is often possible to predict DNA-binding sites in silico. However, for most proteins, such knowledge is not available. Results: It has been shown that DNA-binding residues have distinct biophysical characteristics. Here we demonstrate that these char-acteristics are so distinct that they enable accurate prediction of the residues that bind DNA directly from amino acid sequence, without requiring any additional experimental or structural information. In a cross-validation based on the largest non-redundant dataset of high-resolution protein–DNA complexes available today, we found that 89 % of our predictions are confirmed by experimental data. Thus, it is now possible to identify DNA-binding sites on a proteomic scale even in the absence of any experimental data or 3D-structural information.
Predicting transmembrane beta-barrels in proteomes
, 2004
"... Very few methods address the problem of predicting beta-barrel membrane proteins directly from sequence. One reason is that only very few high-resolution structures for transmembrane beta-barrel (TMB) proteins have been determined thus far. Here we introduced the design, statistics and results of a ..."
Abstract
-
Cited by 28 (1 self)
- Add to MetaCart
(Show Context)
Very few methods address the problem of predicting beta-barrel membrane proteins directly from sequence. One reason is that only very few high-resolution structures for transmembrane beta-barrel (TMB) proteins have been determined thus far. Here we introduced the design, statistics and results of a novel pro®le-based hidden Markov model for the prediction and discrimination of TMBs. The method carefully attempts to avoid over-®tting the sparse experimental data. While our model training and scoring procedures were very similar to a recently published work, the architecture and structure-based labelling were signi®cantly different. In par-ticular, we introduced a new de®nition of beta-hairpin motifs, explicit state modelling of transmem-brane strands, and a log-odds whole-protein dis-crimination score. The resulting method reached an overall four-state (up-, down-strand, periplasmic-, outer-loop) accuracy as high as 86%. Furthermore, accurately discriminated TMB from non-TMB proteins (45 % coverage at 100 % accuracy). This high precision enabled the application to 72 entirely sequenced Gram-negative bacteria. We found over 164 previously uncharacterized TMB proteins at high con®dence. Database searches did not implicate any of these proteins with membranes. We challenge that the vast majority of our 164 predic-tions will eventually be veri®ed experimentally. All proteome predictions and the PROFtmb prediction method are available at