Results 1 - 10
of
68
PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria
, 2003
"... ..."
UniqueProt: creating representative protein sequence sets
- Nucleic Acids Res
, 2003
"... UniqueProt is a practical and easy to use web service designed to create representative, unbiased data sets of protein sequences. The largest possible representative sets are found through a simple greedy algorithm using the HSSP-value to establish sequence similarity. UniqueProt is not a real clust ..."
Abstract
-
Cited by 58 (20 self)
- Add to MetaCart
UniqueProt is a practical and easy to use web service designed to create representative, unbiased data sets of protein sequences. The largest possible representative sets are found through a simple greedy algorithm using the HSSP-value to establish sequence similarity. UniqueProt is not a real clustering program in the sense that the ‘representatives’ are not at the centres of well-defined clusters since the definition of such clusters is problem-specific. Overall, UniqueProt is a reasonable fast solution for bias in data sets. The service is accessible at
Protein–Protein Interactions More Conserved within Species than across Species
"... Experimental high-throughput studies of protein–protein interactions are beginning to provide enough data for comprehensive computational studies. Today, about ten large data sets, each with thousands of interacting pairs, coarsely sample the interactions in fly, human, worm, and yeast. Another abou ..."
Abstract
-
Cited by 51 (2 self)
- Add to MetaCart
(Show Context)
Experimental high-throughput studies of protein–protein interactions are beginning to provide enough data for comprehensive computational studies. Today, about ten large data sets, each with thousands of interacting pairs, coarsely sample the interactions in fly, human, worm, and yeast. Another about 55,000 pairs of interacting proteins have been identified by more careful, detailed biochemical experiments. Most interactions are experimentally observed in prokaryotes and simple eukaryotes; very few interactions are observed in higher eukaryotes such as mammals. It is commonly assumed that pathways in mammals can be inferred through homology to model organisms, e.g. the experimental observation that two yeast proteins interact is transferred to infer that the two corresponding proteins in human also interact. Two pairs for which the interaction is conserved are often described as interologs. The goal of this investigation was a large-scale comprehensive analysis of such inferences, i.e. of the evolutionary conservation of interologs. Here, we introduced a novel score for measuring the overlap between protein–protein interaction data sets. This measure appeared to reflect the overall quality of the data and was the basis for our two surprising results from our large-scale analysis. Firstly, homology-based inferences of physical protein–protein interactions appeared far less successful than expected. In fact, such inferences were accurate only for extremely high levels of sequence similarity. Secondly, and most surprisingly, the identification of interacting partners through sequence similarity was significantly more reliable for protein pairs within the same organism than for pairs between
PSORTdb: a protein subcellular localization database for bacteria
- Nucleic Acids Res
, 2005
"... Information about bacterial subcellular localization (SCL) is important for protein function prediction and identificationof suitable drug/vaccine/diagnostic targets. PSORTdb ..."
Abstract
-
Cited by 31 (8 self)
- Add to MetaCart
(Show Context)
Information about bacterial subcellular localization (SCL) is important for protein function prediction and identificationof suitable drug/vaccine/diagnostic targets. PSORTdb
Better Prediction of Sub-Cellular Localization by Combining Evolutionary and Structural Information
- PROTEINS: STRUCTURE, FUNCTION, AND GENETICS 53:917Â930 (2003)
, 2003
"... The native sub-cellular compartment of a protein is one aspect of its function. Thus, predicting localization is an important step toward predicting function. Short zip code-like sequence fragments regulate some of the shuttling between compartments. Cataloguing and predicting such motifs is the mo ..."
Abstract
-
Cited by 26 (2 self)
- Add to MetaCart
The native sub-cellular compartment of a protein is one aspect of its function. Thus, predicting localization is an important step toward predicting function. Short zip code-like sequence fragments regulate some of the shuttling between compartments. Cataloguing and predicting such motifs is the most accurate means of determining localization in silico. However, only few motifs are currently known, and not all the trafficking appears regulated in this way. The amino acid composition of a protein correlates with its localization. All general prediction methods employed this observation. Here, we explored the evolutionary information contained in multiple alignments and aspects of protein structure to predict localization in absence of homology and targeting motifs. Our final system combined statistical rules and a variety of neural networks to achieve an overall four-state accuracy above 65%, a significant improvement over systems using only composition. The system was at its best for extra-cellular and nuclear proteins; it was significantly less accurate than TargetP for mitochondrial proteins. Interestingly, all methods that were developed on SWISS-PROT sequences failed grossly when fed with sequences from proteins of known structures taken from PDB. We therefore developed two separate systems: one for proteins of known structure and one for proteins of unknown structure. Finally, we applied the PDBbased system along with homology-based inferences and automatic text analysis to annotate all eukaryotic proteins in the PDB
An automated combination of kernels for predicting protein subcellular localization.
, 2007
"... Protein subcellular localization is a crucial ingredient to many important inferences about cellular processes, including prediction of protein function and protein interactions. While many predictive computational tools have been proposed, they tend to have complicated architectures and require man ..."
Abstract
-
Cited by 15 (6 self)
- Add to MetaCart
(Show Context)
Protein subcellular localization is a crucial ingredient to many important inferences about cellular processes, including prediction of protein function and protein interactions. While many predictive computational tools have been proposed, they tend to have complicated architectures and require many design decisions from the developer. Here we utilize the multiclass support vector machine (m-SVM) method to directly solve protein subcellular localization without resorting to the common approach of splitting the problem into several binary classification problems. We further propose a general class of protein sequence kernels which considers all motifs, including motifs with gaps. Instead of heuristically selecting one or a few kernels from this family, we utilize a recent extension of SVMs that optimizes over multiple kernels simultaneously. This way, we automatically search over families of possible amino acid motifs. We compare our automated approach to three other predictors on four different datasets, and show that we perform better than the current state of the art. Further, our method provides some insights as to which sequence motifs are most useful for determining subcellular localization, which are in agreement with biological reasoning. Data files, kernel matrices and open source software are available at
B: LOC3D: annotate sub-cellular localization for protein structures
- Nucleic Acids Res
"... LOC3D ..."
(Show Context)
Eukaryotic protein subcellular localization based on local pairwise profile alignment
- SVM,” in 2006 IEEE International Workshop on Machine Learning for Signal Processing (MLSP’06), 2006
, 2006
"... Abstract — The subcellular locations of proteins are important functional annotations. An effective and reliable subcellular localization method is necessary for proteomics research. This paper introduces a new method—PairProSVM—to automatically predict the subcellular locations of proteins. The pro ..."
Abstract
-
Cited by 12 (8 self)
- Add to MetaCart
(Show Context)
Abstract — The subcellular locations of proteins are important functional annotations. An effective and reliable subcellular localization method is necessary for proteomics research. This paper introduces a new method—PairProSVM—to automatically predict the subcellular locations of proteins. The profiles of all protein sequences in the training set are constructed by PSI-BLAST and the pairwise profile-alignment scores are used to form feature vectors for training a support vector machine (SVM) classifier. It was found that PairProSVM outperforms the methods that are based on sequence alignment and amino-acid compositions even if most of the homologous sequences have been removed. PairProSVM was evaluated on Huang and Li’s and Gardy et al.’s protein datasets. The overall accuracies on these datasets reach 75.3 % and 91.9%, respectively, which are higher than or comparable to those obtained by sequence alignment and composition-based methods. Index Terms — Protein subcellular localization; sequence alignment; profile alignment; kernel methods; support vector machines. I.
eSLDB: eukaryotic subcellular localization database. Nucleic Acids Res
, 2007
"... Eukaryotic Subcellular Localization DataBase col-lects the annotations of subcellular localization of eukaryotic proteomes. So far five proteomes have been processed and stored: Homo sapiens, Mus musculus, Caenorhabditis elegans, Saccharomyces cerevisiae and Arabidopsis thaliana. For each sequence, ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Eukaryotic Subcellular Localization DataBase col-lects the annotations of subcellular localization of eukaryotic proteomes. So far five proteomes have been processed and stored: Homo sapiens, Mus musculus, Caenorhabditis elegans, Saccharomyces cerevisiae and Arabidopsis thaliana. For each sequence, the database lists localization obtained adopting three different approaches: (i) experiment-ally determined (when available); (ii) homology-based (when possible); and (iii) predicted. The latter is computed with a suite of machine learning based methods, developed in house. All the data are available at our website and can be searched by sequence, by protein code and/or by protein des-cription. Furthermore, a more complex search can be performed combining different search fields and keys. All the data contained in the database can be freely downloaded in flat file format. The database is available at