• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Protein homology detection by HMM-HMM comparison (2005)

by J Soding
Venue:Bioinformatics
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 401
Next 10 →

Pfam protein families database

by Robert D. Finn, John Tate, Jaina Mistry, Penny C. Coggill, Stephen John Sammut, Hans-rudolf Hotz, Goran Ceric, Kristoffer Forslund, Sean R. Eddy, Erik L. L. Sonnhammer, Alex Bateman - Nucleic Acids Research, 2008, 36(Database issue): D281–D288
"... Pfam is a comprehensive collection of protein domains and families, represented as multiple sequence alignments and as profile hidden Markov models. The current release of Pfam (22.0) contains 9318 protein families. Pfam is now based not only on the UniProtKB sequence database, but also on NCBI GenP ..."
Abstract - Cited by 771 (13 self) - Add to MetaCart
Pfam is a comprehensive collection of protein domains and families, represented as multiple sequence alignments and as profile hidden Markov models. The current release of Pfam (22.0) contains 9318 protein families. Pfam is now based not only on the UniProtKB sequence database, but also on NCBI GenPept and on sequences from selected metage-nomics projects. Pfam is available on the web from the consortium members using a new, consistent and improved website design in the UK
(Show Context)

Citation Context

...amilies, we now use three different tools. In addition to the profile–profile comparison tool PRC (http://supfam.mrc-lmb.cam.ac.uk/ PRC/), we now use another profile–profile comparisons tool HHsearch =-=(7)-=-, and the simple comparison of outputs program, SCOOP (8). We use three different computational methods because each is sensitive to a slightly different set of relationships and, more importantly, th...

MODBASE, a database of annotated comparative protein structure models

by Ursula Pieper, Narayanan Eswar, Hannes Braberg, M. S. Madhusudhan, Fred P. Davis, Ashley C. Stuart, Nebojsa Mirkovic, Andrea Rossi, Marc A. Marti-renom, Andras Fiser, Ben Webb, Daniel Greenblatt, Conrad C. Huang, Thomas E. Ferrin, Andrej Sali - Nucleic Acids Res , 2000
"... and associated resources ..."
Abstract - Cited by 210 (38 self) - Add to MetaCart
and associated resources
(Show Context)

Citation Context

...deling, ModPipe (17). ModPipe relies mostly on modules of Modeller (18) as well as fold assignment and sequence-structure alignment by PSI-BLAST (19) and the HHSuite modules HHBlits (20) and HHSearch =-=(21)-=-. To be able to process a large number of sequences, it is implemented on a Linux cluster. ModPipe uses sequence–sequence (22), sequence–profile (19,23) and profile–profile (5,24) methods for fold ass...

The HHpred interactive server for protein homology detection and structure prediction

by Johannes Söding, Andreas Biegert, Andrei N. Lupas - Nucleic Acids Res , 2005
"... doi:10.1093/nar/gki408 ..."
Abstract - Cited by 162 (7 self) - Add to MetaCart
doi:10.1093/nar/gki408
(Show Context)

Citation Context

...pred2.1 with the servers that took part in the recent structure prediction benchmark CAFASP4 (27) can be viewed at http://protevo.eb.tuebingen.mpg.de/hhpred/hhpred_in_ CAFASP4.html. In a recent study =-=(28)-=-, in which we benchmarked HHsearch, the method for HMM–HMM comparison employed by our server, together with PSI-BLAST, HMMER, PROF_SIM and COMPASS (3,6,7,29), HHsearch was found to possesses the highe...

A Machine Learning Information Retrieval Approach to Protein Fold Recognition

by Jianlin Cheng, Pierre Baldi
"... Motivation: Recognizing proteins that have similar tertiary structure is the key step of template-based protein structure prediction methods. Traditionally, a variety of alignment methods are used to identify similar folds, based on sequence similarity and sequencestructure compatibility. Although t ..."
Abstract - Cited by 78 (12 self) - Add to MetaCart
Motivation: Recognizing proteins that have similar tertiary structure is the key step of template-based protein structure prediction methods. Traditionally, a variety of alignment methods are used to identify similar folds, based on sequence similarity and sequencestructure compatibility. Although these methods are complementary, their integration has not been thoroughly exploited. Statistical machine learning methods provide tools for integrating multiple features, but so far these methods have been used primarily for protein and fold classification, rather than addressing the retrieval problem of fold recognition–finding a proper template for a given query protein. Results: Here we present a two-stage machine learning, information retrieval, approach to fold recognition. First, we use alignment methods to derive pairwise similarity features for query-template protein pairs. We also use global profile-profile alignments in combination with predicted secondary structure, relative solvent accessibility, contact map, and beta-strand pairing to extract pairwise structural compatibility features. Second, we apply support vector machines to these features to predict the structural relevance (i.e. in the same fold or not) of the query-template pairs. For each query, the continuous relevance scores are used to rank the templates. The FOLDpro approach is modular, scalable, and effective. Compared to 11 other fold recognition methods, FOLDpro yields the best results in almost all standard categories on a comprehensive benchmark dataset. Using predictions of the top-ranked template, the sensitivity is about 85%, 56%, and 27 % at the family, superfamily, and fold levels respectively. Using the 5 top-ranked templates, the sensitivity increases to 90%, 70%, and 48%. Availability: The FOLDpro server is available with the SCRATCH

An HMM posterior decoder for sequence feature prediction that includes homology information

by Lukas Käll, Anders Krogh, Erik L. Sonnhammer , 2005
"... Motivation: When predicting sequence features like transmembrane topology, signal peptides, coil--coil structures, protein secondary structure or genes, extra support can be gained from homologs. ..."
Abstract - Cited by 45 (3 self) - Add to MetaCart
Motivation: When predicting sequence features like transmembrane topology, signal peptides, coil--coil structures, protein secondary structure or genes, extra support can be gained from homologs.

Bud23 methylates g1575 of 18s rrna and is required for efficient nuclear export of pre-40s subunits

by Joshua White, Zhihua Li, Richa Sardana, Janusz M. Bujnicki, Edward M. Marcotte, Arlen W. Johnson - Mol Cell Biol
"... BUD23 was identified from a bioinformatics analysis of Saccharomyces cerevisiae genes involved in ribosome biogenesis. Deletion of BUD23 leads to severely impaired growth, reduced levels of the small (40S) ribosomal subunit, and a block in processing 20S rRNA to 18S rRNA, a late step in 40S maturati ..."
Abstract - Cited by 24 (4 self) - Add to MetaCart
BUD23 was identified from a bioinformatics analysis of Saccharomyces cerevisiae genes involved in ribosome biogenesis. Deletion of BUD23 leads to severely impaired growth, reduced levels of the small (40S) ribosomal subunit, and a block in processing 20S rRNA to 18S rRNA, a late step in 40S maturation. Bud23 belongs to the S-adenosylmethionine-dependent Rossmann-fold methyltransferase superfamily and is related to small-molecule methyltransferases. Nevertheless, we considered that Bud23 methylates rRNA. Methylation of G1575 is the only mapped modification for which the methylase has not been assigned. Here, we show that this modification is lost in bud23 mutants. The nuclear accumulation of the small-subunit reporters Rps2-green fluorescent protein (GFP) and Rps3-GFP, as well as the rRNA processing intermediate, the 5 internal transcribed spacer 1, indicate that bud23 mutants are defective for small-subunit export. Mutations in Bud23 that inactivated its methyltransferase activity complemented a bud23 mutant. In addition, mutant ribosomes in which G1575 was changed to adenosine supported growth comparable to that of cells with wild-type ribosomes. Thus, Bud23 protein, but not its methyltransferase activity, is important for biogenesis and export of the 40S subunit in yeast. Ribosome biogenesis is a complex and highly ordered pro-cess in eukaryotic cells. Most of the events of ribosome assem-
(Show Context)

Citation Context

... method, with the JTT substitution matrix and ML model for rate inference. Comparisons between the Bud23 family and families in the Clusters of Orthologous Groups (COG) database were done with HHpred =-=(48)-=-. Based on the Bud23 family alignment, a phylogenetic tree was calculated with MEGA 3.1 (26), using the neighbor-joining method, with the JTT model of substitutions and pairwise deletions. The stabili...

COFACTOR: An accurate comparative algorithm for structure-based protein function annotation

by Ambrish Roy, Jianyi Yang, Yang Zhang - NUCLEIC ACIDS RES 2012, 40(WEB SERVER ISSUE):W471-W477 , 2012
"... We have developed a new COFACTOR webserver for automated structure-based protein function annotation. Starting from a structural model, given by either experimental determination or computational modeling, COFACTOR first identifies template proteins of similar folds and functional sites by threading ..."
Abstract - Cited by 24 (2 self) - Add to MetaCart
We have developed a new COFACTOR webserver for automated structure-based protein function annotation. Starting from a structural model, given by either experimental determination or computational modeling, COFACTOR first identifies template proteins of similar folds and functional sites by threading the target structure through three representative template libraries that have known protein–ligand binding interactions, Enzyme Commission number or Gene Ontology terms. The biological function insights in these three aspects are then deduced from the functional templates, the confidence of which is evaluated by a scoring function that combines both global and local structural similarities. The algorithm has been extensively benchmarked by large-scale bench-marking tests and demonstrated significant advantages compared to traditional sequence-based methods. In the recent community-wide CASP9 experiment, COFACTOR was ranked as the best method for protein–ligand binding site predictions. The COFACTOR sever and the template libraries are freely available at
(Show Context)

Citation Context

...proteins collected from PDB. As experimental controls, we select those commonly used approaches that are based on sequence–profile alignment (26), profile–profile alignment (27) and HMM–HMM alignment =-=(28)-=-. In all experiments, close homologs of query proteins were intentionally removed from the template libraries using a sequence identity cut-off 30%, before the predictions were made. Supplementary Fig...

The MPI Bioinformatics Toolkit for protein sequence analysis

by Andreas Biegert, Christian Mayer, Michael Remmert, Andrei N. Lupas - Nucleic Acids Res , 2006
"... The MPI Bioinformatics Toolkit is an interactive web service which offers access to a great variety of public and in-house bioinformatics tools. They are grouped into different sections that support sequence searches, multiple alignment, secondary and tertiary structure prediction and classification ..."
Abstract - Cited by 22 (4 self) - Add to MetaCart
The MPI Bioinformatics Toolkit is an interactive web service which offers access to a great variety of public and in-house bioinformatics tools. They are grouped into different sections that support sequence searches, multiple alignment, secondary and tertiary structure prediction and classification. Several public tools are offered in customized ver-sions that extend their functionality. For example, PSI-BLAST can be run against regularly updated standard databases, customized user databases or selectable sets of genomes. Another tool, Quick2D, integrates the results of various secondary structure, transmembrane and disorder prediction programs into one view. The Toolkit provides a friendly and intuitive user interface with an online help facility. As a key feature, various tools are interconnected so that the results of one tool can be forwarded to other tools. One could run PSI-BLAST, parse out a multiple alignment of selected hits and send the results to a cluster analysis tool. The Toolkit frame-work and the tools developed in-house will be pack-aged and freely available under the GNU Lesser General Public Licence (LGPL). The Toolkit can be accessed at
(Show Context)

Citation Context

...(10) Converts BLAST/PSI-BLAST output to a multiple alignment by realigning gapped regions with Clustal and removing local inconsistencies through comparison to a HMM HHalign* Sequence Analysis Söding =-=(30)-=- Comparison of two alignments using HMMs HHrep* Söding et al. (6) Sensitive de novo repeat identification in protein sequences by HMM-HMM comparison PCOILS* Lupas et al.(31) Coiled-coil prediction REP...

Exploration of uncharted regions of the protein universe, in "PLoS Biol

by Lukasz Jaroszewski, Zhanwen Li, S. Sri Krishna, Constantina Bakolitsa, John Wooley, Ashley M, Ian A. Wilson, Adam Godzik , 2009
"... The genome projects have unearthed an enormous diversity of genes of unknown function that are still awaiting biological and biochemical characterization. These genes, as most others, can be grouped into families based on sequence similarity. The PFAM database currently contains over 2,200 such fami ..."
Abstract - Cited by 22 (1 self) - Add to MetaCart
The genome projects have unearthed an enormous diversity of genes of unknown function that are still awaiting biological and biochemical characterization. These genes, as most others, can be grouped into families based on sequence similarity. The PFAM database currently contains over 2,200 such families, referred to as domains of unknown function (DUF). In a coordinated effort, the four large-scale centers of the NIH Protein Structure Initiative have determined the first three-dimensional structures for more than 250 of these DUF families. Analysis of the first 248 reveals that about two thirds of the DUF families likely represent very divergent branches of already known and well-characterized families, which allows hypotheses to be formulated about their biological function. The remainder can be formally categorized as new folds, although about one third of these show significant substructure similarity to previously characterized folds. These results infer that, despite the enormous increase in the number and the diversity of new genes being uncovered, the fold space of the proteins they encode is gradually becoming saturated. The previously unexplored sectors of the protein universe appear to be primarily shaped by extreme diversification of known protein families, which then enables organisms to evolve new functions and adapt to particular niches and habitats. Notwithstanding, these DUF families still constitute the richest source
(Show Context)

Citation Context

...sured by mutation matrices [22] to sequence profiles [23], position-specific mutation matrices [24] or Hidden Markov Models (HMM) [25,26], to comparisons between such profiles [27] or between the HMM =-=[28]-=-, it has become eminently clear that statistically significant sequence similarity between proteins may extend far beyond the intuitive definition based on sequence identity. Such a realization correl...

DOMAC: an accurate, hybrid protein domain prediction server

by Jianlin Cheng , 2007
"... Protein domain prediction is important for protein structure prediction, structure determination, function annotation, mutagenesis analysis and protein engineering. Here we describe an accurate protein domain prediction server (DOMAC) combining both template-based and ab initio methods. The prelimin ..."
Abstract - Cited by 19 (3 self) - Add to MetaCart
Protein domain prediction is important for protein structure prediction, structure determination, function annotation, mutagenesis analysis and protein engineering. Here we describe an accurate protein domain prediction server (DOMAC) combining both template-based and ab initio methods. The preliminary version of the server was ranked among the top domain prediction servers in the seventh edition of Critical
(Show Context)

Citation Context

...rget Num Domain Num Acc. (%) CASP7 Score FOLDpro (DOMAC) 95 93.7 0.963 Baker-RosettaDom (23) 94 86.2 0.940 Ma-OPUS-DOM 94 87.2 0.933 ROBETTA-GINZU (23) 94 84.0 0.932 DomSSEA (7) 94 78.7 0.910 HHpred3 =-=(38)-=- 95 75.8 0.910 Meta-DP (24) 95 74.7 0.907 HHpred1 (38) 93 75.3 0.902 DomFOLD 95 75.8 0.898 DPS(13) 93 75.3 0.889 Chop (22) 83 56.6 0.827 Distill (39) 95 70.5 0.819 NN_PUT-Lab 92 58.7 0.795 The second ...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University