• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Identification of protein coding regions by database similarity search. (1993)

by W Gish, D J States
Venue:Nat. Genet.
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 262
Next 10 →

Prediction of complete gene structures in human genomic DNA

by Chris Burge, Samuel Karlin - J. Mol. Biol , 1997
"... The problem of identifying genes in genomic DNA sequences by computational methods has attracted considerable research attention in recent years. From one point of view, the problem is closely ..."
Abstract - Cited by 1177 (9 self) - Add to MetaCart
The problem of identifying genes in genomic DNA sequences by computational methods has attracted considerable research attention in recent years. From one point of view, the problem is closely
(Show Context)

Citation Context

...e databases, but instead provide information which is independent and complementary to that provided by homology-based gene identification methods such as searching the protein databases with BLASTX (=-=Gish & States, 1993-=-). Additionally, the model takes into account many of the often quite substantial differences in gene density and structure (e.g. intron length) that exist between different C + G'% compositional regi...

A discriminative framework for detecting remote protein homologies

by Tommi Jaakkola, et al. , 1999
"... ..."
Abstract - Cited by 248 (3 self) - Add to MetaCart
Abstract not found

Using the Fisher kernel method to detect remote protein homologies

by Tommi Jaakkola, Mark Diekhans, David Haussler - In Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology , 1999
"... A new method, called the Fisher kernel method, for detecting remote protein homologies is introduced and shown to perform well in classifying protein domains by SCOP superfamily. The method is a variant of support vector machines using a new kernel function. The kernel function is derived from a hid ..."
Abstract - Cited by 208 (4 self) - Add to MetaCart
A new method, called the Fisher kernel method, for detecting remote protein homologies is introduced and shown to perform well in classifying protein domains by SCOP superfamily. The method is a variant of support vector machines using a new kernel function. The kernel function is derived from a hidden Markov model. The general approach of combining generative models like HMMs with discriminative methods such as support vector machines may have applications in other areas of biosequence analysis as well.
(Show Context)

Citation Context

...set of experiments to determine the ability of SVM-Fisher kernel discriminative models to recognize remote protein homologs. The SVM-Fisher kernel methods were compared to BLAST (Altshul et al. 1990; =-=Gish & States 1993-=-) and the generative HMMs built using the SAM-T98 methodology (Park et al. 1998; Karplus, Barrett, & Hughey 1998; Hughey & Krogh 1995; 1996). The experiments measured the recognition rate for members ...

Entropia: Architecture and Performance of an Enterprise Desktop Grid System

by Andrew A. Chien, Brad Calder, Stephen Elbert, Karan Bhatia , 2003
"... The exploitation of idle cycles on pervasive desktop PC systems offers the opportunity to increase the available computing power by orders of magnitude (10x - 1000x). However, for desktop PC distributed computing to be widely accepted within the enterprise, the systems must achieve high levels of ef ..."
Abstract - Cited by 164 (10 self) - Add to MetaCart
The exploitation of idle cycles on pervasive desktop PC systems offers the opportunity to increase the available computing power by orders of magnitude (10x - 1000x). However, for desktop PC distributed computing to be widely accepted within the enterprise, the systems must achieve high levels of efficiency, robustness, security, scalability, manageability, unobtrusiveness, and openness/ease of application integration. We describe the Entropia distributed computing system as a case study, detailing its internal architecture and philosophy in attacking these key problems. Key aspects of the Entropia system include the use of: 1) binary sandboxing technology for security and unobtrusiveness, 2) a layered architecture for efficiency, robustness, scalability and manageability, and 3) an open integration model to allow applications from many sources to be incorporated. Typical applications for the Entropia System includes molecular docking, sequence analysis, chemical structure modeling, and risk management. The applications come from a diverse set of domains including virtual screening for drug discovery, genomics for drug targeting, material property prediction, and portfolio management. In all cases, these applications scale to many thousands of nodes and have no dependences between tasks. We present representative performance results from several applications that illustrate the high performance, linear scaling, and overall capability presented by the Entropia system.

Engineering Support Vector Machine Kernels That Recognize Translation Initiation Sites

by A. Zien, G. Rätsch, S. Mika, B. Schölkopf, T. Lengauer, K.-R. Müller , 2000
"... Motivation: In order to extract protein sequences from nucleotide sequences, it is an important step to recognize points at which regions start that code for proteins. These points are called translation initiation sites (TIS). Results: The task of finding TIS can be modeled as a classification pro ..."
Abstract - Cited by 133 (14 self) - Add to MetaCart
Motivation: In order to extract protein sequences from nucleotide sequences, it is an important step to recognize points at which regions start that code for proteins. These points are called translation initiation sites (TIS). Results: The task of finding TIS can be modeled as a classification problem. We demonstrate the applicability of support vector machines (SVMs) for this task, and show how to incorporate prior biological knowledge by engineering an appropriate kernel function. With the described techniques the recognition performance can be improved by 26% over leading existing approaches. We provide evidence that existing related methods (e.g. ESTScan) could profit from advanced TIS recognition.

Eukaryotic promoter recognition.

by J W Fickett, A G Hatzigeorgiou - Genome Res., , 1997
"... ..."
Abstract - Cited by 93 (0 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...titioning the exons. The problem is compounded by our incomplete understanding of alternative splicing control elements. Another line of development in gene identification is based on homology (e.g., =-=Gish and States 1993-=-; Gelfand et al. 1996). If there is a close homolog in the databases to one of the genes in the sequence under analysis, sequence similarity will usually group the exons for this gene correctly. Still...

A Framework for Information Visualization Spreadsheets

by Ed Huai-hsin Chi , 1999
"... Information has become interactive. Information visualization is the design and creation of interactive graphic depictions of information by combining principles in the disciplines of graphic design, cognitive science, and interactive computer graphics. As the volume and complexity of the data incre ..."
Abstract - Cited by 76 (3 self) - Add to MetaCart
Information has become interactive. Information visualization is the design and creation of interactive graphic depictions of information by combining principles in the disciplines of graphic design, cognitive science, and interactive computer graphics. As the volume and complexity of the data increases, users require more powerful visualization tools that allow them to more effectively explore large abstract datasets. This

Computational Methods for the Identification of Genes in Vertebrate Genomic Sequences

by Jean-michel Claverie - Hum. Mol. Genet , 1997
"... Research into new methods to identify genes in anonymous genomic sequences has been going on for more than 15 years. Over this period of time, the field has evolved from the designing of programs to identify protein coding regions in compact mitochondrial or bacterial genomes, to the challenge of pr ..."
Abstract - Cited by 67 (3 self) - Add to MetaCart
Research into new methods to identify genes in anonymous genomic sequences has been going on for more than 15 years. Over this period of time, the field has evolved from the designing of programs to identify protein coding regions in compact mitochondrial or bacterial genomes, to the challenge of predicting the detailed organization of multi-exon vertebrate genes. The best program currently available perfectly locates more than 80 % of the internal coding exons, and only 5 % of the predictions do not overlap a real exon. Given such accuracy, computational methods are indeed very useful; however, they do not alleviate the need for experimental validation. If the performances are satisfactory for the identification of the coding moiety of genes (internal coding exons), the determination of the full extent of the transcript (5 ′ and 3 ′ extremities of the gene) and the location of promoter regions are still unreliable. As the human and mouse genome sequencing projects enter a production mode, the fully automated annotation of megabase-long anonymous genomic sequences is the next big challenge in bioinformatics.
(Show Context)

Citation Context

...ponentially in the GenBank/EMBL/DDBJ databases (27,28). It became increasingly likely that protein coding exons could be simply recognized by a similarity search against the whole translated database =-=(29,30)-=-. About 50% (31,32) of all vertebrate genes have retained enough similarity with their pre-metazoan ancestors to exhibit a significant BLASTX (30) match in a database containing the whole yeast genome...

Iha: a novel Escherichia coli O157:H7 adherence-conferring molecule encoded on a recently acquired chromosomal island of conserved structure. Infect. Immun

by Phillip I. Tarr, Sima S. Bilge, James C. Vary, Rebecca L. Habeeb, Teresa R. Ward, Michael R, Thomas E. Besser , 2000
"... Updated information and services can be found at: ..."
Abstract - Cited by 60 (6 self) - Add to MetaCart
Updated information and services can be found at:
(Show Context)

Citation Context

...ng kit and a model 373A DNA sequencer (Applied Biosystems, Foster City, Calif.). The sequences were compared to the National Center for Biotechnology Information Geninfo BLAST network server database =-=(13)-=-. A 2,088-bp adherence-conferring open reading frame (ORF) was amplified using PCR and the primers 5�GGGGATCCAATT CTGGCATGCCGAGGCAGTGC3� and 5�GGTCTAGATTCTCGTTGCCACTG TTCCGCCAGG3� (the boldface nucleo...

OrfPredictor: predicting protein-coding regions in EST-derived sequences

by Xiang Jia Min, Gregory Butler, Reginald Storms, Adrian Tsang - Nucleic Acids Res , 2005
"... OrfPredictor is a web server designed for identifying protein-coding regions in expressed sequence tag (EST)-derived sequences. For query sequences with a hit in BLASTX, the program predicts the coding regions based on the translation reading frames identified in BLASTX alignments, otherwise, it pre ..."
Abstract - Cited by 59 (2 self) - Add to MetaCart
OrfPredictor is a web server designed for identifying protein-coding regions in expressed sequence tag (EST)-derived sequences. For query sequences with a hit in BLASTX, the program predicts the coding regions based on the translation reading frames identified in BLASTX alignments, otherwise, it predicts the most probable coding region based on the intrinsic signals of the query sequences. The output is the predicted peptide sequences in the FASTA format, and a definition line that includes the query ID, the translation reading frame and the nucleotide positions where the coding region begins and ends. OrfPredictor facilitates the annotation of EST-derived sequences, particularly, for large-scale EST projects. OrfPredictor is available at
(Show Context)

Citation Context

...gions within EST-derived sequences. The algorithm uses the translation reading frames predicted by using BLASTX (2) as a guide for the identification of the coding region in sequences that have a hit =-=(3)-=- and predicts a coding region ab initio for sequences without a hit. OVERVIEW OF THE ALGORITHM AND IMPLEMENTATION All eukaryotic mRNAs contain a contiguous sequence of nucleotides coding for protein s...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University