Results 1 - 10
of
262
Prediction of complete gene structures in human genomic DNA
- J. Mol. Biol
, 1997
"... The problem of identifying genes in genomic DNA sequences by computational methods has attracted considerable research attention in recent years. From one point of view, the problem is closely ..."
Abstract
-
Cited by 1177 (9 self)
- Add to MetaCart
(Show Context)
The problem of identifying genes in genomic DNA sequences by computational methods has attracted considerable research attention in recent years. From one point of view, the problem is closely
Using the Fisher kernel method to detect remote protein homologies
- In Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
, 1999
"... A new method, called the Fisher kernel method, for detecting remote protein homologies is introduced and shown to perform well in classifying protein domains by SCOP superfamily. The method is a variant of support vector machines using a new kernel function. The kernel function is derived from a hid ..."
Abstract
-
Cited by 208 (4 self)
- Add to MetaCart
(Show Context)
A new method, called the Fisher kernel method, for detecting remote protein homologies is introduced and shown to perform well in classifying protein domains by SCOP superfamily. The method is a variant of support vector machines using a new kernel function. The kernel function is derived from a hidden Markov model. The general approach of combining generative models like HMMs with discriminative methods such as support vector machines may have applications in other areas of biosequence analysis as well.
Entropia: Architecture and Performance of an Enterprise Desktop Grid System
, 2003
"... The exploitation of idle cycles on pervasive desktop PC systems offers the opportunity to increase the available computing power by orders of magnitude (10x - 1000x). However, for desktop PC distributed computing to be widely accepted within the enterprise, the systems must achieve high levels of ef ..."
Abstract
-
Cited by 164 (10 self)
- Add to MetaCart
The exploitation of idle cycles on pervasive desktop PC systems offers the opportunity to increase the available computing power by orders of magnitude (10x - 1000x). However, for desktop PC distributed computing to be widely accepted within the enterprise, the systems must achieve high levels of efficiency, robustness, security, scalability, manageability, unobtrusiveness, and openness/ease of application integration. We describe the Entropia distributed computing system as a case study, detailing its internal architecture and philosophy in attacking these key problems. Key aspects of the Entropia system include the use of: 1) binary sandboxing technology for security and unobtrusiveness, 2) a layered architecture for efficiency, robustness, scalability and manageability, and 3) an open integration model to allow applications from many sources to be incorporated. Typical applications for the Entropia System includes molecular docking, sequence analysis, chemical structure modeling, and risk management. The applications come from a diverse set of domains including virtual screening for drug discovery, genomics for drug targeting, material property prediction, and portfolio management. In all cases, these applications scale to many thousands of nodes and have no dependences between tasks. We present representative performance results from several applications that illustrate the high performance, linear scaling, and overall capability presented by the Entropia system.
Engineering Support Vector Machine Kernels That Recognize Translation Initiation Sites
, 2000
"... Motivation: In order to extract protein sequences from nucleotide sequences, it is an important step to recognize points at which regions start that code for proteins. These points are called translation initiation sites (TIS). Results: The task of finding TIS can be modeled as a classification pro ..."
Abstract
-
Cited by 133 (14 self)
- Add to MetaCart
Motivation: In order to extract protein sequences from nucleotide sequences, it is an important step to recognize points at which regions start that code for proteins. These points are called translation initiation sites (TIS). Results: The task of finding TIS can be modeled as a classification problem. We demonstrate the applicability of support vector machines (SVMs) for this task, and show how to incorporate prior biological knowledge by engineering an appropriate kernel function. With the described techniques the recognition performance can be improved by 26% over leading existing approaches. We provide evidence that existing related methods (e.g. ESTScan) could profit from advanced TIS recognition.
A Framework for Information Visualization Spreadsheets
, 1999
"... Information has become interactive. Information visualization is the design and creation of interactive graphic depictions of information by combining principles in the disciplines of graphic design, cognitive science, and interactive computer graphics. As the volume and complexity of the data incre ..."
Abstract
-
Cited by 76 (3 self)
- Add to MetaCart
Information has become interactive. Information visualization is the design and creation of interactive graphic depictions of information by combining principles in the disciplines of graphic design, cognitive science, and interactive computer graphics. As the volume and complexity of the data increases, users require more powerful visualization tools that allow them to more effectively explore large abstract datasets. This
Computational Methods for the Identification of Genes in Vertebrate Genomic Sequences
- Hum. Mol. Genet
, 1997
"... Research into new methods to identify genes in anonymous genomic sequences has been going on for more than 15 years. Over this period of time, the field has evolved from the designing of programs to identify protein coding regions in compact mitochondrial or bacterial genomes, to the challenge of pr ..."
Abstract
-
Cited by 67 (3 self)
- Add to MetaCart
(Show Context)
Research into new methods to identify genes in anonymous genomic sequences has been going on for more than 15 years. Over this period of time, the field has evolved from the designing of programs to identify protein coding regions in compact mitochondrial or bacterial genomes, to the challenge of predicting the detailed organization of multi-exon vertebrate genes. The best program currently available perfectly locates more than 80 % of the internal coding exons, and only 5 % of the predictions do not overlap a real exon. Given such accuracy, computational methods are indeed very useful; however, they do not alleviate the need for experimental validation. If the performances are satisfactory for the identification of the coding moiety of genes (internal coding exons), the determination of the full extent of the transcript (5 ′ and 3 ′ extremities of the gene) and the location of promoter regions are still unreliable. As the human and mouse genome sequencing projects enter a production mode, the fully automated annotation of megabase-long anonymous genomic sequences is the next big challenge in bioinformatics.
Iha: a novel Escherichia coli O157:H7 adherence-conferring molecule encoded on a recently acquired chromosomal island of conserved structure. Infect. Immun
, 2000
"... Updated information and services can be found at: ..."
(Show Context)
OrfPredictor: predicting protein-coding regions in EST-derived sequences
- Nucleic Acids Res
, 2005
"... OrfPredictor is a web server designed for identifying protein-coding regions in expressed sequence tag (EST)-derived sequences. For query sequences with a hit in BLASTX, the program predicts the coding regions based on the translation reading frames identified in BLASTX alignments, otherwise, it pre ..."
Abstract
-
Cited by 59 (2 self)
- Add to MetaCart
(Show Context)
OrfPredictor is a web server designed for identifying protein-coding regions in expressed sequence tag (EST)-derived sequences. For query sequences with a hit in BLASTX, the program predicts the coding regions based on the translation reading frames identified in BLASTX alignments, otherwise, it predicts the most probable coding region based on the intrinsic signals of the query sequences. The output is the predicted peptide sequences in the FASTA format, and a definition line that includes the query ID, the translation reading frame and the nucleotide positions where the coding region begins and ends. OrfPredictor facilitates the annotation of EST-derived sequences, particularly, for large-scale EST projects. OrfPredictor is available at