Results 1 -
3 of
3
Neural Network Prediction of Translation Initiation Sites in Eukaryotes: Perspectives for EST and Genome analysis
- In ISMB’97
, 1997
"... Translation in eukaryotes does not always start at the first AUG in an mRNA, implying that context information also plays a role. This makes prediction of translation initiation sites a non-trivial task, especially when analysing EST and genome data where the entire mature mRNA sequence is not known ..."
Abstract
-
Cited by 44 (1 self)
- Add to MetaCart
Translation in eukaryotes does not always start at the first AUG in an mRNA, implying that context information also plays a role. This makes prediction of translation initiation sites a non-trivial task, especially when analysing EST and genome data where the entire mature mRNA sequence is not known. In this paper, we employ artificial neural networks to predict which AUG triplet in an mRNA sequence is the start codon. The trained networks correctly classified 88 % of Arabidopsis and 85 % of vertebrate AUG triplets. We find that our trained neural networks use a combination of local start eodon context and global sequence information. Furthermore, analysis of false predictions shows that AUGs in frame with the actual start codon are more frequently selected than out-of-frame AUGs, suggesting that our networks use reading frame detection. A number of conflicts between neural network predictions and database annotations are analysed in detail, leading to identification of possible database errors.
Efficient Selection of Unique and Popular Oligos For Large EST databases
, 2003
"... EST databases have grown exponentially in recent years and now represent the largest collection of genetic sequences. An important application of these databases is that they contain information useful for the design of gene-specific oligonucleotides (or simply, oligos) that can be used in PCR prime ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
EST databases have grown exponentially in recent years and now represent the largest collection of genetic sequences. An important application of these databases is that they contain information useful for the design of gene-specific oligonucleotides (or simply, oligos) that can be used in PCR primer design, microarray experiments, and genomic library screening. In this paper, we study two complementary problems concerning the selection of short oligos, e.g., 20--50 bases, from a large database of tens of thousands of EST sequences: (i) selection of oligos each of which appears (exactly) in one EST sequence but does not appear (exactly or approximately) in any other EST sequence and (ii) selection of oligos that appear (exactly or approximately) in many ESTs. The first problem is called the unique oligo problem and has applications in PCR primer and microarray probe designs. The second is called the popular oligo problem and is useful in screening genomic libraries (such as BAC libraries) for gene-rich regions. We present an e#cient algorithm to identify all unique oligos in the ESTs and an e#cient heuristic algorithm to enumerate the most popular oligos. By taking into account the distribution of the frequencies of the words in the EST database, the algorithms have been carefully engineered to achieve remarkable running times on regular PCs. Each of the algorithms takes only a couple of hours (on a 1.2 GHz CPU, 1 GB RAM machine) to run on a dataset 28 Mbases of barley ESTs from the HarvEST database. We present simulation results on synthetic data and a preliminary analysis of the barley EST database.
in Int. Conf. Intell. Syst. for Mol. Biol. (ISMB) '98, 203-211, 1998
- Proc Int Conf Intell Syst Mol Biol
, 1998
"... The availability of large EST (Expressed Sequence Tag) databases has led to a revolution in the way new genes are cloned. Difficulties arise, however, due to high error rates and redundancy of raw EST data. ..."
Abstract
- Add to MetaCart
The availability of large EST (Expressed Sequence Tag) databases has led to a revolution in the way new genes are cloned. Difficulties arise, however, due to high error rates and redundancy of raw EST data.

