Identification of Genes in Human Genomic DNA (1997)
| Citations: | 23 - 1 self |
BibTeX
@TECHREPORT{Burge97identificationof,
author = {Christopher Burge},
title = {Identification of Genes in Human Genomic DNA},
institution = {},
year = {1997}
}
Years of Citing Articles
OpenURL
Abstract
A general probabilistic model of the gene structural and compositional properties of human genomic DNA is introduced and applied to the problem of identifying genes in unannotated human genomic sequences. The model uses a \Hidden semi-Markov" or semi-Markov source architecture which incorporates probabilistic descriptions of fundamental transcriptional, translational and splicing signals, as well as length distri-butions and compositional features of exons, introns and intergenic regions. Distinct sets of model parameters are derived which account for many of the substantial di er-ences in gene density and structure observed in distinct C+G compositional regions (\isochores") of the human genome. A novel model building procedure, termed Max-imal Dependence Decomposition, is introduced which captures potentially important dependencies between non-adjacent aswell as adjacent positions in a biological signal. Application of this model to the donor splice signal not only gives better discrimina-tion of potential donor sites than previous probabilistic models, but also reveals subtle properties of this signal which suggest aspects of its biochemical function. Acceptor







