MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Analysis Using Information Theoretic and Machine Learning Approaches

Download:
Download as a PDF
by Christina L. Zheng, Virginia R. Sa, Michael Gribskov T. Murlidharan Nair
http://hc.ims.u-tokyo.ac.jp/JSBi/journal/GIW03/GIW03F008.pdf
Add To MetaCart

Abstract:

The computational recognition of precise splice junctions is a challenge faced in the analysis of newly sequenced genomes. This is challenging due to the fact that the distribution of sequence patterns in these regions is not always distinct. Our objective is to understand the sequence signatures at the splice junctions, not simply to create an artificial recognition system. We use a combination of a neural network based calliper randomization approach and an information theoretic based feature selection approach for this purpose. This has been done in an effort to understand regions that harbor information content and to extract features relevant for the prediction of splice junctions. The analysis using the neural network based calliper randomization approach revealed regions important in the internal representation of the network model. The calliper approach captured both correlated as well as independently important features. The feature selection approach captures features that are independently informative. The two different methods can capture features with different properties. Comparative analysis of the results using both the methods help to infer about the kind of information present in the region.

Citations

4364 Elements of Information Theory – Cover, Thomas - 1991
3051 Neural Networks for Pattern Recognition – Bishop - 1995
431 Learning representation by back propagating errors – Rumelhart, Hinton, et al. - 1986
255 Toward optimal feature selection – Koller, Sahami - 1996
92 Comparison of the predicted and observed secondary structure of T4 phage lysozyme – Matthews - 1975
53 Prediction of human mRNA donor and acceptor sites from the DNA sequence – Brunak, Engelbrecht, et al. - 1991
39 Long range correlations in nucleotide sequences – Peng, Buldyrev, et al. - 1992
35 Neural network prediction of translation initiation sites in eukaryotes: perspectives for EST and genome analysis – Pedersen, Nielsen - 1997
30 Clustering of high-dimensional microarray data via iterative feature filtering using normalized cuts – Xing, Karp, et al.
20 A computational analysis of sequence features involved in recognition of short introns – Lim, Burge - 2001
17 Splice site prediction in Arabidopsis thaliana pre-mRNA by combining local and global sequence information – Hebsgaard, Korning, et al. - 1996
15 Computational prediction of eukaryotic protein-coding genes – Zhang - 2002
13 Statistical features of human exons and their flanking regions – Zhang - 1998
5 A clean data set of EST-confirmed splice sites from Homo sapiens and standards for clean-up procedures – Thanaraj - 1999
4 Splicing of Messenger RNA Precursors – Sharp - 1987
3 Statistical analysis and prediction of the exonic structure of human genes – Gelfand - 1992
3 The role of small nuclear ribonucleoprotein particles in pre-mRNA splicing – Maniatis, Reed - 1987
2 Application of artificial neural networks for prokaryotic transcription terminator – Nair, Tambe, et al. - 1994
2 Interfering contexts of regulatory sequence elements – Trifonov - 1995
1 Using feature selection to find inputs that work better as outputs – Caruana, Sa - 1998
1 Calliper randomization: an artificial neural network based analysis of E. coli ribosome binding sites – Nair - 1997
1 Decision trees and neural nets: two complementary representations for learning – Paredis - 1991
1 A statistical analytical approach to decipher information from biological sequences: application to murine splice-site analysis and prediction – Reddy, Pandit - 1995