MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Automatic discovery of protein motifs using genetic programming (1996) [18 citations — 11 self]

Download:
pdf | ps
by John R. Koza, David Andre, Visiting Scholar
Evolutionary Computation: Theory and Applications. Singapore: World Scientific
http://www.genetic-programming.com/ECTA.ps
Add To MetaCart

Abstract:

Automated methods of machine learning may prove to be useful in discovering biologically meaningful information hidden in the rapidly growing databases of DNA sequences and protein sequences. Genetic programming is an extension of the genetic algorithm in which a population of computer programs is bred, over a series of generations, in order to solve a problem. Genetic programming is capable of evolving complicated problem-solving expressions of unspecified size and shape. Moreover, when automatically defined functions are added to genetic programming, genetic programming becomes capable of efficiently capturing and exploiting recurring sub-patterns. This chapter describes how genetic programming with automatically defined functions successfully evolved motifs for detecting the D-E-A-D box family of 2 proteins and for detecting the manganese superoxide dismutase family. Both motifs were evolved without prespecifying their length. Both evolved motifs employed automatically defined functions to capture the repeated use of common subexpressions. When tested against the SWISS-PROT database of proteins, the two genetically evolved consensus motifs detect the two families either as well, or slightly better than, the comparable human-written motifs found in the PROSITE database. 1.

Citations

1782 Genetic Programming: On the Programming of Computers by Means of Natural Selection Cambridge – Koza - 1992
1316 Genetic Algorithms + Data Structures = Evolution Programs – Michalewicz - 1994
848 Handbook of Genetic Algorithms – Davis - 1991
736 Identification of common molecular subsequences – Smith, Waterman - 1981
494 Genetic Programming II: Automatic Discovery of Reusable Programs – Koza - 1994
365 Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology – Holland - 1975
300 The SWISS-PROT protein sequence data bank and its supplement TrEMBL – Bairoch, Apweiler - 1997
212 Prediction of protein secondary structure at better than 70 – Rost, Sander - 1993
179 A simple method for displaying the hydropathic character of a protein – Kyte, Doolittle - 1982
98 Principles that govern folding of protein chains – Anfinsen - 1973
92 Comparison of the predicted and observed secondary structure of T4 phage lysozyme – Matthews - 1975
79 Programming: The Movie – Koza, Rice - 1992
64 Genetic Programming II Videotape: The Next Generation – Koza, Rice - 1994
46 Distributed Genetic Algorithms for Function Optimization – Tanese - 1989
40 Parallel Genetic Programming on a Network of Transputers – Koza, Andre - 1995
38 The protein data bank: a computer based archival ®le for macromolecular structures – Bernstein, Koetzle, et al. - 1977
29 Genetic Algorithms and Investment Strategies – Bauer - 1994
26 PROSITE: recent developments – Bairoch, Bucher - 1994
13 Proteins: Structures and Molecular – Creighton - 1993
12 Parallelism and Programming in Classifier Systems – Forrest - 1991
11 Evolution of a computer program for classifying protein segments as transmembrane domains using genetic programming – Koza - 1994
11 The Kinemage: A Tool for Scientific Communication – Richardson, Richardson - 1992
10 Genetic Algorithms – Eshelman - 1995
8 Artificial Neural Nets and Genetic Algorithms – Albrecht, Reeves, et al. - 1993
8 Parallel Genetic Algorithms – Stender - 1993
7 The prediction of the degree of exposure to solvent of amino acid residues via genetic programming – Handley - 1994
6 Fourth Edition – Freeman - 1989
5 Predicting whether or not a nucleic acid sequence is an e. coli promoter region using genetic programming – Handley - 1995
5 Birth of the D-E-A-D box. Nature 337: 121--122 – Linder, Lasko, et al. - 1989
4 Genetic Algorithms – P, Petry - 1992
4 Genetic Algorithms and Robotics. Singapore: World Scientific – Davidor - 1991
4 Automated learning of a detector for a-helices in protein sequences via genetic programming – Handley - 1993
4 Predicting whether or not a 60-base DNA sequence contains a centrally-located splice site using genetic programming – Handley - 1995
3 Aspects of the structure, functions, and applications of superoxide dismutase – V, Bannister, et al. - 1987
3 A novel RNA helicase gene tightly linked to the Triplo-lethal locus of Drosophila. Nucleic Acids Research 18(18): 5489--5495 – Dorer, Christensen, et al. - 1990
3 Automated learning of a detector for the cores of a-helices in protein sequences via genetic programming – Handley - 1994
3 Manganese superoxide dismutase from Thermus thermophilus: A structural model refined at 1.8 �� resolution – Ludwig, Metzger, et al. - 1991
3 The structure of iron superoxide dismutase from Pseudomonas ovalis complexed with the inhibitor azide. Protein Engineering 4: 113--199 – Stoddard, Ringe, et al. - 1990
2 Identification of five putative yeast RNA helicase genes – Chang, Arenas, et al. - 1990