by John R. Koza, David Andre, Visiting Scholar
Evolutionary Computation: Theory and Applications. Singapore: World Scientific
http://www.genetic-programming.com/ECTA.ps
Add To MetaCart
Abstract:
Automated methods of machine learning may prove to be useful in discovering biologically meaningful information hidden in the rapidly growing databases of DNA sequences and protein sequences. Genetic programming is an extension of the genetic algorithm in which a population of computer programs is bred, over a series of generations, in order to solve a problem. Genetic programming is capable of evolving complicated problem-solving expressions of unspecified size and shape. Moreover, when automatically defined functions are added to genetic programming, genetic programming becomes capable of efficiently capturing and exploiting recurring sub-patterns. This chapter describes how genetic programming with automatically defined functions successfully evolved motifs for detecting the D-E-A-D box family of 2 proteins and for detecting the manganese superoxide dismutase family. Both motifs were evolved without prespecifying their length. Both evolved motifs employed automatically defined functions to capture the repeated use of common subexpressions. When tested against the SWISS-PROT database of proteins, the two genetically evolved consensus motifs detect the two families either as well, or slightly better than, the comparable human-written motifs found in the PROSITE database. 1.
Citations
|
1782
|
Genetic Programming: On the Programming of Computers by Means of Natural Selection Cambridge
– Koza
- 1992
|
|
1316
|
Genetic Algorithms + Data Structures = Evolution Programs
– Michalewicz
- 1994
|
|
848
|
Handbook of Genetic Algorithms
– Davis
- 1991
|
|
736
|
Identification of common molecular subsequences
– Smith, Waterman
- 1981
|
|
494
|
Genetic Programming II: Automatic Discovery of Reusable Programs
– Koza
- 1994
|
|
365
|
Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology
– Holland
- 1975
|
|
300
|
The SWISS-PROT protein sequence data bank and its supplement TrEMBL
– Bairoch, Apweiler
- 1997
|
|
212
|
Prediction of protein secondary structure at better than 70
– Rost, Sander
- 1993
|
|
179
|
A simple method for displaying the hydropathic character of a protein
– Kyte, Doolittle
- 1982
|
|
98
|
Principles that govern folding of protein chains
– Anfinsen
- 1973
|
|
92
|
Comparison of the predicted and observed secondary structure of T4 phage lysozyme
– Matthews
- 1975
|
|
79
|
Programming: The Movie
– Koza, Rice
- 1992
|
|
64
|
Genetic Programming II Videotape: The Next Generation
– Koza, Rice
- 1994
|
|
46
|
Distributed Genetic Algorithms for Function Optimization
– Tanese
- 1989
|
|
40
|
Parallel Genetic Programming on a Network of Transputers
– Koza, Andre
- 1995
|
|
38
|
The protein data bank: a computer based archival ®le for macromolecular structures
– Bernstein, Koetzle, et al.
- 1977
|
|
29
|
Genetic Algorithms and Investment Strategies
– Bauer
- 1994
|
|
26
|
PROSITE: recent developments
– Bairoch, Bucher
- 1994
|
|
13
|
Proteins: Structures and Molecular
– Creighton
- 1993
|
|
12
|
Parallelism and Programming in Classifier Systems
– Forrest
- 1991
|
|
11
|
Evolution of a computer program for classifying protein segments as transmembrane domains using genetic programming
– Koza
- 1994
|
|
11
|
The Kinemage: A Tool for Scientific Communication
– Richardson, Richardson
- 1992
|
|
10
|
Genetic Algorithms
– Eshelman
- 1995
|
|
8
|
Artificial Neural Nets and Genetic Algorithms
– Albrecht, Reeves, et al.
- 1993
|
|
8
|
Parallel Genetic Algorithms
– Stender
- 1993
|
|
7
|
The prediction of the degree of exposure to solvent of amino acid residues via genetic programming
– Handley
- 1994
|
|
6
|
Fourth Edition
– Freeman
- 1989
|
|
5
|
Predicting whether or not a nucleic acid sequence is an e. coli promoter region using genetic programming
– Handley
- 1995
|
|
5
|
Birth of the D-E-A-D box. Nature 337: 121--122
– Linder, Lasko, et al.
- 1989
|
|
4
|
Genetic Algorithms
– P, Petry
- 1992
|
|
4
|
Genetic Algorithms and Robotics. Singapore: World Scientific
– Davidor
- 1991
|
|
4
|
Automated learning of a detector for a-helices in protein sequences via genetic programming
– Handley
- 1993
|
|
4
|
Predicting whether or not a 60-base DNA sequence contains a centrally-located splice site using genetic programming
– Handley
- 1995
|
|
3
|
Aspects of the structure, functions, and applications of superoxide dismutase
– V, Bannister, et al.
- 1987
|
|
3
|
A novel RNA helicase gene tightly linked to the Triplo-lethal locus of Drosophila. Nucleic Acids Research 18(18): 5489--5495
– Dorer, Christensen, et al.
- 1990
|
|
3
|
Automated learning of a detector for the cores of a-helices in protein sequences via genetic programming
– Handley
- 1994
|
|
3
|
Manganese superoxide dismutase from Thermus thermophilus: A structural model refined at 1.8 �� resolution
– Ludwig, Metzger, et al.
- 1991
|
|
3
|
The structure of iron superoxide dismutase from Pseudomonas ovalis complexed with the inhibitor azide. Protein Engineering 4: 113--199
– Stoddard, Ringe, et al.
- 1990
|
|
2
|
Identification of five putative yeast RNA helicase genes
– Chang, Arenas, et al.
- 1990
|