Results 1 - 10
of
58
General Schema Theory for Genetic Programming with Subtree-Swapping Crossover
- In Genetic Programming, Proceedings of EuroGP 2001, LNCS
, 2001
"... In this paper a new, general and exact schema theory for genetic programming is presented. The theory includes a microscopic schema theorem applicable to crossover operators which replace a subtree in one parent with a subtree from the other parent to produce the offspring. A more macroscopic schema ..."
Abstract
-
Cited by 44 (28 self)
- Add to MetaCart
In this paper a new, general and exact schema theory for genetic programming is presented. The theory includes a microscopic schema theorem applicable to crossover operators which replace a subtree in one parent with a subtree from the other parent to produce the offspring. A more macroscopic schema theorem is also provided which is valid for crossover operators in which the probability of selecting any two crossover points in the parents depends only on their size and shape. The theory is based on the notions of Cartesian node reference systems and variable-arity hyperschemata both introduced here for the first time. In the paper we provide examples which show how the theory can be specialised to specific crossover operators and how it can be used to derive an exact definition of effective fitness and a size-evolution equation for GP. 1
Quadratic Bloat in Genetic Programming
- Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2000), pages 451–458, Las Vegas
, 2000
"... In earlier work we predicted program size would grow in the limit at a quadratic rate and up to fifty generations we measured bloat O(generations 1:2\Gamma1:5 ). On two simple benchmarks we test the prediction of bloat O(generations 2:0 ) up to generation 600. In continuous problems the li ..."
Abstract
-
Cited by 32 (3 self)
- Add to MetaCart
In earlier work we predicted program size would grow in the limit at a quadratic rate and up to fifty generations we measured bloat O(generations 1:2\Gamma1:5 ). On two simple benchmarks we test the prediction of bloat O(generations 2:0 ) up to generation 600. In continuous problems the limit of quadratic growth is reached but convergence in the discrete case limits growth in size. Measurements indicate subtree crossover ceases to be disruptive with large programs (1,000,000) and the population effectively converges (even though variety is near unity). Depending upon implementation, we predict run time O(no. generations 2:0\Gamma3:0 ) and memory O(no. generations 1:0\Gamma2:0 ). 1 INTRODUCTION It has been known for some time that programs within GP populations tend to rapidly increase in size as the population evolves [ Koza, 1992, Altenberg, 1994, Tackett, 1994, Blickle and Thiele, 1994, Nordin and Banzhaf, 1995, Nordin, 1997, McPhee and Miller, 1995, Langdon,...
Exact Schema Theory for Genetic Programming and Variable-length Genetic Algorithms with One-Point Crossover
, 2001
"... A few schema theorems for Genetic Programming (GP) have been proposed in the literature in the last few years. Since they consider schema survival and disruption only, they can only provide a lower bound for the expected value of the number of instances of a given schema at the next generation rathe ..."
Abstract
-
Cited by 27 (16 self)
- Add to MetaCart
A few schema theorems for Genetic Programming (GP) have been proposed in the literature in the last few years. Since they consider schema survival and disruption only, they can only provide a lower bound for the expected value of the number of instances of a given schema at the next generation rather than an exact value. This paper presents theoretical results for GP with one-point crossover which overcome this problem. Firstly, we give an exact formulation for the expected number of instances of a schema at the next generation in terms of microscopic quantities. Thanks to this formulation we are then able to provide an improved version of an earlier GP schema theorem in which some (but not all) schema creation events are accounted for. Then, we extend this result to obtain an exact formulation in terms of macroscopic quantities which makes all the mechanisms of schema creation explicit. This theorem allows the exact formulation of the notion of effective fitness in GP and opens the way to future work on GP convergence, population sizing, operator biases, and bloat, to mention only some of the possibilities.
Solving High-Order Boolean Parity Problems with Smooth Uniform Crossover, Sub-Machine Code GP and Demes
, 2000
"... We propose and study new search operators and a novel node representation that can make GP fitness landscapes smoother. Together with a tree evaluation method known as sub-machine code GP and the use of demes, these make up a recipe for solving very large parity problems using GP. We tested this rec ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
We propose and study new search operators and a novel node representation that can make GP fitness landscapes smoother. Together with a tree evaluation method known as sub-machine code GP and the use of demes, these make up a recipe for solving very large parity problems using GP. We tested this recipe on parity problems with up to 22 input variables, solving them with a very high success probability.
Repeated Sequences in Linear Genetic Programming Genomes
- Complex Systems
, 2005
"... Introduction It has been long noticed that there are emergent phenomena in genetic programming (GP) runs unintended by the human designer of the algorithm. Early on it was observed that code which does not change the output of the program (i.e. non-e#ective code) appears in many GP runs [34, 38, 2] ..."
Abstract
-
Cited by 15 (8 self)
- Add to MetaCart
Introduction It has been long noticed that there are emergent phenomena in genetic programming (GP) runs unintended by the human designer of the algorithm. Early on it was observed that code which does not change the output of the program (i.e. non-e#ective code) appears in many GP runs [34, 38, 2]. It was also noted that bloat a#ects many GP systems. Reasons for bloat and non-e#ective code have been examined in years past [25, 4, 7] and remedies have been developed more or less e#ective under particular circumstances (e.g. [29, 15, 22, 17]). Here we would like to argue that non-e#ective code and bloat are only the tip of an iceberg and that there is more to be discovered about "emergent phenomena" in GP runs. Particularly, we would like to study repetition of patterns in GP-evolved programs. These are instructions, or more interestingly, groups of instructions, that occur several times in a program. In fact long sequences of instructions which are repeated can sometimes be decompose
Problem Difficulty and Code Growth in Genetic Programming
, 2004
"... This paper investigates the relationship between code growth and problem difficulty in genetic programming. The symbolic regression problem domain is used to investigate this relationship using two different types of increased instance difficulty. Results are supported by a simplified model of genet ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
This paper investigates the relationship between code growth and problem difficulty in genetic programming. The symbolic regression problem domain is used to investigate this relationship using two different types of increased instance difficulty. Results are supported by a simplified model of genetic programming and show that increased difficulty induces higher selection pressure and less genetic diversity, which both contribute toward an increased rate of code growth.
Genetic programming for mining DNA chip data from cancer patients
- Genetic Programming and Evolvable Machines
, 2004
"... Abstract. In machine learning terms DNA (gene) chip data is unusual in having thousands of attributes (the gene expression values) but few (< 100) records (the patients). A GP based method for both feature selection and generating simple models based on a few genes is demonstrated on cancer data. 1 ..."
Abstract
-
Cited by 14 (7 self)
- Add to MetaCart
Abstract. In machine learning terms DNA (gene) chip data is unusual in having thousands of attributes (the gene expression values) but few (< 100) records (the patients). A GP based method for both feature selection and generating simple models based on a few genes is demonstrated on cancer data. 1
Genetic Programming in Data Mining for Drug Discovery
- EVOLUTIONARY COMPUTING IN DATA MINING A. GHOSH AND L C. JAIN (EDITORS)
, 2004
"... Genetic programming (GP) is used to extract from rat oral bioavailability (OB) measurements simple, interpretable and predictive QSAR models which both generalise to rats and to marketed drugs in humans. Receiver Operating Characteristics (ROC) curves for the binary classifier produced by machine le ..."
Abstract
-
Cited by 13 (7 self)
- Add to MetaCart
Genetic programming (GP) is used to extract from rat oral bioavailability (OB) measurements simple, interpretable and predictive QSAR models which both generalise to rats and to marketed drugs in humans. Receiver Operating Characteristics (ROC) curves for the binary classifier produced by machine learning show no statistical difference between rats (albeit without known clearance differences) and man. Thus evolutionary computing offers the prospect of in silico ADME screening, e.g. for “virtual” chemicals, for pharmaceutical drug discovery.
Exact Schema Theory for GP and Variable-length GAs with Homologous Crossover
- In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001
, 2001
"... In this paper we present a new exact schema theory for genetic programming and variable-length genetic algorithms which is applicable to the general class of homologous crossovers. These are a group of operators, including GP one-point crossover and GP uniform crossover, where the offspring ar ..."
Abstract
-
Cited by 11 (8 self)
- Add to MetaCart
In this paper we present a new exact schema theory for genetic programming and variable-length genetic algorithms which is applicable to the general class of homologous crossovers. These are a group of operators, including GP one-point crossover and GP uniform crossover, where the offspring are created preserving the position of the genetic material taken from the parents. The theory is based on the concepts of GP crossover masks and GP recombination distributions (both introduced here for the first time), as well as the notions of hyperschema and node reference systems introduced in other recent research. This theory generalises and refines previous work in GP and GA theory.

