| Whitley D., Dominic S., Das R., Anderson C., "Genetic reinforcement learning for neurocontrol problems", Machine Learning, 13, p. 259-284, 1993. |
....that is used to optimize the parameterized shared weight network. EA have been applied extensively to neural network design and optimization in a number of different ways. We use EA to optimize parameters of a fixed topology network. This is similar to optimization of neural network weights in [32] and [33] The flexibility of EA means the topology can also be optimized for a particular problem [34] Developmental encoding has also been suggested, which optimizes a program whose instructions dictate placement and connectivity of network nodes [35] When optimizing network architectures, ....
Whitley, D., S. Dominic, R. Das, and C. Anderson, Genetic Reinforcement Learning for Neurocontrol problems. Machine Learning, 1993. 13: p. 259-284.
....to adapt to the demands of the environment. They find that mutation rates adapt to an optimal level, which depends on the evolutionary demands of the environment for novelty. My model is similar, in that mutation rates are also allowed to adapt. Other work with adaptive mutation rates includes [6, 11, 14, 41]. Bedau and Seymour s model and my model are distinct from this other work in that we share an interest in the relationship between the adaptive mutation rates and the evolutionary demands 11 of the environment for novelty. The main difference between this paper and previous work is the ....
Whitley, D., Dominic, S., Das, R., and Anderson, C.W. (1993). Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13, 259-284. 31
....no mutation operator is used. The User module changes to re ect this as follows: mate c : Chromosome] Prob [Chromosome] mate c = edgerecombination The selection operator used is linear selection. The combination used is steady state. 3. 3 Inverted pendulum This problem was taken from [6]. The problem is to balance an inverted pendulum that sits over a cart. This is achieved by pushing the cart to the right or to the left. The cart can only move in one dimension on a nite track. A representation of the system can be seen in Figure 1. The genetic algorithm is used in this case to ....
Whitley, D., Dominic, S., Das, R., and Anderson, C. W. Genetic reinforcement learning for neurocontrol problems. Machine Learning (13), 259-284 (1993).
....when the neural network is evaluated. This way, ESP decomposes the problem of finding a successful network into several smaller subproblems, resulting in more efficient evolution. In several robot control benchmark tasks, ESP was compared to other neuro evolution methods such as SANE, GENITOR [28], and Cellular Encoding [9, 29] as well as to other reinforcement learning methods such as Adaptive Heuristic Critic [3, 1] Q learning [25, 20] and VAPS [14] ESP turns out to be consistently the most powerful, solving problems faster, and solving harder problems [8] It therefore forms a solid ....
Whitley, D., Dominic, S., Das, R., and Anderson, C. W. (1993). Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13:259--284.
....biological evolution. 1 Introduction Neuroevolution (NE) the artificial evolution of neural networks using genetic algorithms, has shown great promise in complex reinforcement learning tasks (Gomez and Miikkulainen 1999; Gruau et al. 1996; Moriarty and Miikkulainen 1997; Potter et al. 1995; Whitley et al. 1993). Neuroevolution searches through the space of behaviors for a network that performs well at a given task. This approach to solving complex control problems represents an alternative to statistical techniques that attempt to estimate the utility of particular actions in particular states of the ....
Whitley, D., Dominic, S., Das, R., and Anderson, C. W. (1993). Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13:259--284.
....The quality of an encoding strategy, however, has to be measured by the speed of convergence to sufficiently good solutions. Most papers report good results on small scale toy problems such as xor, sine, parity, or low bit adding. The convergence time competes well with back propagation [28, 13]. The largest problems that have been tackled successfully range from high bit parity [4] pole balancing [28, 10] and simplified pattern recognition [1] to high order classification tasks [26] If convergence times are reported at all, they are usually quite high: 3 days on 11 workstations ....
....solutions. Most papers report good results on small scale toy problems such as xor, sine, parity, or low bit adding. The convergence time competes well with back propagation [28, 13] The largest problems that have been tackled successfully range from high bit parity [4] pole balancing [28, 10], and simplified pattern recognition [1] to high order classification tasks [26] If convergence times are reported at all, they are usually quite high: 3 days on 11 workstations [1] or one week [26, in conversation ] 10, in conversation] for the mentioned tasks. These are tasks that can be ....
[Article contains additional citation context not shown here]
Darell Whitley, Stephen Dominic, Rajarshi Das, and Charles W. Anderson. Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13:259--284, 1993.
....in principle, of optimizing any classification structure or set of structures. In fact, people have tried optimizing most traditional machine learning structures as well as some nontraditiona l structures using GAs. These structures have ranged from neural network weights and topologies (Gruau Whitley, 1993; Whitley et al. 1990, 1991, 1993; Whitley Schaffer, 1992) to LISP programs (Koza, 1992) to regions of the instance space similar to decision trees induced by a splitting algorithm (Rendell, 1983, 1985; Sikora Shaw, 1994) to expertsystem rules (Montana, 1990) to weights for a game s ....
Whitley, D., S. Dominic, R. Das, and C. W. Anderson. 1993. Genetic reinforcement learning for neurocontroller problems. Machine Learning 13(2/3):259284.
.... on the neural network directly, and rely exclusively on mutation [10, 11, 12, 13] or combine mutation with training [14] Methods based on genetic algorithms usually represent the structure and the weights of ANNs as a string of 1 bits or as a combination of bits, integers and real numbers [15, 16, 17, 18, 19, 20], and perform the crossover operation as if the network were a linear structure. However, neural networks cannot naturally be represented as vectors. They are oriented graphs, whose nodes are neurons and whose arcs are synaptic connections. Therefore, it is arguable that any efficient approach to ....
D. Whitley, S. Dominic, R. Das, and C. Anderson. Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13:259--284, 1993.
....approach to the development of neural networks [6] it misses the exchange of genetic material which is the driving force in GAs. Methods based on genetic algorithms [4] usually represent the structure and the weights of NNs as a string of bits or as a combination of bits and real numbers [7, 8, 9], and perform the crossover operation as if the network were a linear structure. However, neural networks cannot naturally be represented as binary vectors. They are oriented graphs, whose nodes are neurons and whose arcs are synaptic connections. Therefore, it is arguable that any efficient ....
D. Whitley, S. Dominic, R. Das, and C. Anderson. Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13:259--284, 1993.
....a more suitable approach to the development of neural networks, it misses the exchange of genetic material which is the driving force in GAs. Methods based on genetic algorithms usually represent the structure and the weights of NNs as a string of bits or as a combination of bits and real numbers [7, 8, 9], and perform the crossover operation as if the network were a linear structure. However, neural networks cannot naturally be represented as binary vectors. They are oriented graphs, whose nodes are neurons and whose arcs are 1 synaptic connections. Therefore, it is arguable that any efficient ....
....and this may be used to reduce or increase the complexity of the network, within predefined limits. 6 5 Pole balancing problem To assess the performance of the method proposed, it was applied to the pole balancing problem. This problem is a well studied benchmark for control methods [17, 18, 8, 19, 20, 21, 22]. The task consists of balancing a pole hinged in the center of a moving cart, by applying a force to the cart exclusively. The pole is only allowed to move in a vertical plane and the cart moves in a one dimensional track. The system is illustrated in Figure 4. The only forces acting on the ....
[Article contains additional citation context not shown here]
D. Whitley, S. Dominic, R. Das, and C. Anderson. Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13:259--284, 1993.
....to adapt to the demands of the environment. They find that mutation rates adapt to an optimal level, which depends on the evolutionary demands of the environment for novelty. My model is similar, in that mutation rates are also allowed to adapt. Other work with adaptive mutation rates includes [6, 11, 14, 41]. Bedau and Seymour s model and my model are distinct from this other work in that we share an interest in the relationship between the adaptive mutation rates and the evolutionary demands Turney 12 of the environment for novelty. The main difference between this paper and previous work is the ....
Whitley, D., Dominic, S., Das, R., and Anderson, C.W. (1993). Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13, 259-284. Turney 32
.... Chemistry, 394, 461] Journal of Microcomputer Applications, 288, 668] Journal of Qing Hua University, 277] Journal of Systems Engineering, 696] Journal of Technical Physics (Poland) 300] Journal of the Society of Instrument and Control Engineers, 140] Lancet, 332] Machine Learning, [959] Mem. Tokohu Inst. Technol. I, Sci. Eng. Japan) 565] Methods of Information in Medicine, 901] Midwest Symp Circuits Syst, 273] Mini Micro Syst. China) 438] Modell Simul Mater Sci Eng, 284] Network: Computation in Neural Systems, 836] Neural Computat. Appl. 558] Neural Computing ....
....Montes, Jos e Francisco, 594, 595, 596, 598] Alderighi, M. 578] Alexander, D. M. 279] Alkadhimi, K. 212] Allen, E. B. 347, 508] Alpert, Bradley K. 599, 600, 601] Altiparmak, Fulya, 164] Amari, Sun ichi, 623] Anand, Vic, 921] Andersen, Tim L. 223, 235] Anderson, C. W. [956, 959] Anderson, Charles W. 132] Anderson, John, 368] Anderson, P. G. 433] Anderson, Peter G. 856] Angeline, Peter J. 13, 135, 315] Ansari, Nirwan, 426] Anthony, Denis, 602] Aoki, Takeshi, 219] Aoyagi, Yuji, 316, 427] Apolinario, Jr. J. A. 317] Apolin ario Jr. J. A. 480] ....
[Article contains additional citation context not shown here]
Darrell Whitley, Stephen Dominic, Rajarshi Das, and C. W. Anderson. Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13(2-3):259--284, 1993. ga:Whitley93d.
.... of the Institute of Systems, Control, and Information Engineers (Japan) 205, 380, 996] Journal of the Operational Research Society, 352] Journal of the Society of Instrument and Control Engineers, 375, 509, 571] KI Lexikon, 190] Kikai Gijutsu Kenkyusho Shoho, 997] Machine Learning, [313, 386, 391, 516, 1065] Machine Learning Journal, 216] Mech. Syst. Signal Process. UK) 193] Methods of Information in Medicine, 881] Microprocessing and Microprogramming, 1073] Neural Computing and Applications, 847] Neural Network World, 802, 1004] New Electronics (UK) 799] New Scientist, 271, 661] ....
....Mamory, 51] Alander, Jarmo T. 27, 28, 29, 30, 22, 23] Alba Torres, Enrique A. 52, 53, 54] Aldana Montes, Jos e Francisco, 52, 54] Alippi, Cesare, 55] Allen, Franklin, 550] Allen, L. 152] Alliot, Jean Marc, 56] Altman, Erik R. 1026] Amari, Sun ichi, 121] Anderson, C. W. [1065] Anderson, Peter G. 779] Angeline, Peter J. 57, 817, 818] Anheyer, Thomas, 1035] Annicchiarico, W. 366] Anon. 34, 66, 260, 489, 509, 776, 778] Ara, K. 58] Arai, Fumihito, 326, 340, 341] Araki, Miyuhiko, 995] Arena, P. 156] Argast, J. D. 700] Arnone, Salvatore, 1004] ....
[Article contains additional citation context not shown here]
Darrell Whitley, Rajarshi Das, and C. W. Anderson. Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13(2-3):259--284, 1993. y(CCA 2398/94 EI M050056/94) ga:Whitley93d.
....Keywords Symbiotic adaptive neuroevolution, coevolution, neural networks, diversity. 1. Introduction Artificial evolution has become an increasingly popular method for forming control policies in difficult decision problems (Grefenstette, Ramsey, Schultz, 1990; Moriarty Miikkulainen, 1996a; Whitley, Dominic, Das, Anderson, 1993). Such applications are very different from the function optimization tasks to which evolutionary algorithms (EAs) have been traditionally applied. For example, it is no longer desirable to converge the population to the best solution, since convergence will hinder adaptation of the population in ....
....3. 1 Evolving Symbiotic Neurons In almost all approaches to neuroevolution, each individual in the population represents a complete neural network that is evaluated independently of other networks in the population (Belew, McInerney, Schraudolph, 1991; Koza Rice, 1991; Nolfi Parisi, 1992; Whitley et al. 1993). As described in the previous section, by treating each member as a separate, full solution, the EA focuses the search toward a single dominant individual. Such concentration can greatly impede search progress in both complex and dynamic tasks. In contrast, the SANE method restricts the scope of ....
Whitley, D., Dominic, S., Das, R., & Anderson, C. W. (1993). Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13, 259--284.
....that crossover is an inportant mechanism provided by GAs for the exploitation of information on different regions of the search space. Methods based on genetic algorithms usually represent the structure and the weights of NNs as a string of bits or as a combination of bits and real numbers [9] [10], 11] and perform the crossover operation as if the network were a linear structure. However, neural networks cannot naturally be represented as binary vectors. They are oriented graphs, whose nodes are neurons and whose arcs are synaptic connections. Therefore, it is arguable that any efficient ....
D. Whitley, S. Dominic, R. Das, and C. Anderson, "Genetic reinforcement learning for neurocontrol problems," Machine Learning, vol. 13, pp. 259--284, 1993.
.... operate on the neural network directly, and rely exclusively on mutation [10, 11, 12, 13] or combine mutation with training [14] Methods based on genetic algorithms usually represent the structure and the weights of ANNs as a string of bits or as a combination of bits, integers and real numbers [15, 16, 17, 18, 19, 20], and perform the crossover operation as if the network were a linear structure. However, neural networks cannot naturally be represented as vectors. They are oriented graphs, whose nodes are neurons and whose arcs are synaptic connections. Therefore, it is arguable that any efficient approach to ....
D. Whitley, S. Dominic, R. Das, and C. Anderson. Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13:259--284, 1993.
....think that crossover is an inportant mechanism provided by GAs for the exploitation of information on different regions of the search space. Methods based on genetic algorithms usually represent the structure and the weights of NNs as a string of bits or as a combination of bits and real numbers [6, 7], and perform the crossover operation as if the network were a linear structure. However, neural networks cannot naturally be represented as binary vectors. They are oriented graphs, whose nodes are neurons and whose arcs are synaptic connections. Therefore, it is arguable that any efficient ....
....Node a. b) Node c . c) New node created to replace node a in the offspring. Note the multiple connection created. 4 Experimental results To assess the performance of the method proposed, it was applied to the pole balancing problem. This problem is a well studied benchmark for control methods [7, 14, 15]. The task consists of balancing a pole hinged in the center of a moving cart, by applying a force to the cart exclusively. The pole is only allowed to move in a vertical plane and the cart moves in a one dimensional track. The only forces acting on the system are a control force applied to the ....
[Article contains additional citation context not shown here]
D. Whitley, S. Dominic, R. Das, and C. Anderson. Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13:259--284, 1993.
....In the simplest case (see Figure 7) a neural network for the agent s decision policy is represented as a sequence of real valued connection weights. A straightforward EA for parameter optimization can be used to optimize the weights of the neural network (Belew, McInerney, Schraudolph, 1991; Whitley, Dominic, Das, Anderson, 1993; Yamauchi Beer, 1993) This representation thus requires the least modification of the standard EA. We now turn to distributed representations of policies in EARL systems. 5.2 Distributed Representation of Policies In the previous section we outlined EARL approaches that treat the agent s ....
....optimize system performance, this method shows much promise for scaling up to realistic tasks. 10.3 Genitor Genitor (Whitley Kauth, 1988; Whitley, 1989) is an aggressive, general purpose genetic algorithm that has been shown effective when specialized for use on reinforcement learning problems. Whitley et al. 1993) demonstrated how Genitor can efficiently evolve decision policies represented as neural networks using only limited reinforcement from the domain. Genitor relies solely on its evolutionary algorithm to adjust the weights in neural networks. In solving RL problems, each member of the population ....
[Article contains additional citation context not shown here]
Whitley, D., Dominic, S., Das, R., & Anderson, C. W. (1993). Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13, 259--284.
....60,000 90,000 13,100 26,200 39,300 52,400 65,500 Figure 3.5: Backpropagation and the n Bit Adder 61 This finding is consistent with reports of other research [Lindgren, 1992] The training of NN connection weights with GA is generally feasible only for small networks. D. Whitley suggests in [Whitley, 1993] to apply GANN systems to control problems that rely on reinforcement learning. Since no target output is given, back propagation cannot be used as a training method. In general, control problems demand the training of a real valued function. In case of the pole balancing problem [Wieland, 1991] ....
: Darell Whitley, Stephen Dominic, Rajarshi Das, and Charles W. Anderson: "Genetic Reinforcement Learning for Neurocontrol Problems ", in Machine Learning, 13, p. 259-284, Kluwer.
....generate offspring. Pole balancing has established itself as the standard benchmark for reinforcement learning methods. Unlike many other dynamical systems, it is conceptually simple and intuitive to humans, yet a good representative of real world control tasks (Barto et al. 1983; Anderson, 1989; Whitley et al. 1993; Pendrith, 1994; Moriarty and Miikkulainen, 1996) However, it is no longer challenging enough for modern reinforcement learning methods and more difficult variants need to be found. One particularly difficult variant is a cart with two poles of different lengths that have to be balanced ....
Whitley, D., Dominic, S., Das, R., and Anderson, C. W. (1993). Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13:259--284.
.... ability of SANE was compared to those of the best known reinforcement learning approaches to this problem: the single layer Adaptive Heuristic Critic (AHC) of Barto et al. 1983) the two layer Adaptive Heuristic Critic of Anderson (1987, 1989) and the GENITOR neuro evolution system of Whitley et al. 1993). SANE was found to be considerably faster (in CPU time) and more efficient (in training episodes) than the two layer AHC and GENITOR implementations. Compared to the single layer AHC, SANE was an order of magnitude faster even though it required more training episodes. The generalization ....
....a string s is selected for mating with probability f s =F , where f s is the fitness of string s and F is the average fitness of the population. This strategy can often cause a population to converge prematurely. As the average fitness of the strings increase, the variance in fitness decreases (Whitley 1993). Without sufficient variance between the best and worst performing strings, the genetic algorithm will be unable to assign significant bias towards the best strings. By selecting strings based on their rank in the population, the best strings will always receive significant bias over the worst ....
[Article contains additional citation context not shown here]
Whitley, D., Dominic, S., Das, R., and Anderson, C. W. (1993). Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13:259--284.
....on two populations which interact through their fitness. The first population contains NNs. We use a simple genetic representation of a NN: a linear string of its weights, with weights belonging to links feeding into the same node located next to each other. In accordance with (Paredis 1991) and (Whitley 1993), the weights are encoded directly as real numbers. Furthermore, the weights of the NNs in the initial population are in the range [ 1,1] This in order to keep weight contributions in a range to which the (sigmoidal) transfer function is sensitive (Wieland 1990) The second population consists of ....
Whitley, D., (1993), Genetic Reinforcement Learning for Neurocontrol Problems, Machine Learning, 13: p. 259-284, Kluwer Academic Publishers.
.... and generalization ability of SANE was contrasted with those of the single layer Adaptive Heuristic Critic (AHC) of Barto et al. 1983) the twolayer Adaptive Heuristic Critic of Anderson (1987, 1989) the Q learning method of Watkins and Dayan (1992) and the GENITOR neuro evolution system of Whitley et al. 1993). SANE was found to be considerably faster (in CPU time) and more efficient (in training episodes) than the two layer AHC, Q learning, and GENITOR implementations. Compared to the single layer AHC, SANE was an order of magnitude faster even though it required more training episodes. The ....
....learning approaches to this problem. 5.1. The Inverted Pendulum Problem The inverted pendulum or pole balancing problem is a classic control problem that has received much attention in the reinforcement learning literature (Anderson, 1989; Barto et al. 1983; Michie and Chambers, 1968; Whitley et al. 1993). A single pole is centered on a cart (figure 2) which may move left or right on a horizontal track. Naturally, any movements to the cart tend to unbalance the pole. The objective is to push the cart either left or right with a fixed magnitude force such that the pole remains balanced and the ....
[Article contains additional citation context not shown here]
Whitley, D., Dominic, S., Das, R., and Anderson, C. W. (1993). Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13:259--284.
....framework frees the implementor from generating training examples and provides a highly adaptive mechanism for dynamic environments. Recent work has shown evolved neurocontrollers effective in unstable, dynamic control tasks (Cliff et al. 1993; Moriarty and Miikkulainen 1996; Nolfi et al. 1994; Whitley et al. 1993; Yamauchi and Beer 1993) The bane of the evolutionary methods, however, has been the large number of fitness evaluations that must be performed to achieve a high level of performance. Recently, Moriarty and Miikkulainen (1996) have developed a more efficient neuro evolution system called SANE ....
....SANE to better evaluate subcomponents of the final solution. In the pole balancing benchmark, SANE compared favorably to temporal difference methods such as the Adaptive Heuristic Critic (Barto et al. 1983) and Q learning (Watkins 1989) and network level neuro evolution systems such as GENITOR (Whitley et al. 1993). The purpose of this paper is to compare and contrast the merits of evolution at the neuron level and evolution at the network level. SANE s neuron level approach is very adept at finding quick, effective solutions, but often has difficulty pinpointing the best solutions. The diverse neuron ....
[Article contains additional citation context not shown here]
Whitley, D., Dominic, S., Das, R., and Anderson, C. W. (1993). Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13:259--284.
....ALGORITHMS Evolutionary methods have found applications that span the range of architectures for intelligent robotics. For example, evolutionary algorithms have been used to learn rule sets for rule based autonomous agents [2] topologies and weights for neural nets for robotic control [3, 4], fuzzy logic control systems [5] programs for LISP controlled robots [6] and rules for behavior based robots [7] There are at least two different research paradigms evident in the literature of evolutionary algorithms for robotics, which might be called the artificial life (ALife) paradigm, ....
Whitley, D., S. Dominic, R. Das, C. Anderson (1993). Genetic reinforcement learning for neurocontrol problems. Machine Learning 13(2/3), 259-284.
....can not significantly improve the genetic algorithms, as additional unreported experiments suggest. This finding is consistent with reports of other research [Lindgren, 1992] The training of NN connection weights with GA is generally feasible only for small networks. D. Whitley suggests in [Whitley, 1993] to apply GANN systems to control problems that rely on reinforcement learning. Since no target output is given, back propagation cannot be used as a training method. In general, control problems demand the training of a real valued function. In case of the pole balancing problem [Wieland, 1991] ....
: Darell Whitley, Stephen Dominic, Rajarshi Das, and Charles W. Anderson: "Genetic Reinforcement Learning for Neurocontrol Problems", in Machine Learning, 13, p. 259-284, Kluwer.
....in the problem space that contain above average solutions. According to Whitely, genetic algorithms are capable of performing a global search of a space because they can rely on hyperplane sampling to guide the search instead of searching along the gradient of a function as back propagation does (Whitley et al. 1993). Combining genetic algorithms and connectionist networks There are many interesting possibilities for applying genetic algorithms to connectionist networks. GAs have been used to find good initial network weights, to tune network learning parameters, to determine network structure, to evolve ....
....to connectionist networks. GAs have been used to find good initial network weights, to tune network learning parameters, to determine network structure, to evolve network learning algorithms, and to learn network weights (Belew et al. 1992; Harp et al. 1989; Harvey, 1993b; Chalmers, 1990; Whitley et al. 1993). It is the last option that will be used here: the network architecture is fixed and the GA works to adapt an appropriate set of weights. Applying genetic algorithms to networks has not been as straightforward as other types of GA applications. Traditionally individuals in GA populations have ....
[Article contains additional citation context not shown here]
Whitley, C., Dominic, S., Das, R., and Anderson, C. W. (1993). Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13:259--284.
....traditional evolutionary approach generates an initial population of NNs with randomly assigned weights. We use a simple genetic representation of a NN: a linear string of its weights, with weights belonging to links feeding into the same node located next to each other. In accordance with [8] and [15] the weights are encoded directly as real numbers. Their value is in the interval [ 100,100] The FITNESS of a NN is equal to the number of the 200 examples it classifies correctly. After the creation of the initial population, the reproduction gives higher chances to good NNs to reproduce. The ....
....value is in the interval [ 100,100] The FITNESS of a NN is equal to the number of the 200 examples it classifies correctly. After the creation of the initial population, the reproduction gives higher chances to good NNs to reproduce. The particular algorithm used here is based on GENITOR ( 14] [15]) At each cycle it executes the following steps: 1) two parents are SELECTed. This selection is biased towards the fitter individuals, 2) a child is generated (here, two point crossover and adaptive mutation [14] is used) 3) its FITNESS is calculated, 4) if this fitness is higher than the ....
Whitley, D., (1993), Genetic Reinforcement Learning for Neurocontrol Problems, Machine Learning, 13: p. 259-284, Kluwer Academic Publishers.
....upon the genetic algorithm will be investigated, along with the validity of the operator productivity metric. 2.4 Hybrid Methods There is of course no reason why the two approaches described here cannot be combined. A hybrid of both the coevolutionary and learning rule approaches was proposed in [35]. This bears many similarities to the co evolution of the crossover probability studied here, however only one co evolution operator was used. This operator increased the crossover probability if the parent was fitter than the child, and decreased the crossover probability otherwise. 2.5 ....
D. Whitley. Genetic Reinforcement Learning for Neurocontrol Problems. Machine Learning, 13:259--284, 1993.
....ROBOTICS APPLICATIONS Evolutionary methods have found applications that span the range of architectures for intelligent robotics. For example, evolutionary algorithms have been used to learn rule sets for rule based autonomous agents [4] topologies and weights for neural nets for robotic control [11,12], fuzzy logic control systems [13] programs for LISP controlled robots [5] and rules for behavior based robots [14] There are at least two different research paradigms evident in the literature of evolutionary algorithms for robotics, which might be called the artificial life (ALife) paradigm, ....
Whitley, D., S. Dominic, R. Das, C. Anderson (1993). Genetic reinforcement learning for neurocontrol problems. Machine Learning 13(2/3), 259-284.
....methods such as backpropagation. The evolutionary framework frees the implementor from generating training examples and provides a highly adaptive mechanism for dynamic environments. Recent work has shown evolved neuro controllers effective in several unstable, dynamic control tasks [3] 6] 8] [13]. The bane of the evolutionary methods, however, has been the large number of fitness evaluations that must be performed to achieve a high level of performance. Recently, we have developed a more efficient evolutionary approach called SANE (Symbiotic, Adaptive NeuroEvolution) 6] which explicitly ....
....explicitly decomposes the evolutionary search for a complete solution into several parallel searches for partial solutions. In most approaches to neuroevolution, each individual represents a complete neural network that is evaluated independently of other networks in the population [2] 5] 9] [13]. By treating each member as a separate full solution, the genetic algorithm focuses the search towards a single type of individual, which normally leads to convergence on a single solution [4] In contrast, SANE s individuals do not represent complete solutions; they represent partial solutions. ....
[Article contains additional citation context not shown here]
Darrell Whitley, Stephen Dominic, Rajarshi Das, and Charles W. Anderson. Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13:259--284, 1993.
....an unmanageable mess. For ERL methods to handle such problems, efficient mechanisms for adaptive behavior based on immediate rewards need to be developed. Several researchers have investigated the use of local learning after each action in an evolutionary algorithm (Grefenstette 1991; Gruau and Whitley 1993; Littman 1995; Nolfi and Parisi 1995) Such approaches have been termed Lamarckian if the local revisions are written back to the genetic chromosomes or Baldwinian if the revisions are not persevered. Local learning affords the evolutionary algorithm a quicker response to environmental shifts, ....
....is perhaps its most important contribution. In almost all approaches to neuro evolution, each individual in the genetic population represents a complete neural network that is evaluated independently of other networks in the population (Belew et al. 1991; Koza and Rice 1991; Nolfi and Parisi 1992; Whitley et al. 1993). As described in section 3.2.3, by treating each member as a separate, full solution, the evolutionary algorithm focuses the search towards a single dominant individual. Such concentration can greatly impede search progress in both complex and dynamic tasks. In contrast, the SANE method restricts ....
[Article contains additional citation context not shown here]
Whitley, D., Dominic, S., Das, R., and Anderson, C. W. (1993). Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13:259--284.
.... discovery, they became linked with a reinforcement procedure [21, 5, 17, 11, 10] But as will be seen below, a GA was, in a sense, already a reinforcement procedure prior to the advent of classifier systems because it operates on information about the relative performance of potential solutions [32]. Goldberg provides a thorough introduction to both the optimization and reinforcement applications of GAs [16] Genetic algorithms are based on the theory of natural selection in evolution. GAs work on a population of individuals, where each individual represents a possible solution to the given ....
....in the next generation than individuals with below average fitness. Therefore genetic algorithms are capable of performing a global search of a space because they can rely on hyperplane sampling to guide the search instead of searching along the gradient of a function as back propagation does [32]. Combining Genetic Algorithms and Neural Networks There are many interesting possibilities for applying genetic algorithms to neural networks. GAs have been used to find good initial network weights, to tune network learning parameters, to determine network structure, to evolve network learning ....
[Article contains additional citation context not shown here]
C. Whitley, S. Dominic, R. Das, and C. W. Anderson, "Genetic reinforcement learning for neurocontrol problems," Machine Learning, vol. 13, no. 2/3, pp. 259--284, 1993.
....function is maximized. This function is actually proportional to the number of successful decisions (i.e. actions that do not lead to the receipt of penalty signal) A previous direct approach to delayed reinforcement problems employs real valued genetic algorithms to perform the optimization task [16]. In the present study we propose another optimization strategy that is based on the polytope method with random restarts. Details concerning such an approach are presented in the next section, while section 3 provides experimental results from the application of the proposed method to the pole ....
....of optimization approach will be suitable for training. As already stated, a previous reinforcement learning approach that follows a direct strategy, employs optimization techniques based on genetic algorithms and provides very good results in terms of training speed (required number of cycles) [16]. In this work, we present a different optimization strategy based on the polytope algorithm [12, 8, 13] which is described next. 2.1 The Polytope Algorithm The Polytope algorithm belongs to the class of direct search methods for non linear optimization. It is also known by the name Simplex, ....
[Article contains additional citation context not shown here]
D. Whitley, S. Dominic, R. Das and C.W. Anderson, Genetic Reinforcement Learning for Neurocontrol Problems, Machine Learning, Vol. 13, pp. 259-284, 1993.
....was supported in part by the National Science Foundation under grant #IRI 9504317. 1 Introduction Artificial evolution has become an increasingly popular method for forming control policies in difficult decision problems (Grefenstette, Ramsey, Schultz, 1990; Moriarty Miikkulainen, 1996a; Whitley, Dominic, Das, Anderson, 1993). Such applications are very different from the function optimization tasks to which evolutionary algorithms (EA) have been traditionally applied. For example, it is no longer desirable to converge the population to the best solution, since convergence will hinder adaptation of the population in ....
....3. 1 Evolving Symbiotic Neurons In almost all approaches to neuro evolution, each individual in the population represents a complete neural network that is evaluated independently of other networks in the population (Belew, McInerney, Schraudolph, 1991; Koza Rice, 1991; Nolfi Parisi, 1992; Whitley et al. 1993). As described in the previous section, by treating each member as a separate, full solution, the evolutionary algorithm focuses the search towards a single dominant individual. Such concentration can greatly impede search progress in both complex and dynamic tasks. In contrast, the SANE method ....
Whitley, D., Dominic, S., Das, R., & Anderson, C. W. (1993). Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13, 259--284.
....and nondifferentiable space. It does not depend on the gradient information of the error (or fitness) function, thus is particularly appealing when the gradient information is unavailable or very costly to get. For example, the evolutionary approach has been used in reinforcement learning [47, 64, 69, 70, 71], recurrent network learning [45, 64, 72] and higher order network learning [56, 57] Moreover, the same evolutionary algorithm can be used to training many different networks regardless of whether they are feedforward networks, recurrent networks, or higher order networks. The general ....
D. Whitley, S. Dominic, R. Das, and C. W. Anderson. Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13(2/3):259--284, 1993.
....to be optimized is integer valued, gradient based optimization techniques cannot be employed. A previous reinforcement learning approach that follows the direct strategy uses genetic algorithms to perform optimization with very good results in terms of training speed (required number of cycles) [15]. In our case, we have considered the derivative free optimization procedures provided by Merlin . Among them, the SIMPLEX method has been found to be very effective. The simplex algorithm (or polytope algorithm) starts with an initial simplex, which is subsequently adapted in order to reach the ....
....(jF j = 10N) and the controller must decide only the direction of the force at each time step. Obviously the control problem is more difficult comprared to the case where any value for the magnitude is allowed. Details concerning the equations of motion of the cart pole system can be found in [11, 15, 13]. These motion equations are unknown to the controller. According to the specifications of [15, 11] the action network is a multilayer perceptron with four input units (accepting the sytem state) one hidden layer with five sigmoid units and one sigmoid unit in the output layer. There are also ....
[Article contains additional citation context not shown here]
Whitley D, Dominic S, Das R. and Anderson CW, Genetic Reinforcement Learning for Neurocontrol Problems, Machine Learning vol. 13 pp. 259-284, 1993.
....combined to form better solutions in subsequent generations. In neuro evolution, the solutions take the form of neural networks. Most approaches to neuro evolution operate on a population of complete neural networks that are encoded in separate chromosomes (Belew et al. 1991; Koza and Rice 1991; Whitley et al. 1993). By evolving full solutions to the problem (i.e. complete neural networks) the algorithm typically converges the population towards a single dominant individual. Such concentration is desirable if it occurs at the global optimum, however, often populations prematurely converge to a local ....
.... generalization ability of SANE was compared to those of the best known reinforcement learning approaches to this problem: the single layer Adaptive Heuristic Critic (AHC) of Barto et al. 1983) the two layer Adaptive Heuristic Critic of Anderson (1989) and the GENITOR neuro evolution system of Whitley et al. 1993). SANE was found to be considerably faster (in CPU time) and more efficient (in training episodes) than the two layer AHC and GENITOR implementations. Compared to the single layer AHC, SANE was an order of magnitude faster even though it required more training episodes. The generalization ....
Whitley, D., Dominic, S., Das, R., and Anderson, C. W. (1993). Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13:259--284.
....learning can thus integrate sophisticated behaviors such as obstacle avoidance into a robot arm control policy. Since artificial evolution of neural networks has been shown competitive and in many cases more efficient than other reinforcement learning methods (Moriarty and Miikkulainen 1996a; Whitley et al. 1993), the approach in this paper is based on neuro evolution as the reinforcement learning method. 3 Evolving Neuro Controllers Recently there has been much interest in evolving neural networks with genetic algorithms in control tasks (Cliff et al. 1993; Moriarty and Miikkulainen 1996a; Nolfi et al. ....
....approach in this paper is based on neuro evolution as the reinforcement learning method. 3 Evolving Neuro Controllers Recently there has been much interest in evolving neural networks with genetic algorithms in control tasks (Cliff et al. 1993; Moriarty and Miikkulainen 1996a; Nolfi et al. 1994; Whitley et al. 1993; Yamauchi and Beer 1993) Genetic algorithms (Holland 1975; Goldberg 1989) are global search techniques patterned after Darwin s theory of natural evolution. Numerous potential solutions are encoded in strings, called chromosomes, and evaluated in a specific task. Substrings, or genes, of the ....
Whitley, D., Dominic, S., Das, R., and Anderson, C. W. (1993). Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13:259--284.
.... reinforcement learning: methods that learn through temporal differences (Sutton, 1988; Watkins Dayan, 1992; Kaelbling, Littman, Moore, 1996) and methods that learn through evolutionary algorithms (Grefenstette, Ramsey, Schultz, 1990; Holland Reitman, 1978; Moriarty Miikkulainen, 1996a; Whitley, Dominic, Das, Anderson, 1993; Wilson, 1994) This paper adopts an evolutionary algorithm as the primary reinforcement learning mechanism, but also employs a technique similar to temporal difference learning for smaller strategy refinements. 3 An Approach to Learning Lane Selection Strategies To test our hypothesis that ....
Whitley, D., Dominic, S., Das, R., & Anderson, C. W. (1993). Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13, 259--284.
....used to study the di#cult credit assignment problem that arises when performance feedback is provided only by a failure signal. This problem has often used to test new approaches to learning control #from early work by Widrow and Smith, 1964, to recent studies such as Jordan and Jacobs, 1990, and Whitley, Dominic, Das, and Anderson, 1993#. It involves a pendulum hinged to the top of a wheeled cart that travels along a track of limited length. The pendulum is constrained to move within the vertical plane. The state is speci#ed by the position and velocity of the cart and the angle between the pendulum and vertical and the angular ....
D. Whitley, S. Dominic, R. Das, and C. Anderson. #1993#. Genetic reinforcement learning for neurocontrol problems. Machine Learning, to appear.
....410 510 2 poles 3 inputs 840,000 300 Table 1: Results of the experiments. The learning column reports the average number of evaluations needed to find the solution, the Gen. column report the performance in generalization We ran a generalization test based on that used by Whitley et al.[7] where 625 initial settings of the cart and of the pole are generated. Each of the normalized 4 input variables representing cart position, cart velocity, pole position, and pole velocity take the following 5 values: 0.05, 0.25, 0.5, 0.75, 0.95. Note that these values actually scale to ....
Whitley, D., Dominic, S., Das, R. and Anderson, C. (1993) Genetic reinforcement learning for neurocontrol problems, Maching Learning 13:259--284.
....reproduce results similar to Wieland s for 1 and 2 poles and also evolve cellular encodings for neural networks to solve these same problems. Cellular encoding is a language for local graph transformations that controls the division of cells that grow an artificial neural network (ANN) Gruau and Whitley, 1993; Gruau 1994) Earlier versions of the cellular encoding language described only Boolean neural networks. In the current version of cellular encoding, real valued weights are evolved. In addition, some constraints are imposed on the size of networks during the development process. Our results ....
....in this paper as well as the use of incremental evaluation functions that utilize multiple start states. 2 Cellular Encoding with real weights Cellular encoding is a language for local graph transformations that controls the division of cells which grow into an artificial neural network (Gruau and Whitley, 1993; Gruau, 1994) Each cell has an input site and an output site and can be linked to other cells. A cell also possesses a list of internal registers that represent local memory. The registers contain neuron attributes such as weights or the threshold. The graph transformations can be classified ....
[Article contains additional citation context not shown here]
Whitley, D., Dominic, S., Das, R., and Anderson, C. (1993). Genetic Reinforcement Learning for Neurocontrol Problems. Machine Learning, 13:259--284.
....to study the difficult credit assignment problem that arises when performance feedback is provided only by a failure signal. This problem has often used to test new approaches to learning control (from early work by Widrow and Smith, 1964, to recent studies such as Jordan and Jacobs, 1990, and Whitley, Dominic, Das, and Anderson, 1993). It involves a pendulum hinged to the top of a wheeled cart that travels along a track of limited length. The pendulum is constrained to move within the vertical plane. The state is specified by the position and velocity of the cart and the angle between the pendulum and vertical and the angular ....
D. Whitley, S. Dominic, R. Das, and C. Anderson. (1993). Genetic reinforcement learning for neurocontrol problems. Machine Learning, to appear.
....to determine an optimal policy which can determine the actions that the agent is going to take given each state. Q learning is a family of reinforcement learning algorithms initially developed by Watkins [6] The prevalence of these algorithm is partially due to the existence of convergence proofs [7]. In Q learning, the predicted long term cumulative reinforcement, called the Q value, is a function of actions as well as input states. The Q function acts as an evaluation function that predicts the discounted cumulative reinforcement. The action selection policy is based on the Qvalue and the ....
Whitley, Dominic, Das, and Anderson. Genetic reinforcement learning for neurocontrol problems. Technical Report CS92-102, CSU, 1992.
No context found.
Whitley D., Dominic S., Das R., Anderson C., "Genetic reinforcement learning for neurocontrol problems", Machine Learning, 13, p. 259-284, 1993.
No context found.
D. Whitley, S. Dominic, R. Das, and C. W. Anderson. Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13:259--284, 1993.
No context found.
Whitney D & Dominic S & Das R &Anderson C.W, : "Genetic Reinforcement Learning for Neurocontrol Problems", Mechine Learning, Vol 13, P259-284, Kluwer Academic Publishers, 1993.
No context found.
Darrell Whitley, S. Dominic, R. Das, and C. Anderson. Genetic Reinforcement Learning for Neurocontrol Problems. Machine Learning, 1992.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC