| Gullapalli, V. Reinforcement Learning and its Application to Control. PhD thesis, University of Massachusetts, Amherst, MA, 1992. |
....and thus to reduce the amount of exploration necessary with the real robot [105, 80] O line learning, however, depends heavily on the quality of the model. Moreover, while o line learning can improve learning speed, it does not generally address safety. To cope with these requirements, shaping [47] and teacher inputs [25] have been used to guide exploration towards a good control policy. However, these techniques have not yet produced formal methods to facilitate the interaction with the learning system and exploration process. 13 1.2.5 Hybrid Discrete Continuous Systems Research in ....
....knowledgeable outside teacher presents a powerful means to speed up the learning process [24] In these techniques, outside information about correct action strategies permits to shortcut the exploration process. Shaping presents another means of guiding a reinforcement learning process in robots [47] as well as in biological organisms. Here, the reward function is modi ed incrementally from a very general reinforcement which rewards the system if it partially performs the task, to a more speci c reinforcement de ning the complete task. This permits the system to eciently discover an ....
Gullapalli, V. Reinforcement Learning and its Application to Control. PhD thesis, University of Massachusetts, Amherst, MA, 1992.
....(Heise, 1989) Kaiser and Kreuziger, 1994) The approaches to skill learning are mainly aiming at identifying a control function for a given task. They can be found both in the robotics (Asada and Yang, 1989) Delson and West, 1993) Liu and Asada, 1993) and the machine learning community (Gullapalli, 1992), Lin, 1993) In general, all works share the same principal view on skills as the ability of the robot to safely change the world from a given state to a defined one in the presence of uncertainty, with the individual control functions applied using only initialization data and direct sensorial ....
Gullapalli, V. (1992). Reinforcement Learning and its application to control. PhD thesis. University of Massachusetts, Department of Computer and Information Science.
....state and action as input and Q value as the output. Training such a network is not conceptually di#cult, but using the network to #nd the optimal action can be a challenge. One method is to do a local gradient ascent search on the action in order to #nd one with high value #Baird Klopf, 1993#. Gullapalli #1990, 1992# has developed a #neural reinforcement learning unit for use in continuous action spaces. The unit generates actions with a normal distribution; it adjusts the mean and variance based on previous experience. When the chosen actions are not performing well, the variance is high, resulting in ....
Gullapalli, V. #1992#. Reinforcement learning and its application to control. Ph.D. thesis, University of Massachusetts, Amherst, MA.
....infor On Training Automated Agents 14 mation about the task to stop exploring and to start relying on the domain decision policy it has learned. Much of the research in reinforcement learning is aimed at solving these two problems (Sutton, 1984; Kaelbling, 1990; Barto, Bradtke Singh, 1991; Gullapalli, 1992; Whitehead, 1992; Thrun Moller, 1992, and many more) The remainder of this section describes three approaches to reinforcement learning. First, another of Samuel s checkers players will be explored. Then, two particular reinforcement learning algorithms that have been applied to many ....
....and the identification of the subtasks to the learning agent. Shaping, a method used in operant conditioning (Skinner, 1938) has been adopted recently as a method for training automated learning agents. The learning agent is trained on increasingly complex approximations of the required task (Gullapalli, 1992), and would not be expected to learn to perform the task without this aid. In this case, the human expert specifies the approximations that allow the learning agent to learn its task. Task decomposition and shaping each take advantage of human expertise to lessen the difficulties of learning in ....
Gullapalli, Vijaykumar (1992). Reinforcement learning and its application to control. Doctoral dissertation, Computer and Information Science, University of Massachusetts, Amherst, MA.
....a fine level of control is required, the boxes are made smaller, which considerably increases the number of states. As the number of states becomes much larger, the system suffers as the computational overhead increases. An alternative is the Stochastic Real Valued (SRV) approach proposed by Gullapalli (1990, 1992) which takes real values of variables directly from the environment and uses these to determine the required control action. The method does not suffer from heavy computational burdens and permits continuous values for the control action. The SRV approach attempts to determine the optimal control ....
Gullapalli, V. (1992). Reinforcement Learning and its Application to Control, University of Massachusetts.
....learning schemes do not provide a mechanism to support the explicit incorporation of knowledge about the task and the robot mechanism into the learning and explo ration process. While there have been attempts to incorporate such information implicitly in the form of shaping procedures [9], or by means of teaching [3] most such systems require all knowledge to be designed into the behaviors or composition mechanisms a priori. The approach presented in this paper addresses the problems of complexity and safety, and provides a means of introducing domain knowledge by means of a ....
V. Gullapalli. Reinforcement Learning and its Application to Control. PhD thesis, Univ. of Mass., Amherst, MA, 1982.
....discretization (with grids or triangulations) or general approximation (such as neural networks, polynomial functions, fuzzy sets, etc. methods. RL algorithms for continuous state space have been implemented with neural networks (see for example (Barto, Sutton, Anderson, 1983) Barto, 1990) (Gullapalli, 1992), Williams, 1992) Lin, 1993) Sutton Whitehead, 1993) Harmon, Baird, Klopf, 1996) and (Bertsekas Tsitsiklis, 1996) fuzzy sets (see (Now e, 1995) Glorennec Jouffe, 1997) approximators based on state aggregation (see (Singh, Jaakkola, Jordan, 1994) clustering (see (Mahadevan ....
Gullapalli, V. (1992). Reinforcement Learning and its application to control. Ph.D.
.... without the direct influence of an outside teacher, making the reinforcement learning paradigm an attractive option since it allows to learn sequences of behavior from simple reinforcement signals [1, 17] However, while these techniques have been applied to simple robot systems and in simulation[2, 5, 7, 10, 11, 12, 6], the complexity of the primitive action and state spaces of most robots leads to a need for large amounts of experiences to learn a given task, thus rendering these methods impracticable for on line learning on such systems. Furthermore, most such learning systems do not provide a means for ....
V. Gullapalli. Reinforcement Learning and its Application to Control. PhD thesis, University of Massachusetts, Amherst, MA, 1982.
.... Obviously (see, for instance, 63] Reinforcement Learning tackles problems that are also closely related to those of adaptive control (e.g. 29] Formalisms such as Albus CMAC ( 1] have already been adopted by the RL community ( 37] Also Gullapalli s work exploits these similarities ([21]) 7 Conclusion In this paper the problem of automated synthesis of a control program for a robot starting from examples of correct behaviour supplied by a human teacher was addressed. The problem is interesting because of the impact it could have on the technology of the field. We have shown how ....
V. Gullapalli. Reinforcement Learning and its application to control. PhD thesis, University of Massachusetts, Department of Computer and Information Science, 1992.
....have a biological plausibility that backpropagation seems to lack [32] and they are applicable to problems more difficult than supervised learning. The two approaches can be effectively combined by using a backpropagation network with associative learning automata as output units. Gullapalli [45] applied this architecture successfully to a difficult robotics problem. He used SRV (Stochastic Real Valued) units, a generalization of the AR GammaP unit that produces real valued outputs. 3.2 Sequential Reinforcement Learning Sequential decision problems differ from single stage problems in ....
Gullapalli, V. Reinforcement Learning and Its Application to Control. PhD thesis, University of Massachusetts, 1992. COINS Technical Report 92-10.
....with the same generality and under similar constraints. Indeed, there is some evidence that RL algorithms may be faster than their only known competitor that is applicable with the same level of generality, namely classical DP methods (Barto and Singh [12, 11] Moore and Atkeson [79] Gullapalli [48]) The second misconception is the view that RL algorithms can only be used as weak methods. This misconception was perhaps generated inadvertently by the early developmental work on RL that used as illustrations applications with very little domain knowledge (Barto et.al. 13] Sutton [106] ....
....MDT with (R fl[P ] V ) playing the role of the immediate payoff function R (cf. Equation 2. 2) If the size of the action set A is large, finding the best action in a state can itself become computationally expensive and is the subject of current research (e.g. Gullapalli [48]) 15 complete agents in real life environments. This dissertation will deal exclusively with learning tasks that can be formulated as MDTs. In particular, the next two chapters will present theoretical results about the application of DP and RL algorithms to abstract MDTs without reference to ....
[Article contains additional citation context not shown here]
V. Gullapalli. Reinforcement Learning and its application to control. PhD thesis, University of Massachusetts, Amherst, MA 01003, 1992.
....transition function of the system in order to determine the optimal control law. IDP algorithms such as Q learning or TD( on the other hand, do not require such a 4 model. Under what conditions would it be advantageous to use a model free method instead of a model based method Gullapalli [31] discusses these questions in detail. We will only present an overview here, under the assumption that no model is initially available. If the model based approach is chosen, then it will be necessary to build first a model. There are two primary arguments for taking a model based approach. First, ....
....Quadratic control problem. The primary argument for taking the model free approach is that it may be less expensive to find the optimal (or least an acceptable) controller through direct interaction with the system than by building a model and then deriving a controller from the model. Gullapalli [31] describes experiments that support this argument. A model of the state transition function may be relatively easy to build for some dynamic systems, such as linear systems. However, an accurate model may be very difficult to obtain for other systems, such as financial markets, home heating and ....
Gullapalli, V. Reinforcement Learning and Its Application to Control. PhD thesis, University of Massachusetts, Amherst, MA, 1992.
....Lambda Figure 5: Learning time for different values of for accumulating eligibility traces (left) and replacing traces (right) Each point is an average of 30 simulations. not eliminate the need for solving the delayed reinforcement problem. Gullapalli has studied two implementations of shaping [Gullapalli, 1992]. In the first the complexity of the control task is gradually increased during learning, and the reinforcement function used is changed accordingly. In this way most of a training run is used in learning the approximation to the current target behavior. This system was used to make a simulated ....
....be considered as a very succesfull example of the use of shaping. Self play is a sort of shaping, since at first the agent plays against a nearly random opponent and thereby solves an easy task. The complexity of the task then grows as the agent gets better at playing. In Gullapalli s experiments [Gullapalli, 1992] and Selfridge, Sutton and Barto s [Selfridge et al. 1985] as well as in Dorigo, Colombetti and Borghi s [Colombetti et al. 1996, Dorigo and Colombetti, 1997] the agent received a different reinforcement signal over time for the same behavior. This is not in agreement with the original ....
Gullapalli, V. (1992). Reinforcement Learning and Its Application to Control. PhD thesis, University of Massachusetts. COINS Technical Report 92--10.
....subsequent process of determining an optimal policy because it does not involve the temporal credit assignment problem. However, it is important to note that determining an optimal policy from the optimal value function can also be computationally intensive, particularly for large action sets. See Gullapalli (1992) for a discussion of the scaling issues involved in deriving the optimal policies for MDTs with large action sets. 2 SCALING ISSUES FOR LEARNING ALGORITHMS BASED ON DYNAMIC PROGRAMMING Adaptive critic architectures based on Watkins s (1989) Q learning algorithm, or on Sutton s (1988) temporal ....
Gullapalli, V. (1992). Reinforcement Learning and its application to control. PhD thesis, University of Massachusetts, Amherst, MA 01003.
....some problems due to the resolution of the HJB equation are underlined and references to viscosity solutions are indicated. A number of reinforcement learning algorithms have been implemented with neural networks or other approximation systems (see Barto, Sutton and Anderson (1983) Barto (1990) Gullapalli (1992), Lin (1993) and many other) However, as it has been pointed out by Baird (1995) in general, these learning methods do not converge. The Residual gradient advantage updating proposed by Harmon, Baird and Klopf (1996) is a convergent algorithm in the sense of the convergence of gradient descent ....
Gullapalli, V.: 1992, Reinforcement Learning and its application to control, PhD thesis, University of Massachussetts, Amherst.
....the state and action as input and Q value as the output. Training such a network is not conceptually difficult, but using the network to find the optimal action can be a challenge. One method is to do a local gradient ascent search on the action in order to find one with high value [5] Gullapalli [31, 32] has developed a neural reinforcement learning unit for use in continuous action spaces. The unit generates actions with a normal distribution; it adjusts the mean and variance based on previous experience. When the chosen actions are not performing well, the variance is high, resulting in ....
Vijay Gullapalli. Reinforcement learning and its application to control. PhD thesis, University of Massachusetts, Amherst, MA, 1992.
....intense exploration to find a route out of the almost entirely enclosed start region. Having eventually reached a sufficiently high resolution, it discovers the gap and proceeds greedily towards the goal, only to be temporarily blocked by the goal s barrier region. c) The second trial. Gullapalli [43, 44] has developed a neural reinforcement learning unit for use in continuous action spaces. The unit generates actions with a normal distribution; it adjusts the mean and variance based on previous experience. When the chosen actions are not performing well, the variance is high, resulting in ....
Vijay Gullapalli. Reinforcement learning and its application to control. PhD thesis, University of Massachusetts, Amherst, MA, 1992.
....some kind of generalization. Generalization replaces costly training experience. Indeed, generalizing approximation techniques such as artificial neural networks and instance based methods have been used in practice with some remarkable success in domains such as game playing [17, 2] and robotics [5, 8, 10]. For example, Tesauro [17] reports a backgammon computer program that reaches grand master level of play, which has been constructed using a combination of reinforcement learning techniques and artificial neural networks. Despite these encouraging empirical results, little is known about the ....
V. Gullapalli, Reinforcement Learning and its Application to Control. PhD thesis, Department of Computer and Information Science, University of Massachusetts, 1992.
....and action as input and Q value as the output. Training such a network is not conceptually difficult, but using the network to find the optimal action can be a challenge. One method is to do a local gradient ascent search on the action in order to find one with high value (Baird Klopf, 1993) Gullapalli (1990, 1992) has developed a neural reinforcement learning unit for use in continuous action spaces. The unit generates actions with a normal distribution; it adjusts the mean and variance based on previous experience. When the chosen actions are not performing well, the variance is high, resulting in ....
Gullapalli, V. (1992). Reinforcement learning and its application to control. Ph.D. thesis, University of Massachusetts, Amherst, MA.
....which the controller was trained using the appropriate evaluation criterion at each time step as described above. The sensory feedback to the controller was also updated at each time step. Details of the implementation of the controller and the simulation of the calculator hand system are given in [3]. For the purpose of comparison, we also attempted to train a controller on the key pressing task without resorting to shaping. An identical training procedure was followed in this case, with the sole modification being the use of a single evaluation criterion that only rewarded pressing of the ....
V. Gullapalli. Reinforcement Learning and its application to control. PhD thesis, University of Massachusetts, Amherst, MA 01003, 1992.
No context found.
V. Gullapalli. Reinforcement Learning and its application to control. PhD thesis, University of Massachusetts, Amherst, MA 01003, 1992.
No context found.
V. Gullapalli, 1992a, Reinforcement learning and its application to control. Technical Report, COINS, 92-10, Ph. D. Thesis, University of Massachusetts, Amherst, MA, USA.
No context found.
Gullapalli, V. (1992). Reinforcement Learning and Its Application to Control.
No context found.
Vijay Gullapalli. Reinforcement Learning and its application to control. PhD thesis, University of Massachussetts, Amherst., 1992.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC