| S. Schaal and C. Atkeson. Robot juggling: An implementation of memory-based learning. In IEEE Control Systems, volume 14, pages 57--71, 1994. |
....is tedious, we may not know the environment ahead of time, and we want the robots to adapt their behaviours as the environment changes. Reinforcement learning (RL) 10] has been used by a number of researchers as a computational tool for building robots that improve themselves with experience [11, 16, 17, 19, 27]. Strictly speaking, reinforcement learning is a problem formulation. It de nes the interaction between a learning agent and its environment in terms of states, actions, and rewards. A reinforcement learning agent improves its performance on sequential tasks using reward and punishment received ....
S. Schaal and C. G. Atkeson. Robot juggling: An implementation of memory-based learning. Control Systems Magazine, 14, 1994.
....that work concurrently where the output from a unit can subsume the output of other units or altering the activities of other units. The composition of these units must be carefully designed to achieve the required behavior. Reactive systems have been used successfully in many robotic systems [6,7,8,9,10] including multi agent robots or the robots that work in team such as [11] Koza used Genetic Programming [12] to evolve programs to control robots to perform the desired task. Robot programs can be regarded as a plan . The structure of plan is in the form of a tree where the internal nodes are ....
Schaal, S. and Atkeson, C.G. Robot Juggling: Implementation of Memory-Based Learning, IEEE Control Systems, vol. 14, no. 1 (Feb. 1994) 57-71.
....has been successfully applied to numerous motor control problems, either simulated or real [1, 6, 11, 19] These results are rather spectacular, but successes are restricted to the control of mechanical systems with few degrees of freedom, such as the cart pole task or the acrobot. Schaal et al. [15, 16] successfully taught robots with many degrees of freedom to perform motor tasks, but the learning consisted only in estimating a model of state dynamics, and no value function was estimated (a linear quadratic regulator was used to generate controls) As far as I know, no value function with more ....
Stefan Schaal and Christopher G. Atkeson. Robot juggling: An implementation of memory-based learning. Control Systems Magazine, 14:57--71, 1994.
....reels. Morimoto et Doya [42] ont combine l experience simulee avec l experience reelle pour apprendre a un robot a se mettre debout avec l algorithme du Q learning. Schaal et Atkeson ont aussi utilise avec succes l apprentissage par renforcement model base dans leurs experiences de robot jongleur [59]. 11 ESUM E (SUMMARY IN FRENCH) Quasiment tous les algorithmes d apprentissage par renforcement font appel a l estimation de fonctions valeur qui indiquent a quel point il est bon d etre dans un etat donne (en termes de recompense totale attendue dans le long terme) ou a quel point il est bon ....
.... and gradually build a strategy that tends to obtain a maximum reward [67, 33] These algorithms have been successfully applied to complex problems such as board games [69] job shop scheduling [80] elevator dispatching [20] and, of course, motor control tasks, either simulated [66, 24] or real [41, 59]. 28 REINFORCEMENT LEARNING USING NEURAL NETWORKS Model Based versus Model Free These reinforcement learning algorithms can be divided into two categories: model based (or indirect) algorithms, which use an estimation of the system s dynamics, and model free (or direct) algorithms, which do not. ....
[Article contains additional citation context not shown here]
Stefan Schaal and Christopher G. Atkeson. Robot juggling: An implementation of memory-based learning. Control Systems Magazine, 14:57-- 71, 1994. 11, 28, 29
....to be similar to SAM in philosophy. The MBR model is also based on a set of stored data and rules, some form of matching of a new input stimulus with similar stored data, and interpolation between them. MBR has also been employed successfully in pattern classification problems [2] and control [10], often drawing from a database of rules. SAM, on the other hand, takes the idea further structurally and computationally, conforming the idea to an adaptive system identification model with different variations of local interpolative functions. Looking at its weaknesses, we see that SAM system ....
Schaal, S. and C.G. Atkeson, "Robot Juggling: Implementation of Memory-Based Learning," IEEE Control Systems, vol. 14, no. 1 (Feb. 1994) 57-71.
....in other domains. 8.2 Robotics and Control In recentyears there have been many robotics and control applications that have used reinforcement learning. Here we will concentrate on the following four examples, although many other interesting ongoing robotics investigations are underway. 1. Schaal and Atkeson #1994# constructed a two armed robot, shown in Figure 11, that learns to juggle a device known as a devil stick. This is a complex non linear control task involving a six dimensional state space and less than 200 msecs per control decision. After about 40 initial attempts the robot learns to keep ....
Schaal, S., & Atkeson, C. #1994#. Robot juggling: An implementation of memory-based learning. Control Systems Magazine, 14.
....regression #LWR# is a memory based method that performs a regression around a pointofinterest using only training data that are #local to that point. One recent study demonstrated that LWR was suitable for real time control by constructing an LWR based system that learned a di#cult juggling task #Schaal Atkeson, 1994#. o o o o o o o o o o o o o x Figure 2: In locally weighted regression, points are weighted by proximity to the current x in question using a kernel. A regression is then computed using the weighted points. We consider here a form of locally weighted regression that is a variant of the LOESS ....
....is to set k such that the reference point being predicted has a predetermined amount of support, that is, k is set so that n is close to some target value. This has the disadvantage of requiring assumptions about the noise and smoothness of the function being learned. Another technique, used bySchaal and Atkeson #1994#, sets k to minimize the crossvalidated error on the training set. A disadvantage of this technique is that it assumes the distribution of the training set is representativeofP #x#, whichit may not be in an active learning situation. A third method, also described bySchaal and Atkeson #1994#, is ....
[Article contains additional citation context not shown here]
Schaal, S., & Atkeson, C. #1994#. Robot juggling: An implementation of memory-based learning. Control Systems, 14, 57#71.
....to domains from real time control. These domains require real time actionselection and convergence to optimal behaviors but, at the same time, the setup for each trial is expensive and thus it is important to keep the number of trials small. For learning how to balance poles or juggle devil sticks (Schaal Atkeson 1994), for example, the pole needs to be picked up and brought into the initial position before every trial. Domains from real time control are typically directed and sometimes probabilistic, and we have not yet applied FALCONS to domains with these properties. The main difficulty of applying FALCONS ....
Schaal, S., and Atkeson, C. 1994. Robot juggling: An implementation of memory-based learning. Control Systems Magazine 14.
....developed in the machine learning field (Aha, 1989) Bottou Vapnik, 1992) Recent work on lazy learning (a.k. a just in time learning) gave a new impetus to the adoption of local techniques for modeling (Atkeson et al. 1997a) Stenman et al. 1996) and control problem (Tolle et al. 1992) (Schaal Atkeson, 1994), Atkeson et al. 1997b) The new promising feature of this local paradigm is the adoption of enhanced statistical procedures to identify the local approximator. One example is the PRESS statistic (Myers, 1990) which is a simple, well founded and economical way to perform leave oneout cross ....
Schaal S. & Atkeson C. G. 1994. Robot Juggling: Implementation of Memory-Based Learning.
....and developed in the machine learning field (Aha, 1989) Bottou Vapnik, 1992) Recent work on lazy learning (a.k. a just in time learning) gave a new impetus to the adoption of local techniques for modeling (Atkeson et al. 1997a) Stenman et al. 1996) and control problem (Tolle et al. 1992) (Schaal Atkeson, 1994), Atkeson et al. 1997b) The new promising feature of this local paradigm is the adoption of enhanced statistical procedures to identify the local approximator. One example is the PRESS statistic (Myers, 1990) which is a simple, well founded and economical way to perform leave one out cross ....
Schaal S. & Atkeson C. G. 1994. Robot Juggling: Implementation of Memory-Based Learning. IEEE Control Systems, February, 57--71.
....declarative. See Matari c (1990a) for examples. 2 Author s bias: declarative learning can be further divided into as many interesting categories, but is not the area pursued here. other approaches have also been studied (e.g. Atkeson, Aboaf, McIntyre Reinkensmeyer (1988) Atkeson (1990) Schaal Atkeson (1994)) Adaptive control problems typically deal with learning complex dynamical systems with non linearly coupled degrees of freedom usually involved in moving multi jointed manipulators, objects, and physical bodies. 6.2.3 Learning New Behaviors Learning new behaviors deals with the problem of ....
....and adaptivity that are difficult to state precisely without domain specific grounding. Consequently, most learning control problems appear to be instances of behavior learning, such as learning to balance a pole (Barto, Sutton Anderson 1983) to play billiards (Moore 1992) and to juggle (Schaal Atkeson 1994). Furthermore, work on action selection, deciding what action to make in each state, can be viewed as learning a higher level behavior as an abstraction on the state action space. For example, a maze learning system can be said to learn a specific maze solving behavior. Genetic learning has ....
[Article contains additional citation context not shown here]
Schaal, S. & Atkeson, C. G. (1994), `Robot Juggling: An Implementation of MemoryBased Learning', Control Systems Magazine.
....like all variance minimization, becomes an approximation of the optimal approach. The learning model discussed in this paper is a form of locally weighted regression (LWR) Cleveland et al. 1988] which has been used in difficult machine learning tasks, notably the robot juggler of Schaal and Atkeson [1994]. Previous work [Cohn et al. 1995] discussed allvariance query selection for LWR; in the remainder of this paper, I describe a method for performing all bias query selection. Section 2 describes the criterion that must be optimized for all bias query selection. Section 3 describes the locally ....
....to points in the training set, from which y(x) is computed via linear regression. The low cost of incorporating new training examples makes this form of locally weighted regression appealing for learning systems which must operate in real time, or with time varying target functions (e.g. [Schaal and Atkeson 1994]) 3.1 Computing Delta y for LWR If we know what new point ( x; y) we re going to add, computing Delta y for LWR is straightforward. Defining h as the weight given to x, we can write Delta y = y 0 Gamma y (7) 0 y oe 0 xy oe 02 x (x Gamma 0 x ) Gamma y ....
Schaal, S. & Atkeson, C. (1994). Robot Juggling: An Implementation of Memory-based Learning. Control Systems 14, 57--71.
....control in general, and in manipulation in specific. Manipulator work largely focuses on the difficult problems of inverse kinematics and dynamics (Paul 1981, Brady et al. 1982) and learning techniques are often employed to overcome the high dimensionality of the problem (Atkeson 1989, Schaal Atkeson 1994). By using generic behaviors, we hope to develop stereotyped parametric solutions to common motor control problems, such as reaching to a point and achieving postures, so that a motor system can perform less run time computation (i.e. avoid solving the find path problem (Lozano P erez 1982) for ....
Schaal, S. & Atkeson, C. C. (1994), `Robot Juggling: An Implementation of Memory-Based Learning', Control Systems Magazine 14, 57--71.
.... solving the IK for the manipulator s joint angles (Paul 1981, Brady et al. 1982) Various neural network approaches to learning IK for simple manipulators have been explored and more sophisticated learning methods for dynamic tasks and higher DOF systems are being developed (Atkeson 1989, Schaal Atkeson 1994). The work most similar to the force field approach we describe was performed by Williamson (1996) who presented a controller for a 6 DOF robot arm, based on the same biological evidence we describe next. It consists of four behaviors: three reaching and one resting posture; intermediate targets ....
Schaal, S. & Atkeson, C. C. (1994), `Robot Juggling: An Implementation of Memory-Based Learning', Control Systems Magazine 14, 57--71.
.... learning to teach a vision based robot how to shoot a ball into a goal [2] Both the genetic algorithm and reinforcement learning are essentially discrete state techniques; it seems like continuous techniques taken from control theory such as adaptive control might be best for motor control tasks [12, 13]. 4 Administrative details The majority of this work will be carried out in the Yale Vision and Robotics Laboratory under the supervision of Professor Gregory Hager of the Computer Science Department. I expect to complete it in time to defend and receive my degree in the spring of 1998. ....
S. Schaal and C. Atkeson. Robot Juggling: Implementation of Memory-Based Learning. IEEE Control Systems, Vol. 14, No. 1, pp. 57-71, 1994.
....1 1 (x Gammax i ) 2 , but any function that weights the nearer neighbors more will work. The shape of the weighting function also affects the form of the resulting approximation. Examples of robot learning with local linear models can be found in [Atkeson, 1991; Schaal and Atkeson, 1993a; Schaal and Atkeson, 1994] Retrieval efficiency and other limitations Memory based function approximation is attractive because it follows a policy of least commitment. When it receives training data it makes no decision about how to use it for future queries, and just stores the data instead. This policy allows any ....
S. Schaal and C. Atkeson, "Robot Juggling: An Implementation of Memory-Based Learning," IEEE Control Systems Magazine, 14(1), 1994.
....function in a compact form. There have been several function approximation methods studied in the discounted RL literature, including neural network learning, clustering, memory based methods, and locally weighted regression (Lin, 1992; Boyan Moore, 1994; Mahadevan Connell, 1992; Moore, 1990; Schaal Atkeson, 1994). Two characteristics of the AGV scheduling domain attracted us to local linear regression as the method of choice. First, the location of the AGV is one of the most important features of the state in this domain. Any function approximation scheme must be able to generalize specific locations of ....
....is represented by a set of k linear features and n Gamma k nonlinear features. Our value function approximation is limited to generalizing the values of the k linear features. This is similar to Locally Weighted Regression (LWR) where the nonlinear features are given infinitely large weights (Schaal Atkeson, 1994). It also means that the value function h may only be locally linear. We represent the value function using a select set of states and their h values, called the exemplars. Suppose we need an estimate of h(p) which has values x p1 ; x pk ; x pn for its n features, where the ....
[Article contains additional citation context not shown here]
Schaal, S., & Atkeson, C. (1994). Robot juggling: An implementation of memory-based learning. In IEEE Control Systems, pp. 14:57--71.
....problem termed instance based state identification. The approach was inspired by the successes of instance based (also called memory based ) methods for learning in continuous perception spaces (i.e. Atkeson, 1992; Moore, 1992 ] These methods have also had success with physical robots [ Schaal and Atkeson, 1994; Schneider, 1994 ] The application of instance based learning to memory for state identification is driven by the important insight that learning in continuous spaces and learning with hidden state have a crucial feature in common: they both begin learning without knowing the final granularity ....
Stefan Schaal and Christopher Atkeson. Robot juggling: An implementation of memory-based learning. Control Systems Mag., Feb 1994.
....state problem we term instance based state identification. The approach was inspired by the successes of instance based (also called memory based ) methods for learning in continuous perception spaces, i.e. Atkeson, 1992; Moore, 1992] These methods have also had success with physical robots [Schaal and Atkeson, 1994; Schneider, 1994] The term memory based introduces an unfortunate conflict of vocabulary. Here memory does not refer to short term memory of past perceptions Learning in a Geometric Space Learning in a Sequence Space 5 2 9 1 8 5 5 k nearest neighbor, k = 3 3 2 8 2 1 8 5 5 5 5 8 5 5 ....
Stefan Schaal and Christopher G. Atkeson. Robot juggling: An implementation of memory-based learning. Control Systems Magazine, February 1994.
....3.1 Learning Devil Sticking Devil sticking is a juggling task where a center stick is batted back and forth between two handsticks (Figure 1a) Figure 1b shows a sketch of our devil sticking robot. The robot uses its top two joints to perform planar devil sticking; more details can be found in (Schaal and Atkeson, 1994). The task of the robot is to learn a continuous left rightleft etc. juggling pattern. For learning, the task is modeled as a discrete function that maps impact states on one hand to impact states on the other hand. A state is given as a five dimensional vector x = pxy T , ....
....ideally suited for LWR as it is not too high dimensional and new training data are only generated at about 1 2Hz, i.e. whenever the center stick hits one of the handsticks. Moreover, memory based learning also supports efficient search of the state action space for statistically good new commands (Schaal and Atkeson, 1994). As a result, successful (i.e. more than 1000 consecutive hits) devil sticking could be achieved in about 40 80 trials, corresponding to about 300 800 training points in memory (Figure 2) This is a remarkable learning rate given that humans need about one week of 1 hour practicing a day before ....
Schaal, S, Atkeson, CG (1994). Robot juggling: An implementation of memory-based learning. Control Systems Magazine 14:57-71.
....average and locally higher order models; the former tend to introduce too much bias while the latter require fitting many parameters which is computationally expensive and needs a lot of data. The algorithm which we explore here, locally weighted regression (LWR) Atkeson, 1992, Moore, 1991, Schaal Atkeson, 1994), is closely related to versions suggested by Cleveland et al. 1979, 1988) and Farmer Siderowich (1987) A LWR model is trained by simply storing every experience as an input output pair in memory. If an output y q is to be generated from a given input x q , the it is computed by fitting a ....
....before they achieve decent juggling performance. In comparison to this, the learning algorithm performed very well. However, it has to be pointed out that the learned controllers were only local and could not cope with larger perturbations. A detailed description of this experiment can be found in Schaal Atkeson (1994). t 3 t 1 t 2 a q x, y p (a) b ) 1 11 21 31 41 51 0 200 400 600 800 1000 1200 Number of Hits per Trial Trial Number (c) Figure 4: a) illustration of devilsticking, b) a devilsticking robot, c) learning curve of robot 8 CONCLUSIONS One of the advantages of ....
Schaal, S., Atkeson, C.G. (1994), "Robot Juggling: An Implementation of Memory-based Learning", to appear in:Control Systems Magazine, Feb. (1994).
....3.1 Learning of Devil Sticking Devil sticking is a juggling task where a center stick is batted back and forth between two handsticks (Figure 1a) Figure 1b shows a sketch of our devil sticking robot. The robot uses its top two joints to perform planar devil sticking; more details can be found in [21]) The task of the robot is to learn a continuous left right left etc. juggling pattern. For the purpose of learning, the task is modeled as a discrete function that maps impact states on one hand to impact states on the other hand. A state is given as a 5 dimensional vector x = pxy T ....
....to five dimensional output function. This task is ideally suited for LWR as it is not too high dimensional and new training data are only generated at about 1 2Hz. Moreover, the memorybased learning also allows to efficiently search the stateactions space for statistically good new commands ([21]) As a result, successful devil sticking can be achieved in about 40 80 trials, corresponding to about 300 800 training points in memory (Figure 2) This is a remarkable learning speed given that humans need about one week of 1 hour practicing a day before they learn to juggle the ....
S. Schaal and C. G. Atkeson, "Robot juggling: An implementation of memory-based learning," Control Systems Magazine, vol. 14, pp. 57-71, 1994.
....experiment design strategy is to choose actions at random. Far more effective, however, is to choose datapoints which, given the uncertainty inherent in the prediction, are considered most likely to achieve the desired behavior. This can considerably reduce the exploration required [ Moore, 1991a, Schaal and Atkeson, 1994b, Cohn et al. 1995 ] Example (robotic) Billiards Some experiments were performed with the billiards robot shown in Figure 3. See [ Moore, 1992, Moore et al. 1992 ] for more details of the experiment. The equipment consists of a small (1:5m Theta 0:75m) pool table, a spring actuated cue ....
....system at a fixed setpoint. If the location of the setpoint were known in advance we could use conventional LQR in combination with our model. If the location of the setpoint is unknown, we have an additional part of the task to learn, which can be addressed by the shifting setpoint algorithm [ Schaal and Atkeson, 1994a ] The shifting setpoint algorithm (SSA) attempts to decompose the control problem into two separate control tasks on different time scales. At the fast time scale, it acts as a nonlinear regulator by trying to keep the controlled system at some chosen setpoints. On a slower time scale, the ....
[Article contains additional citation context not shown here]
S. Schaal and C. Atkeson. Robot Juggling: An Implementation of Memory-based Learning. Control Systems Magazine, 14, 1994.
No context found.
S. Schaal and C. Atkeson. Robot juggling: An implementation of memory-based learning. In IEEE Control Systems, volume 14, pages 57--71, 1994.
No context found.
Schall, S and Atkeson, C. G. Robot Juggling: Implementation of Memory Based Learning. IEEE Control Systems. 14(1), 1994.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC