• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

A New Approach to Manipulator control: the Cerebellar Model Articulation Controller (1975)

by J S Albus
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 164
Next 10 →

Reinforcement learning: a survey

by Leslie Pack Kaelbling, Michael L. Littman, Andrew W. Moore - Journal of Artificial Intelligence Research , 1996
"... This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem ..."
Abstract - Cited by 1134 (21 self) - Add to MetaCart
This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word "reinforcement." The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.

Classifier Fitness Based on Accuracy

by Stewart W. Wilson , 1995
"... In many classifier systems, the classifier strength parameter serves as a predictor of future payoff and as the classifier's fitness for the genetic algorithm. We investigate a classifier system, XCS, in which each classifier maintains a prediction of expected payoff, but the classifier's fitness is ..."
Abstract - Cited by 239 (14 self) - Add to MetaCart
In many classifier systems, the classifier strength parameter serves as a predictor of future payoff and as the classifier's fitness for the genetic algorithm. We investigate a classifier system, XCS, in which each classifier maintains a prediction of expected payoff, but the classifier's fitness is given by a measure of the prediction's accuracy. The system executes the genetic algorithm in niches defined by the match sets, instead of panmictically. These aspects of XCS result in its population tending to form a complete and accurate mapping X x A => P from inputs and actions to payoff predictions. Further, XCS tends to evolve classifiers that are maximally general subject to an accuracy criterion. Besides introducing a new direction for classifier system research, these properties of XCS make it suitable for a wide range of reinforcement learning situations where generalization over states is desirable. Key words Classifier systems, strength, fitness, accuracy, mapping, generalizati...

Task Decomposition Through Competition in a Modular Connectionist Architecture

by Robert A. Jacobs - COGNITIVE SCIENCE , 1990
"... A novel modular connectionist architecture is presented in which the networks composing the architecture compete to learn the training patterns. As a result of the competition, different networks learn different training patterns and, thus, learn to compute different functions. The architecture pe ..."
Abstract - Cited by 167 (4 self) - Add to MetaCart
A novel modular connectionist architecture is presented in which the networks composing the architecture compete to learn the training patterns. As a result of the competition, different networks learn different training patterns and, thus, learn to compute different functions. The architecture performs task decomposition in the sense that it learns to partition a task into two or more functionally independent vii tasks and allocates distinct networks to learn each task. In addition, the architecture tends to allocate to each task the network whose topology is most appropriate to that task, and tends to allocate the same network to similar tasks and distinct networks to dissimilar tasks. Furthermore, it can be easily modified so as to...

Efficient Memory-based Learning for Robot Control

by Andrew William Moore, Trinity Hall , 1990
"... This dissertation is about the application of machine learning to robot control. A system which has no initial model of the robot/world dynamics should be able to construct such a model using data received through its sensors--an approach which is formalized here as the $AB (State-Action-Behaviour) ..."
Abstract - Cited by 94 (1 self) - Add to MetaCart
This dissertation is about the application of machine learning to robot control. A system which has no initial model of the robot/world dynamics should be able to construct such a model using data received through its sensors--an approach which is formalized here as the $AB (State-Action-Behaviour) control cycle. A method of learning is presented in which all the experiences in the lifetime of the robot are explicitly remembered. The experiences are stored in a manner which permits fast recall of the closest previous experience to any new situation, thus permitting very quick predictions of the effects of proposed actions and, given a goal behaviour, permitting fast generation of a candidate action. The learning can take place in high-dimensional non-linear control spaces with real-valued ranges of variables. Furthermore, the method avoids a number of shortcomings of earlier learning methods in which the controller can become trapped in inadequate performance which does not improve. Also considered is how the system is made resistant to noisy inputs and how it adapts to environmental changes. A well founded mechanism for choosing actions is introduced which solves the experiment/perform dilemma for this domain with adequate computational efficiency, and with fast convergence to the goal behaviour. The dissertation explefins in detail how the $AB control cycle can be integrated into both low and high complexity tasks. The methods and algorithms are evaluated with numerous experiments using both real and simulated robot domefins. The final experiment also illustrates how a compound learning task can be structured into a hierarchy of simple learning tasks.

NeuroAnimator: Fast Neural Network Emulation and Control of Physics-Based Models

by Radek Grzeszczuk , 1998
"... Animation through the numerical simulation of physics-based graphics models offers unsurpassed realism, but it can be computationally demanding. Likewise, finding controllers that enable physics-based models to produce desired animations usually entails formidable computational cost. This paper de ..."
Abstract - Cited by 78 (3 self) - Add to MetaCart
Animation through the numerical simulation of physics-based graphics models offers unsurpassed realism, but it can be computationally demanding. Likewise, finding controllers that enable physics-based models to produce desired animations usually entails formidable computational cost. This paper demonstrates the possibility of replacing the numerical simulation and control of model dynamics with a dramatically more efficient alternative. In particular, we propose the NeuroAnimator, a novel approach to creating physically realistic animation that exploits neural networks. NeuroAnimators are automatically trained off-line to emulate physical dynamics through the observation of physics-based models in action. Depending on the model, its neural network emulator can yield physically realistic animation one or two orders of magnitude faster than conventional numerical simulation. Furthermore, by exploiting the network structure of the NeuroAnimator, we introduce a fast algorithm for learning controllers that enables either physics-based models or their neural network emulators to synthesize motions satisfying prescribed animation goals. We demonstrate NeuroAnimators for passive and active (actuated) rigid body, articulated, and deformable physics-based models.

Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces

by Juan Carlos SantamarĂ­a, Richard S. Sutton, Ashwin Ram , 1996
"... A key element in the solution of reinforcement learning problems is the value function. The purpose of this function is to measure the long-term utility or value of any given state and it is important because an agent can use it to decide what to do next. A common problem in reinforcement learning w ..."
Abstract - Cited by 77 (6 self) - Add to MetaCart
A key element in the solution of reinforcement learning problems is the value function. The purpose of this function is to measure the long-term utility or value of any given state and it is important because an agent can use it to decide what to do next. A common problem in reinforcement learning when applied to systems having continuous states and action spaces is that the value function must operate with a domain consisting of real-valued variables, which means that it should be able to represent the value of infinitely many state and action pairs. For this reason, function approximators are used to represent the value function when a close-form solution of the optimal policy is not available. In this paper, we extend a previously proposed reinforcement learning algorithm so that it can be used with function approximators that generalize the value of individual experiences across both, state and action spaces. In particular, we discuss the benefits of using sparse coarse-coded funct...

Robust Non-linear Control through Neuroevolution

by Faustino John Gomez , 2003
"... ..."
Abstract - Cited by 75 (18 self) - Add to MetaCart
Abstract not found

Subspace Methods for Robot Vision

by Shree K. Nayar, Sameer A. Nene, Hiroshi Murase , 1995
"... In contrast to the traditional approach, visual recognition is formulated as one of matching appearance rather than shape. For any given robot vision task, all possible appearance variations define its visual workspace. A set of images is obtained by coarsely sampling the workspace. The image set is ..."
Abstract - Cited by 72 (2 self) - Add to MetaCart
In contrast to the traditional approach, visual recognition is formulated as one of matching appearance rather than shape. For any given robot vision task, all possible appearance variations define its visual workspace. A set of images is obtained by coarsely sampling the workspace. The image set is compressed to obtain a low-dimensional subspace, called the eigenspace, in which the visual workspace is represented as a continuous appearance manifold. Given an unknown input image, the recognition system first projects the image to eigenspace. The parameters of the vision task are recognized based on the exact location of the projection on the appearance manifold. Efficient algorithms for finding the closest manifold point are discussed. The proposed appearance representation has several applications in robot vision. As examples, a precise visual positioning system, a real-time visual tracking system, and a real-time temporal inspection system are described. The performance of these syst...

Biped dynamic walking using reinforcement learning

by Hamid Benbrahim, Judy A. Franklin - Robotics and Autonomous Systems , 1997
"... biped robot, legged robot. This paper presents some results from a study of biped dynamic walking using reinforcement learning. During this study a hardware biped robot was built, a new reinforcement learning algorithm as well as a new learning architecture were developed. The biped learned dynamic ..."
Abstract - Cited by 29 (0 self) - Add to MetaCart
biped robot, legged robot. This paper presents some results from a study of biped dynamic walking using reinforcement learning. During this study a hardware biped robot was built, a new reinforcement learning algorithm as well as a new learning architecture were developed. The biped learned dynamic walking without any previous knowledge about its dynamic model. The Self Scaling Reinforcement learning algorithm was developed in order to deal with the problem of reinforcement learning in continuous action domains. The learning architecture was developed in order to solve complex control problems. It uses different modules that consist of simple controllers and small neural networks. The architecture allows for easy incorporation of new modules that represent new knowledge, or new requirements for the desired task. 1

DENFIS: Dynamic Evolving Neural-Fuzzy Inference System and Its Application for Time-Series Prediction

by Nikola Kasabov, Qun Song , 2001
"... This paper introduces a new type of fuzzy inference systems, denoted as DENFIS (dynamic evolving neural-fuzzy inference system), for adaptive on-line and off-line learning, and their application for dynamic time series prediction. DENFIS evolve through incremental, hybrid (supervised/unsupervised), ..."
Abstract - Cited by 28 (7 self) - Add to MetaCart
This paper introduces a new type of fuzzy inference systems, denoted as DENFIS (dynamic evolving neural-fuzzy inference system), for adaptive on-line and off-line learning, and their application for dynamic time series prediction. DENFIS evolve through incremental, hybrid (supervised/unsupervised), learning and accommodate new input data, including new features, new classes, etc. through local element tuning. New fuzzy rules are created and updated during the operation of the system. At each time moment the output of DENFIS is calculated through a fuzzy inference system based on m-most activated fuzzy rules which are dynamically chosen from a fuzzy rule set. Two approaches are proposed: (1) dynamic creation of a first-order TakagiSugeno type fuzzy rule set for a DENFIS on-line model; (2) creation of a first-order TakagiSugeno type fuzzy rule set, or an expanded high-order one, for a DENFIS off-line model. A set of fuzzy rules can be inserted into DENFIS before, or during its learning process. Fuzzy rules can also be extracted during the learning process or after it. An evolving clustering method (ECM), which is employed in both on-line and off-line DENFIS models, is also introduced. It is demonstrated that DENFIS can effectively learn complex temporal sequences in an adaptive way and outperform some well known, existing models.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University