Results 1 - 10
of
36
A Survey of Multi-Objective Sequential Decision-Making
"... Sequential decision-making problems with multiple objectives arise naturally in practice and pose unique challenges for research in decision-theoretic planning and learning, which has largely focused on single-objective settings. This article surveys algorithms designed for sequential decision-makin ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
(Show Context)
Sequential decision-making problems with multiple objectives arise naturally in practice and pose unique challenges for research in decision-theoretic planning and learning, which has largely focused on single-objective settings. This article surveys algorithms designed for sequential decision-making problems with multiple objectives. Though there is a growing body of literature on this subject, little of it makes explicit under what circumstances special methods are needed to solve multi-objective problems. Therefore, we identify three distinct scenarios in which converting such a problem to a single-objective one is impossible, infeasible, or undesirable. Furthermore, we propose a taxonomy that classifies multi-objective methods according to the applicable scenario, the nature of the scalarization function (which projects multi-objective values to scalar ones), and the type of policies considered. We show how these factors determine the nature of an optimal solution, which can be a single policy, a convex hull, or a Pareto front. Using this taxonomy, we survey the literature on multi-objective methods for planning and learning. Finally, we discuss key applications of such methods and outline opportunities for future work. 1.
Fast damage recovery in robotics with the T-resilience algorithm
- Int. J. Robot. Res
"... Damage recovery is critical for autonomous robots that need to operate for a long time without assistance. Most current methods are complex and costly because they require antici-pating each potential damage in order to have a contingency plan ready. As an alternative, we introduce the T-resilience ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
Damage recovery is critical for autonomous robots that need to operate for a long time without assistance. Most current methods are complex and costly because they require antici-pating each potential damage in order to have a contingency plan ready. As an alternative, we introduce the T-resilience al-gorithm, a new algorithm that allows robots to quickly and au-tonomously discover compensatory behaviors in unanticipated situations. This algorithm equips the robot with a self-model and discovers new behaviors by learning to avoid those that perform differently in the self-model and in reality. Our algo-rithm thus does not identify the damaged parts but it implicitly searches for efficient behaviors that do not use them. We evalu-ate the T-Resilience algorithm on a hexapod robot that needs to adapt to leg removal, broken legs and motor failures; we com-pare it to stochastic local search, policy gradient and the self-modeling algorithm proposed by Bongard et al. The behavior of the robot is assessed on-board thanks to a RGB-D sensor and a SLAM algorithm. Using only 25 tests on the robot and an overall running time of 20 minutes, T-Resilience consistently leads to substantially better results than the other approaches. 1.
Fact-Finding Review of
- ICT in Development in a Rural-Urban Setting. 2003, United Nations Human Settlements Programme: Hong Kong
"... doi: 10.3389/fnene.2010.00015 Does neural input or processing play a greater role in the magnitude of neuroimaging signals? ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
doi: 10.3389/fnene.2010.00015 Does neural input or processing play a greater role in the magnitude of neuroimaging signals?
Learning Complex Neural Network Policies with Trajectory Optimization
"... Direct policy search methods offer the promise of automatically learning controllers for com-plex, high-dimensional tasks. However, prior ap-plications of policy search often required spe-cialized, low-dimensional policy classes, limit-ing their generality. In this work, we introduce a policy search ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Direct policy search methods offer the promise of automatically learning controllers for com-plex, high-dimensional tasks. However, prior ap-plications of policy search often required spe-cialized, low-dimensional policy classes, limit-ing their generality. In this work, we introduce a policy search algorithm that can directly learn high-dimensional, general-purpose policies, rep-resented by neural networks. We formulate the policy search problem as an optimization over trajectory distributions, alternating between opti-mizing the policy to match the trajectories, and optimizing the trajectories to match the policy and minimize expected cost. Our method can learn policies for complex tasks such as bipedal push recovery and walking on uneven terrain, while outperforming prior methods. 1.
Design of a Control Architecture for Habit Learning in Robots
"... Abstract. Researches in psychology and neuroscience have identified multiple decision systems in mammals, enabling control of behavior to shift with training and familiarity of the environment from a goal-directed system to a habitual system. The former relies on the explicit estimation of future co ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
(Show Context)
Abstract. Researches in psychology and neuroscience have identified multiple decision systems in mammals, enabling control of behavior to shift with training and familiarity of the environment from a goal-directed system to a habitual system. The former relies on the explicit estimation of future consequences of actions through planning towards a particular goal, which makes decision time longer but produces rapid adaptation to changes in the environment. The latter learns to associate values to par-ticular stimulus-response associations, leading to quick reactive decision-making but slow relearning in response to environmental changes. Com-putational neuroscience models have formalized this as a coordination of model-based and model-free reinforcement learning. From this inspira-tion we hypothesize that it could enable robots to learn habits, detect when these habits are appropriate and thus avoid long and costly com-putations of the planning system. We illustrate this in a simple repetitive cube-pushing task on a conveyor belt, where a speed-accuracy trade-off is required. We show that the two systems have complementary advantages in these tasks, which can be combined for performance improvement.
Policy Search For Learning Robot Control Using Sparse Data
"... Abstract — In many complex robot applications, such as grasping and manipulation, it is difficult to program desired task solutions beforehand, as robots are within an uncertain and dynamic environment. In such cases, learning tasks from experience can be a useful alternative. To obtain a sound lear ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Abstract — In many complex robot applications, such as grasping and manipulation, it is difficult to program desired task solutions beforehand, as robots are within an uncertain and dynamic environment. In such cases, learning tasks from experience can be a useful alternative. To obtain a sound learning and generalization performance, machine learning, especially, reinforcement learning, usually requires sufficient data. However, in cases where only little data is available for learning, due to system constraints and practical issues, reinforcement learning can act suboptimally. In this paper, we investigate how model-based reinforcement learning, in partic-ular the probabilistic inference for learning control method (PILCO), can be tailored to cope with the case of sparse data to speed up learning. The basic idea is to include further prior knowledge into the learning process. As PILCO is built on the probabilistic Gaussian processes framework, additional system knowledge can be incorporated by defining appropriate prior distributions, e.g. a linear mean Gaussian prior. The resulting PILCO formulation remains in closed form and analytically tractable. The proposed approach is evaluated in simulation as well as on a physical robot, the Festo Robotino XT. For the robot evaluation, we employ the approach for learning an object pick-up task. The results show that by including prior knowledge, policy learning can be sped up in presence of sparse data. I.
Planning for Decentralized Control of Multiple Robots Under Uncertainty
"... We describe a probabilistic framework for synthesizing con-trol policies for general multi-robot systems, given environ-ment and sensor models and a cost function. Decentral-ized, partially observable Markov decision processes (Dec-POMDPs) are a general model of decision processes where a team of ag ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
We describe a probabilistic framework for synthesizing con-trol policies for general multi-robot systems, given environ-ment and sensor models and a cost function. Decentral-ized, partially observable Markov decision processes (Dec-POMDPs) are a general model of decision processes where a team of agents must cooperate to optimize some objective (specified by a shared reward or cost function) in the presence of uncertainty, but where communication limitations mean that the agents cannot share their state, so execution must proceed in a decentralized fashion. While Dec-POMDPs are typically intractable to solve for real-world problems, recent research on the use of macro-actions in Dec-POMDPs has significantly increased the size of problem that can be prac-tically solved as a Dec-POMDP. We describe this general model, and show how, in contrast to most existing methods that are specialized to a particular problem class, it can syn-thesize control policies that use whatever opportunities for coordination are present in the problem, while balancing off uncertainty in outcomes, sensor information, and information about other agents. We use three variations on a warehouse task to show that a single planner of this type can generate cooperative behavior using task allocation, direct communi-cation, and signaling, as appropriate.
Learning neural network policies with guided policy search under unknown dynamics
- In Advances in Neural Information Processing Systems
, 2014
"... We present a policy search method that uses iteratively refitted local linear models to optimize trajectory distributions for large, continuous problems. These tra-jectory distributions can be used within the framework of guided policy search to learn policies with an arbitrary parameterization. Our ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
We present a policy search method that uses iteratively refitted local linear models to optimize trajectory distributions for large, continuous problems. These tra-jectory distributions can be used within the framework of guided policy search to learn policies with an arbitrary parameterization. Our method fits time-varying linear dynamics models to speed up learning, but does not rely on learning a global model, which can be difficult when the dynamics are complex and discontinuous. We show that this hybrid approach requires many fewer samples than model-free methods, and can handle complex, nonsmooth dynamics that can pose a challenge for model-based techniques. We present experiments showing that our method can be used to learn complex neural network policies that successfully execute simulated robotic manipulation tasks in partially observed environments with nu-merous contact discontinuities and underactuation. 1
An Intelligent Control System for Mobile Robot Navigation Tasks in Surveillance
"... Abstract. In recent years, the autonomous mobile robot has found diverse applications such as home/health care system, surveillance system in civil and military applications and exhibition robot. For surveillance tasks such as moving target pursuit or following and patrol in a region using mobile ro ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Abstract. In recent years, the autonomous mobile robot has found diverse applications such as home/health care system, surveillance system in civil and military applications and exhibition robot. For surveillance tasks such as moving target pursuit or following and patrol in a region using mobile robot, this paper presents a fuzzy Q-learning, as an intelligent control for cost-based navigation, for autonomous learning of suitable behaviors without the supervision or external human command. The Q-learning is used to select the appropriate rule of interval type-2 fuzzy rule base. The initial testing of the intelligent control is demonstrated by simulation as well as experiment of a simple wall-following based patrolling task of autonomous mobile robot.
Towards Robot Skill Learning: From Simple Skills to Table Tennis
"... Abstract. Learning robots that can acquire new motor skills and refine existing ones have been a long-standing vision of both robotics, and machine learning. However, off-the-shelf machine learning appears not to be adequate for robot skill learning, as it neither scales to anthropomorphic robotics ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Learning robots that can acquire new motor skills and refine existing ones have been a long-standing vision of both robotics, and machine learning. However, off-the-shelf machine learning appears not to be adequate for robot skill learning, as it neither scales to anthropomorphic robotics nor do fulfills the crucial real-time requirements. As an alternative, we propose to divide the generic skill learning problem into parts that can be well-understood from a robotics point of view. In this context, we have developed machine learning methods applicable to robot skill learning. This paper discusses recent progress ranging from simple skill learning problems to a game of robot table tennis. 1