Results 1 - 10
of
1,713
Reinforcement Learning I: Introduction
, 1998
"... In which we try to give a basic intuitive sense of what reinforcement learning is and how it differs and relates to other fields, e.g., supervised learning and neural networks, genetic algorithms and artificial life, control theory. Intuitively, RL is trial and error (variation and selection, search ..."
Abstract
-
Cited by 5614 (118 self)
- Add to MetaCart
In which we try to give a basic intuitive sense of what reinforcement learning is and how it differs and relates to other fields, e.g., supervised learning and neural networks, genetic algorithms and artificial life, control theory. Intuitively, RL is trial and error (variation and selection, search) plus learning (association, memory). We argue that RL is the only field that seriously addresses the special features of the problem of learning from interaction to achieve long-term goals.
Ant Colony System: A cooperative learning approach to the traveling salesman problem
- IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION
, 1997
"... This paper introduces the ant colony system (ACS), a distributed algorithm that is applied to the traveling salesman problem (TSP). In the ACS, a set of cooperating agents called ants cooperate to find good solutions to TSP’s. Ants cooperate using an indirect form of communication mediated by a pher ..."
Abstract
-
Cited by 1029 (53 self)
- Add to MetaCart
This paper introduces the ant colony system (ACS), a distributed algorithm that is applied to the traveling salesman problem (TSP). In the ACS, a set of cooperating agents called ants cooperate to find good solutions to TSP’s. Ants cooperate using an indirect form of communication mediated by a pheromone they deposit on the edges of the TSP graph while building solutions. We study the ACS by running experiments to understand its operation. The results show that the ACS outperforms other nature-inspired algorithms such as simulated annealing and evolutionary computation, and we conclude comparing ACS-3-opt, a version of the ACS augmented with a local search procedure, to some of the best performing algorithms for symmetric and asymmetric TSP’s.
Evolving Neural Networks through Augmenting Topologies
- Evolutionary Computation
"... An important question in neuroevolution is how to gain an advantage from evolving neural network topologies along with weights. We present a method, NeuroEvolution of Augmenting Topologies (NEAT), which outperforms the best fixed-topology method on a challenging benchmark reinforcement learning task ..."
Abstract
-
Cited by 536 (112 self)
- Add to MetaCart
(Show Context)
An important question in neuroevolution is how to gain an advantage from evolving neural network topologies along with weights. We present a method, NeuroEvolution of Augmenting Topologies (NEAT), which outperforms the best fixed-topology method on a challenging benchmark reinforcement learning task. We claim that the increased efficiency is due to (1) employing a principled method of crossover of different topologies, (2) protecting structural innovation using speciation, and (3) incrementally growing from minimal structure. We test this claim through a series of ablation studies that demonstrate that each component is necessary to the system as a whole and to each other. What results is significantly faster learning. NEAT is also an important contribution to GAs because it shows how it is possible for evolution to both optimize and complexify solutions simultaneously, offering the possibility of evolving increasingly complex solutions over generations, and strengthening the analogy with biological evolution.
The dynamics of reinforcement learning in cooperative multiagent systems
- IN PROCEEDINGS OF NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-98
, 1998
"... Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate their action choices in multiagent systems. We examine some of the factors that can influence the dynamics of the learning process in such a setting. We first distinguish reinforcement learners that a ..."
Abstract
-
Cited by 377 (1 self)
- Add to MetaCart
(Show Context)
Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate their action choices in multiagent systems. We examine some of the factors that can influence the dynamics of the learning process in such a setting. We first distinguish reinforcement learners that are unaware of (or ignore) the presence of other agents from those that explicitly attempt to learn the value of joint actions and the strategies of their counterparts. We study (a simple form of) Q-learning in cooperative multiagent systems under these two perspectives, focusing on the influence of that game structure and exploration strategies on convergence to (optimal and suboptimal) Nash equilibria. We then propose alternative optimistic exploration strategies that increase the likelihood of convergence to an optimal equilibrium.
Multiagent Systems: A Survey from a Machine Learning Perspective
- AUTONOMOUS ROBOTS
, 1997
"... Distributed Artificial Intelligence (DAI) has existed as a subfield of AI for less than two decades. DAI is ..."
Abstract
-
Cited by 372 (24 self)
- Add to MetaCart
(Show Context)
Distributed Artificial Intelligence (DAI) has existed as a subfield of AI for less than two decades. DAI is
AntNet: Distributed stigmergetic control for communications networks
- JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 1998
"... This paper introduces AntNet, a novel approach to the adaptive learning of routing tables in communications networks. AntNet is a distributed, mobile agents based Monte Carlo system that was inspired by recent work on the ant colony metaphor for solving optimization problems. AntNet's agents co ..."
Abstract
-
Cited by 336 (31 self)
- Add to MetaCart
(Show Context)
This paper introduces AntNet, a novel approach to the adaptive learning of routing tables in communications networks. AntNet is a distributed, mobile agents based Monte Carlo system that was inspired by recent work on the ant colony metaphor for solving optimization problems. AntNet's agents concurrently explore the network and exchange collected information. The communication among the agents is indirect and asynchronous, mediated by the network itself. This form of communication is typical of social insects and is called stigmergy. We compare our algorithm with six state-of-the-art routing algorithms coming from the telecommunications and machine learning elds. The algorithms' performance is evaluated over a set of realistic testbeds. We run many experiments over real and artificial IP datagram networks with increasing number of nodes and under several paradigmatic spatial and temporal traffic distributions. Results are very encouraging. AntNet showed superior performance under all the experimental conditions with respect to its competitors. We analyze the main characteristics of the algorithm and try to explain the reasons for its superiority.
Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm
, 1998
"... In this paper, we adopt general-sum stochastic games as a framework for multiagent reinforcement learning. Our work extends previous work by Littman on zero-sum stochastic games to a broader framework. We design a multiagent Q-learning method under this framework, and prove that it converges to a Na ..."
Abstract
-
Cited by 331 (4 self)
- Add to MetaCart
In this paper, we adopt general-sum stochastic games as a framework for multiagent reinforcement learning. Our work extends previous work by Littman on zero-sum stochastic games to a broader framework. We design a multiagent Q-learning method under this framework, and prove that it converges to a Nash equilibrium under specified conditions. This algorithm is useful for finding the optimal strategy when there exists a unique Nash equilibrium in the game. When there exist multiple Nash equilibria in the game, this algorithm should be combined with other learning techniques to find optimal strategies.
Experiences with an Interactive Museum Tour-Guide Robot
, 1998
"... This article describes the software architecture of an autonomous, interactive tour-guide robot. It presents a modular and distributed software architecture, which integrates localization, mapping, collision avoidance, planning, and various modules concerned with user interaction and Web-based telep ..."
Abstract
-
Cited by 329 (72 self)
- Add to MetaCart
This article describes the software architecture of an autonomous, interactive tour-guide robot. It presents a modular and distributed software architecture, which integrates localization, mapping, collision avoidance, planning, and various modules concerned with user interaction and Web-based telepresence. At its heart, the software approach relies on probabilistic computation, on-line learning, and any-time algorithms. It enables robots to operate safely, reliably, and at high speeds in highly dynamic environments, and does not require any modifications of the environment to aid the robot's operation. Special emphasis is placed on the design of interactive capabilities that appeal to people's intuition. The interface provides new means for human-robot interaction with crowds of people in public places, and it also provides people all around the world with the ability to establish a "virtual telepresence" using the Web. To illustrate our approach, results are reported obtained in mid-...
Video Textures
, 2000
"... This paper introduces a new type of medium, called a video texture, which has qualities somewhere between those of a photograph and a video. A video texture provides a continuous infinitely varying stream of images. While the individual frames of a video texture may be repeated from time to time, th ..."
Abstract
-
Cited by 276 (8 self)
- Add to MetaCart
This paper introduces a new type of medium, called a video texture, which has qualities somewhere between those of a photograph and a video. A video texture provides a continuous infinitely varying stream of images. While the individual frames of a video texture may be repeated from time to time, the video sequence as a whole is never repeated exactly. Video textures can be used in place of digital photos to infuse a static image with dynamic qualities and explicit action. We present techniques for analyzing a video clip to extract its structure, and for synthesizing a new, similar looking video of arbitrary length. We combine video textures with view morphing techniques to obtain 3D video textures. We also introduce videobased animation, in which the synthesis of video textures can be guided by a user through high-level interactive controls. Applications of video textures and their extensions include the display of dynamic scenes on web pages, the creation of dynamic backdrops for sp...
Learning About a New Technology: Pineapple
- Yale University
, 2000
"... This paper investigates the role of social learning in the diffusion of a new agricultural technology in Ghana. We use unique data on farmers ’ communication patterns to define each individual’s information neighborhood, the set of others from whom he might learn. Our empirical strategy is to test w ..."
Abstract
-
Cited by 241 (8 self)
- Add to MetaCart
This paper investigates the role of social learning in the diffusion of a new agricultural technology in Ghana. We use unique data on farmers ’ communication patterns to define each individual’s information neighborhood, the set of others from whom he might learn. Our empirical strategy is to test whether farmers adjust their inputs to align with those of their information neighbors who were surprisingly successful in previous periods. We present evidence that farmers adopt surprisingly successful information neighbors ’ practices, conditional on many potentially confounding factors including common growing conditions, credit arrangements, clan membership, and religion. The relationship of these input adjustments to experience further supports their interpretation as resulting from social learning. In ad-The authors have benefittedfromtheadviceofRichardAkresh,Federico Bandi, Alan Bester, Dirk Bergemann,