Results 1  10
of
89
Revisiting LogLinear Learning: Asynchrony, Completeness and PayoffBased Implementation
, 2008
"... Loglinear learning is a learning algorithm with equilibrium selection properties. Loglinear learning provides guarantees on the percentage of time that the joint action profile will be at a potential maximizer in potential games. The traditional analysis of loglinear learning has centered around ..."
Abstract

Cited by 42 (11 self)
 Add to MetaCart
Loglinear learning is a learning algorithm with equilibrium selection properties. Loglinear learning provides guarantees on the percentage of time that the joint action profile will be at a potential maximizer in potential games. The traditional analysis of loglinear learning has centered around explicitly computing the stationary distribution. This analysis relied on a highly structured setting: i) players ’ utility functions constitute a potential game, ii) players update their strategies one at a time, which we refer to as asynchrony, iii) at any stage, a player can select any action in the action set, which we refer to as completeness, and iv) each player is endowed with the ability to assess the utility he would have received for any alternative action provided that the actions of all other players remain fixed. Since the appeal of loglinear learning is not solely the explicit form of the stationary distribution, we seek to address to what degree one can relax the structural assumptions while maintaining that only potential function maximizers are the stochastically stable action profiles. In this paper, we introduce slight variants of loglinear learning to include both synchronous updates and incomplete action sets. In both settings, we prove that only potential function maximizers are stochastically stable. Furthermore, we introduce a payoffbased version of loglinear learning, in which players are only aware of the utility they received and the action that they played. Note that loglinear learning in its original form is not a payoffbased learning algorithm. In payoffbased loglinear learning, we also prove that only potential maximizers are stochastically stable. The key enabler for these results is to change the focus of the analysis away from deriving the explicit form of the stationary distribution of the learning process towards characterizing the stochastically stable states. The resulting analysis uses the theory of resistance trees for regular perturbed Markov decision processes, thereby allowing a relaxation of the aforementioned structural assumptions.
Dynamic vehicle routing for robotic systems,”
 Proceedings of the IEEE,
, 2010
"... AbstractRecent years have witnessed great advancements in the science and technology of autonomy, robotics and networking. This paper surveys recent concepts and algorithms for dynamic vehicle routing (DVR), that is, for the automatic planning of optimal multivehicle routes to perform tasks that ..."
Abstract

Cited by 38 (12 self)
 Add to MetaCart
(Show Context)
AbstractRecent years have witnessed great advancements in the science and technology of autonomy, robotics and networking. This paper surveys recent concepts and algorithms for dynamic vehicle routing (DVR), that is, for the automatic planning of optimal multivehicle routes to perform tasks that are generated over time by an exogenous process. We consider a rich variety of scenarios relevant for robotic applications. We begin by reviewing the basic DVR problem: demands for service arrive at random locations at random times and a vehicle travels to provide onsite service while minimizing the expected wait time of the demands. Next, we treat different multivehicle scenarios based on different models for demands (e.g., demands with different priority levels and impatient demands), vehicles (e.g., motion constraints, communication and sensing capabilities), and tasks. The performance criterion used in these scenarios is either the expected wait time of the demands or the fraction of demands serviced successfully. In each specific DVR scenario, we adopt a rigorous technical approach that relies upon methods from queueing theory, combinatorial optimization and stochastic geometry. First, we establish fundamental limits on the achievable performance, including limits on stability and quality of service. Second, we design algorithms, and provide provable guarantees on their performance with respect to the fundamental limits.
Connections Between Cooperative Control and Potential Games Illustrated on the Consensus Problem
, 2007
"... This paper presents a view of cooperative control using the language of learning in games. We review the game theoretic concepts of potential games and weakly acyclic games and demonstrate how the specific cooperative control problem of consensus can be formulated in these settings. Motivated by th ..."
Abstract

Cited by 36 (12 self)
 Add to MetaCart
This paper presents a view of cooperative control using the language of learning in games. We review the game theoretic concepts of potential games and weakly acyclic games and demonstrate how the specific cooperative control problem of consensus can be formulated in these settings. Motivated by this connection, we build upon game theoretic concepts to better accommodate a broader class of cooperative control problems. In particular, we introduce sometimes weakly acyclic games for timevarying objective functions and action sets, and provide distributed algorithms for convergence to an equilibrium. Finally, we illustrate how to implement these algorithms for the consensus problem in a variety of settings, most notably, in an environment with nonconvex obstructions.
Payoffbased dynamics for multiplayer weakly acyclic games
 SIAM J. CONTROL OPT
, 2009
"... We consider repeated multiplayer games in which players repeatedly and simultaneously choose strategies from a finite set of available strategies according to some strategy adjustment process. We focus on the specific class of weakly acyclic games, which is particularly relevant for multiagent coo ..."
Abstract

Cited by 33 (12 self)
 Add to MetaCart
(Show Context)
We consider repeated multiplayer games in which players repeatedly and simultaneously choose strategies from a finite set of available strategies according to some strategy adjustment process. We focus on the specific class of weakly acyclic games, which is particularly relevant for multiagent cooperative control problems. A strategy adjustment process determines how players select their strategies at any stage as a function of the information gathered over previous stages. Of particular interest are “payoffbased ” processes in which, at any stage, players know only their own actions and (noise corrupted) payoffs from previous stages. In particular, players do not know the actions taken by other players and do not know the structural form of payoff functions. We introduce three different payoffbased processes for increasingly general scenarios and prove that, after a sufficiently large number of stages, player actions constitute a Nash equilibrium at any stage with arbitrarily high probability. We also show how to modify player utility functions through tolls and incentives in socalled congestion games, a special class of weakly acyclic games, to guarantee that a centralized objective can be realized as a Nash equilibrium. We illustrate the methods with a simulation of distributed routing over a network.
Cooperative control and potential game
 IEEE Trans. Syst., Man, Cybern. B
, 2009
"... Abstract—We present a view of cooperative control using the language of learning in games. We review the gametheoretic concepts of potential and weakly acyclic games, and demonstrate how several cooperative control problems, such as consensus and dynamic sensor coverage, can be formulated in these ..."
Abstract

Cited by 32 (6 self)
 Add to MetaCart
(Show Context)
Abstract—We present a view of cooperative control using the language of learning in games. We review the gametheoretic concepts of potential and weakly acyclic games, and demonstrate how several cooperative control problems, such as consensus and dynamic sensor coverage, can be formulated in these settings. Motivated by this connection, we build upon gametheoretic concepts to better accommodate a broader class of cooperative control problems. In particular, we extend existing learning algorithms to accommodate restricted action sets caused by the limitations of agent capabilities and groupbased decision making. Furthermore, we also introduce a new class of games called sometimes weakly acyclic games for timevarying objective functions and action sets, and provide distributed algorithms for convergence to an equilibrium. Index Terms—Cooperative control, game theory, learning in games, multiagent systems. I.
Distributed MultiTarget Tracking In A SelfConfiguring Camera Network
"... This paper deals with the problem of tracking multiple targets in a distributed network of selfconfiguring pantiltzoom cameras. We focus on applications where events unfold over a large geographic area and need to be analyzed by multiple overlapping and nonoverlapping active cameras without a cen ..."
Abstract

Cited by 30 (7 self)
 Add to MetaCart
(Show Context)
This paper deals with the problem of tracking multiple targets in a distributed network of selfconfiguring pantiltzoom cameras. We focus on applications where events unfold over a large geographic area and need to be analyzed by multiple overlapping and nonoverlapping active cameras without a central unit accumulating and analyzing all the data. The overall goal is to keep track of all targets in the region of deployment of the cameras, while selectively focusing at a high resolution on some particular target features. To acquire all the targets at the desired resolutions while keeping the entire scene in view, we use cooperative network control ideas based on multiplayer learning in games. For tracking the targets as they move through the area covered by the cameras, we propose a special application of the distributed estimation algorithm known as KalmanConsensus filter through which each camera comes to a consensus with its neighboring cameras about the actual state of the target. This leads to a camera network topology that changes with time. Combining these ideas with singleview analysis, we have a completely distributed approach for multitarget tracking and camera network selfconfiguration. We show performance analysis results with reallife experiments on a network of 10 cameras. 1.
Payoff Based Dynamics for MultiPlayer Weakly Acyclic Games
 SIAM JOURNAL ON CONTROL AND OPTIMIZATION, SPECIAL ISSUE ON CONTROL AND OPTIMIZATION IN COOPERATIVE NETWORKS
, 2007
"... We consider repeated multiplayer games in which players repeatedly and simultaneously choose strategies from a finite set of available strategies according to some strategy adjustment process. We focus on the specific class of weakly acyclic games, which is particularly relevant for multiagent coo ..."
Abstract

Cited by 28 (15 self)
 Add to MetaCart
(Show Context)
We consider repeated multiplayer games in which players repeatedly and simultaneously choose strategies from a finite set of available strategies according to some strategy adjustment process. We focus on the specific class of weakly acyclic games, which is particularly relevant for multiagent cooperative control problems. A strategy adjustment process determines how players select their strategies at any stage as a function of the information gathered over previous stages. Of particular interest are “payoff based ” processes, in which at any stage, players only know their own actions and (noise corrupted) payoffs from previous stages. In particular, players do not know the actions taken by other players and do not know the structural form of payoff functions. We introduce three different payoff based processes for increasingly general scenarios and prove that after a sufficiently large number of stages, player actions constitute a Nash equilibrium at any stage with arbitrarily high probability. We also show how to modify player utility functions through tolls and incentives in socalled congestion games, a special class of weakly acyclic games, to guarantee that a centralized objective can be realized as a Nash equilibrium. We illustrate the methods with a simulation of distributed routing over a network.
Regret based dynamics: Convergence in weakly acyclic games
 In Proceedings of the 2007 International Conference on Autonomous Agents and Multiagent Systems (AAMAS
, 2007
"... Regret based algorithms have been proposed to control a wide variety of multiagent systems. The appeal of regretbased algorithms is that (1) these algorithms are easily implementable in large scale multiagent systems and (2) there are existing results proving that the behavior will asymptotically ..."
Abstract

Cited by 28 (11 self)
 Add to MetaCart
(Show Context)
Regret based algorithms have been proposed to control a wide variety of multiagent systems. The appeal of regretbased algorithms is that (1) these algorithms are easily implementable in large scale multiagent systems and (2) there are existing results proving that the behavior will asymptotically converge to a set of points of “noregret ” in any game. We illustrate, through a simple example, that noregret points need not reflect desirable operating conditions for a multiagent system. Multiagent systems often exhibit an additional structure (i.e. being “weakly acyclic”) that has not been exploited in the context of regret based algorithms. In this paper, we introduce a modification of regret based algorithms by (1) exponentially discounting the memory and (2) bringing in a notion of inertia in players ’ decision process. We show how these modifications can lead to an entire class of regret based algorithm that provide almost sure convergence to a pure Nash equilibrium in any weakly acyclic game.
Decentralized camera network control using game theory
 in Proc. ACM/IEEE Int. Conf. Distributed Smart Cameras
, 2008
"... ..."
(Show Context)
Achieving Pareto Optimality Through Distributed Learning
, 2012
"... We propose a simple payoffbased learning rule that is completely decentralized, and that leads to an efficient configuration of actions in any nperson finite strategicform game with generic payoffs. The algorithm follows the theme of exploration versus exploitation and is hence stochastic in natu ..."
Abstract

Cited by 21 (5 self)
 Add to MetaCart
We propose a simple payoffbased learning rule that is completely decentralized, and that leads to an efficient configuration of actions in any nperson finite strategicform game with generic payoffs. The algorithm follows the theme of exploration versus exploitation and is hence stochastic in nature. We prove that if all agents adhere to this algorithm, then the agents will select the action profile that maximizes the sum of the agents ’ payoffs a high percentage of time. The algorithm requires no communication. Agents respond solely to changes in their own realized payoffs, which are affected by the actions of other agents in the system in ways that they do not necessarily understand. The method can be applied to the optimization of complex systems with many distributed components, such as the routing of information in networks and the design and control of wind farms. The proof of the proposed learning algorithm relies on the theory of large deviations for perturbed Markov chains.