Results 1  10
of
330
Online Convex Programming and Generalized Infinitesimal Gradient Ascent
, 2003
"... Convex programming involves a convex set F R and a convex function c : F ! R. The goal of convex programming is to nd a point in F which minimizes c. In this paper, we introduce online convex programming. In online convex programming, the convex set is known in advance, but in each step of some ..."
Abstract

Cited by 298 (4 self)
 Add to MetaCart
(Show Context)
Convex programming involves a convex set F R and a convex function c : F ! R. The goal of convex programming is to nd a point in F which minimizes c. In this paper, we introduce online convex programming. In online convex programming, the convex set is known in advance, but in each step of some repeated optimization problem, one must select a point in F before seeing the cost function for that step. This can be used to model factory production, farm production, and many other industrial optimization problems where one is unaware of the value of the items produced until they have already been constructed. We introduce an algorithm for this domain, apply it to repeated games, and show that it is really a generalization of in nitesimal gradient ascent, and the results here imply that generalized in nitesimal gradient ascent (GIGA) is universally consistent.
Regret in the Online Decision Problem
, 1999
"... At each point in time a decision maker must choose a decision. The payoff in a period from the decision chosen depends on the decision as well as the state of the world that obtains at that time. The difficulty is that the decision must be made in advance of any knowledge, even probabilistic, about ..."
Abstract

Cited by 129 (2 self)
 Add to MetaCart
At each point in time a decision maker must choose a decision. The payoff in a period from the decision chosen depends on the decision as well as the state of the world that obtains at that time. The difficulty is that the decision must be made in advance of any knowledge, even probabilistic, about which state of the world will obtain. A range of problems from a variety of disciplines can be framed in this way. In this
Calibrated Learning and Correlated Equilibrium
 Games and Economic Behavior
, 1996
"... Suppose two players meet each other in a repeated game where: 1. each uses a learning rule with the property that it is a calibrated forecast of the others plays, and 2. each plays a best response to this forecast distribution. ..."
Abstract

Cited by 114 (5 self)
 Add to MetaCart
Suppose two players meet each other in a repeated game where: 1. each uses a learning rule with the property that it is a calibrated forecast of the others plays, and 2. each plays a best response to this forecast distribution.
Shopbots and Pricebots
, 1999
"... Shopbots are agents that automatically search the Internet to obtain information about prices and other attributes of goods and services. They herald a future in which autonomous agents profoundly influence electronic markets. In this study, a simple economic model is proposed and analyzed, which is ..."
Abstract

Cited by 108 (13 self)
 Add to MetaCart
Shopbots are agents that automatically search the Internet to obtain information about prices and other attributes of goods and services. They herald a future in which autonomous agents profoundly influence electronic markets. In this study, a simple economic model is proposed and analyzed, which is intended to quantify some of the likely impacts of a proliferation of shopbots and other economicallymotivated software agents. In addition, this paper reports on simulations of pricebots  adaptive, pricesetting agents which firms may well implement to combat, or even take advantage of, the growing community of shopbots. This study forms part of a larger research program that aims to provide insights into the impact of agent technology on the nascent information economy.
A general class of adaptive strategies
 Journal of Economic Theory
"... We exhibit and characterize an entire class of simple adaptive strategies, in the repeated play of a game, having the Hannanconsistency property: In the longrun, the player is guaranteed an average payoff as large as the bestreply payoff to the empirical distribution of play of the other players; ..."
Abstract

Cited by 102 (4 self)
 Add to MetaCart
We exhibit and characterize an entire class of simple adaptive strategies, in the repeated play of a game, having the Hannanconsistency property: In the longrun, the player is guaranteed an average payoff as large as the bestreply payoff to the empirical distribution of play of the other players; i.e., there is no “regret. ” Smooth fictitious play (Fudenberg and Levine [1995]) and regretmatching (Hart and MasColell [2000]) are particular cases. The motivation and application of the current paper come from the study of procedures whose empirical distribution of play is, in the longrun, (almost) a correlated equilibrium. For the analysis we first develop a generalization of Blackwell’s [1956a] approachability strategy for games with vector payoffs.
Intrinsic Robustness of the Price of Anarchy
 STOC'09
, 2009
"... The price of anarchy (POA) is a worstcase measure of the inefficiency of selfish behavior, defined as the ratio of the objective function value of a worst Nash equilibrium of a game and that of an optimal outcome. This measure implicitly assumes that players successfully reach some Nash equilibrium ..."
Abstract

Cited by 101 (12 self)
 Add to MetaCart
(Show Context)
The price of anarchy (POA) is a worstcase measure of the inefficiency of selfish behavior, defined as the ratio of the objective function value of a worst Nash equilibrium of a game and that of an optimal outcome. This measure implicitly assumes that players successfully reach some Nash equilibrium. This drawback motivates the search for inefficiency bounds that apply more generally to weaker notions of equilibria, such as mixed Nash and correlated equilibria; or to sequences of outcomes generated by natural experimentation strategies, such as successive best responses or simultaneous regretminimization. We prove a general and fundamental connection between the price of anarchy and its seemingly stronger relatives in classes of games with a sum objective. First, we identify a “canonical sufficient condition ” for an upper bound of the POA for pure Nash equilibria, which we call a smoothness argument. Second, we show that every bound derived via a smoothness argument extends automatically, with no quantitative degradation in the bound, to mixed Nash equilibria, correlated equilibria, and the average objective function value of regretminimizing players (or “price of total anarchy”). Smoothness arguments also have automatic implications for the inefficiency of approximate and BayesianNash equilibria and, under mild additional assumptions, for bicriteria bounds and for polynomiallength bestresponse sequences. We also identify classes of games — most notably, congestion games with cost functions restricted to an arbitrary fixed set — that are tight, in the sense that smoothness arguments are guaranteed to produce an optimal worstcase upper bound on the POA, even for the smallest set of interest (pure Nash equilibria). Byproducts of our proof of this result include the first tight bounds on the POA in congestion games with nonpolynomial cost functions, and the first
AWESOME: A general multiagent learning algorithm that converges in selfplay and learns a best response against stationary opponents
, 2003
"... A satisfactory multiagent learning algorithm should, at a minimum, learn to play optimally against stationary opponents and converge to a Nash equilibrium in selfplay. The algorithm that has come closest, WoLFIGA, has been proven to have these two properties in 2player 2action repeated games— as ..."
Abstract

Cited by 97 (5 self)
 Add to MetaCart
(Show Context)
A satisfactory multiagent learning algorithm should, at a minimum, learn to play optimally against stationary opponents and converge to a Nash equilibrium in selfplay. The algorithm that has come closest, WoLFIGA, has been proven to have these two properties in 2player 2action repeated games— assuming that the opponent’s (mixed) strategy is observable. In this paper we present AWESOME, the first algorithm that is guaranteed to have these two properties in all repeated (finite) games. It requires only that the other players ’ actual actions (not their strategies) can be observed at each step. It also learns to play optimally against opponents that eventually become stationary. The basic idea behind AWESOME (Adapt When Everybody is Stationary, Otherwise Move to Equilibrium) is to try to adapt to the others’ strategies when they appear stationary, but otherwise to retreat to a precomputed equilibrium strategy. The techniques used to prove the properties of AWESOME are fundamentally different from those used for previous algorithms, and may help in analyzing other multiagent learning algorithms also.
Autonomous vehicletarget assignment: a game theoretical formulation
 ASME JOURNAL OF DYNAMIC SYSTEMS, MEASUREMENT AND CONTROL
, 2007
"... We consider an autonomous vehicletarget assignment problem where a group of vehicles are expected to optimally assign themselves to a set of targets. We introduce a gametheoretical formulation of the problem in which the vehicles are viewed as selfinterested decision makers. Thus, we seek the opt ..."
Abstract

Cited by 89 (22 self)
 Add to MetaCart
We consider an autonomous vehicletarget assignment problem where a group of vehicles are expected to optimally assign themselves to a set of targets. We introduce a gametheoretical formulation of the problem in which the vehicles are viewed as selfinterested decision makers. Thus, we seek the optimization of a global utility function through autonomous vehicles that are capable of making individually rational decisions to optimize their own utility functions. The first important aspect of the problem is to choose the utility functions of the vehicles in such a way that the objectives of the vehicles are localized to each vehicle yet aligned with a global utility function. The second important aspect of the problem is to equip the vehicles with an appropriate negotiation mechanism by which each vehicle pursues the optimization of its own utility function. We present several design procedures and accompanying caveats for vehicle utility design. We present two new negotiation mechanisms, namely, “generalized regret monitoring with fading memory and inertia ” and “selective spatial adaptive play, ” and provide accompanying proofs of their convergence. Finally, we present simulations that illustrate how vehicle negotiations can consistently lead to nearoptimal assignments provided that the utilities of the vehicles are designed appropriately.
Convergence and noregret in multiagent learning
 In Advances in Neural Information Processing Systems 17
, 2005
"... Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the environment is no longer stationary, thus undermining convergence guarantees. Second, learning is often susceptible to deception, where the other agents may be ..."
Abstract

Cited by 85 (0 self)
 Add to MetaCart
(Show Context)
Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the environment is no longer stationary, thus undermining convergence guarantees. Second, learning is often susceptible to deception, where the other agents may be able to exploit a learner’s particular dynamics. In the worst case, this could result in poorer performance than if the agent was not learning at all. These challenges are identifiable in the two most common evaluation criteria for multiagent learning algorithms: convergence and regret. Algorithms focusing on convergence or regret in isolation are numerous. In this paper, we seek to address both criteria in a single algorithm by introducing GIGAWoLF, a learning algorithm for normalform games. We prove the algorithm guarantees at most zero average regret, while demonstrating the algorithm converges in many situations of selfplay. We prove convergence in a limited setting and give empirical results in a wider variety of situations. These results also suggest a third new learning criterion combining convergence and regret, which we call negative nonconvergence regret (NNR). 1