Results 1 - 10
of
11
Potential-based Algorithms in On-line Prediction and Game Theory
"... In this paper we show that several known algorithms for sequential prediction problems (including Weighted Majority and the quasi-additive family of Grove, Littlestone, and Schuurmans), for playing iterated games (including Freund and Schapire's Hedge and MW, as well as the -strategies of Hart and M ..."
Abstract
-
Cited by 24 (3 self)
- Add to MetaCart
In this paper we show that several known algorithms for sequential prediction problems (including Weighted Majority and the quasi-additive family of Grove, Littlestone, and Schuurmans), for playing iterated games (including Freund and Schapire's Hedge and MW, as well as the -strategies of Hart and Mas-Colell), and for boosting (including AdaBoost) are special cases of a general decision strategy based on the notion of potential. By analyzing this strategy we derive known performance bounds, as well as new bounds, as simple corollaries of a single general theorem. Besides offering a new and unified view on a large family of algorithms, we establish a connection between potential-based analysis in learning and their counterparts independently developed in game theory. By exploiting this connection, we show that certain learning problems are instances of more general game-theoretic problems. In particular, we describe a notion of generalized regret and show its applications in learning theory.
Bounds for Regret-Matching Algorithms
, 2006
"... We introduce a general class of learning algorithms, regret-matching algorithms, and a regret-based framework for analyzing their performance in online decision problems. Our analytic framework is based on a set # of transformations over the set of actions. Specifically, we calculate a #-regret ve ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
We introduce a general class of learning algorithms, regret-matching algorithms, and a regret-based framework for analyzing their performance in online decision problems. Our analytic framework is based on a set # of transformations over the set of actions. Specifically, we calculate a #-regret vector by comparing the average reward obtained by an agent over some finite sequence of rounds to the average reward that could have been obtained had the agent instead played each transformation # # of its sequence of actions. The regret matching algorithms analyzed here select the agent's next action based on the vector of #-regrets, along with a link function f . Many well-studied learning algorithms are seen to be instances of regret matching. We derive bounds on the regret experienced by (f, #)-regret matching algorithms for polynomial and exponential link functions (though we consider polynomial link 2). Although we do not improve upon the bounds reported in past work (except in special cases), our means of analysis is more general, in part because we do not rely directly on Taylor's theorem. Hence, we can analyze algorithms based on a larger class of link functions, particularly non-differentiable link functions. In ongoing work, we are indeed studying regret matching with alternative link functions, other than polynomial and exponential.
Learning, Regret minimization, and Equilibria
- In
, 2007
"... Many situations involve repeatedly making decisions in an uncertain environment: for instance, deciding what route to drive to work each day, or repeated play of a game against an opponent with an unknown strategy. In this chapter we describe learning algorithms with strong guarantees for settings o ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Many situations involve repeatedly making decisions in an uncertain environment: for instance, deciding what route to drive to work each day, or repeated play of a game against an opponent with an unknown strategy. In this chapter we describe learning algorithms with strong guarantees for settings of this type, along with connections to game-theoretic equilibria when all players in a system are simultaneously adapting in such a manner.
No-regret with bounded computational capacity
- CMS-EMS Discussion Paper 1373, Kellogg School of Management, Northwestern
, 2003
"... We deal with no regret and related aspects of vector-payoff games when one of the players is limited in computational capacity. We show that player 1 can almost approach with bounded-recall strategies, or with finite automata, any convex set which is approachable when no capacity bound is present. I ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We deal with no regret and related aspects of vector-payoff games when one of the players is limited in computational capacity. We show that player 1 can almost approach with bounded-recall strategies, or with finite automata, any convex set which is approachable when no capacity bound is present. In particular we deduce that with bounded computational capacity player 1 can ensure having almost no regret.
Online Learning for Global Cost Functions Eyal
"... We consider an online learning setting where at each time step the decision maker has to choose how to distribute the future loss between k alternatives, and then observes the loss of each alternative. Motivated by load balancing and job scheduling, we consider a global cost function (over the losse ..."
Abstract
- Add to MetaCart
We consider an online learning setting where at each time step the decision maker has to choose how to distribute the future loss between k alternatives, and then observes the loss of each alternative. Motivated by load balancing and job scheduling, we consider a global cost function (over the losses incurred by each alternative), rather than a summation of the instantaneous losses as done traditionally in online learning. Such global cost functions include the makespan (the maximum over the alternatives) and the Ld norm (over the alternatives). Based on approachability theory, we design an algorithm that guarantees vanishing regret for this setting, where the regret is measured with respect to the best static decision that selects the same distribution over alternatives at every time step. For the special case of makespan cost we devise a simple and efficient algorithm. In contrast, we show that for concave global cost functions, such as Ld norms for d < 1, the worst-case average regret does not vanish. 1
EXPRESSIBLE INSPECTIONS
"... Abstract. A decision maker needs predictions about the realization of a repeated experiment in each period. An expert provides a theory that, conditional on each finite history of outcomes, supplies a probabilistic prediction about the next outcome. However, there may be false experts without any kn ..."
Abstract
- Add to MetaCart
Abstract. A decision maker needs predictions about the realization of a repeated experiment in each period. An expert provides a theory that, conditional on each finite history of outcomes, supplies a probabilistic prediction about the next outcome. However, there may be false experts without any knowledge of the data-generating process who deliver theories strategically. Hence, empirical tests for predictions are necessary. A test is manipulable if a false expert can pass the test with a high probability. Like contracts, tests have to be computable to be implemented. Considering only computable tests, we show that there is a test which passes true experts with a high probability yet is not manipulable by any computable strategy. In particular, the constructed test is both prequential and future-independent. On the other hand, any computable test is manipulable by a strategy that is computable relative to the halting problem. Our conclusion overturns earlier results that prequential or future independent tests are manipulable, and shows that computability considerations have significant effects in these problems. 1.
Decision Making in Uncertain and Changing Environments
, 2009
"... We consider an agent who has to repeatedly make choices in an uncertain and changing environment, who has full information of the past, who discounts future payoffs, but who has no prior. We provide a learning algorithm that performs almost as well as the best of a given finite number of experts or ..."
Abstract
- Add to MetaCart
We consider an agent who has to repeatedly make choices in an uncertain and changing environment, who has full information of the past, who discounts future payoffs, but who has no prior. We provide a learning algorithm that performs almost as well as the best of a given finite number of experts or benchmark strategies and does so at any point in time, provided the agent is sufficiently patient. The key is to find the appropriate degree of forgetting distant past. Standard learning algorithms that treat recent and distant past equally do not have the sequential epsilon optimality property.

