Results 1 -
6 of
6
A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting
, 1997
"... In the first part of the paper we consider the problem of dynamically apportioning resources among a set of options in a worst-case on-line framework. The model we study can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic set ..."
Abstract
-
Cited by 1714 (53 self)
- Add to MetaCart
In the first part of the paper we consider the problem of dynamically apportioning resources among a set of options in a worst-case on-line framework. The model we study can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting. We show that the multiplicative weightupdate rule of Littlestone and Warmuth [20] can be adapted to this model yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems. We show how the resulting learning algorithm can be applied to a variety of problems, including gambling, multiple-outcome prediction, repeated games and prediction of points in R n . In the second part of the paper we apply the multiplicative weight-update technique to derive a new boosting algorithm. This boosting algorithm does not require any prior knowledge about the performance of the weak learning algorithm. We also study generalizations of...
A Primal-Dual Perspective of Online Learning Algorithms
"... We describe a novel framework for the design and analysis of online learning algorithms based on the notion of duality in constrained optimization. We cast a sub-family of universal online bounds as an optimization problem. Using the weak duality theorem we reduce the process of online learning to ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
We describe a novel framework for the design and analysis of online learning algorithms based on the notion of duality in constrained optimization. We cast a sub-family of universal online bounds as an optimization problem. Using the weak duality theorem we reduce the process of online learning to the task of incrementally increasing the dual objective function. The amount by which the dual increases serves as a new and natural notion of progress for analyzing online learning algorithms. We are thus able to tie the primal objective value and the number of prediction mistakes using the increase in the dual.
On-line Learning with Imperfect Monitoring
- in Proceedings of the 16th Annual Conference on Learning Theory
, 2003
"... We study on-line play of repeated matrix games in which the observations of past actions of the other player and the obtained reward are partial and stochastic. We define the Partial Observation Bayes Envelope (POBE) as the best reward against the worst-case stationary strategy of the opponent t ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
We study on-line play of repeated matrix games in which the observations of past actions of the other player and the obtained reward are partial and stochastic. We define the Partial Observation Bayes Envelope (POBE) as the best reward against the worst-case stationary strategy of the opponent that agrees with past observations. Our goal is to have the (unobserved) average reward above the POBE. For the case where the observations (but not necessarily the rewards) depend on the opponent play alone, an algorithm for attaining the POBE is derived.
Adaptive Strategies and Regret Minimization in arbitrarily varying Markov Environments
- In Proc. of 14th COLT
, 2001
"... We consider the problem of maximizing the average reward in a controlled Markov environment, which also contains some arbitrarily varying elements. This problem is captured by a two-person stochastic game model involving the reward maximizing agent and a second player, which is free to use an arbitr ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We consider the problem of maximizing the average reward in a controlled Markov environment, which also contains some arbitrarily varying elements. This problem is captured by a two-person stochastic game model involving the reward maximizing agent and a second player, which is free to use an arbitrary (non-stationary and unpredictable) control strategy. While the minimax value of the associated zero-sum game provides a guaranteed performance level, the fact that the second player's behavior is observed as the game unfolds opens up the opportunity to improve upon this minimax value if the second player is not playing a worst-case strategy. This basic idea has been formalized in the context of repeated matrix games by the classical notions of regret minimization with respect to the Bayes envelope, where an attainable performance goal is defined in terms of the empirical frequencies of the opponent's actions. This paper presents an extension of these ideas to problems with Markovian dynamics, under appropriate recurrence conditions. The Bayes envelope is first defined in a natural way in terms of the observed state action frequencies. As this envelope may not be attained in general, we define a proper convexification thereof as an attainable solution concept. In the specific case of single-controller games, where the opponent alone controls the state transitions, the Bayes envelope itself turns out to be convex and attainable. Some concrete examples are shown to fit in this framework.
A Note on Kuhn’s Theorem
, 2007
"... We revisit Kuhn’s classic theorem on mixed and behavior strategies in games. We frame Kuhn’s work in terms of two questions in decision theory: What is the relationship between global and local assessment of uncertainty? What is the relationship between global and local optimality of strategies? ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We revisit Kuhn’s classic theorem on mixed and behavior strategies in games. We frame Kuhn’s work in terms of two questions in decision theory: What is the relationship between global and local assessment of uncertainty? What is the relationship between global and local optimality of strategies?

