38 citations found. Retrieving documents...
D. Foster and R. Vohra. Regret in the on-line decision problem. Games and Economic Behavior, 29:7--35, 1999.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Efficient Algorithms for Online Decision Problems - Kalai, Vempala (2003)   (2 citations)  (Correct)

....algorithm as well as a lazy one that rarely switches between decisions. 1 Introduction In an online decision problem, one has to make a sequence of decisions without knowledge of the future. Exponential weighting schemes for these problems have been discovered and rediscovered in may areas [7]. Even in learning, there are too many results to mention (for a survey, see [1] We show that Hannan s original idea of doing what worked best against the past (with perturbed totals) gives efficient and simple algorithms for online decision problems. We extend his algorithm to get ....

D. Foster and R. Vohra. Regret in the on-line decision problem. Games and Economic Behavior, vol.29, pp.1084-1090, 1999.


On Optimal Sequential Prediction for General Processes - Nobel (2001)   (Correct)

..... Aggregating methods, and corresponding bounds on the di erence between the loss of the aggregate scheme and that of the best scheme in the family, have been established in a variety of settings. Representative work and further references can be found in [41, 17, 27, 10, 9, 25] Foster and Vohra [19] give an account of the aggregating problem and its history. Merhav and Feder [28] give an overview of prediction from individual sequences. Weissman and Merhav [43] establish nite sample aggregation bounds for the prediction of individual binary sequences observed in additive, independent noise, ....

D.P. Foster and R. Vohra, Regret in the on-line decision problem, Games and Economic Behavior, vol.29, pp.1084-1090, 1999.


The Non-Stochastic Multi-Armed Bandit Problem - Auer, Cesa-Bianchi (2002)   (10 citations)  (Correct)

....A desirable property for a player is Hannan consistency, which is similar to saying (in our bandit framework) that the weak regret per time step of the player converges to 0 with probability 1. Examples of Hannan consistent player strategies have been provided by several authors in the past (see [18] for a survey of these results) By applying (slight extensions of) Theorems 6.3 and 6.4, we can prove provide an example of a simple Hannan consistent player whose convergence rate is optimal up to logarithmic factors. Our player algorithms are based in part on an algorithm presented by Freund ....

....is Hannan consistency [8] defined as follows. Player i is Hannan consistent if lim sup T 1 max j2S i R (j) i (T ) 0 with probability 1. The existence and properties of Hannan consistent players have been first investigated by Hannan [10] and Blackwell [2] and later by many others (see [18] for a nice survey) Hannan consistency can be also studied in the so called unknown game setup , where it is further assumed that: 1) each player knows neither the total number of players nor the payoff function of any player (including itself) 2) after each round each player sees its own ....

[Article contains additional citation context not shown here]

Dean P. Foster and Rakesh Vohra. Regret in the on-line decision problem. Games and Economic Behavior, 29:7--36, 1999.


On No-Regret Learning, Fictitious Play, and Nash Equilibrium - Jafari, Greenwald, Gondek   (Correct)

....i (s 0 i ; s i js i ) Correlated equilibrium generalizes the notion of Nash equilibrium by allowing for correlations among the players strategies. An algorithm achieves no conditional regret i its empirical distribution of play converges to correlated equilibrium (see, for example, [3, 11]) In general, no conditional regret implies no regret, and these two properties are equivalent in two strategy games. Hence, no regret al..gorithms are guaranteed to converge to correlated equilibrium in 2 2 games. By studying the conditional regret matrices given opposing sequence of ....

D. Foster and R. Vohra. Regret in the on-line decision problem. Games and Economic Behavior, 21:40-55, 1997.


Potential-based Algorithms in On-line Prediction and Game.. - Cesa-Bianchi, Lugosi   (Correct)

....k. Variants of the learning with experts framework, such as shifting experts or the more general specialists [11] can be analyzed using generalized regret. Example 16. An important special case of the generalized regret (9) is the socalled internal or conditional regret [19] see also [7] for a survey) In this case the N = m(m 1) experts are labeled by pairs (i; j) for i 6= j. Expert (i; j) predicts always i, that is, f (i;j) t = i for all t, and it is active only when the predictor s guess is j, that is, A (i;j) k; t) 1 if and only if k = j. Thus, component (i; j) of the ....

D. Foster and R. Vohra. Regret in the on-line decision problem. Games and Economic Behavior, 29:7-36, 1999.


On Optimal Sequential Decisions Schemes for General Processes - Nobel (2000)   (Correct)

....in the family for every random process. Aggregating methods, and corresponding bounds on the di erence between the loss of the aggregate scheme and that of the best scheme in the family, have been established in a variety of settings. Representative work and further references can be found in [28, 12, 22, 6, 5, 23, 20, 14]. Here we describe a simple aggregate decision scheme that is based on weighted majority methods [28, 22] for predicting individual binary sequences. Let F be a xed, countable family of decision schemes and let x = x 1 ; x 2 ; be a sequence with values x i 2 X . Fix 2 (0; 1) and let fF j ....

D.P. Foster and R. Vohra. Regret in the on-line decision problem, Games and Economic Behavior, vol. 29, pp. 1084-1090, 1999.


Probabilistic Pricebots - Greenwald, Kephart (2000)   (3 citations)  (Correct)

.... pricebots [7] was introduced, and a variety of (mostly deterministic) pricing algorithms were simulated [8] Motivated in part by a gametheoretic analysis of this model which yields only mixedstrategy Nash equilibria, this paper explores the use of probabilistic pricing based on no regret learning [4, 5], in various informational settings. Among the deterministic algorithms studied previously, one requires complete information about Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or ....

....of their competitors strategies. In order to be deemed pro table, pricebots will need to learn from and adapt to changing market conditions. In this paper, we study adaptive pricebot algorithms based on variants of no regret learning speci cally, no external [5] and no internal regret [4] emphasizing the di ering levels of information on which the algorithms depend. An agent algorithm that requires as input the relevant pro ts at all its possible price points (including the expected pro ts that would have been obtained by prices that are not set) are referred to as informed ....

[Article contains additional citation context not shown here]

D. Foster and R. Vohra. Regret in the on-line decision problem. Games and Economic Behavior, 21:40-55, 1997.


Efficient Algorithms for Universal Portfolios - Kalai, Vempala (2000)   (Correct)

....Abstract A constant rebalanced portfolio is an investment strategy which keeps the same distribution of wealth among a set of stocks from day to day. There has been much work on Cover s Universal algorithm, which is competitive with the best constant rebalanced portfolio determined in hindsight [3, 9, 2, 8, 14, 4, 5, 6]. While this algorithm has good performance guarantees, all known implementations are exponential in the number of stocks, restricting the number of stocks used in experiments [9, 4, 2, 5, 6] We present an efficient implementation of the Universal algorithm that is based on non uniform random ....

....the same distribution of wealth among a set of stocks from day to day. That is, the proportion of total wealth in a given stock is the same at the beginning of each day. Recently there has been work on on line investment strategies which are competitive with the best CRP determined in hindsight [3, 9, 2, 8, 14, 4, 5, 6]. Specifically, the daily performance of these algorithms on a market approaches that of the best CRP for that market, chosen in hindsight, as the lengths of these markets increase without bound. As an example of a useful CRP, consider the following market with just two stocks [9, 5] The price ....

D.P. Foster and R.V. Vohra. Regret in the On-line Decision Problem. Games and Economic Behavior, Vol. 29, No. 1/2, pp. 7-35, Nov 1999.


Minimizing Regret: The General Case. - Rustichini (1998)   (4 citations)  (Correct)

.... see [3] Other contributions are: 2] 12] 4] In particular the basic intuition of regret has been extended to game theory (see Fudenberg and Levine, 7] For a detailed an informative discussion of this results, and the related literature on calibrated forecasting, see Foster and Vohra [6]. In a recent paper Auer, Cesa Bianchi, Freund and Schapire (see [1] have extended Hannan s result to the case in which the player is informed only of his own payo: so the choice of action of the opponent can only be inferred from the realized payo. This result can be considered a surprising ....

Foster, D. and R. Vohra, (1996), Regret in the On-line Decision Problem, Discussion Paper.


Online Searching - Jaillet, Stafford (1999)   (Correct)

....of an unknown distribution. In [6] the authors gave an abstract formulation (called task systems) and a formal definition for the study of the competitive analysis of online algorithms and problems. In [16] another abstract formulation, called k server problems was introduced. More recently, in [7], the authors present the interesting notion of regret in the online decision problem setting. Over the past ten years, online algorithms have received considerable research interest in computer science, and to a lesser extent in operations research. There are many interesting application areas ....

D.P. Foster and R. Vohra. Regret in the on-line decision problem. Working paper, Department of Management Science, Ohio State University, 1997.


Shopbots and Pricebots - Greenwald, Kephart (1999)   (34 citations)  (Correct)

....probability, and exploit successful actions by increasing the probability of employing those actions that generate high pro ts. In this study, we con ne our attention to the no external regret al..gorithm due to Freund and Schapire [14] and the no internal regret al..gorithm of Foster and Vohra [12]. 9 As the no regret al..gorithms are inherently non deterministic, they are candidates for learning mixed strategy equilibria. 8 In the game theoretic literature, this strategy is often referred to as Cournot bestreply dynamics [7] however, price is being set, rather than quantity. 9 For ....

.... (NIR) As described in [16] there are a number of learning algorithms that satisfy the no external regret optimality criterion (e.g. Foster and Vohra [11] and Freund and Schapire [14] similarly, the no internal regret optimality criterion is satis ed by algorithms due to both Foster and Vohra [12] and Hart and Mas Colell [22] In this section, we discuss simulations of NER pricebots a la Freund and Schapire and NIR pricebots a la Foster and Vohra. Rather than consider 5 pricebots as above, we limit our attention to merely 2 NR pricebots, since the dynamics of 2 such pricebots converges ....

[Article contains additional citation context not shown here]

D. Foster and R. Vohra. Regret in the on-line decision problem. Games and Economic Behavior, 21:40-55, 1997.


Learning in Network Contexts: Experimental Results from.. - Greenwald, Friedman, al. (1999)   (1 citation)  (Correct)

....insigni cant. By remapped we mean that there is a mapping f of the strategy space into itself such that for every occurrence of a given strategy s in the original sequence the mapped strategy f(s) appears in the remapped sequence of strategies. The learning procedures described in Foster and Vohra [13] and Hart and Mas Colell [27] satisfy the property of no internal regret. Early no external regret al..gorithms were discovered by Blackwell [4] Hannan [26] Banos [2] and Megiddo [35] recently, no external regret al..gorithms appeared in Cover [8] Freund and Schapire [14] and Auer, Cesa Bianchi, ....

....12 strategies in the new game; nally, take the union of these actions for all orderings. Since the ordering is non strict, asynchronicity de ned in this way incorporates synchronicity, from which it follows that D 1 S 1 O 1 . Another result of interest, due to Foster and Vohra [13], is that a set of no internal regret learners converges to a correlated equilibrium. Note that the support of a set of correlated equilibria is a subset of D 1 ; in other words, correlated equilibria do not assign positive probabilities to strategies outside D 1 , but neither do they ....

[Article contains additional citation context not shown here]

D. Foster and R. Vohra. Regret in the on-line decision problem. Games and Economic Behavior, 21:40-55, 1997.


Shopbots and Pricebots - Greenwald, Kephart (1999)   (34 citations)  (Correct)

....probability, and exploit successful actions by increasing the probability of employing those actions that generate high profits. In this study, we confine our attention to the no external regret al..gorithm due to Freund and Schapire [14] and the no internal regret al..gorithm of Foster and Vohra [12]. 9 As the no regret al..gorithms are inherently non deterministic, they are candidates for learning mixed strategy equilibria. 8 In the game theoretic literature, this strategy is often referred to as Cournot bestreply dynamics [7] however, price is being set, rather than quantity. 9 For ....

.... (NIR) As described in [16] there are a number of learning algorithms that satisfy the no external regret optimality criterion (e.g. Foster and Vohra [11] and Freund and Schapire [14] similarly, the no internal regret optimality criterion is satisfied by algorithms due to both Foster and Vohra [12] and Hart and Mas Colell [22] In this section, we discuss simulations of NER pricebots a la Freund and Schapire and NIR pricebots a la Foster and Vohra. Rather than consider 5 pricebots as above, we limit our attention to merely 2 NR pricebots, since the dynamics of 2 such pricebots converges ....

[Article contains additional citation context not shown here]

D. Foster and R. Vohra. Regret in the on-line decision problem. Games and Economic Behavior, 21:40--55, 1997.


Strategic Pricebot Dynamics - Greenwald, Kephart, Tesauro (1999)   (16 citations)  (Correct)

....pricing as well. GT is a constant function since it makes no use of historical observations. Nonetheless, it is of interest in our simulation studies in part because there exist learning algorithms that converge to stage game theoretic equilibria over repeated play (see Foster and Vohra [6] and Greenwald [8] MY The myopically optimal, or myoptimal , 4 pricing strategy (see, for example, 11] uses information about all the buyer characteristics that factor into the buyer demand function, as well as competitors prices, but makes no attempt to account for competitors pricing ....

D. Foster and R. Vohra. Regret in the on-line decision problem. Games and Economic Behavior, 21:40--55, 1997.


Automated Learning in Network Games - Mishra, Parikh, Greenwald (1998)   (Correct)

....into two camps. The high rationality approach involves learning algorithms which aim to predict the strategies of their opponents, and myopically optimize with respect to those predictions. The prediction methods can be Bayesian (as in Kalai and Lehrer [33] calibrated (as in Foster and Vohra [18]) or consistent (as in Fudenberg and Levine [22, 21] Typically the asymptotic play of such algorithms are either correlated or Nash equilibria. Since these algorithms depend on knowledge of the underlying structure of the game, they are not applicable in the network contexts which we are ....

D. Foster and R. Vohra. Regret in the on-line decision problem. Preprint, 1997.


On-Line Algorithms for Combining Language Models - Kalai, Chen, Blum, Rosenfeld (1998)   (3 citations)  (Correct)

.... algorithm associated with model p(wjh) The perplexity of a text T , PPp(T ) which we use to report our results, is defined as follows: PPp(T ) 2 Hp (T ) 1 p(T ) 1 t : This first goal is addressed by what we call the SELECTOR algorithm, which has been analyzed in several fields [6]. We view it as a special case of the problem of predicting from expert advice, described further in [1] 2.1. Selector The SELECTOR is a supermodel which does almost as well as the single best of its constituent language models, regardless of the text. One way to describe a language model is to ....

.... 2 Deltag Delta : This combination of infinitely many language models is a description of a probability distribution rather than a description of an algorithm for computing p mix (wjh) An O(t m Gamma1 ) implementation is described in [4] but there are faster approximations such as tiling [6] or sampling [2] Cover s algorithm, translated to the language modeling domain, comes with the following guarantee for all j: Hp mix (T ) Hp j (T ) m Gamma 1) log 2 t t : The above cross entropy overhead of (m Gamma 1) log 2 t t is small for large test sets. For example, over 10,000 words ....

D. Foster and R. Vohra. Regret in the on-line decision problem. In Something for Nothing Workshop, May 1995.


On-Line Algorithms for Combining Language Models - Kalai, Chen, Blum, Rosenfeld (1998)   (3 citations)  (Correct)

.... compression algorithm associated with model p(wjh) The perplexity of a text T , PPp(T ) which we use to report our results, is defined as follows, PPp(T ) 2 Hp (T ) 1 p(T ) This first goal is addressed by what we call the SELECTOR algorithm, which has been analyzed in several fields [6]. We view it as a special case of the problem of predicting from expert advice, described further in [1] 2.1. Selector The SELECTOR is a supermodel which does almost as well as the single best of its constituent language models, regardless of the text. One way to describe a language model is to ....

.... 2 Deltag Delta : This combination of infinitely many language models is a description of a probability distribution rather than a description of an algorithm for computing p mix (wjh) An O(t m Gamma1 ) implementation is described in [4] but there are faster approximations such as tiling [6] or sampling [2] Cover s algorithm, translated to the language modeling domain, comes with the following guarantee for all j, Hp mix (T ) Hp j (T ) m Gamma 1) log 2 t t : Guarantees of the performance of Cover s algorithm and the two approximations can be found in [4, 5] 2] and [6] ....

[Article contains additional citation context not shown here]

D. Foster and R. Vohra. Regret in the on-line decision problem. In Something for Nothing Workshop, May 1995.


Computing Equilibria in Multi-Player Games - Christos Papadimitriou Tim   (Correct)

No context found.

D. Foster and R. Vohra. Regret in the on-line decision problem. Games and Economic Behavior, 29:7--35, 1999.


Correlated-Q Learning - Amy Greenwald Amy   (Correct)

No context found.

D. Foster and R. Vohra. Regret in the on-line decision problem. Games and Economic Behavior, 21:40-55, 1997.


Probabilistic Pricebots - Amy Greenwald Department (2000)   (3 citations)  (Correct)

No context found.

D. Foster and R. Vohra. Regret in the on-line decision problem. Games and Economic Behavior, 21:40-55, 1997.


From External to Internal Regret - Avrim Blum Avrim   (Correct)

No context found.

D. Foster and R. Vohra. Regret in the on-line decision problem. Games and Economic Behavior, 29:7--36, 1999.


On-Line Algorithms For Combining Language Models - Adam Kalai Stanley (1998)   (3 citations)  (Correct)

No context found.

D. Foster and R. Vohra. Regret in the on-line decision problem. In Something for Nothing Workshop, May 1995.


Bounds for Regret-Matching Algorithms - Amy Greenwald Amy   (Correct)

No context found.

Dean Foster and Rakesh Vohra. Regret in the on-line decision problem. Games and Economic Behavior, 29: 7--35, 1999.


Bounds for Regret-Matching Algorithms - Amy Greenwald Amy   (Correct)

No context found.

D. Foster and R. Vohra. Regret in the on-line decision problem, 1995.


From External to Internal Regret - Blum, Mansour (2004)   (Correct)

No context found.

D. Foster and R. Vohra. Regret in the on-line decision problem. Games and Economic Behavior, 21:40--55, 1997.


Correlated-Q Learning - Amy   (Correct)

No context found.

D. Foster and R. Vohra. Regret in the on-line decision problem. Games and Economic Behavior, 21:40-55, 1997.


Learning in the Santa Fe Bar Problem - Amy Greenwald Amy   (Correct)

No context found.

D. Foster and R. Vohra. Regret in the on-line decision problem. Games and Economic Behavior, 21:40--55, 1997.


Online Learning of Non-stationary Sequences - Monteleoni, Jaakkola (2003)   (1 citation)  (Correct)

No context found.

D. P. Foster and R. Vohra. Regret in the on-line decision problem. Games and Economic Behavior, 29:7--35, 1999.


Computing Equilibria in Multi-Player Games - Papadimitriou, Roughgarden (2004)   (Correct)

No context found.

D. Foster and R. Vohra. Regret in the on-line decision problem. Games and Economic Behavior, 29:7--35, 1999.


Potential-based Algorithms in On-line Prediction and Game.. - Cesa-Bianchi, Lugosi (2001)   (Correct)

No context found.

D. Foster and R. Vohra. Regret in the on-line decision problem. Games and Economic Behavior, 29:736, 1999.


A General Class of No-Regret Learning Algorithms and.. - Greenwald, Jafari (2003)   (1 citation)  (Correct)

No context found.

D. Foster and R. Vohra. Regret in the on-line decision problem. Games and Economic Behavior, 21:40-55, 1997.


QnR-Learning in Markov Games - David Gondek Amy   (Correct)

No context found.

D. Foster and R. Vohra. Regret in the on-line decision problem. Games and Economic Behavior, 21:40-55, 1997.


On No-Regret Learning, Fictitious Play, and Nash.. - Greenwald, Jafari.. (2001)   (Correct)

No context found.

D. Foster and R. Vohra. Regret in the on-line decision problem. Games and Economic Behavior, 21:40-55, 1997.


Online Convex Programming and Generalized Infinitesimal Gradient .. - Zinkevich (2003)   (6 citations)  (Correct)

No context found.

D. Foster and R. Vohra. Regret in the on-line decision problem. Games and Economic Behavior, 29(1):7-35, 1999.


Efficient Algorithms for Universal Portfolios - Kalai, Vempala (2002)   (Correct)

No context found.

D.P. Foster and R.V. Vohra. Regret in the On-line Decision Problem. Games and Economic Behavior, 29(1/2):7-35, 1999.


Online Learning of Non-stationary Sequences - Claire Monteleoni And (2003)   (1 citation)  (Correct)

No context found.

D. P. Foster and R. Vohra. Regret in the on-line decision problem. Games and Economic Behavior, 29:7--35, 1999.


Learning and Implementation on the Internet - Friedman, Shenker (1998)   (18 citations)  (Correct)

No context found.

D. Foster and R. Vohra. Regret in the on-line decision problem. Mimeo, 1997.


Foresight-Based Pricing Algorithms in an Economy of Software.. - Tesauro, Kephart (1998)   (6 citations)  (Correct)

No context found.

D. Foster and R. Vohra, "Regret in the on-line decision problem." Games and Economic Behavior, to appear, 1998.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC