Abstract:
Convex programming involves a convex set F R n and a convex function c: F! R. The goal of convex programming is to nd a point in F which minimizes c. In this paper, we introduce online convex programming. In online convex programming, the convex set is known in advance, but in each step of some repeated optimization problem, one must select a point in F before seeing the cost function for that step. This can be used to model factory production, farm production, and many other industrial optimization problems where one is unaware of the value of the items produced until they have already been constructed. We introduce an algorithm for this domain, apply it to repeated games, and show that it is really a generalization of innitesimal gradient ascent, and the results here imply that generalized in nitesimal gradient ascent (GIGA) is universally consistent.
Citations
|
437
|
The weighted majority algorithm
– Littlestone, Warmuth
- 1994
|
|
279
|
The theory of learning in games
– Fudenberg, Levine
- 1998
|
|
159
|
Exponentiated gradient versus gradient descent for linear predictors
– Kivinen, Warmuth
- 1997
|
|
122
|
Probability inequalities for sums of bounded random variables
– Hoeding
- 1963
|
|
89
|
An analog of the minimax theorem for vector payoffs
– Blackwell
- 1956
|
|
86
|
Adaptive game playing using multiplicative weights
– Freund, Schapire
- 1999
|
|
75
|
Regret in the on-line decision problem
– Foster, Vohra
- 1997
|
|
47
|
Relative loss bounds for multidimensional regression problems
– Kivinen, Warmuth
|
|
34
|
A general class of adaptative strategies
– Hart, Mas-Colell
|
|
26
|
Tracking the best linear predictor
– Herbster, Warmuth
|
|
24
|
Efficient algorithms for on-line optimization
– Kalai, Vempala
- 2003
|
|
22
|
Natural gradient works eciently in learning
– Amari
- 1998
|
|
21
|
Conditional universal consistency
– Fudenberg, Levine
- 1999
|
|
12
|
2001a, ‘Convergence of Gradient Dynamics with a Variable Learning Rate
– Bowling, Veloso
|
|
9
|
Universal consistency and cautious play
– Fudenberg, Levine
- 1995
|
|
8
|
Proving relative loss bounds for on-line learning algorithms using bregman divergences
– Gentile, Warmuth
- 2000
|
|
7
|
Online oblivious routing
– Bansal, Blum, et al.
- 2003
|
|
5
|
Prior knowledge and preferential structures in gradient descent algorithms
– Mahony, Williamson
- 2001
|
|
3
|
Duality and auxilary functions for Bregman distances
– Pietra, Pietra, et al.
- 1999
|
|
3
|
A proof of calibration via Blackwell’s approachability theorem
– Foster
- 1999
|
|
2
|
Worst-case quadratic bounds for online prediction of linear functions by gradient descent
– Cesa-Bianchi, Long, et al.
- 1994
|
|
1
|
Approximation to bayes risk in repeated play. Annals of Mathematics Studies
– Hannan
- 1957
|