Results 1 - 10
of
35
Playing games with approximation algorithms
- In Proceedings of the 39 th annual ACM Symposium on Theory of Computing
, 2007
"... Abstract. In an online linear optimization problem, on each period t, an online algorithm chooses st ∈ S from a fixed (possibly infinite) set S of feasible decisions. Nature (who may be adversarial) chooses a weight vector wt ∈ R n, and the algorithm incurs cost c(st, wt), where c is a fixed cost fu ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
Abstract. In an online linear optimization problem, on each period t, an online algorithm chooses st ∈ S from a fixed (possibly infinite) set S of feasible decisions. Nature (who may be adversarial) chooses a weight vector wt ∈ R n, and the algorithm incurs cost c(st, wt), where c is a fixed cost function that is linear in the weight vector. In the full-information setting, the vector wt is then revealed to the algorithm, and in the bandit setting, only the cost experienced, c(st, wt), is revealed. The goal of the online algorithm is to perform nearly as well as the best fixed s ∈ S in hindsight. Many repeated decision-making problems with weights fit naturally into this framework, such as online shortest-path, online TSP, online clustering, and online weighted set cover. Previously, it was shown how to convert any efficient exact offline optimization algorithm for such a problem into an efficient online algorithm in both the full-information and the bandit settings, with average cost nearly as good as that of the best fixed s ∈ S in hindsight. However, in the case where the offline algorithm is an approximation algorithm with ratio α> 1, the previous approach only worked for special types of approximation algorithms. We show how to convert any offline approximation algorithm for a linear optimization problem into a corresponding online approximation algorithm, with a polynomial blowup in runtime. If the offline algorithm has an α-approximation guarantee, then the expected cost of the online algorithm on any sequence is not much larger than α times that of the best s ∈ S, where the best is chosen with the benefit of hindsight. Our main innovation is combining Zinkevich’s algorithm for convex optimization with a geometric transformation that can be applied to any approximation algorithm. Standard techniques generalize the above result to the bandit setting, except that a “Barycentric Spanner ” for the problem is also (provably) necessary as input. Our algorithm can also be viewed as a method for playing large repeated games, where one can only compute approximate best-responses, rather than best-responses. 1. Introduction. In the 1950’s
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
, 2010
"... Stochastic subgradient methods are widely used, well analyzed, and constitute effective tools for optimization and online learning. Stochastic gradient methods ’ popularity and appeal are largely due to their simplicity, as they largely follow predetermined procedural schemes. However, most common s ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
Stochastic subgradient methods are widely used, well analyzed, and constitute effective tools for optimization and online learning. Stochastic gradient methods ’ popularity and appeal are largely due to their simplicity, as they largely follow predetermined procedural schemes. However, most common subgradient approaches are oblivious to the characteristics of the data being observed. We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning. The adaptation, in essence, allows us to find needles in haystacks in the form of very predictive but rarely seenfeatures. Ourparadigmstemsfromrecentadvancesinstochasticoptimizationandonlinelearning which employ proximal functions to control the gradient steps of the algorithm. We describe and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal function that can be chosen in hindsight. In a companion paper, we validate experimentally our theoretical analysis and show that the adaptive subgradient approach outperforms state-of-the-art, but non-adaptive, subgradient algorithms. 1
High-Probability Regret Bounds for Bandit Online Linear Optimization
"... We present a modification of the algorithm of Dani et al. [8] for the online linear optimization problem in the bandit setting, which with high probability has regret at most O ∗ ( √ T) against an adaptive adversary. This improves on the previous algorithm [8] whose regret is bounded in expectatio ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
We present a modification of the algorithm of Dani et al. [8] for the online linear optimization problem in the bandit setting, which with high probability has regret at most O ∗ ( √ T) against an adaptive adversary. This improves on the previous algorithm [8] whose regret is bounded in expectation against an oblivious adversary. We obtain the same dependence on the dimension (n 3/2) as that exhibited by Dani et al. The results of this paper rest firmly on those of [8] and the remarkable technique of Auer et al. [2] for obtaining highprobability bounds via optimistic estimates. This paper answers an open question: it eliminates the gap between the high-probability bounds obtained in the full-information vs bandit settings. 1
Combinatorial Bandits
"... We study sequential prediction problems in which, at each time instance, the forecaster chooses a binary vector from a certain fixed set S ⊆ {0, 1} d and suffers a loss that is the sum of the losses of those vector components that equal to one. The goal of the forecaster is to achieve that, in the l ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
We study sequential prediction problems in which, at each time instance, the forecaster chooses a binary vector from a certain fixed set S ⊆ {0, 1} d and suffers a loss that is the sum of the losses of those vector components that equal to one. The goal of the forecaster is to achieve that, in the long run, the accumulated loss is not much larger than that of the best possible vector in the class. We consider the “bandit ” setting in which the forecaster has only access to the losses of the chosen vectors. We introduce a new general forecaster achieving a regret bound that, for a variety of concrete choices of S, is of order √ nd ln |S | where n is the time horizon. This is not improvable in general and is better than previously known bounds. We also point out that computationally efficient implementations for various interesting choices of S exist. 1
Hedging structured concepts
- In COLT
, 2010
"... We develop an online algorithm called Component Hedge for learning structured concept classes when the loss of a structured concept sums over its components. Example classes include paths through a graph (composed of edges) and partial permutations (composed of assignments). The algorithm maintains ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
We develop an online algorithm called Component Hedge for learning structured concept classes when the loss of a structured concept sums over its components. Example classes include paths through a graph (composed of edges) and partial permutations (composed of assignments). The algorithm maintains a parameter vector with one non-negative weight per component, which always lies in the convex hull of the structured concept class. The algorithm predicts by decomposing the current parameter vector into a convex combination of concepts and choosing one of those concepts at random. The parameters are updated by first performing a multiplicative update and then projecting back into the convex hull. We show that Component Hedge has optimal regret bounds for a large variety of structured concept classes. 1
Beating the adaptive bandit with high probability
, 2009
"... We provide a principled way of proving Õ( √ T) high-probability guarantees for partial-information (bandit) problems over convex decision sets. First, we prove a regret guarantee for the full-information problem in terms of “local ” norms, both for entropy and self-concordant barrier regularization, ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
We provide a principled way of proving Õ( √ T) high-probability guarantees for partial-information (bandit) problems over convex decision sets. First, we prove a regret guarantee for the full-information problem in terms of “local ” norms, both for entropy and self-concordant barrier regularization, unifying these methods. Given one of such algorithms as a black-box, we can convert a bandit problem into a full-information problem using a sampling scheme. The main result states that a high-probability Õ ( √ T) bound holds whenever the black-box, the sampling scheme, and the estimates of missing information satisfy a number of conditions, which are relatively easy to check. At the heart of the method is a construction of linear upper bounds on confidence intervals. As applications of the main result, we provide the first known efficient algorithm for the sphere with an Õ( √ T) high-probability bound. We also derive the result for the n-simplex, improving the O ( √ nT log(nT)) bound of Auer et al [3] by replacing the log T term with log log T and closing the gap to the lower bound of Ω ( √ nT). The guarantees we obtain hold for adaptive adversaries (unlike the in-expectation results of [1]) and the algorithms are efficient, given that the linear upper bounds on confidence can be computed. 1
Better Algorithms for Benign Bandits
"... The online bandit problem is a repeated decision making problem, where the goal is to select one of several possible decisions in every round, and incur a cost associated with the decision, in such a way that the total cost incurred over all iterations is close to the cost of the best fixed decision ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
The online bandit problem is a repeated decision making problem, where the goal is to select one of several possible decisions in every round, and incur a cost associated with the decision, in such a way that the total cost incurred over all iterations is close to the cost of the best fixed decision in hindsight. The difference in these costs is known as the regret of the algorithm. The term bandit refers to the setting where one only obtains the cost of the decision used in a given iteration and no other information. Perhaps the most general form of this problem is the non-stochastic bandit linear optimization problem, where the set of decisions is a convex set in some Euclidean space, and the cost functions are linear. Only recently an efficient algorithm attaining Õ( √ T) regret was discovered in this setting. In this paper we propose a new algorithm for the bandit linear optimization problem which obtains a regret bound of Õ( √ Q), where Q is the total variation in the cost functions. This regret bound shows that it is possible to incur much less regret in a slowly changing environment, and in fact, matches regret bounds that are attainable in the simpler stochastic setting, when the cost functions are obtained from a probability distribution. This regret bound, previously conjectured to hold in the full information case, is surprisingly attainable even in the bandit setting. Our algorithm is efficient and applies several new ideas to bandit optimization such as reservoir sampling. 1
Contextual Bandits with Similarity Information
- 24TH ANNUAL CONFERENCE ON LEARNING THEORY
, 2011
"... In a multi-armed bandit (MAB) problem, an online algorithm makes a sequence of choices. In each round it chooses from a time-invariant set of alternatives and receives the payoff associated with this alternative. While the case of small strategy sets is by now wellunderstood, a lot of recent work ha ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
In a multi-armed bandit (MAB) problem, an online algorithm makes a sequence of choices. In each round it chooses from a time-invariant set of alternatives and receives the payoff associated with this alternative. While the case of small strategy sets is by now wellunderstood, a lot of recent work has focused on MAB problems with exponentially or infinitely large strategy sets, where one needs to assume extra structure in order to make the problem tractable. In particular, recent literature considered information on similarity between arms. We consider similarity information in the setting of contextual bandits, a natural extension of the basic MAB problem where before each round an algorithm is given the context – a hint about the payoffs in this round. Contextual bandits are directly motivated by placing advertisements on webpages, one of the crucial problems in sponsored search. A particularly simple way to represent similarity information in the contextual bandit setting is via a similarity distance between the context-arm pairs which bounds from above the difference between the respective expected payoffs. Prior work
Self-concordant analysis for logistic regression
"... Most of the non-asymptotic theoretical work in regression is carried out for the square loss, where estimators can be obtained through closed-form expressions. In this paper, we use and extend tools from the convex optimization literature, namely self-concordant functions, to provide simple extensio ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Most of the non-asymptotic theoretical work in regression is carried out for the square loss, where estimators can be obtained through closed-form expressions. In this paper, we use and extend tools from the convex optimization literature, namely self-concordant functions, to provide simple extensions of theoretical results for the square loss to the logistic loss. We apply the extension techniques to logistic regression with regularization by the ℓ2-norm and regularization by the ℓ1-norm, showing that new results for binary classification through logistic regression can be easily derived from corresponding results for least-squares regression. 1
Characterizing truthful multi-armed bandit mechanisms
- In ACM-EC
, 2009
"... We consider a multi-round auction setting motivated by payper-click auctions for Internet advertising. In each round the auctioneer selects an advertiser and shows her ad, which is then either clicked or not. An advertiser derives value from clicks; the value of a click is her private information. I ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We consider a multi-round auction setting motivated by payper-click auctions for Internet advertising. In each round the auctioneer selects an advertiser and shows her ad, which is then either clicked or not. An advertiser derives value from clicks; the value of a click is her private information. Initially, neither the auctioneer nor the advertisers have any information about the likelihood of clicks on the advertisements. The auctioneer’s goal is to design a (dominant strategies) truthful mechanism that (approximately) maximizes the social welfare. If the advertisers bid their true private values, our problem is equivalent to the multi-armed bandit problem, and thus can be viewed as a strategic version of the latter. In particular, for both problems the quality of an algorithm can be characterized by regret, the difference in social welfare between the algorithm and the benchmark which always selects the same“best”advertisement. We investigate how the design of multi-armed bandit algorithms is affected by the restriction that the resulting mechanism must be truthful. We find that truthful mechanisms have certain strong structural properties – essentially, they must separate exploration from exploitation – and they incur much higher regret than the optimal multi-armed bandit algorithms. Moreover, we provide a truthful mechanism which (essentially) matches our lower bound on regret.

