Results 1  10
of
27
MultiArmed Bandits in Metric Spaces
 STOC'08
, 2008
"... In a multiarmed bandit problem, an online algorithm chooses from a set of strategies in a sequence of n trials so as to maximize the total payoff of the chosen strategies. While the performance of bandit algorithms with a small finite strategy set is quite well understood, bandit problems with larg ..."
Abstract

Cited by 88 (12 self)
 Add to MetaCart
In a multiarmed bandit problem, an online algorithm chooses from a set of strategies in a sequence of n trials so as to maximize the total payoff of the chosen strategies. While the performance of bandit algorithms with a small finite strategy set is quite well understood, bandit problems with large strategy sets are still a topic of very active investigation, motivated by practical applications such as online auctions and web advertisement. The goal of such research is to identify broad and natural classes of strategy sets and payoff functions which enable the design of efficient solutions. In this work we study a very general setting for the multiarmed bandit problem in which the strategies form a metric space, and the payoff function satisfies a Lipschitz condition with respect to the metric. We refer to this problem as the Lipschitz MAB problem. We present a complete solution for the multiarmed problem in this setting. That is, for every metric space (L, X) we define an isometry invariant MaxMinCOV(X) which bounds from below the performance of Lipschitz MAB algorithms for X, and we present an algorithm which comes arbitrarily close to meeting this bound. Furthermore, our technique gives even better results for benign payoff functions.
Approximation algorithms and online mechanisms for item pricing
, 2007
"... We present approximation and online algorithms for problems of pricing a collection of items for sale so as to maximize the seller’s revenue in an unlimited supply setting. Our first result is an O(k)approximation algorithm for pricing items to singleminded bidders who each want at most k items. ..."
Abstract

Cited by 75 (9 self)
 Add to MetaCart
(Show Context)
We present approximation and online algorithms for problems of pricing a collection of items for sale so as to maximize the seller’s revenue in an unlimited supply setting. Our first result is an O(k)approximation algorithm for pricing items to singleminded bidders who each want at most k items. This improves over work of Briest and Krysta (2006) who achieve an O(k2) bound. For the case k = 2, where we obtain a 4approximation, this can be viewed as the following graph vertex pricing problem: given a (multi) graph G with valuations wi j on the edges, find prices pi ≥ 0 for the vertices to maximize {(i, j):wi j≥pi+p j} (pi + p j). We also improve the approximation of Guruswami et al. (2005) for the “highway problem” in which all desired subsets are intervals on a line, from O(logm+ logn) to O(logn), where m is the number of bidders and n is the number of items. Our approximation algorithms can
Regret minimization and the price of total anarchy
, 2008
"... We propose weakening the assumption made when studying the price of anarchy: Rather than assume that selfinterested players will play according to a Nash equilibrium (which may even be computationally hard to find), we assume only that selfish players play so as to minimize their own regret. Regret ..."
Abstract

Cited by 59 (10 self)
 Add to MetaCart
(Show Context)
We propose weakening the assumption made when studying the price of anarchy: Rather than assume that selfinterested players will play according to a Nash equilibrium (which may even be computationally hard to find), we assume only that selfish players play so as to minimize their own regret. Regret minimization can be done via simple, efficient algorithms even in many settings where the number of action choices for each player is exponential in the natural parameters of the problem. We prove that despite our weakened assumptions, in several broad classes of games, this “price of total anarchy” matches the Nash price of anarchy, even though play may never converge to Nash equilibrium. In contrast to the price of anarchy and the recently introduced price of sinking [15], which require all players to behave in a prescribed manner, we show that the price of total anarchy is in many cases resilient to the presence of Byzantine players, about whom we make no assumptions. Finally, because the price of total anarchy is an upper bound on the price of anarchy even in mixed strategies, for some games our results yield as corollaries previously unknown bounds on the price of anarchy in mixed strategies.
An online algorithm for maximizing submodular functions
, 2007
"... We present an algorithm for solving a broad class of online resource allocation jobs arrive one at a time, and one can complete the jobs by investing time in a number of abstract activities, according to some schedule. We assume that the fraction of jobs completed by a schedule is a monotone, submod ..."
Abstract

Cited by 59 (12 self)
 Add to MetaCart
(Show Context)
We present an algorithm for solving a broad class of online resource allocation jobs arrive one at a time, and one can complete the jobs by investing time in a number of abstract activities, according to some schedule. We assume that the fraction of jobs completed by a schedule is a monotone, submodular function of a set of pairs (v, τ), where τ is the time invested in activity v. Under this assumption, our online algorithm performs nearoptimally according to two natural metrics: (i) the fraction of jobs completed within time T, for some fixed deadline T> 0, and (ii) the average time required to complete each job. We evaluate our algorithm experimentally by using it to learn, online, a schedule for allocating CPU time among solvers entered in the 2007 SAT solver competition. 1
Contextual Bandits with Similarity Information
 24TH ANNUAL CONFERENCE ON LEARNING THEORY
, 2011
"... In a multiarmed bandit (MAB) problem, an online algorithm makes a sequence of choices. In each round it chooses from a timeinvariant set of alternatives and receives the payoff associated with this alternative. While the case of small strategy sets is by now wellunderstood, a lot of recent work ha ..."
Abstract

Cited by 56 (8 self)
 Add to MetaCart
(Show Context)
In a multiarmed bandit (MAB) problem, an online algorithm makes a sequence of choices. In each round it chooses from a timeinvariant set of alternatives and receives the payoff associated with this alternative. While the case of small strategy sets is by now wellunderstood, a lot of recent work has focused on MAB problems with exponentially or infinitely large strategy sets, where one needs to assume extra structure in order to make the problem tractable. In particular, recent literature considered information on similarity between arms. We consider similarity information in the setting of contextual bandits, a natural extension of the basic MAB problem where before each round an algorithm is given the context – a hint about the payoffs in this round. Contextual bandits are directly motivated by placing advertisements on webpages, one of the crucial problems in sponsored search. A particularly simple way to represent similarity information in the contextual bandit setting is via a similarity distance between the contextarm pairs which bounds from above the difference between the respective expected payoffs. Prior work
Combinatorial MultiArmed Bandit: General Framework, Results and Applications
"... We define a general framework for a large class of combinatorial multiarmed bandit (CMAB) problems, where simple arms with unknown distributions form super arms. In each round, a super arm is played and the outcomes of its related simple arms are observed, which helps the selection of super arms in ..."
Abstract

Cited by 27 (4 self)
 Add to MetaCart
(Show Context)
We define a general framework for a large class of combinatorial multiarmed bandit (CMAB) problems, where simple arms with unknown distributions form super arms. In each round, a super arm is played and the outcomes of its related simple arms are observed, which helps the selection of super arms in future rounds. The reward of the super arm depends on the outcomes of played arms, and it only needs to satisfy two mild assumptions, which allow a large class of nonlinear reward instances. We assume the availability of an (α, β)approximation oracle that takes the means of the distributions of arms and outputs a super arm that with probability β generates an α fraction of the optimal expected reward. The objective of a CMAB algorithm is to minimize (α, β)approximation regret, which is the difference in total expected reward between the αβ fraction of expected reward when always playing the optimal super arm, and the expected reward of playing super arms according to the algorithm. We provide CUCB algorithm that achieves O(log n) regret, where n is the number of rounds played, and we further provide distributionindependent bounds for a large class of reward functions. Our regret analysis is tight in that it matches the bound for classical MAB problem up to a constant factor, and it significantly improves the regret bound Proceedings of the 30 th
Approximation Algorithms for Reliable Stochastic Combinatorial Optimization
, 2010
"... We consider optimization problems that can be formulated as minimizing the cost of a feasible solution wTx over an arbitrary combinatorial feasible set F ⊂ {0, 1} n. For these problems we describe a broad class of corresponding stochastic problems where the cost vector W has independent random compo ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
(Show Context)
We consider optimization problems that can be formulated as minimizing the cost of a feasible solution wTx over an arbitrary combinatorial feasible set F ⊂ {0, 1} n. For these problems we describe a broad class of corresponding stochastic problems where the cost vector W has independent random components, unknown at the time of solution. A natural and important objective that incorporates risk in this stochastic setting is to look for a feasible solution whose stochastic cost has a small tail or a small convex combination of mean and standard deviation. Our models can be equivalently reformulated as nonconvex programs for which no efficient algorithms are known. In this paper, we make progress on these hard problems. Our results are several efficient generalpurpose approximation schemes. They use as a blackbox (exact or approximate) the solution to the underlying deterministic problem and thus immediately apply to arbitrary combinatorial problems. For example, from an available δapproximation algorithm to the linear problem, we construct a δ(1 + ǫ)approximation algorithm for the stochastic problem, which invokes the linear algorithm only a logarithmic number of times in the problem input (and polynomial in 1 ǫ), for any desired accuracy level ǫ> 0. The algorithms are based on a geometric analysis of the curvature and approximability of the nonlinear level sets of the objective functions.
Multiarmed bandits on implicit metric spaces
"... slivkins at microsoft.com The multiarmed bandit (MAB) setting is a useful abstraction of many online learning tasks which focuses on the tradeoff between exploration and exploitation. In this setting, an online algorithm has a fixed set of alternatives (“arms”), and in each round it selects one ar ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
(Show Context)
slivkins at microsoft.com The multiarmed bandit (MAB) setting is a useful abstraction of many online learning tasks which focuses on the tradeoff between exploration and exploitation. In this setting, an online algorithm has a fixed set of alternatives (“arms”), and in each round it selects one arm and then observes the corresponding reward. While the case of small number of arms is by now wellunderstood, a lot of recent work has focused on multiarmed bandits with (infinitely) many arms, where one needs to assume extra structure in order to make the problem tractable. In particular, in the Lipschitz MAB problem there is an underlying similarity metric space, known to the algorithm, such that any two arms that are close in this metric space have similar payoffs. In this paper we consider the more realistic scenario in which the metric space is implicit – it is defined by the available structure but not revealed to the algorithm directly. Specifically, we assume that an algorithm is given a treebased classification of arms. For any given problem instance such a classification implicitly defines a similarity metric space, but the numerical similarity information is not available to the algorithm. We provide an algorithm for this setting, whose performance guarantees (almost) match the best known guarantees for the corresponding instance of the Lipschitz MAB problem. 1
NetworkWide Deployment of Intrusion Detection and Prevention Systems
, 2010
"... Traditional research efforts for scaling NIDS and NIPS systems using parallelization and hardwareassisted acceleration have largely focused on a singlevantagepoint view. In this chapter, we explore a different design alternative that exploits spatial, networkwide opportunities for distributing NI ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
(Show Context)
Traditional research efforts for scaling NIDS and NIPS systems using parallelization and hardwareassisted acceleration have largely focused on a singlevantagepoint view. In this chapter, we explore a different design alternative that exploits spatial, networkwide opportunities for distributing NIDS and NIPS functions throughout a network. We present systematic models that capture the operational constraints and requirements in deploying networkwide NIDS and NIPS capabilities. These formulations enable network administrators to optimally leverage their infrastructure toward their security objectives. For the NIDS case, we design a linear programming formulation for partitioning NIDS functions across a network to ensure that no node is overloaded. We also describe and evaluate a prototype implementation using Bro. For NIPS, we show how to maximally reduce unwanted traffic using special hardwareassisted capabilities. In this case, the hardware constraints make the optimization problem NPhard, and we design and implement practical approximation algorithms based on randomized rounding. These results have immediate practical implications as: (1) enterprise networks become larger and their traffic volumes increase; and (2) ISPs increasingly deploy NIDS/NIPS capabilities as innetwork defenses. By leveraging networkwide opportunities for distributing NIDS/NIPS responsibilities, our work effectively complements efforts to scale
Online Learning for Global Cost Functions
, 2009
"... We consider an online learning setting where at each time step the decision maker has to choose how to distribute the future loss between k alternatives, and then observes the loss of each alternative. Motivated by load balancing and job scheduling, we consider a global cost function (over the losse ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
We consider an online learning setting where at each time step the decision maker has to choose how to distribute the future loss between k alternatives, and then observes the loss of each alternative. Motivated by load balancing and job scheduling, we consider a global cost function (over the losses incurred by each alternative), rather than a summation of the instantaneous losses as done traditionally in online learning. Such global cost functions include the makespan (the maximum over the alternatives) and the Ld norm (over the alternatives). Based on approachability theory, we design an algorithm that guarantees vanishing regret for this setting, where the regret is measured with respect to the best static decision that selects the same distribution over alternatives at every time step. For the special case of makespan cost we devise a simple and efficient algorithm. In contrast, we show that for concave global cost functions, such as Ld norms for d < 1, the worstcase average regret does not vanish.