Results 1  10
of
46
Learning permutations with exponential weights
 In 20th Annual Conference on Learning Theory
, 2007
"... Abstract. We give an algorithm for learning a permutation online. The algorithm maintains its uncertainty about the target permutation as a doubly stochastic matrix. This matrix is updated by multiplying the current matrix entries by exponential factors. These factors destroy the doubly stochastic ..."
Abstract

Cited by 45 (7 self)
 Add to MetaCart
Abstract. We give an algorithm for learning a permutation online. The algorithm maintains its uncertainty about the target permutation as a doubly stochastic matrix. This matrix is updated by multiplying the current matrix entries by exponential factors. These factors destroy the doubly stochastic property of the matrix and an iterative procedure is needed to renormalize the rows and columns. Even though the result of the normalization procedure does not have a closed form, we can still bound the additional loss of our algorithm over the loss of the best permutation chosen in hindsight. 1
An OptimizationBased Framework for Automated MarketMaking
 EC'11
, 2011
"... We propose a general framework for the design of securities markets over combinatorial or infinite state or outcome spaces. The framework enables the design of computationally efficient markets tailored to an arbitrary, yet relatively small, space of securities with bounded payoff. We prove that any ..."
Abstract

Cited by 25 (10 self)
 Add to MetaCart
(Show Context)
We propose a general framework for the design of securities markets over combinatorial or infinite state or outcome spaces. The framework enables the design of computationally efficient markets tailored to an arbitrary, yet relatively small, space of securities with bounded payoff. We prove that any market satisfying a set of intuitive conditions must price securities via a convex cost function, which is constructed via conjugate duality. Rather than deal with an exponentially large or infinite outcome space directly, our framework only requires optimization over a convex hull. By reducing the problem of automated market making to convex optimization, where many efficient algorithms exist, we arrive at a range of new polynomialtime pricing mechanisms for various problems. We demonstrate the advantages of this framework with the design of some particular markets. We also show that by relaxing the convex hull we can gain computational tractability without compromising the market institution’s bounded budget.
From Bandits to Experts: On the Value of SideObservations
"... We consider an adversarial online learning setting where a decision maker can choose an action in every stage of the game. In addition to observing the reward of the chosen action, the decision maker gets side observations on the reward he would have obtained had he chosen some of the other actions. ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
(Show Context)
We consider an adversarial online learning setting where a decision maker can choose an action in every stage of the game. In addition to observing the reward of the chosen action, the decision maker gets side observations on the reward he would have obtained had he chosen some of the other actions. The observation structure is encoded as a graph, where node i is linked to node j if sampling i provides information on the reward of j. This setting naturally interpolates between the wellknown “experts ” setting, where the decision maker can view all rewards, and the multiarmed bandits setting, where the decision maker can only view the reward of the chosen action. We develop practical algorithms with provable regret guarantees, which depend on nontrivial graphtheoretic properties of the information feedback structure. We also provide partiallymatching lower bounds. 1
Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach
 In International Conference on Machine Learning (ICML
, 2014
"... Markov chain Monte Carlo (MCMC) methods are often deemed far too computationally intensive to be of any practical use for large datasets. This paper describes a methodology that aims to scale up the MetropolisHastings (MH) algorithm in this context. We propose an approximate implementation of th ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
(Show Context)
Markov chain Monte Carlo (MCMC) methods are often deemed far too computationally intensive to be of any practical use for large datasets. This paper describes a methodology that aims to scale up the MetropolisHastings (MH) algorithm in this context. We propose an approximate implementation of the accept/reject step of MH that only requires evaluating the likelihood of a random subset of the data, yet is guaranteed to coincide with the accept/reject step based on the full dataset with a probability superior to a userspecified tolerance level. This adaptive subsampling technique is an alternative to the recent approach developed in (Korattikara et al., 2014), and it allows us to establish rigorously that the resulting approximate MH algorithm samples from a perturbed version of the target distribution of interest, whose total variation distance to this very target is controlled explicitly. We explore the benefits and limitations of this scheme on several examples. 1.
Supplement: NonStochastic Bandit Slate Problems
"... Recall our special variant of Hedge: we are allowed to uses only distributions p(t) from some fixed convex subset P of the simplex of all distributions. The goal then is to minimize regret relative to an arbitrary distribution p ∈ P. Such a version of Hedge is given in Figure 1, and a statement of i ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
(Show Context)
Recall our special variant of Hedge: we are allowed to uses only distributions p(t) from some fixed convex subset P of the simplex of all distributions. The goal then is to minimize regret relative to an arbitrary distribution p ∈ P. Such a version of Hedge is given in Figure 1, and a statement of its performance below. This algorithm is implicit in the work of [4, 6]. Algorithm MW(P) Initialization: An arbitrary probability distribution p(1) ∈ P on the experts, and some η> 0. For t = 1, 2,..., T: 1. Choose distribution p(t) over experts, and observe the cost vector ℓ(t). 2. Compute the probability vector ˆp(t + 1) using the following multiplicative update rule: for every expert i, ˆpi(t + 1) = pi(t) exp(−ηℓi(t))/Z(t) (1) where Z(t) = ∑ i pi(t) exp(−ηℓi(t)) is the normalization factor. 3. Set p(t + 1) to be the projection of ˆp(t + 1) on the set P using the RE as a distance function, i.e. p(t + 1) = arg minp∈P RE(p ‖ ˆp(t + 1)). Figure 1: The Multiplicative Weights Algorithm with Restricted Distributions Theorem 1.1. Assume that η> 0 is chosen so that for all t and i, ηℓi(t) ≥ −1. Then algorithm MW(P) generates distributions p(1),..., p(T) ∈ P, such that for any p ∈ P, T∑ T∑ ℓ(t) · p(t) − ℓ(t) · p ≤ η (ℓ(t)) 2 · p(t) + t=1 Here, (ℓ(t)) 2 is the vector that is the coordinatewise square of ℓ(t). t=1 RE(p ‖ p(1)) η Proof. We use the relative entropy between p and p(t), RE(p ‖ p(t)): = ∑ i pi ln(pi/pi(t)) as a “potential ” function. We have RE(p ‖ ˆp t+1) − RE(p ‖ p(t)) = ∑ pi(t) pi ln
Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit
, 2012
"... We consider a linear stochastic bandit problem where the dimension K of the unknown parameter θ is larger than the sampling budget n. In such cases, it is in general impossible to derive sublinear regret bounds since usual linear bandit algorithms have a regret in O(K √ n). In this paper we assume ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
(Show Context)
We consider a linear stochastic bandit problem where the dimension K of the unknown parameter θ is larger than the sampling budget n. In such cases, it is in general impossible to derive sublinear regret bounds since usual linear bandit algorithms have a regret in O(K √ n). In this paper we assume that θ is S−sparse, i.e. has at most S−nonzero components, and that the space of arms is the unit ball for the .2 norm. We combine ideas from Compressed Sensing and Bandit Theory and derive algorithms with regret bounds in O(S √ n). We detail an application to the problem of optimizing a function that depends on many variables but among which only a small number of them (initially unknown) are relevant.
Minimax policies for combinatorial prediction games
, 2011
"... We address the online linear optimization problem when the actions of the forecaster are represented by binary vectors. Our goal is to understand the magnitude of the minimax regret for the worst possible set of actions. We study the problem under three different assumptions for the feedback: full i ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
We address the online linear optimization problem when the actions of the forecaster are represented by binary vectors. Our goal is to understand the magnitude of the minimax regret for the worst possible set of actions. We study the problem under three different assumptions for the feedback: full information, and the partial information models of the socalled “semibandit”, and “bandit” problems. We consider both L∞, and L2type of restrictions for the losses assigned by the adversary. We formulate a general strategy using Bregman projections on top of a potentialbased gradient descent, which generalizes the ones studied in the series of papers György et al. (2007), Dani et al. (2008), Abernethy et al. (2008), CesaBianchi and Lugosi (2009), Helmbold and Warmuth (2009), Koolen et al. (2010), Uchiya et al. (2010), Kale et al. (2010) and Audibert and Bubeck (2010). We provide simple proofs that recover most of the previous results. We propose new upper bounds for the semibandit game. Moreover we derive lower bounds for all three feedback assumptions. With the only exception of the bandit game, the upper and lower bounds are tight, up to a constant factor. Finally, we answer a question asked by Koolen et al. (2010) by showing that the exponentially weighted average forecaster is suboptimal against L∞ adversaries.
An Efficient Algorithm for Learning with SemiBandit Feedback
, 1305
"... Abstract. We consider the problem of online combinatorial optimization undersemibanditfeedback. Thegoal ofthelearner is tosequentially select its actions from a combinatorial decision set so as to minimize its cumulative loss. We propose a learning algorithm for this problem based on combining the ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
(Show Context)
Abstract. We consider the problem of online combinatorial optimization undersemibanditfeedback. Thegoal ofthelearner is tosequentially select its actions from a combinatorial decision set so as to minimize its cumulative loss. We propose a learning algorithm for this problem based on combining the FollowthePerturbedLeader (FPL) prediction method with a novel loss estimation procedure called Geometric Resampling (GR). Contrary to previous solutions, the resulting algorithm can be efficiently implemented for any decision set where efficient offline combinatorial optimization is possible at all. Assuming that the elements of the decision set can be described with ddimensional binary vectors with at most m nonzero entries, we show that the expected regret of our algorithm after T rounds is O(m √ dT logd). As a side result, we also improve the best known regret bounds for FPL in the full information setting to O(m 3/2 √ T logd), gaining a factor of √ d/m over previous bounds for this algorithm.
Efficient Tracking of Large Classes of Experts
, 2011
"... In the framework for prediction of individual sequences, sequential prediction methods are to be constructed that perform nearly as well as the best expert from a given class. We consider prediction strategies that compete with the class of switching strategies that can segment a given sequence into ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
In the framework for prediction of individual sequences, sequential prediction methods are to be constructed that perform nearly as well as the best expert from a given class. We consider prediction strategies that compete with the class of switching strategies that can segment a given sequence into several blocks, and follow the advice of a different “base ” expert in each block. As usual, the performance of the algorithm is measured by the regret defined as the excess loss relative to the best switching strategy selected in hindsight for the particular sequence to be predicted. In this paper we construct prediction strategies of low computational cost for the case where the set of base experts is large. In particular we derive a family of efficient tracking algorithms that, for any prediction algorithm A designed for the base class, can be implemented with time and space complexity O(n γ log n) times larger than that of A, where n is the time horizon and γ ≥ 0 is a parameter of the algorithm. With A properly chosen, our algorithm achieves a regret bound of optimal order for γ> 0, and only O(log n) times larger than the optimal order for γ = 0 for all typical regret bound types we examined. For example, for predicting binary sequences with switching parameters, our method achieves the optimal O(log n) regret rate with time complexity O(n 1+γ log n) for any γ ∈ (0, 1).
Combinatorial partial monitoring game with linear feedback and its applications. In
 ICML,
, 2014
"... Abstract In online learning, a player chooses actions to play and receives reward and feedback from the environment with the goal of maximizing her reward over time. In this paper, we propose the model of combinatorial partial monitoring games with linear feedback, a model which simultaneously addr ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
(Show Context)
Abstract In online learning, a player chooses actions to play and receives reward and feedback from the environment with the goal of maximizing her reward over time. In this paper, we propose the model of combinatorial partial monitoring games with linear feedback, a model which simultaneously addresses limited feedback, infinite outcome space of the environment and exponentially large action space of the player. We present the Global Confidence Bound (GCB) algorithm, which integrates ideas from both combinatorial multiarmed bandits and finite partial monitoring games to handle all the above issues. GCB only requires feedback on a small set of actions and achieves O(T 2 3 log T ) distributionindependent regret and O(log T ) distributiondependent regret (the latter assuming unique optimal action), where T is the total time steps played. Moreover, the regret bounds only depend linearly on log X  rather than X , where X is the action space. GCB isolates offline optimization tasks from online learning and avoids explicit enumeration of all actions in the online learning part. We demonstrate that our model and algorithm can be applied to a crowdsourcing application leading to both an efficient learning algorithm and low regret, and argue that they can be applied to a wide range of combinatorial applications constrained with limited feedback.