Results

**21 - 29**of**29**### Incentive Design for Adaptive Agents

, 2016

"... (Article begins on next page) The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters. ..."

Abstract
- Add to MetaCart

(Article begins on next page) The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters.

### 1 Imagine

"... Abstract. This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the total number of plays n is known beforehand by the agent, Audibert et al. (2009) exhibit a policy such that with probability at least 1 − 1/n, the regret of the policy is of order log n. Th ..."

Abstract
- Add to MetaCart

Abstract. This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the total number of plays n is known beforehand by the agent, Audibert et al. (2009) exhibit a policy such that with probability at least 1 − 1/n, the regret of the policy is of order log n. They have also shown that such a property is not shared by the popular ucb1 policy of Auer et al. (2002). This work first answers an open question: it extends this negative result to any anytime policy. The second contribution of this paper is to design anytime robust policies for specific multi-armed bandit problems in which some restrictions are put on the set of possible distributions of the different arms. 1

### Bandit Market Makers

"... We propose a flexible framework for profit-seeking market making by combining cost function based automated market makers with bandit learning algorithms. The key idea is to consider each parametrisation of the cost function as a bandit arm, and the minimum expected profits from trades executed duri ..."

Abstract
- Add to MetaCart

(Show Context)
We propose a flexible framework for profit-seeking market making by combining cost function based automated market makers with bandit learning algorithms. The key idea is to consider each parametrisation of the cost function as a bandit arm, and the minimum expected profits from trades executed during a period as the rewards. This allows for the creation of market makers that can adjust liquidity and bid-asks spreads dynamically to maximise profits. 1

### Robustness of anytime bandit policies

"... This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the total number of plays n is known beforehand by the agent, Audibert et al. [2] exhibit a policy such that with probability at least 1 − 1/n, the regret of the policy is of order log n. They have also ..."

Abstract
- Add to MetaCart

(Show Context)
This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the total number of plays n is known beforehand by the agent, Audibert et al. [2] exhibit a policy such that with probability at least 1 − 1/n, the regret of the policy is of order log n. They have also shown that such a property is not shared by the popular ucb1 policy of Auer et al. [3]. This work first answers an open question: it extends this negative result to any anytime policy. Another contribution of this paper is to design anytime robust policies for specific multi-armed bandit problems in which some restrictions are put on the set of possible distributions of the different arms. We also show that, for any policy (i.e. when the number of plays is known), the regret is of order log n with probability at least 1 − 1/n, so that the policy of Audibert et al. has the best possible deviation properties.

### Machine Learning in an Auction Environment

"... We consider a model of repeated online auctions in which an ad with an uncertain click-through rate faces a random distribution of competing bids in each auction and there is discounting of payoffs. We formulate the optimal solution to this explore/exploit problem as a dynamic programming problem an ..."

Abstract
- Add to MetaCart

(Show Context)
We consider a model of repeated online auctions in which an ad with an uncertain click-through rate faces a random distribution of competing bids in each auction and there is discounting of payoffs. We formulate the optimal solution to this explore/exploit problem as a dynamic programming problem and show that efficiency is maximized by making a bid for each advertiser equal to the advertiser’s expected value for the advertising opportunity plus a term propor-tional to the variance in this value divided by the number of impressions the advertiser has received thus far. We then use this result to illustrate that the value of incorporating active exploration into a machine learning system in an auc-tion environment is exceedingly small.

### Lectures on Approximation in Mechanism Design

, 2010

"... 1 Approximation and Mechanism Design 9 1.1 Economics and Computer Science........................ 9 ..."

Abstract
- Add to MetaCart

1 Approximation and Mechanism Design 9 1.1 Economics and Computer Science........................ 9

### Teaching Bandits How to Behave

"... Consider a setting in which an agent selects an action in each time period and there is an interested party who seeks to induce a particular action. The interested party can associate incentives with actions to perturb their value to the agent. The agent’s decision problem is modeled as a multi-arme ..."

Abstract
- Add to MetaCart

Consider a setting in which an agent selects an action in each time period and there is an interested party who seeks to induce a particular action. The interested party can associate incentives with actions to perturb their value to the agent. The agent’s decision problem is modeled as a multi-armed bandit process where the intrinsic value for an action updates independently of the state of other actions and only when the action is selected. The agent selects the action in each period with the maximal perturbed value. In particular, this models the problem of a learning agent with the interested party as a teacher. For inducing the goal action as soon as possible, or as often as possible over a fixed time period, it is optimal for an interested party with a per-period incentive budget to assign the budget to the goal action and wait for the agent to learn to want to make that choice. Teaching is easy in this case. In contrast, with an across-period budget, no algorithm can provide good performance on all instances without knowledge of the agent’s update process, except in the particular case in which the goal is to induce the agent to select the goal action once.

### Truthful Prioritization for Dynamic Bandwidth Sharing ABSTRACT

"... We design a protocol for dynamic prioritization of data on shared routers such as untethered 3G/4G devices. The mechanism prioritizes bandwidth in favor of users with the highest value, and is incentive compatible, so that users can simply report their true values for network access. A revenue pooli ..."

Abstract
- Add to MetaCart

(Show Context)
We design a protocol for dynamic prioritization of data on shared routers such as untethered 3G/4G devices. The mechanism prioritizes bandwidth in favor of users with the highest value, and is incentive compatible, so that users can simply report their true values for network access. A revenue pooling mechanism also aligns incentives for sellers, so that they will choose to use prioritization methods that retain the incentive properties on the buy-side. In this way, the design allows for an open architecture. In addition to revenue pooling, the technical contribution is to identify a class of stochastic demand models and a prioritization scheme that provides allocation monotonicity. Simulation results confirm efficiency gains from dynamic prioritization relative to prior methods, as well as the effectiveness of revenue pooling.