Results 1  10
of
29
Dynamic CostPerAction Mechanisms and Applications to Online Advertising
"... We study the CostPerAction or CostPerAcquisition (CPA) charging scheme in online advertising. In this scheme, instead of paying per click, the advertisers pay only when a user takes a specific action (e.g. fills out a form) or completes a transaction on their websites. We focus on designing effi ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
(Show Context)
We study the CostPerAction or CostPerAcquisition (CPA) charging scheme in online advertising. In this scheme, instead of paying per click, the advertisers pay only when a user takes a specific action (e.g. fills out a form) or completes a transaction on their websites. We focus on designing efficient and incentive compatible mechanisms that use this charging scheme. We describe a mechanism based on a samplingbased learning algorithm that under suitable assumptions is asymptotically individually rational, asymptotically Bayesian incentive compatible and asymptotically exante efficient. In particular, we demonstrate our mechanism for the case where the utility functions of the advertisers are independent and identicallydistributed random variables as well as the case where they evolve like independent reflected Brownian motions.
Deviations of stochastic bandit regret
 In Proceedings of the 22nd international conference on Algorithmic learning theory (ALT’11
, 2011
"... Abstract. This paper studies the deviations of the regret in a stochastic multiarmed bandit problem. When the total number of plays n is known beforehand by the agent, Audibert et al. (2009) exhibit a policy such that with probability at least 1 − 1/n, the regret of the policy is of order logn. The ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
(Show Context)
Abstract. This paper studies the deviations of the regret in a stochastic multiarmed bandit problem. When the total number of plays n is known beforehand by the agent, Audibert et al. (2009) exhibit a policy such that with probability at least 1 − 1/n, the regret of the policy is of order logn. They have also shown that such a property is not shared by the popular ucb1 policy of Auer et al. (2002). This work first answers an open question: it extends this negative result to any anytime policy. The second contribution of this paper is to design anytime robust policies for specific multiarmed bandit problems in which some restrictions are put on the set of possible distributions of the different arms. 1
A Truthful Learning Mechanism for Contextual Multi–Slot Sponsored Search Auctions with Externalities
 EC 13TH ACM CONFERENCE ON ELECTRONIC COMMERCE
, 2012
"... Sponsored search auctions constitute one of the most successful applications of microeconomic mechanisms. In mechanism design, auctions are usually designed to incentivize advertisers to bid their truthful valuations and, at the same time, to assure both the advertisers and the auctioneer a non–nega ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
Sponsored search auctions constitute one of the most successful applications of microeconomic mechanisms. In mechanism design, auctions are usually designed to incentivize advertisers to bid their truthful valuations and, at the same time, to assure both the advertisers and the auctioneer a non–negative utility. Nonetheless, in sponsored search auctions, the click–through–rates (CTRs) of the advertisers are often unknown to the auctioneer and thus standard incentive compatible mechanisms cannot be directly applied and must be paired with an effective learning algorithm for the estimation of the CTRs. This introduces the critical problem of designing a learning mechanism able to estimate the CTRs as the same time as implementing a truthful mechanism with a revenue loss as small as possible compared to an optimal mechanism designed with the true CTRs. Previous works showed that in single–slot auctions the problem can be solved using a suitable exploration–exploitation mechanism able to achieve a per–step regret of order O(T −1/3) (where T is the number of times the auction is repeated). In this paper we extend these results to the general case of contextual multi–slot auctions with position – and ad–dependent externalities. In particular, we prove novel upper–bounds on the revenue loss w.r.t. to a VCG auction and we report numerical simulations investigating their accuracy in predicting the dependency of the regret on the number of rounds T, the number of slots K, and the number of advertisements n.
Learning and incentives in usergenerated content: Multiarmed bandits with endogenous arms.
, 2012
"... Abstract Motivated by the problem of learning the qualities of usergenerated content on the Web, we study a multiarmed bandit problem where the number and success probabilities of the arms of the bandit are endogenously determined by strategic agents in response to the incentives provided by the ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
Abstract Motivated by the problem of learning the qualities of usergenerated content on the Web, we study a multiarmed bandit problem where the number and success probabilities of the arms of the bandit are endogenously determined by strategic agents in response to the incentives provided by the learning algorithm. We model the contributors of usergenerated content as attentionmotivated agents who derive benefit when their contribution is displayed, and have a cost to quality, where a contribution's quality is the probability of its receiving a positive viewer vote. Agents strategically choose whether and what quality contribution to produce in response to the algorithm that decides how to display contributions. The algorithm, which would like to eventually only display the highest quality contributions, can only learn a contribution's quality from the viewer votes the contribution receives when displayed. The problem of inferring the relative qualities of contributions using viewer feedback, to optimize for overall viewer satisfaction over time, can then be modeled as the classic multiarmed bandit problem, except that the arms available to the bandit and therefore the achievable regret are endogenously determined by strategic agents a good algorithm for this setting must not only quickly identify the best contributions, but also incentivize highquality contributions to choose amongst in the first place. We first analyze the wellknown UCB algorithm M UCB
Truthful multi–armed bandit mechanisms for multi–slot sponsored search auctions
 Current Science, Special Issue on Game Theory 103 (9
, 2012
"... ar ..."
(Show Context)
MultiArmed Bandit with Budget Constraint and Variable Costs
"... We study the multiarmed bandit problems with budget constraint and variable costs (MABBV). In this setting, pulling an arm will receive a random reward together with a random cost, and the objective of an algorithm is to pull a sequence of arms in order to maximize the expected total reward with t ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
We study the multiarmed bandit problems with budget constraint and variable costs (MABBV). In this setting, pulling an arm will receive a random reward together with a random cost, and the objective of an algorithm is to pull a sequence of arms in order to maximize the expected total reward with the costs of pulling those arms complying with a budget constraint. This new setting models many Internet applications (e.g., ad exchange, sponsored search, and cloud computing) in a more accurate manner than previous settings where the pulling of arms is either costless or with a fixed cost. We propose two UCB based algorithms for the new setting. The first algorithm needs prior knowledge about the lower bound of the expected costs when computing the exploration term. The second algorithm eliminates this need by estimating the minimal expected costs from empirical observations, and therefore can be applied to more realworld applications where prior knowledge is not available. We prove that both algorithms have nice learning abilities, with regret bounds of O(ln B). Furthermore, we show that when applying our proposed algorithms to a previous setting with fixed costs (which can be regarded as our special case), one can improve the previously obtained regret bound. Our simulation results on realtime bidding in ad exchange verify the effectiveness of the algorithms and are consistent with our theoretical analysis.
Modeling Human Decisionmaking in Generalized Gaussian Multiarmed Bandits
, 2014
"... We present a formal model of human decisionmaking in exploreexploit tasks using the context of multiarmed bandit problems, where the decisionmaker must choose among multiple options with uncertain rewards. We address the standard multiarmed bandit problem, the multiarmed bandit problem with tr ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
We present a formal model of human decisionmaking in exploreexploit tasks using the context of multiarmed bandit problems, where the decisionmaker must choose among multiple options with uncertain rewards. We address the standard multiarmed bandit problem, the multiarmed bandit problem with transition costs, and the multiarmed bandit problem on graphs. We focus on the case of Gaussian rewards in a setting where the decisionmaker uses Bayesian inference to estimate the reward values. We model the decisionmaker’s prior knowledge with the Bayesian prior on the mean reward. We develop the upper credible limit (UCL) algorithm for the standard multiarmed bandit problem and show that this deterministic algorithm achieves logarithmic cumulative expected regret, which is optimal performance for uninformative priors. We show how good priors and good assumptions on the correlation structure among arms can greatly enhance decisionmaking performance, even over short time horizons. We extend to the stochastic UCL algorithm and draw several connections to human decisionmaking behavior. We present empirical data from human experiments and show that human performance is efficiently captured by the stochastic UCL algorithm with appropriate parameters. For the multiarmed bandit problem with transition costs and the multiarmed bandit problem on graphs, we generalize the UCL algorithm to the block UCL algorithm and the graphical block UCL algorithm, respectively. We show that these algorithms also achieve logarithmic cumulative expected regret and require a sublogarithmic expected number of transitions among arms. We further illustrate the performance of these algorithms with numerical examples.