Citations
804 | Finite-time analysis of the multiarmed bandit problem
- Auer, Cesa-Bianchi, et al.
(Show Context)
Citation Context ...formation can help the search engine find a better allocation of the ads in future. Multi-Armed Bandit (MAB) problems effectively characterize such explore-exploit situations faced in sequential decision problems. The performance of any MAB algorithm is measured in terms of regret which is the expected loss of reward when compared to the optimal algorithm which pulls the best arms every time. The single play MAB (SPMAB) problem, where the decision maker pulls one arm every round, is a well studied problem. The UCB1 (Upper Confidence Bound) algorithm proposed by Auer, Cesa-Bianchi, and Fischer [1] has distribution independent regret of O (√ nT log T ) for this problem. However, UCB1 algorithm cannot be applied directly to design a regret minimizing SPMAB mechanism because the advertisers are strategic and may misreport their valuations. In fact, Babaioff, Sharma, and Slivkins [3] have shown that any ex-post truthful and normalized MABmechanism must suffer a regret of at least Ω ( n1/3T 2/3 ) . This increased dependence by a factor of T 1/6 on time horizon T is termed as the price of truthfulness. The multiple play MAB problems (MPMAB) more realistically capture the problem of SSA where... |
30 | Characterizing truthful multi-armed bandit mechanisms: extended abstract,”
- Babaioff, Sharma, et al.
- 2009
(Show Context)
Citation Context ...is the expected loss of reward when compared to the optimal algorithm which pulls the best arms every time. The single play MAB (SPMAB) problem, where the decision maker pulls one arm every round, is a well studied problem. The UCB1 (Upper Confidence Bound) algorithm proposed by Auer, Cesa-Bianchi, and Fischer [1] has distribution independent regret of O (√ nT log T ) for this problem. However, UCB1 algorithm cannot be applied directly to design a regret minimizing SPMAB mechanism because the advertisers are strategic and may misreport their valuations. In fact, Babaioff, Sharma, and Slivkins [3] have shown that any ex-post truthful and normalized MABmechanism must suffer a regret of at least Ω ( n1/3T 2/3 ) . This increased dependence by a factor of T 1/6 on time horizon T is termed as the price of truthfulness. The multiple play MAB problems (MPMAB) more realistically capture the problem of SSA where multiple slots are up for auction. Gatti, Lazaric, and Trovo [4] have come up with an MPMAB mechanism for multi-slot SSA. They have extended the notions of click probabilities from single slot to the multiple slot case and considered two types of externalities. The positiondependent ex... |
17 | Truthful mechanisms with implicit payment computation
- Babaioff, Kleinberg, et al.
(Show Context)
Citation Context ...ation rule having sublinear regret with time when the click through rates (CTR) of the advertisements (ads) are affected by ad-dependent externality or position-dependent externality. The above impossibility results motivate our second contribution: when the CTRs are affected by only position-dependent externality and follow click-precedence property, we design a novel ex-post truthful mechanism for multi-slot SSAs with sublinear regret. The ex-post monotone allocation rule in the proposed mechanism non-trivially generalizes the NewCB allocation rule presented by Babaioff, Sharma, and Slivkins[2]. We derive regret bounds for this allocation rule. When a strong property such as ex-post truthfulness is required, our allocation rule performs as well as the A-VCG mechanism presented by Gatti, Lazaric, and Trovo[4] and in the special case of identical slots, our allocation rule in fact outperforms the A-VCG mechanism and has a regret of O (√ T ) with time. Categories and Subject Descriptors 500 [Information systems]: Sponsored search advertising General Terms Economics, Algorithms Keywords Mechanism Design; Sponsored Search; Multi-Armed Bandit 1. INTRODUCTION Sponsored search auction (SSA... |
9 | A truthful learning mechanism for contextual multi–slot sponsored search auctions with externalities.
- Gatti, Lazaric, et al.
- 2012
(Show Context)
Citation Context ...ivate our second contribution: when the CTRs are affected by only position-dependent externality and follow click-precedence property, we design a novel ex-post truthful mechanism for multi-slot SSAs with sublinear regret. The ex-post monotone allocation rule in the proposed mechanism non-trivially generalizes the NewCB allocation rule presented by Babaioff, Sharma, and Slivkins[2]. We derive regret bounds for this allocation rule. When a strong property such as ex-post truthfulness is required, our allocation rule performs as well as the A-VCG mechanism presented by Gatti, Lazaric, and Trovo[4] and in the special case of identical slots, our allocation rule in fact outperforms the A-VCG mechanism and has a regret of O (√ T ) with time. Categories and Subject Descriptors 500 [Information systems]: Sponsored search advertising General Terms Economics, Algorithms Keywords Mechanism Design; Sponsored Search; Multi-Armed Bandit 1. INTRODUCTION Sponsored search auction (SSA) provides an environment where a mechanism design problem is inherently coupled with a learning problem. As a first step, the auctioneer tries to elicit the true valuations of the advertisers to design a truthful searc... |