#### DMCA

## A Contextual-Bandit Approach to Personalized News Article Recommendation

### Cached

### Download Links

Citations: | 178 - 16 self |

### Citations

1936 | Information Theory, Inference and Learning Algorithms.
- MacKay
- 2003
(Show Context)
Citation Context ... model, the predictive variance of the expected payoff x ⊤ t,aθ ∗ a is evaluated as x ⊤ t,aA −1 a xt,a, and then q x⊤ t,aA −1 a xt,a becomes the standard deviation. Furthermore, in information theory =-=[19]-=-, the differential entropy of p(θa) is defined as − 1 2 ln((2π)d detAa). The entropy of p(θa) when updated by the inclusion of the new point xt,a then becomes − 1 2 ln((2π)d det (Aa + xt,ax ⊤ t,a)). T... |

817 | Finite-time analysis of the multi-armed bandit problem.
- Auer, Cesa-Bianchi, et al.
- 2002
(Show Context)
Citation Context ...ing traffic. This strategy, with random exploration on an ǫ fraction of the traffic and greedy exploitation on the rest, is known as ǫ-greedy. Advanced exploration approaches such as EXP3 [8] or UCB1 =-=[7]-=- could be applied as well. Intuitively, we need to distribute more traffic to new content to learn its value more quickly, and fewer users to track temporal changes of existing content. Recently, pers... |

509 |
Asymptotically efficient adaptive allocation rules.
- Lai, Robbins
- 1985
(Show Context)
Citation Context ...ret, RA(T)/T , converges to 0 with probability 1. In contrast to the unguided exploration strategy adopted by ǫgreedy, another class of algorithms generally known as upper confidence bound algorithms =-=[4, 7, 17]-=- use a smarter way to balance exploration and exploitation. Specifically, in trial t, these algorithms estimate both the mean payoff ˆµt,a of each arm a as well as a corresponding confidence interval ... |

491 | The nonstochastic multiarmed bandit problem.
- Auer, Cesa-Bianchi, et al.
- 2002
(Show Context)
Citation Context ...n the remaining traffic. This strategy, with random exploration on an ǫ fraction of the traffic and greedy exploitation on the rest, is known as ǫ-greedy. Advanced exploration approaches such as EXP3 =-=[8]-=- or UCB1 [7] could be applied as well. Intuitively, we need to distribute more traffic to new content to learn its value more quickly, and fewer users to track temporal changes of existing content. Re... |

475 | Some Aspects of the Sequential Design of Experiments.
- Robbins
- 1952
(Show Context)
Citation Context ...er a purely exploring nor a purely exploiting algorithm works best in general, and a good tradeoff is needed. The context-free K-armed bandit problem has been studied by statisticians for a long time =-=[9, 24, 26]-=-. One of the simplest and most straightforward algorithms is ǫ-greedy. In each trial t, this algorithm first estimates the average payoff ˆµt,a of each arm a. Then, with probability 1 − ǫ, it chooses ... |

319 |
Bandit problems: Sequential allocation of experiments.
- Berry, Fristedt
- 1985
(Show Context)
Citation Context ...er a purely exploring nor a purely exploiting algorithm works best in general, and a good tradeoff is needed. The context-free K-armed bandit problem has been studied by statisticians for a long time =-=[9, 24, 26]-=-. One of the simplest and most straightforward algorithms is ǫ-greedy. In each trial t, this algorithm first estimates the average payoff ˆµt,a of each arm a. Then, with probability 1 − ǫ, it chooses ... |

298 | Recommender systems in e-commerce.
- Schafer, Konstan, et al.
- 1999
(Show Context)
Citation Context ...ent-based filtering and hybrid approaches, can provide meaningful recommendations at an individual level by leveraging users’ interests as demonstrated by their past activity. Collaborative filtering =-=[25]-=-, by recognizing similarities across users based on their consumption history, provides a good recommendation solution to the scenarios where overlap in historical consumption across users is relative... |

278 | Google news personalization: scalable online collaborative filtering.
- Das, Datar, et al.
- 2007
(Show Context)
Citation Context ...l 26–30, 2010, Raleigh, North Carolina, USA. ACM 978-1-60558-799-8/10/04. service vendors acquire and maintain a large amount of content in their repository, for instance, for filtering news articles =-=[14]-=- or for the display of advertisements [5]. Moreover, the content of such a web-service repository changes dynamically, undergoing frequent insertions and deletions. In such a setting, it is crucial to... |

269 | Bandit processes and dynamic allocation indices.
- Gittins
- 1979
(Show Context)
Citation Context ...s has a regret of Õ(√T), a significant improvement over earlier algorithms [1]. Finally, we note that there exist another class of bandit algorithms based on Bayes rule, such as Gittins index methods =-=[15]-=-. With appropriately defined prior distributions, Bayesian approaches may have good performance. These methods require extensive offline engineering to obtain good prior models, and are often computat... |

182 | Using Confidence Bounds for Exploitation-exploration Trade-offs.
- Auer
- 2003
(Show Context)
Citation Context ...ee of Õ(T 2/3 ). Algorithms with stronger regret guarantees may be designed under various modeling assumptions about the bandit. Assuming the expected payoff of an arm is linear in its features, Auer =-=[6]-=- describes the LinRel algorithm that is essentially a UCB-type approach and shows that one of its variants has a regret of Õ(√T), a significant improvement over earlier algorithms [1]. Finally, we not... |

168 |
On the likelihood that one unknown probability exceeds another in view of the evidence of two samples.
- Thompson
- 1933
(Show Context)
Citation Context ...er a purely exploring nor a purely exploiting algorithm works best in general, and a good tradeoff is needed. The context-free K-armed bandit problem has been studied by statisticians for a long time =-=[9, 24, 26]-=-. One of the simplest and most straightforward algorithms is ǫ-greedy. In each trial t, this algorithm first estimates the average payoff ˆµt,a of each arm a. Then, with probability 1 − ǫ, it chooses ... |

126 |
Sample Mean Based Index Policies with O(log n) Regret for the Multi-Armed Bandit Problem. Advances in Applied Probability,
- Agrawal
- 1995
(Show Context)
Citation Context ...ret, RA(T)/T , converges to 0 with probability 1. In contrast to the unguided exploration strategy adopted by ǫgreedy, another class of algorithms generally known as upper confidence bound algorithms =-=[4, 7, 17]-=- use a smarter way to balance exploration and exploitation. Specifically, in trial t, these algorithms estimate both the mean payoff ˆµt,a of each arm a as well as a corresponding confidence interval ... |

90 | Text-learning and related intelligent agents: A survey.
- Mladenic
- 1999
(Show Context)
Citation Context ...tic. Contentbased filtering helps to identify new items which well match anexisting user’s consumption profile, but the recommended items are always similar to the items previously taken by the user =-=[20]-=-. Hybrid approaches [11] have been developed by combining two or more recommendation techniques; for example, the inability of collaborative filtering to recommend new items is commonly alleviated by ... |

84 | The epoch-greedy algorithm for contextual multi-armed bandits.
- Langford, Zhang
- 2008
(Show Context)
Citation Context ...2.1 A Multi-armed Bandit Formulation The problem of personalized news article recommendation can be naturally modeled as a multi-armed bandit problem with context information. Following previous work =-=[18]-=-, we call it a contextual bandit. 1 Formally, a contextual-bandit algorithm A proceeds in discrete trials t = 1, 2,3, . . . In trial t: 1. The algorithm observes the current user ut and a set At of ar... |

73 | Eligibility Traces for Off-Policy Policy Evaluation.
- Precup, Sutton, et al.
- 2000
(Show Context)
Citation Context ... clear how to evaluate π based only on such logged data. This evaluation problem may be viewed as a special case of the so-called “off-policy evaluation problem” in reinforcement learning (see, c.f., =-=[23]-=-). One solution is to build a simulator to model the bandit process from the logged data, and then evaluate π with the simulator. However, the modeling step will introduce bias in the simulator and so... |

54 |
Nejdl (Eds.), The Adaptive Web,
- Brusilovsky, Kobsa, et al.
- 2007
(Show Context)
Citation Context ...s of existing content. Recently, personalized recommendation has become a desirable feature for websites to improve user satisfaction by tailoring content presentation to suit individual users’ needs =-=[10]-=-. Personalization involves a process of gathering and storing user attributes, managing content assets, and, based on an analysis of current and past users’ behavior, delivering the individually best ... |

54 | Personalized Recommendation on Dynamic Content Using Predictive Bilinear Models,”
- Chu, Park
- 2009
(Show Context)
Citation Context ...orical consumption record whatsoever; this is known as a cold-start situation [21]. These issues make traditional recommender-system approaches difficult to apply, as shown by prior empirical studies =-=[12]-=-. It thus becomes indispensable to learn the goodness of match between user interests and content when one or both of them are new. However, acquiring such information can be expensive and may reduce ... |

46 | Efficient bandit algorithms for online multiclass prediction. - Kakade, Shalev-Shwartz, et al. - 2008 |

32 | Online models for content optimization
- Agarwal, Chen, et al.
- 2008
(Show Context)
Citation Context ...lely on content information. In practice, we usually explore the unknown by collecting consumers’ feedback in real time to evaluate the popularity of new content while monitoring changes in its value =-=[3]-=-. For instance, a small amount of traffic can be designated for such exploration. Based on the users’ response (such as clicks) to randomly selected content on this small slice of traffic, the most po... |

30 | Näıve filterbots for robust cold-start recommendations
- Park, Pennock, et al.
- 2006
(Show Context)
Citation Context ...arity changing over time as well. Furthermore, a significant number of visitors are likely to be entirely new with no historical consumption record whatsoever; this is known as a cold-start situation =-=[21]-=-. These issues make traditional recommender-system approaches difficult to apply, as shown by prior empirical studies [12]. It thus becomes indispensable to learn the goodness of match between user in... |

28 |
Exploring compact reinforcement-learning representations with linear regression.
- Walsh, Szita, et al.
- 2009
(Show Context)
Citation Context ...imate of the coefficients: ˆθa = (D ⊤ a Da + Id) −1 D ⊤ a ca, (3) where Id is the d × d identity matrix. When components in ca are independent conditioned on corresponding rows in Da, it can be shown =-=[27]-=- that, with probability at least 1 − δ, ˛ ˛x ⊤ t,a ˆ q θa − E[rt,a|xt,a] ˛ ≤ α x⊤ t,a(D⊤ a Da + Id) −1xt,a (4) for any δ > 0 and xt,a ∈ R d , where α = 1 + p ln(2/δ)/2 is a constant. In other words, t... |

23 |
Explore/exploit schemes for web content optimization.
- Agarwal, Chen, et al.
- 2009
(Show Context)
Citation Context ...es may have good performance. These methods require extensive offline engineering to obtain good prior models, and are often computationally prohibitive without coupling with approximation techniques =-=[2]-=-. 3. ALGORITHM Given asymptotic optimality and the strong regret bound of UCB methods for context-free bandit algorithms, it is tempting to devise similar algorithms for contextual bandit problems. Gi... |

21 | Reinforcement learning with immediate rewards and linear hypotheses
- Abe, Biermann, et al.
(Show Context)
Citation Context ...s features, Auer [6] describes the LinRel algorithm that is essentially a UCB-type approach and shows that one of its variants has a regret of Õ(√T), a significant improvement over earlier algorithms =-=[1]-=-. Finally, we note that there exist another class of bandit algorithms based on Bayes rule, such as Gittins index methods [15]. With appropriately defined prior distributions, Bayesian approaches may ... |

14 | BJust-in-time contextual advertising,[ in
- Anagnostopoulos, Border, et al.
- 2007
(Show Context)
Citation Context ...A. ACM 978-1-60558-799-8/10/04. service vendors acquire and maintain a large amount of content in their repository, for instance, for filtering news articles [14] or for the display of advertisements =-=[5]-=-. Moreover, the content of such a web-service repository changes dynamically, undergoing frequent insertions and deletions. In such a setting, it is crucial to quickly identify interesting content for... |

12 | Efficient bandit algorithms for online multiclass prediction
- Kakade, Shalev-Shwartz, et al.
- 2008
(Show Context)
Citation Context ... pool is large. In the future, we plan to investigate bandit approaches to other similar web-based serviced such as online advertising, and compare our algorithms to related methods such as Banditron =-=[16]-=-. A second direction is to extend the bandit formulation and algorithms in which an “arm” may refer to a complex object rather than an item (like an article). An example is ranking, where an arm corre... |

9 |
Hybrid systems for personalized recommendations. In
- Burke
- 2005
(Show Context)
Citation Context ...ing helps to identify new items which well match anexisting user’s consumption profile, but the recommended items are always similar to the items previously taken by the user [20]. Hybrid approaches =-=[11]-=- have been developed by combining two or more recommendation techniques; for example, the inability of collaborative filtering to recommend new items is commonly alleviated by combining it with conten... |

9 | A case study of behavior-driven conjoint analysis on yahoo!: front page today module
- Chu, Park, et al.
- 2009
(Show Context)
Citation Context ...nd capture nonlinearity in these raw features, we carried out conjoint analysis based on random exploration data collected in September 2008. Following a previous approach to dimensionality reduction =-=[13]-=-, we projected user features onto article categories and then clustered users with similar preferences into groups. More specifically: • We first used logistic regression (LR) to fit a bilinear model ... |

8 |
Simulation Studies of Multi-armed Bandits with Covariates.
- Pavlidis, Tasoulis, et al.
- 2008
(Show Context)
Citation Context ...t,at 14: end for Finally, we note that, under the assumption that input features xt,a were drawn i.i.d. from a normal distribution (in addition to the modeling assumption in Eq. (2)), Pavlidis et al. =-=[22]-=- came up with a similar algorithm that uses a least-squares solution ˜ θa instead of our ridge-regression solution ( ˆ θa in Eq. (3)) to compute the UCB. However, our approach (and theoretical analysi... |