Results 1 - 10
of
32
Regret minimization under partial monitoring
- MATHEMATICS OF OPERATIONS RESEARCH
, 2004
"... We consider repeated games in which the player, instead of observing the action chosen by the opponent in each game round, receives a feedback generated by the combined choice of the two players. We study Hannan consistent players for this games; that is, randomized playing strategies whose per-roun ..."
Abstract
-
Cited by 24 (5 self)
- Add to MetaCart
We consider repeated games in which the player, instead of observing the action chosen by the opponent in each game round, receives a feedback generated by the combined choice of the two players. We study Hannan consistent players for this games; that is, randomized playing strategies whose per-round regret vanishes with probability one as the number n of game rounds goes to infinity. We prove a general lower bound of Ω(n^−1/3) on the convergence rate of the regret, and exhibit a specific strategy that attains this rate on any game for which a Hannan consistent player exists.
Regret bounds for sleeping experts and bandits
- In 21st COLT
, 2008
"... We study on-line decision problems where the set of actions that are available to the decision algorithm vary over time. With a few notable exceptions, such problems remained largely unaddressed in the literature, despite their applicability to a large number of practical problems. Departing from pr ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
We study on-line decision problems where the set of actions that are available to the decision algorithm vary over time. With a few notable exceptions, such problems remained largely unaddressed in the literature, despite their applicability to a large number of practical problems. Departing from previous work on this “Sleeping Experts” problem, we compare algorithms against the payoff obtained by the best ordering of the actions, which is a natural benchmark for this type of problem. We study both the full-information (best expert) and partial-information (multi-armed bandit) settings and consider both stochastic and adaptive adversaries. For all settings we give algorithms achieving (almost) information-theoretically optimal regret bounds (up to a constant or a sublogarithmic factor) with respect to the best-ordering benchmark. 1
DSybil: Optimal Sybil-Resistance for Recommendation Systems
, 2009
"... Recommendation systems can be attacked in various ways, and the ultimate attack form is reached with a sybil attack, where the attacker creates a potentially unlimited number of sybil identities to vote. Defending against sybil attacks is often quite challenging, and the nature of recommendation sys ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
Recommendation systems can be attacked in various ways, and the ultimate attack form is reached with a sybil attack, where the attacker creates a potentially unlimited number of sybil identities to vote. Defending against sybil attacks is often quite challenging, and the nature of recommendation systems makes it even harder. This paper presents DSybil, a novel defense for diminishing the influence of sybil identities in recommendation systems. DSybil provides strong provable guarantees that hold even under the worst-case attack and are optimal. DSybil can defend against an unlimited number of sybil identities over time. DSybil achieves its strong guarantees by i) exploiting the heavy-tail distribution of the typical voting behavior of the honest identities, and ii) carefully identifying whether the system is already getting “enough help ” from the (weighted) voters already taken into account or whether more “help ” is needed. Our evaluation shows that DSybil would continue to provide high-quality recommendations even when a millionnode botnet uses an optimal strategy to launch a sybil attack. 1.
No-regret learning in convex games
, 2007
"... Quite a bit is known about minimizing different kinds of regret in experts problems, and how these regret types relate to types of equilibria in the multiagent setting of repeated matrix games. Much less is known about the possible kinds of regret in online convex programming problems (OCPs), or abo ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
Quite a bit is known about minimizing different kinds of regret in experts problems, and how these regret types relate to types of equilibria in the multiagent setting of repeated matrix games. Much less is known about the possible kinds of regret in online convex programming problems (OCPs), or about equilibria in the analogous multiagent setting of repeated convex games. This gap is unfortunate, since convex games are much more expressive than matrix games, and since many important machine learning problems can be expressed as OCPs. In this paper, we work to close this gap: we analyze a spectrum of regret types which lie between external and swap regret, along with their corresponding equilibria, which lie between coarse correlated and correlated equilibrium. We also analyze algorithms for minimizing these regret types. As examples of our framework, we derive algorithms for learning correlated equilibria in polyhedral convex games and extensive-form correlated equilibria in extensive-form games. The former is exponentially more efficient than previous algorithms, and the latter is the first of its type. 1.
The communication complexity of uncoupled Nash equilibrium procedures
- Games and Economic Behavior
, 2006
"... We study the question of how long it takes players to reach a Nash equilibrium in uncoupled setups, where each player initially knows only his own payoff function. We derive lower bounds on the communication complexity of reaching a Nash equilibrium, i.e., on the number of bits that need to be trans ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
We study the question of how long it takes players to reach a Nash equilibrium in uncoupled setups, where each player initially knows only his own payoff function. We derive lower bounds on the communication complexity of reaching a Nash equilibrium, i.e., on the number of bits that need to be transmitted, and thus also on the required number of steps. Specifically, we show lower bounds that are exponential in the number of players in each one of the following cases: (1) reaching a pure Nash equilibrium; (2) reaching a pure Nash equilibrium in a Bayesian setting; and (3) reaching a mixed Nash equilibrium. We then show that, in contrast, the communication complexity of reaching a correlated equilibrium is polynomial in the number of players.
New Techniques for Algorithm Portfolio Design
"... We present and evaluate new techniques for designing algorithm portfolios. In our view, the problem has both a scheduling aspect and a machine learning aspect. Prior work has largely addressed one of the two aspects in isolation. Building on recent work on the scheduling aspect of the problem, we pr ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We present and evaluate new techniques for designing algorithm portfolios. In our view, the problem has both a scheduling aspect and a machine learning aspect. Prior work has largely addressed one of the two aspects in isolation. Building on recent work on the scheduling aspect of the problem, we present a technique that addresses both aspects simultaneously and has attractive theoretical guarantees. Experimentally, we show that this technique can be used to improve the performance of state-of-the-art algorithms for Boolean satisfiability, zero-one integer programming, and A.I. planning. 1
Using Online Algorithms to Solve NP-Hard Problems More Efficiently in Practice
, 2007
"... as representing the official policies of the U.S. Government. ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
as representing the official policies of the U.S. Government.
Tracking Dynamic Sources of Malicious Activity at Internet-Scale
"... We formulate and address the problem of discovering dynamic malicious regions on the Internet. We model this problem as one of adaptively pruning a known decision tree, but with additional challenges: (1) severe space requirements, since the underlying decision tree has over 4 billion leaves, and (2 ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
We formulate and address the problem of discovering dynamic malicious regions on the Internet. We model this problem as one of adaptively pruning a known decision tree, but with additional challenges: (1) severe space requirements, since the underlying decision tree has over 4 billion leaves, and (2) a changing target function, since malicious activity on the Internet is dynamic. We present a novel algorithm that addresses this problem, by putting together a number of different “experts ” algorithms and online paging algorithms. We prove guarantees on our algorithm’s performance as a function of the best possible pruning of a similar size, and our experiments show that our algorithm achieves high accuracy on large real-world data sets, with significant improvements over existing approaches. 1
Mortal Multi-Armed Bandits
"... We formulate and study a new variant of the k-armed bandit problem, motivated by e-commerce applications. In our model, arms have (stochastic) lifetime after which they expire. In this setting an algorithm needs to continuously explore new arms, in contrast to the standard k-armed bandit model in wh ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We formulate and study a new variant of the k-armed bandit problem, motivated by e-commerce applications. In our model, arms have (stochastic) lifetime after which they expire. In this setting an algorithm needs to continuously explore new arms, in contrast to the standard k-armed bandit model in which arms are available indefinitely and exploration is reduced once an optimal arm is identified with nearcertainty. The main motivation for our setting is online-advertising, where ads have limited lifetime due to, for example, the nature of their content and their campaign budgets. An algorithm needs to choose among a large collection of ads, more than can be fully explored within the typical ad lifetime. We present an optimal algorithm for the state-aware (deterministic reward function) case, and build on this technique to obtain an algorithm for the state-oblivious (stochastic reward function) case. Empirical studies on various reward distributions, including one derived from a real-world ad serving application, show that the proposed algorithms significantly outperform the standard multi-armed bandit approaches applied to these settings. 1

