Results 1 - 10
of
21
Improved second-order bounds for prediction with expert advice
- In COLT
, 2005
"... Abstract. This work studies external regret in sequential prediction games with both positive and negative payoffs. External regret measures the difference between the payoff obtained by the forecasting strategy and the payoff of the best action. In this setting, we derive new and sharper regret bou ..."
Abstract
-
Cited by 31 (6 self)
- Add to MetaCart
Abstract. This work studies external regret in sequential prediction games with both positive and negative payoffs. External regret measures the difference between the payoff obtained by the forecasting strategy and the payoff of the best action. In this setting, we derive new and sharper regret bounds for the well-known exponentially weighted average forecaster and for a new forecaster with a different multiplicative update rule. Our analysis has two main advantages: first, no preliminary knowledge about the payoff sequence is needed, not even its range; second, our bounds are expressed in terms of sums of squared payoffs, replacing larger firstorder quantities appearing in previous bounds. In addition, our most refined bounds have the natural and desirable property of being stable under rescalings and general translations of the payoff sequence. 1.
Knows What It Knows: A Framework For Self-Aware Learning
"... We introduce a learning framework that combines elements of the well-known PAC and mistake-bound models. The KWIK (knows what it knows) framework was designed particularly for its utility in learning settings where active exploration can impact the training examples the learner is exposed to, as is ..."
Abstract
-
Cited by 25 (14 self)
- Add to MetaCart
We introduce a learning framework that combines elements of the well-known PAC and mistake-bound models. The KWIK (knows what it knows) framework was designed particularly for its utility in learning settings where active exploration can impact the training examples the learner is exposed to, as is true in reinforcement-learning and active-learning problems. We catalog several KWIK-learnable classes and open problems. 1.
Regret minimization under partial monitoring
- MATHEMATICS OF OPERATIONS RESEARCH
, 2004
"... We consider repeated games in which the player, instead of observing the action chosen by the opponent in each game round, receives a feedback generated by the combined choice of the two players. We study Hannan consistent players for this games; that is, randomized playing strategies whose per-roun ..."
Abstract
-
Cited by 24 (5 self)
- Add to MetaCart
We consider repeated games in which the player, instead of observing the action chosen by the opponent in each game round, receives a feedback generated by the combined choice of the two players. We study Hannan consistent players for this games; that is, randomized playing strategies whose per-round regret vanishes with probability one as the number n of game rounds goes to infinity. We prove a general lower bound of Ω(n^−1/3) on the convergence rate of the regret, and exhibit a specific strategy that attains this rate on any game for which a Hannan consistent player exists.
An online algorithm for maximizing submodular functions
, 2007
"... We present an algorithm for solving a broad class of online resource allocation jobs arrive one at a time, and one can complete the jobs by investing time in a number of abstract activities, according to some schedule. We assume that the fraction of jobs completed by a schedule is a monotone, submod ..."
Abstract
-
Cited by 21 (8 self)
- Add to MetaCart
We present an algorithm for solving a broad class of online resource allocation jobs arrive one at a time, and one can complete the jobs by investing time in a number of abstract activities, according to some schedule. We assume that the fraction of jobs completed by a schedule is a monotone, submodular function of a set of pairs (v, τ), where τ is the time invested in activity v. Under this assumption, our online algorithm performs near-optimally according to two natural metrics: (i) the fraction of jobs completed within time T, for some fixed deadline T> 0, and (ii) the average time required to complete each job. We evaluate our algorithm experimentally by using it to learn, online, a schedule for allocating CPU time among solvers entered in the 2007 SAT solver competition. 1
Combining multiple heuristics online
- In Proceedings of the Twenty-Second Conference on Artificial Intelligence
, 2007
"... We present black-box techniques for learning how to interleave the execution of multiple heuristics in order to improve average-case performance. In our model, a user is given a set of heuristics whose only observable behavior is their running time. Each heuristic can compute a solution to any probl ..."
Abstract
-
Cited by 17 (6 self)
- Add to MetaCart
We present black-box techniques for learning how to interleave the execution of multiple heuristics in order to improve average-case performance. In our model, a user is given a set of heuristics whose only observable behavior is their running time. Each heuristic can compute a solution to any problem instance, but its running time varies across instances. The user solves each instance by interleaving runs of the heuristics according to a task-switching schedule. We present (i) exact and approximation algorithms for computing an optimal task-switching schedule offline, (ii) sample complexity bounds for learning a task-switching schedule from training data, and (iii) a no-regret strategy for selecting task-switching schedules online. We demonstrate the power of our results using data from recent solver competitions. We outline how to extend our results to the case in which the heuristics are randomized, and the user may periodically restart each heuristic with a fresh random seed.
Online Active Learning Methods for Fast Label-Efficient Spam Filtering
, 2007
"... Active learning methods seek to reduce the number of labeled examples needed to train an effective classifier, and have natural appeal in spam filtering applications where trustworthy labels for messages may be costly to acquire. Past investigations of active learning in spam filtering have focused ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
Active learning methods seek to reduce the number of labeled examples needed to train an effective classifier, and have natural appeal in spam filtering applications where trustworthy labels for messages may be costly to acquire. Past investigations of active learning in spam filtering have focused on the pool-based scenario, where there is assumed to be a large, unlabeled data set and the goal is to iteratively identify the best subset of examples for which to request labels. However, even with optimizations this is a costly approach. We investigate an online active learning scenario where the filter is exposed to a stream of messages which must be classified one at a time. The filter may only request a label for a given message immediately after it has been classified. The goal is to achieve strong online classification performance with few label requests. This is a novel scenario for low-cost active spam filtering, fitting for application in large-scale systems. We draw from the label efficient machine learning literature to investigate several approaches to selective sampling in this scenario using linear classifiers. We show that online active learning can dramatically reduce labeling and training costs with negligible additional overhead while maintaining high levels of classification performance.
Improved Rates for the Stochastic Continuum-Armed Bandit Problem
- In 20th Conference on Learning Theory (COLT
, 2007
"... Abstract. Considering one-dimensional continuum-armed bandit problems, we propose an improvement of an algorithm of Kleinberg and a new set of conditions which give rise to improved rates. In particular, we introduce a novel assumption that is complementary to the previous smoothness conditions, whi ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Abstract. Considering one-dimensional continuum-armed bandit problems, we propose an improvement of an algorithm of Kleinberg and a new set of conditions which give rise to improved rates. In particular, we introduce a novel assumption that is complementary to the previous smoothness conditions, while at the same time smoothness of the mean payoff function is required only at the maxima. Under these new assumptions new bounds on the expected regret are derived. In particular, we show that apart from logarithmic factors, the expected regret scales with the square-root of the number of trials, provided that the mean payoff function has finitely many maxima and its second derivatives are continuous and non-vanishing at the maxima. This improves a previous result of Cope by weakening the assumptions on the function. We also derive matching lower bounds. To complement the bounds on the expected regret, we provide high probability bounds which exhibit similar scaling. 1
Restart schedules for ensembles of problem instances
- In Proceedings of the Twenty-Second Conference on Artificial Intelligence
, 2007
"... The mean running time of a Las Vegas algorithm can often be dramatically reduced by periodically restarting it with a fresh random seed. The optimal restart schedule depends on the Las Vegas algorithm’s run length distribution, which in general is not known in advance and may differ across problem i ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
The mean running time of a Las Vegas algorithm can often be dramatically reduced by periodically restarting it with a fresh random seed. The optimal restart schedule depends on the Las Vegas algorithm’s run length distribution, which in general is not known in advance and may differ across problem instances. We consider the problem of selecting a single restart schedule to use in solving each instance in a set of instances. We present offline algorithms for computing an (approximately) optimal restart schedule given knowledge of each instance’s run length distribution, generalization bounds for learning a restart schedule from training data, and online algorithms for selecting a restart schedule adaptively as new problem instances are encountered.
The on-line shortest path problem under partial monitoring
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2007
"... The on-line shortest path problem is considered under partial monitoring scenarios. At each round, a decision maker has to choose a path between two distinguished vertices of a weighted directed acyclic graph whose edge weights can change in an arbitrary (adversarial) way such that the loss of the ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
The on-line shortest path problem is considered under partial monitoring scenarios. At each round, a decision maker has to choose a path between two distinguished vertices of a weighted directed acyclic graph whose edge weights can change in an arbitrary (adversarial) way such that the loss of the chosen path (defined as the sum of the weights of its composing edges) be small. In the multi-armed bandit setting, after choosing a path, the decision maker learns only the weights of those edges that belong to the chosen path. For this scenario, an algorithm is given whose average cumulative loss in n rounds exceeds that of the best path, matched off-line to the entire sequence of the edge weights, by a quantity that is proportional to 1 / √n and depends only polynomially on the number of edges of the graph. The algorithm can be implemented with linear complexity in the number of rounds n and in the number of edges. This result improves earlier bandit-algorithms which have performance bounds that either depend exponentially on the number of edges or converge to zero at a slower rate than O(1 / √n). An extension to the so-called label efficient setting is also given, where the decision maker is informed about the weight of the chosen path only with probability ɛ < 1. Applications to routing in packet switched networks along with simulation results are also presented.
FPL analysis for adaptive bandits
- In 3rd Symposium on Stochastic Algorithms, Foundations and Applications (SAGA’05
, 2005
"... www-alg.ist.hokudai.ac.jp / ∼jan A main problem of “Follow the Perturbed Leader ” strategies for online decision problems is that regret bounds are typically proven against oblivious adversary. In partial observation cases, it was not clear how to obtain performance guarantees against adaptive adver ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
www-alg.ist.hokudai.ac.jp / ∼jan A main problem of “Follow the Perturbed Leader ” strategies for online decision problems is that regret bounds are typically proven against oblivious adversary. In partial observation cases, it was not clear how to obtain performance guarantees against adaptive adversary, without worsening the bounds. We propose a conceptually simple argument to resolve this problem. Using this, a regret bound of O(t 2 3) for FPL in the adversarial multiarmed bandit problem is shown. This bound holds for the common FPL variant using only the observations from designated exploration rounds. Using all observations allows for the stronger bound of O ( √ t), matching the best bound known so far (and essentially the known lower bound) for adversarial bandits. Surprisingly, this variant does not even need explicit exploration, it is self-stabilizing. However the sampling probabilities have to be either externally provided or approximated to sufficient accuracy, using O(t2 log t) samples in each step.

