Results 1  10
of
20
Combinatorial Bandits
"... We study sequential prediction problems in which, at each time instance, the forecaster chooses a binary vector from a certain fixed set S ⊆ {0, 1} d and suffers a loss that is the sum of the losses of those vector components that equal to one. The goal of the forecaster is to achieve that, in the l ..."
Abstract

Cited by 46 (6 self)
 Add to MetaCart
(Show Context)
We study sequential prediction problems in which, at each time instance, the forecaster chooses a binary vector from a certain fixed set S ⊆ {0, 1} d and suffers a loss that is the sum of the losses of those vector components that equal to one. The goal of the forecaster is to achieve that, in the long run, the accumulated loss is not much larger than that of the best possible vector in the class. We consider the “bandit ” setting in which the forecaster has only access to the losses of the chosen vectors. We introduce a new general forecaster achieving a regret bound that, for a variety of concrete choices of S, is of order √ nd ln S  where n is the time horizon. This is not improvable in general and is better than previously known bounds. We also point out that computationally efficient implementations for various interesting choices of S exist. 1
Selective sampling and active learning from single . . .
"... We present a new online learning algorithm in the selective sampling framework, where labels must be actively queried before they are revealed. We prove bounds on the regret of our algorithm and on the number of labels it queries when faced with an adaptive adversarial strategy of generating the ins ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
We present a new online learning algorithm in the selective sampling framework, where labels must be actively queried before they are revealed. We prove bounds on the regret of our algorithm and on the number of labels it queries when faced with an adaptive adversarial strategy of generating the instances. Our bounds both generalize and strictly improve over previous bounds in similar settings. Additionally, our selective sampling algorithm can be converted into an efficient statistical active learning algorithm. We extend our algorithm and analysis to the multipleteacher setting, where the algorithm can choose which subset of teachers to query for each label. Finally, we demonstrate the effectiveness of our techniques on a realworld Internet search problem.
Using More Data to Speedup Training Time
"... In many recent applications, data is plentiful. By now, we have a rather clear understanding of how more data can be used to improve the accuracy of learning algorithms. Recently, there has been a growing interest in understanding how more data can be leveraged to reduce the required training runtim ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
In many recent applications, data is plentiful. By now, we have a rather clear understanding of how more data can be used to improve the accuracy of learning algorithms. Recently, there has been a growing interest in understanding how more data can be leveraged to reduce the required training runtime. In this paper, we study the runtime of learning as a function of the number of available training examples, and underscore the main highlevel techniques. We provide the first formal positive result showing that even in the unrealizable case, the runtime can decrease exponentially while only requiring a polynomial growth of the number of examples. Our construction corresponds to a synthetic learning problem and an interesting open question is whether the tradeoff can be shown for more natural learning problems. We spell out several interesting candidates of natural learning problems for which we conjecture that there is a tradeoff between computational and sample complexity. 1
NEWTRON: an Efficient Bandit algorithm for Online Multiclass Prediction
"... We present an efficient algorithm for the problem of online multiclass prediction with bandit feedback in the fully adversarial setting. We measure its regret with respect to the logloss defined in [AR09], which is parameterized by a scalar α. We prove that the regret of NEWTRON is O(log T) when α ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
We present an efficient algorithm for the problem of online multiclass prediction with bandit feedback in the fully adversarial setting. We measure its regret with respect to the logloss defined in [AR09], which is parameterized by a scalar α. We prove that the regret of NEWTRON is O(log T) when α is a constant that does not vary with horizon T, and at most O(T 2/3) if α is allowed to increase to infinity with T. For α = O(log T), the regret is bounded by O ( √ T), thus solving the open problem of [KSST08, AR09]. Our algorithm is based on a novel application of the online Newton method [HAK07]. We test our algorithm and show it to perform well in experiments, even when α is a small constant. 1
On multilabel classification and ranking with partial feedback
 In NIPS
, 2012
"... We present a novel multilabel/ranking algorithm working in partial information settings. The algorithm is based on 2ndorder descent methods, and relies on upperconfidence bounds to tradeoff exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where covariates ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
We present a novel multilabel/ranking algorithm working in partial information settings. The algorithm is based on 2ndorder descent methods, and relies on upperconfidence bounds to tradeoff exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where covariates can be adversarial, but multilabel probabilities are ruled by (generalized) linear models. We show O(T 1/2 log T) regret bounds, which improve in several ways on the existing results. We test the effectiveness of our upperconfidence scheme by contrasting against fullinformation baselines on realworld multilabel datasets, often obtaining comparable performance. 1
Doubly Aggressive Selective Sampling Algorithms for Classification
"... Online selective sampling algorithms learn to perform binary classification, and additionally they decided whether to ask, or query, for a label of any given example. We introduce two stochastic linear algorithms and analyze them in the worstcase mistakebound framework. Even though stochastic, fo ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Online selective sampling algorithms learn to perform binary classification, and additionally they decided whether to ask, or query, for a label of any given example. We introduce two stochastic linear algorithms and analyze them in the worstcase mistakebound framework. Even though stochastic, for some inputs, our algorithms query with probability 1 and make an update even if there is no mistake, yet the margin is small, hence they are doubly aggressive. We prove bounds in the worstcase settings, which may be lower than previous bounds in some settings. Experiments with 33 document classification datasets, some with 100Ks examples, show the superiority of doublyaggressive algorithms both in performance and number of queries. 1
Upper Confidence Weighted Learning for Efficient Exploration in Multiclass Prediction with Binary Feedback
 PROCEEDINGS OF THE TWENTYTHIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE
, 2013
"... We introduce a novel algorithm called Upper Confidence Weighted Learning (UCWL) for online multiclass learning from binary feedback. UCWL combines the Upper Confidence Bound (UCB) framework with the Soft Confidence Weighted (SCW) online learning scheme. UCWL achieves state of the art performance (es ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We introduce a novel algorithm called Upper Confidence Weighted Learning (UCWL) for online multiclass learning from binary feedback. UCWL combines the Upper Confidence Bound (UCB) framework with the Soft Confidence Weighted (SCW) online learning scheme. UCWL achieves state of the art performance (especially on noisy and nonseparable data) with low computational costs. Estimated confidence intervals are used for informed exploration, which enables faster learning than the uninformed exploration case or the case where exploration is not used. The targeted application setting is humanrobot interaction (HRI), in which a robot is learning to classify its observations while a human teaches it by providing only binary feedback (e.g., right/wrong). Results in an HRI experiment, and with two benchmark datasets, show UCWL outperforms other algorithms in the online binary feedback setting, and surprisingly even sometimes beats stateoftheart algorithms that get full feedback, while UCWL gets only binary feedback on the same data.
AEfficient Interactive Multiclass Learning from Binary Feedback
"... We introduce a novel algorithm called Upper Confidence Weighted Learning (UCWL) for online multiclass learning from binary feedback (e.g., feedback that indicates whether the prediction was right or wrong). UCWL combines the Upper Confidence Bound (UCB) framework with the Soft Confidence Weighted (S ..."
Abstract
 Add to MetaCart
We introduce a novel algorithm called Upper Confidence Weighted Learning (UCWL) for online multiclass learning from binary feedback (e.g., feedback that indicates whether the prediction was right or wrong). UCWL combines the Upper Confidence Bound (UCB) framework with the Soft Confidence Weighted (SCW) online learning scheme. In UCB, each instance is classified using both score and uncertainty. For a given instance in the sequence, the algorithm might guess its class label primarily to reduce the class uncertainty. This is a form of informed exploration, which enables the performance to improve with lower sample complexity compared with the case without exploration. Combining UCB with SCW leads to the ability to deal well with noisy and nonseparable data, and stateoftheart performance is achieved without increasing the computational cost. A potential application setting is humanrobot interaction (HRI), where the robot is learning to classify some set of inputs while the human teaches it by providing only binary feedback—or sometimes even the wrong answer entirely. Experimental results in the HRI setting and with two benchmark datasets from other settings show that UCWL outperforms other stateoftheart algorithms in the online binary feedback setting—and surprisingly even sometimes outperforms stateoftheart algorithms that get full feedback (e.g., the true class label) while UCWL gets only binary feedback on the same data sequence.
Coactive Learning
"... We propose Coactive Learning as a model of interaction between a learning system and a human user, where both have the common goal of providing results of maximum utility to the user. Interactions in the Coactive Learning model take the following form: at each step, the system (e.g. search engine) r ..."
Abstract
 Add to MetaCart
We propose Coactive Learning as a model of interaction between a learning system and a human user, where both have the common goal of providing results of maximum utility to the user. Interactions in the Coactive Learning model take the following form: at each step, the system (e.g. search engine) receives a context (e.g. query) and predicts an object (e.g. ranking); the user responds by correcting the system if necessary, providing a slightly improved – but not necessarily optimal – object as feedback. We argue that such preference feedback can be inferred in large quantity from observable user behavior (e.g., clicks in web search), unlike the optimal feedback required in the expert model or the cardinal valuations required for bandit learning. Despite the relaxed requirements for the feedback, we show that it is possible to adapt many existing online learning algorithms to the coactive framework. In particular, we provide algorithms that achieve O(1/√T) average regret in terms of cardinal utility, even though the learning algorithm never observes cardinal utility values directly. We also provide an algorithm with O(log(T)/T) average regret in the case of λstrongly convex loss functions. An extensive empirical study demonstrates the applicability of our model and algorithms on a movie recommendation task, as well as ranking for web search. 1.
1Distributed Online Big Data Classification Using Context Information
"... Distributed, online data mining systems have emerged as a result of applications requiring analysis of large amounts of correlated and highdimensional data produced by multiple distributed data sources. We propose a distributed online data classification framework where data is gathered by distribu ..."
Abstract
 Add to MetaCart
(Show Context)
Distributed, online data mining systems have emerged as a result of applications requiring analysis of large amounts of correlated and highdimensional data produced by multiple distributed data sources. We propose a distributed online data classification framework where data is gathered by distributed data sources and processed by a heterogeneous set of distributed learners which learn online, at runtime, how to classify the different data streams either by using their locally available classification functions or by helping each other by classifying each other’s data. Importantly, since the data is gathered at different locations, sending the data to another learner to process incurs additional costs such as delays, and hence this will be only beneficial if the benefits obtained from a better classification will exceed the costs. We model the problem of joint classification by the distributed and heterogeneous learners from multiple data sources as a distributed contextual bandit problem where each data is characterized by a specific context. We develop a distributed online learning algorithm for which we can prove sublinear regret. Compared to prior work in distributed online data mining, our work is the first to provide analytic regret results characterizing the performance of the proposed algorithm. I.