Results 1 - 10
of
66
The greedy miser: Learning under test-time budgets
- In ICML
, 2012
"... As machine learning algorithms increasingly enter real-world settings, there is rising interest in controlling the cpu-cost during test-time. In industry, computational resources must be budgeted and costs must be strictly accounted for. At its very core, this problem is inherently a tradeoff betwee ..."
Abstract
-
Cited by 24 (8 self)
- Add to MetaCart
(Show Context)
As machine learning algorithms increasingly enter real-world settings, there is rising interest in controlling the cpu-cost during test-time. In industry, computational resources must be budgeted and costs must be strictly accounted for. At its very core, this problem is inherently a tradeoff between accuracy and test-time computation. Test-time computation consists of two components: 1. the actual running time of the algorithm; 2. the time required for feature extraction. The latter can vary drastically if the feature set is diverse. In this abstract, we propose a novel algorithm that explicitly considers the feature extraction cost during training. We first state the (non-continuous) global objective, which explicitly trades off feature cost and accuracy, and then relax it into a continuous loss function. Subsequently, we derive an update rule that shows the resulting loss lends itself naturally to greedy optimization with stage-wise regression [4]. The resulting learning algorithm is much simpler than any prior work, yet leads to superior test-time performance. Its accuracy matches that of the unconstrained baseline (with unlimited resources) while achieving an order of magnitude reduction of test-time cost. Cost-sensitive learning. We use gradient-boosting [4] to learn a classifier H(x) = ∑T t=1 βtht(x) to minimize some loss ℓ(H). Here, ht ∈ H where H is the set of all possible regression trees [1] of some limited
Parallel Boosted Regression Trees for Web Search Ranking
"... Gradient Boosted Regression Trees (GBRT) are the current state-of-the-art learning paradigm for machine learned websearch ranking — a domain notorious for very large data sets. In this paper, we propose a novel method for parallelizing the training of GBRT. Our technique parallelizes the constructio ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
(Show Context)
Gradient Boosted Regression Trees (GBRT) are the current state-of-the-art learning paradigm for machine learned websearch ranking — a domain notorious for very large data sets. In this paper, we propose a novel method for parallelizing the training of GBRT. Our technique parallelizes the construction of the individual regression trees and operates using the master-worker paradigm as follows. The data are partitioned among the workers. At each iteration, the worker summarizes its data-partition using histograms. The master processor uses these to build one layer of a regression tree, and then sends this layer to the workers, allowing the workers to build histograms for the next layer. Our algorithm carefully orchestrates overlap between communication and computation to achieve good performance.
Online structured prediction via coactive learning
, 2012
"... We propose Coactive Learning as a model of interaction between a learning system and a human user, where both have the common goal of providing results of maximum utility to the user. At each step, the system (e.g. search engine) receives a context (e.g. query) and predicts an object (e.g. ranking). ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
We propose Coactive Learning as a model of interaction between a learning system and a human user, where both have the common goal of providing results of maximum utility to the user. At each step, the system (e.g. search engine) receives a context (e.g. query) and predicts an object (e.g. ranking). The user responds by correcting the system if necessary, providing a slightly improved – but not necessarily optimal – object as feedback. We argue that such feedback can often beinferredfrom observableuser behavior, for example, from clicks in web-search. Evaluating predictions by their cardinal utility to the user, we propose efficient learning algorithms that have O ( 1 √ ) average regret, even T though the learning algorithm never observes cardinal utility values as in conventional online learning. We demonstrate the applicability of our model and learning algorithms on a movie recommendation task, as well as ranking for web-search.
Cost-Sensitive Tree of Classifiers
"... Recently, machine learning algorithms have successfully entered large-scale real-world industrial applications (e.g. search engines and email spam filters). Here, the CPU cost during testtime must be budgeted and accounted for. In this paper, we address the challenge of balancing the test-time cost ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
(Show Context)
Recently, machine learning algorithms have successfully entered large-scale real-world industrial applications (e.g. search engines and email spam filters). Here, the CPU cost during testtime must be budgeted and accounted for. In this paper, we address the challenge of balancing the test-time cost and the classifier accuracy in a principled fashion. The test-time cost of a classifier is often dominated by the computation required for feature extraction—which can vary drastically across features. We decrease this extraction time by constructing a tree of classifiers, through which test inputs traverse along individual paths. Each path extracts different features and is optimized for a specific sub-partition of the input space. By only computing features for inputs that benefit from them the most, our costsensitive tree of classifiers can match the high accuracies of the current state-of-the-art at a small fraction of the computational cost. 1.
E.: Integrating the Content and
- Process of Strategic MIS Planning with Competitive Strategy. Decision Sciences 22 (5
, 1991
"... We review here the recent success in quantum annealing, i.e., optimization of the cost or energy functions of complex systems utilizing quantum fluctuations. The concept is introduced in successive steps through the studies of mapping of such computationally hard problems to the classical spin glass ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
(Show Context)
We review here the recent success in quantum annealing, i.e., optimization of the cost or energy functions of complex systems utilizing quantum fluctuations. The concept is introduced in successive steps through the studies of mapping of such computationally hard problems to the classical spin glass problems. The quantum spin glass problems arise with the introduction of quantum
Learning Scoring Functions with Order-Preserving Losses and Standardized Supervision
"... We address the problem of designing surrogate losses for learning scoring functions in the context of label ranking. We extend to ranking problems a notion of orderpreserving losses previously introduced for multiclass classification, and show that these losses lead to consistent formulations with r ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
We address the problem of designing surrogate losses for learning scoring functions in the context of label ranking. We extend to ranking problems a notion of orderpreserving losses previously introduced for multiclass classification, and show that these losses lead to consistent formulations with respect to a family of ranking evaluation metrics. An order-preserving loss can be tailored for a given evaluation metric by appropriately setting some weights depending on this metric and the observed supervision. These weights, called the standard form of the supervision, do not always exist, but we show that previous consistency results for ranking were proved in special cases where they do. We then evaluate a new pairwise loss consistent with the (Normalized) Discounted Cumulative Gain on benchmark datasets. 1.
Top-N Recommendations from Implicit Feedback leveraging Linked Open Data
"... The advent of the Linked Open Data (LOD) initiative gave birth to a variety of open knowledge bases freely accessible on the Web. They provide a valuable source of information that can improve conventional recommender systems, if properly exploited. In this paper we present SPrank, a novel hybrid re ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
(Show Context)
The advent of the Linked Open Data (LOD) initiative gave birth to a variety of open knowledge bases freely accessible on the Web. They provide a valuable source of information that can improve conventional recommender systems, if properly exploited. In this paper we present SPrank, a novel hybrid recommendation algorithm able to compute top-N item recommendations from implicit feedback exploiting the information available in the so called Web of Data. We leverage DBpedia, a well-known knowledge base in the LOD compass, to extract semantic path-based features and to eventually compute recommendations using a learning to rank algorithm. Experiments with datasets on two different domains show that the proposed approach outperforms in terms of prediction accuracy several state-of-the-art top-N recommendation algorithms for implicit feedback in situations affected by different degrees of data sparsity. 1.
Learning to Suggest: A Machine Learning Framework for Ranking Query Suggestions
"... We consider the task of suggesting related queries to users after they issue their initial query to a web search engine. We propose a machine learning approach to learn the probability that a user may find a follow-up query both useful and relevant, given his initial query. Our approach is based on ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
We consider the task of suggesting related queries to users after they issue their initial query to a web search engine. We propose a machine learning approach to learn the probability that a user may find a follow-up query both useful and relevant, given his initial query. Our approach is based on a machine learning model which enables us to generalize to queries that have never occurred in the logs as well. The model is trained on co-occurrences mined from the search logs, with novel utility and relevance models, and the machine learning step is done without any labeled data by human judges. The learning step allows us to generalize from the past observations and generate query suggestions that are beyond the past co-occurred queries. This brings significant gains in coverage while yielding modest gains in relevance. Both offline (based on human judges) and online (based on millions of user interactions) evaluations demonstrate that our approach significantly outperforms strong baselines.
On the (Non-)existence of Convex, Calibrated Surrogate Losses for Ranking
"... We study surrogate losses for learning to rank, in a framework where the rankings are induced by scores and the task is to learn the scoring function. We focus on the calibration of surrogate losses with respect to a ranking evaluation metric, where the calibration is equivalent to the guarantee tha ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
(Show Context)
We study surrogate losses for learning to rank, in a framework where the rankings are induced by scores and the task is to learn the scoring function. We focus on the calibration of surrogate losses with respect to a ranking evaluation metric, where the calibration is equivalent to the guarantee that near-optimal values of the surrogate risk imply near-optimal values of the risk defined by the evaluation metric. We prove that if a surrogate loss is a convex function of the scores, then it is not calibrated with respect to two evaluation metrics widely used for search engine evaluation, namely the Average Precision and the Expected Reciprocal Rank. We also show that such convex surrogate losses cannot be calibrated with respect to the Pairwise Disagreement, an evaluation metric used when learning from pairwise preferences. Our results cast lights on the intrinsic difficulty of some ranking problems, as well as on the limitations of learning-to-rank algorithms based on the minimization of a convex surrogate risk. 1