Results 1  10
of
53
Logistic Model Trees
, 2006
"... Tree induction methods and linear models are popular techniques for supervised learning tasks, both for the prediction of nominal classes and numeric values. For predicting numeric quantities, there has been work on combining these two schemes into ‘model trees’, i.e. trees that contain linear regr ..."
Abstract

Cited by 154 (2 self)
 Add to MetaCart
Tree induction methods and linear models are popular techniques for supervised learning tasks, both for the prediction of nominal classes and numeric values. For predicting numeric quantities, there has been work on combining these two schemes into ‘model trees’, i.e. trees that contain linear regression functions at the leaves. In this paper, we present an algorithm that adapts this idea for classification problems, using logistic regression instead of linear regression. We use a stagewise fitting process to construct the logistic regression models that can select relevant attributes in the data in a natural way, and show how this approach can be used to build the logistic regression models at the leaves by incrementally refining those constructed at higher levels in the tree. We compare the performance of our algorithm to several other stateoftheart learning schemes on 36 benchmark UCI datasets, and show that it produces accurate and compact classifiers.
Top 10 algorithms in data mining
, 2007
"... Abstract This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, kMeans, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. These top 10 algorithms are among the most influential data mining a ..."
Abstract

Cited by 113 (2 self)
 Add to MetaCart
Abstract This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, kMeans, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. These top 10 algorithms are among the most influential data mining algorithms in the research community. With each algorithm, we provide a description of the algorithm, discuss the impact of the algorithm, and review current and further research on the algorithm. These 10 algorithms cover classification,
Methods and uncertainties in bioclimatic envelope modelling under climate change. Prog
 Phys. Geogr
, 2006
"... Abstract: Potential impacts of projected climate change on biodiversity are often assessed using singlespecies bioclimatic ‘envelope ’ models. Such models are a special case of species distribution models in which the current geographical distribution of species is related to climatic variables so ..."
Abstract

Cited by 65 (3 self)
 Add to MetaCart
Abstract: Potential impacts of projected climate change on biodiversity are often assessed using singlespecies bioclimatic ‘envelope ’ models. Such models are a special case of species distribution models in which the current geographical distribution of species is related to climatic variables so to enable projections of distributions under future climate change scenarios. This work reviews a number of critical methodological issues that may lead to uncertainty in predictions from bioclimatic modelling. Particular attention is paid to recent developments of bioclimatic modelling that address some of these issues as well as to the topics where more progress needs to be made. Developing and applying bioclimatic models in a informative way requires good understanding of a wide range of methodologies, including the choice of modelling technique, model validation, collinearity, autocorrelation, biased sampling of explanatory variables, scaling and impacts of nonclimatic factors. A key challenge for future research is integrating factors such as land cover, direct CO2 effects, biotic interactions and dispersal mechanisms into speciesclimate models. We conclude that, although bioclimatic envelope models have a number of important advantages, they need to be applied only when users of models have a thorough understanding of their limitations and uncertainties.
Optimal prediction pools
 Working Paper No.1017, European Central
, 2009
"... Abstract: A prediction model is any statement of a probability distribution for an outcome not yet observed. This study considers the properties of weighted linear combinations of n prediction models, or linear pools, evaluated using the conventional log predictive scoring rule. The log score is a c ..."
Abstract

Cited by 40 (3 self)
 Add to MetaCart
Abstract: A prediction model is any statement of a probability distribution for an outcome not yet observed. This study considers the properties of weighted linear combinations of n prediction models, or linear pools, evaluated using the conventional log predictive scoring rule. The log score is a concave function of the weights, and, in general, an optimal linear combination will include several models with positive weights despite the fact that exactly one model has limiting posterior probability one. The paper derives several interesting formal results: For example, a prediction model with positive weight in a pool may have zero weight if some other models are deleted from that pool. The results are illustrated using S&P 500 returns with prediction models from the ARCH, stochastic volatility, and Markov mixture families. In this example models that are clearly inferior by the usual scoring criteria have positive weights in optimal linear pools, and these pools substantially outperform their best components.
Variance and bias for general loss functions
"... Abstract. When using squared error loss, bias and variance and their decomposition of prediction error are well understood and widely used concepts. However, there is no universally accepted de nition for other loss functions. Numerous attempts have been made to extend these concepts beyond squared ..."
Abstract

Cited by 36 (0 self)
 Add to MetaCart
Abstract. When using squared error loss, bias and variance and their decomposition of prediction error are well understood and widely used concepts. However, there is no universally accepted de nition for other loss functions. Numerous attempts have been made to extend these concepts beyond squared error loss. Most approaches have focused solely on 01 loss functions and have produced signi cantly di erent de nitions. These di erences stem from disagreement as to the essential characteristics thatvariance and bias should display. This paper suggests an explicit list of rules that we feelany \reasonable " set of de nitions should satisfy. Using this framework, bias and variance de nitions are produced which generalize to any symmetric loss function. We illustrate these statistics on several loss functions with particular emphasis on 01 loss. We conclude with a discussion of the various de nitions that have been proposed in the past as well as a method for estimating these quantities on real data sets.
A CLUE for CLUster Ensembles
 Journal of Statistical Software
, 2005
"... Cluster ensembles are collections of individual solutions to a given clustering problem which are useful or necessary to consider in a wide range of applications. The R package clue provides an extensible computational environment for creating and analyzing cluster ensembles, with basic data structu ..."
Abstract

Cited by 31 (7 self)
 Add to MetaCart
Cluster ensembles are collections of individual solutions to a given clustering problem which are useful or necessary to consider in a wide range of applications. The R package clue provides an extensible computational environment for creating and analyzing cluster ensembles, with basic data structures for representing partitions and hierarchies, and facilities for computing on these, including methods for measuring proximity and obtaining consensus and “secondary ” clusterings. 1
Weighted knearestneighbor techniques and ordinal classification
 Discussion Paper 399, SFB 386
, 2006
"... In the field of statistical discrimination knearest neighbor classification is a wellknown, easy and successful method. In this paper we present an extended version of this technique, where the distances of the nearest neighbors can be taken into account. In this sense there is a close connection ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
In the field of statistical discrimination knearest neighbor classification is a wellknown, easy and successful method. In this paper we present an extended version of this technique, where the distances of the nearest neighbors can be taken into account. In this sense there is a close connection to LOESS, a local regression technique. In addition we show possibilities to use nearest neighbor for classification in the case of an ordinal class structure. Empirical studies show the advantages of the new techniques.
Generalized boosted models: A guide to the gbm package
, 2005
"... Boosting takes on various forms with different programs using different loss functions, different base models, and different optimization schemes. The gbm package takes the approach described in [2] and [3]. Some of the terminology differs, mostly due to an effort to cast boosting terms into more st ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
(Show Context)
Boosting takes on various forms with different programs using different loss functions, different base models, and different optimization schemes. The gbm package takes the approach described in [2] and [3]. Some of the terminology differs, mostly due to an effort to cast boosting terms into more standard statistical terminology (e.g. deviance). In addition, the gbm package implements boosting for models commonly used in statistics but not commonly associated with boosting. The Cox proportional hazard model, for example, is an incredibly useful model and the boosting framework applies quite readily with only slight modification [5]. Also some algorithms implemented in the gbm package differ from the standard implementation. The AdaBoost algorithm [1] has a particular loss function and a particular optimization algorithm associated with it. The gbm implementation of AdaBoost adopts AdaBoost’s exponential loss function (its bound on misclassification rate) but uses Friedman’s gradient descent algorithm rather than the original one proposed. So the main purposes of this document is to spell out in detail what the gbm package implements. 1 Gradient boosting This section essentially presents the derivation of boosting described in [2]. The gbm package also adopts the stochastic gradient boosting strategy, a small but important tweak on the basic algorithm, described in [3]. 1.1 Friedman’s gradient boosting machine Friedman (2001) and the companion paper Friedman (2002) extended the work of Friedman, Hastie, and Tibshirani (2000) and laid the ground work for a new generation of boosting algorithms. Using the connection between boosting and optimization, this new work proposes the Gradient Boosting Machine. In any function estimation problem we wish to find a regression function, ˆf(x), that minimizes the expectation of some loss function, Ψ(y, f), as shown in (4). ˆf(x) = arg min f(x) Ey,xΨ(y, f(x)) 1 Initialize ˆ f(x) to be a constant, ˆ �N f(x) = arg minρ i=1 Ψ(yi, ρ). For t in 1,..., T do 1. Compute the negative gradient as the working response zi = − ∂ ∂f(xi) Ψ(yi, f(xi))  f(xi) = ˆ f(xi) 2. Fit a regression model, g(x), predicting zi from the covariates xi. 3. Choose a gradient descent step size as ρ = arg min
twang: Toolkit for weighting and analysis of nonequivalent groups. Software for using matching methods in R. Available at http://cran.rproject.org/src/contrib/Descriptions/twang.html
, 2006
"... While working on a series of program evaluations for which only observational data were available, we developed methods (written up in McCaffrey et al., 2004) and several R scripts and functions that we believe will be of general interest to analysts developing propensity score models, models of att ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
While working on a series of program evaluations for which only observational data were available, we developed methods (written up in McCaffrey et al., 2004) and several R scripts and functions that we believe will be of general interest to analysts developing propensity score models, models of attrition or nonresponse, and some other applications. Because we now regularly use these
ada: An R Package for Stochastic Boosting
"... Boosting is an iterative algorithm that combines simple classification rules with ‘mediocre’ performance in terms of misclassification error rate to produce a highly accurate classification rule. Stochastic gradient boosting provides an enhancement which incorporates a random mechanism at each boost ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
Boosting is an iterative algorithm that combines simple classification rules with ‘mediocre’ performance in terms of misclassification error rate to produce a highly accurate classification rule. Stochastic gradient boosting provides an enhancement which incorporates a random mechanism at each boosting step showing an improvement in performance and speed in generating the ensemble. ada is an R package that implements three popular variants of boosting, together with a version of stochastic gradient boosting. In addition, useful plots for data analytic purposes are provided along with an extension to the multiclass case. The algorithms are illustrated with synthetic and real data sets.