• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

The Elements of Statistical Learning: Data Mining, Inference, and Prediction. (2001)

by T Hastie, R Tibshirani, J Friedman
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 1,422
Next 10 →

Novel methods improve prediction of species’ distributions from occurrence data

by Jane Elith, Catherine H. Graham, Robert P. Anderson, Miroslav Dudík, Simon Ferrier, Antoine Guisan, Robert J. Hijmans, Falk Huettmann, John R. Leathwick, Anthony Lehmann, Jin Li, Lucia G. Lohmann, Bette A. Loiselle, Glenn Manion, Craig Moritz, Miguel Nakamura, Yoshinori Nakazawa, Jacob Mcc Overton, A. Townsend Peterson, Steven J. Phillips, Karen Richardson, Ricardo Scachetti-pereira, Robert E. Schapire, Jorge Soberón, Stephen Williams, Mary S. Wisz, Niklaus E. Zimmermann - Ecography , 2006
"... occurrence data ..."
Abstract - Cited by 482 (8 self) - Add to MetaCart
occurrence data
(Show Context)

Citation Context

... et al. 2005), are now much more abundant and available. In parallel, development of methods for efficient exploration and summary of patterns in large databases has accelerated in other disciplines (=-=Hastie et al. 2001-=-), but only a few of these have been applied in ecological studies. Given the widespread use of distribution modelling, and the synergy of advances in data availability and modelling methods, a clear ...

Extremely Randomized Trees

by Pierre Geurts - MACHINE LEARNING , 2003
"... This paper presents a new learning algorithm based on decision tree ensembles. In opposition to the classical decision tree induction method, the trees of the ensemble are built by selecting the tests during their induction fully at random. This extreme ..."
Abstract - Cited by 267 (49 self) - Add to MetaCart
This paper presents a new learning algorithm based on decision tree ensembles. In opposition to the classical decision tree induction method, the trees of the ensemble are built by selecting the tests during their induction fully at random. This extreme

Metric Learning by Collapsing Classes

by Amir Globerson, Sam Roweis
"... We present an algorithm for learning a quadratic Gaussian metric (Mahalanobis distance) for use in classification tasks. Our method relies on the simple geometric intuition that a good metric is one under which points in the same class are simultaneously near each other and far from points in th ..."
Abstract - Cited by 230 (2 self) - Add to MetaCart
We present an algorithm for learning a quadratic Gaussian metric (Mahalanobis distance) for use in classification tasks. Our method relies on the simple geometric intuition that a good metric is one under which points in the same class are simultaneously near each other and far from points in the other classes. We construct a convex optimization problem whose solution generates such a metric by trying to collapse all examples in the same class to a single point and push examples in other classes infinitely far away. We show that when the metric we learn is used in simple classifiers, it yields substantial improvements over standard alternatives on a variety of problems. We also discuss how the learned metric may be used to obtain a compact low dimensional feature representation of the original input space, allowing more efficient classification with very little reduction in performance.

The Entire Regularization Path for the Support Vector Machine

by Trevor Hastie, Saharon Rosset, Robert Tibshirani, Ji Zhu , 2004
"... The Support Vector Machine is a widely used tool for classification. Many efficient imple-mentations exist for fitting a two-class SVM model. The user has to supply values for the tuning parameters: the regularization cost parameter, and the kernel parameters. It seems a common practice is to use a ..."
Abstract - Cited by 204 (11 self) - Add to MetaCart
The Support Vector Machine is a widely used tool for classification. Many efficient imple-mentations exist for fitting a two-class SVM model. The user has to supply values for the tuning parameters: the regularization cost parameter, and the kernel parameters. It seems a common practice is to use a default value for the cost parameter, often leading to the least restrictive model. In this paper we argue that the choice of the cost parameter can be critical. We then derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model. We illustrate our algorithm on some examples, and use our representation to give further insight into the range of SVM solutions.
(Show Context)

Citation Context

...d its associated classifier Class(x) = sign[f(x)] (2) There are many ways to fit such a linear classifier, including linear regression, Fisher's linear discriminant analysis, and logistic regression [=-=Hastie et al., 2001-=-, Chapter 4]. If the # Trevor Hastie and Rob Tibshirani are faculty members of the Statistics and Biostatistics departments at Stanford University; Saharon Rosset is a member of the technical sta# at ...

Machine learning classifiers and fmri: A tutorial overview

by Francisco Pereira, Tom Mitchell, Matthew Botvinick - NeuroImage , 2009
"... Interpreting brain image experiments requires analysis of complex, multivariate data. In recent years, one analysis approach that has grown in popularity is the use of machine learning algorithms to train classifiers to decode stimuli, mental states, behaviors and other variables of interest from fM ..."
Abstract - Cited by 159 (6 self) - Add to MetaCart
Interpreting brain image experiments requires analysis of complex, multivariate data. In recent years, one analysis approach that has grown in popularity is the use of machine learning algorithms to train classifiers to decode stimuli, mental states, behaviors and other variables of interest from fMRI data and thereby show the data contain enough information about them. In this tutorial overview we review some of the key choices faced in using this approach as well as how to derive statistically significant results, illustrating each point from a case study. Furthermore, we show how, in addition to answering the question of ‘is there information about a variable of interest ’ (pattern discrimination), classifiers can be used to tackle other classes of question, namely ‘where is the information ’ (pattern localization) and ‘how is that information encoded ’ (pattern characterization). 1

Piecewise linear regularized solution paths,

by Saharon Rosset , Ji Zhu - The Annals of Statistics, , 2007
"... Abstract We consider the generic regularized optimization problemβ(λ) = arg min β L(y, Xβ) + λJ(β). Recently, ..."
Abstract - Cited by 140 (9 self) - Add to MetaCart
Abstract We consider the generic regularized optimization problemβ(λ) = arg min β L(y, Xβ) + λJ(β). Recently,
(Show Context)

Citation Context

...erized LASSO to that of the LASSO on the original data and after we artificially “contaminate” the data by adding large constants to a small number of responses. We used the training-test split as in =-=[8]-=-. The training set consists of 67 observations and the test set of 30 observations. We ran the LASSO and the Huberized LASSO with a knot at t = 1 on the original dataset, and on the “contaminated” dat...

An introduction to boosting and leveraging

by Ron Meir, Gunnar Rätsch - Advanced Lectures on Machine Learning, LNCS , 2003
"... ..."
Abstract - Cited by 138 (10 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...riefly mention some weak learners which have been used successfully in applications. Decision trees and stumps. Decision trees have been widely used for many years in the statistical literature (e.g. =-=[29, 144, 85]-=-) as powerful, effective and easily interpretable classification algorithms that are able to automatically select relevant features. Hence, it is not surprising that some of the most successful initia...

Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation

by Steven J. Phillips, Miroslav Dudík
"... Accurate modeling of geographic distributions of species is crucial to various applications in ecology and conservation. The best performing techniques often require some parameter tuning, which may be prohibitively time-consuming to do separately for each species, or unreliable for small or biased ..."
Abstract - Cited by 131 (2 self) - Add to MetaCart
Accurate modeling of geographic distributions of species is crucial to various applications in ecology and conservation. The best performing techniques often require some parameter tuning, which may be prohibitively time-consuming to do separately for each species, or unreliable for small or biased datasets. Additionally, even with the abundance of good quality data, users interested in the application of species models need not have the statistical knowledge required for detailed tuning. In such cases, it is desirable to use ‘‘default settings’’, tuned and validated on diverse datasets. Maxent is a recently introduced modeling technique, achieving high predictive accuracy and enjoying several additional attractive properties. The performance of Maxent is influenced by a moderate number of parameters. The first contribution of this paper is the empirical tuning of these parameters. Since many datasets lack information about species absence, we present a tuning method that uses presence-only data. We evaluate our method on independently collected high-quality presenceabsence data. In addition to tuning, we introduce several concepts that improve the predictive accuracy and running time of Maxent. We introduce ‘‘hinge features’ ’ that model more complex relationships in the training data; we describe a new logistic output format that gives an estimate of probability of presence; finally we explore ‘‘background sampling’’ strategies that cope with sample selection bias and decrease model-building time. Our evaluation, based on a diverse dataset of 226 species from 6 regions, shows: 1) default settings tuned on presence-only data achieve performance which is almost as good as if they had been tuned on the evaluation data itself; 2) hinge features substantially improve model

Adapting ranking SVM to document retrieval

by Yunbo Cao, Jun Xu, Tie-yan Liu, Hang Li, Yalou Huang, Hsiao-wuen Hon - In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval , 2006
"... The paper is concerned with applying learning to rank to document retrieval. Ranking SVM is a typical method of learning to rank. We point out that there are two factors one must consider when applying Ranking SVM, in general a “learning to rank” method, to document retrieval. First, correctly ranki ..."
Abstract - Cited by 124 (21 self) - Add to MetaCart
The paper is concerned with applying learning to rank to document retrieval. Ranking SVM is a typical method of learning to rank. We point out that there are two factors one must consider when applying Ranking SVM, in general a “learning to rank” method, to document retrieval. First, correctly ranking documents on the top of the result list is crucial for an Information Retrieval system. One must conduct training in a way that such ranked results are accurate. Second, the number of relevant documents can vary from query to query. One must avoid training a model biased toward queries with a large number of relevant documents. Previously, when existing methods that include Ranking SVM were applied to document retrieval, none of the two factors was taken into consideration. We show it is possible to make modifications in conventional Ranking SVM, so it can be better used for document retrieval. Specifically, we modify the “Hinge Loss ” function in Ranking SVM to deal with the problems described above. We employ two methods to conduct optimization on the loss function: gradient descent and quadratic programming. Experimental results show that our method, referred to as Ranking SVM for IR, can outperform the conventional Ranking SVM and other existing methods for document retrieval on two datasets.
(Show Context)

Citation Context

...Quadratic Optimization problem: 1 2 min M( w) = w + C ξi w 2 i= 1 (1) (2) subject to ξi ≥ 0, zi w, xi − xi ≥1− ξi i = 1, , (6) Note that the optimization is (6) is equivalent to that in (7), when λ = =-=[7]-=-. 1 2C ℓ � � � � ⎡ ⎤ ⎣ ⎦ � min 1 −zi w i= 1 (1) (2) wx , i − xi 2 + λ w , + where subscript “+” indicates the positive part. The first term is the so-called empirical Hinge Loss and the second term is...

Practical selection of SVM parameters and noise estimation for SVM regression”, Neural

by Vladimir Cherkassky , Yunqian Ma - Netw , 2004
"... Abstract We investigate practical selection of hyper-parameters for support vector machines (SVM) regression (that is, 1-insensitive zone and regularization parameter C). The proposed methodology advocates analytic parameter selection directly from the training data, rather than re-sampling approac ..."
Abstract - Cited by 112 (1 self) - Add to MetaCart
Abstract We investigate practical selection of hyper-parameters for support vector machines (SVM) regression (that is, 1-insensitive zone and regularization parameter C). The proposed methodology advocates analytic parameter selection directly from the training data, rather than re-sampling approaches commonly used in SVM applications. In particular, we describe a new analytical prescription for setting the value of insensitive zone 1; as a function of training sample size. Good generalization performance of the proposed parameter selection is demonstrated empirically using several low-and high-dimensional regression problems. Further, we point out the importance of Vapnik's 1-insensitive loss for regression problems with finite samples. To this end, we compare generalization performance of SVM regression (using proposed selection of 1-values) with regression using 'least-modulus' loss ð1 ¼ 0Þ and standard squared loss. These comparisons indicate superior generalization performance of SVM regression under sparse sample settings, for various types of additive noise. q
(Show Context)

Citation Context

... are given in Section 7. 2. Support vector regression and SVM parameter selection We consider standard regression formulation under general setting for predictive learning (Cherkassky & Mulier, 1998; =-=Hastie, Tibshirani, & Friedman, 2001-=-; Vapnik, 1999). The goal is to estimate unknown real-valued function in the relationship: y rðxÞþd where d is independent and identically distributed (i.i.d.) zero mean random error (noise), x is a...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University