Results 1  10
of
208
BagBoosting for tumor classification with gene expression data
 Bioinformatics
, 2004
"... Motivation: Microarray experiments are expected to contribute significantly to the progress in cancer treatment by enabling a precise and early diagnosis. They create a need for class prediction tools, which can deal with a large number of highly correlated input variables, perform feature selection ..."
Abstract

Cited by 194 (2 self)
 Add to MetaCart
Motivation: Microarray experiments are expected to contribute significantly to the progress in cancer treatment by enabling a precise and early diagnosis. They create a need for class prediction tools, which can deal with a large number of highly correlated input variables, perform feature selection and provide class probability estimates that serve as a quantification of the predictive uncertainty. A very promising solution is to combine the two ensemble schemes bagging and boosting to a novel algorithm called BagBoosting.
Results: When bagging is used as a module in boosting, the resulting classifier consistently improves the predictive performance and the probability estimates of both bagging and boosting on real and simulated gene expression data. This quasiguaranteed improvement can be obtained by simply making a bigger computing effort. The advantageous predictive potential is also confirmed by comparing BagBoosting to several established class prediction tools for microarray data.
Statistical Behavior and Consistency of Classification Methods based on Convex Risk Minimization
, 2001
"... We study how close the optimal Bayes error rate can be approximately reached using a classification algorithm that computes a classifier by minimizing a convex upper bound of the classification error function. The measurement of closeness is characterized by the loss function used in the estimation. ..."
Abstract

Cited by 158 (6 self)
 Add to MetaCart
(Show Context)
We study how close the optimal Bayes error rate can be approximately reached using a classification algorithm that computes a classifier by minimizing a convex upper bound of the classification error function. The measurement of closeness is characterized by the loss function used in the estimation. We show that such a classification scheme can be generally regarded as a (non maximumlikelihood) conditional inclass probability estimate, and we use this analysis to compare various convex loss functions that have appeared in the literature. Furthermore, the theoretical insight allows us to design good loss functions with desirable properties. Another aspect of our analysis is to demonstrate the consistency of certain classification methods using convex risk minimization.
An introduction to boosting and leveraging
 Advanced Lectures on Machine Learning, LNCS
, 2003
"... ..."
(Show Context)
Grouped and hierarchical model selection through composite absolute penalties
 Annals of Statistics
, 2006
"... Extracting useful information from highdimensional data is an important part of the focus of today’s statistical research and practice. Penalized loss function minimization has been shown to be effective for this task both theoretically and empirically. With the virtues of both regularization and ..."
Abstract

Cited by 119 (3 self)
 Add to MetaCart
Extracting useful information from highdimensional data is an important part of the focus of today’s statistical research and practice. Penalized loss function minimization has been shown to be effective for this task both theoretically and empirically. With the virtues of both regularization and sparsity, the L1penalized L2 minimization method Lasso has been popular in regression models. In this paper, we combine different norms including L1 to form an intelligent penalty in order to add side information to the fitting of a regression or classification model to obtain reasonable estimates. Specifically, we introduce the Composite Absolute Penalties (CAP) family which allows the grouping and hierarchical relationships between the predictors to be expressed. CAP penalties are built by defining groups and combining the properties of norm penalties at the across group and within group levels. Grouped selection occurs for nonoverlapping groups. In that case, we give a Bayesian 1 interpretation for CAP penalties. Hierarchical variable selection is reached by defining groups with particular overlapping patterns. In the computation aspect, we propose using the BLASSO and crossvalidation to obtain CAP estimates. For a subfamily of CAP estimates involving only the L1 and L ∞ norms, we introduce the iCAP algorithm to trace the entire regularization path for the grouped selection problem. Within this subfamily, unbiased estimates of the degrees of freedom (df) are derived allowing the regularization parameter to be selected without crossvalidation. CAP is shown to improve on the predictive performance of the LASSO in a series of simulated experiments including cases with p>> n and misspecified groupings. When the complexity of a model is properly calculated, iCAP is seen to be parsimonious in the experiments. 1
Kernel Logistic Regression and the Import Vector Machine
 Journal of Computational and Graphical Statistics
, 2001
"... The support vector machine (SVM) is known for its good performance in binary classification, but its extension to multiclass classification is still an ongoing research issue. In this paper, we propose a new approach for classification, called the import vector machine (IVM), which is built on ker ..."
Abstract

Cited by 119 (4 self)
 Add to MetaCart
(Show Context)
The support vector machine (SVM) is known for its good performance in binary classification, but its extension to multiclass classification is still an ongoing research issue. In this paper, we propose a new approach for classification, called the import vector machine (IVM), which is built on kernel logistic regression (KLR). We show that the IVM not only performs as well as the SVM in binary classification, but also can naturally be generalized to the multiclass case. Furthermore, the IVM provides an estimate of the underlying probability. Similar to the "support points" of the SVM, the IVM model uses only a fraction of the training data to index kernel basis functions, typically a much smaller fraction than the SVM. This gives the IVM a computational advantage over the SVM, especially when the size of the training data set is large. 1
Boosting algorithms: Regularization, prediction and model fitting
 Statistical Science
, 2007
"... Abstract. We present a statistical perspective on boosting. Special emphasis is given to estimating potentially complex parametric or nonparametric models, including generalized linear and additive models as well as regression models for survival analysis. Concepts of degrees of freedom and correspo ..."
Abstract

Cited by 99 (12 self)
 Add to MetaCart
(Show Context)
Abstract. We present a statistical perspective on boosting. Special emphasis is given to estimating potentially complex parametric or nonparametric models, including generalized linear and additive models as well as regression models for survival analysis. Concepts of degrees of freedom and corresponding Akaike or Bayesian information criteria, particularly useful for regularization and variable selection in highdimensional covariate spaces, are discussed as well. The practical aspects of boosting procedures for fitting statistical models are illustrated by means of the dedicated opensource software package mboost. This package implements functions which can be used for model fitting, prediction and variable selection. It is flexible, allowing for the implementation of new boosting algorithms optimizing userspecified loss functions. Key words and phrases: Generalized linear models, generalized additive models, gradient boosting, survival analysis, variable selection, software. 1.
Theory of classification: A survey of some recent advances
, 2005
"... The last few years have witnessed important new developments in the theory and practice of pattern classification. We intend to survey some of the main new ideas that have led to these recent results. ..."
Abstract

Cited by 96 (3 self)
 Add to MetaCart
The last few years have witnessed important new developments in the theory and practice of pattern classification. We intend to survey some of the main new ideas that have led to these recent results.
Propensity Score Estimation with Boosted Regression for Evaluating Causal Effects in Observational Studies
 Psychological Methods
, 2004
"... Causal effect modeling with naturalistic rather than experimental data is challenging. In observational studies participants in different treatment conditions may also differ on pretreatment characteristics that influence outcomes. Propensity score methods can theoretically eliminate these confounds ..."
Abstract

Cited by 89 (7 self)
 Add to MetaCart
(Show Context)
Causal effect modeling with naturalistic rather than experimental data is challenging. In observational studies participants in different treatment conditions may also differ on pretreatment characteristics that influence outcomes. Propensity score methods can theoretically eliminate these confounds for all observed covariates, but accurate estimation of propensity scores is impeded by large numbers of covariates, uncertain functional forms for their associations with treatment selection, and other problems. This paper demonstrates that boosting, a modern statistical technique, can overcome many of these obstacles. We illustrate this approach with a study of adolescent probationers in substance abuse treatment programs. Propensity score weights estimated using boosting eliminate most pretreatment group differences, and substantially alter the apparent relative effects of adolescent substance abuse treatment. Experimental studies offer the most rigorous evidence with which to establish treatment efficacy, but they are not always practical or feasible. Experimental treatment evaluations can be expensive to field and may be too slow to produce answers to pressing questions. In some cases
High dimensional classification using features annealed independence rules
 Ann. Statist
, 2008
"... ABSTRACT. Classification using highdimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other highthroughput data. The impact of dimensionality on classifications is largely poorly understood. In a seminal paper, Bickel an ..."
Abstract

Cited by 80 (18 self)
 Add to MetaCart
ABSTRACT. Classification using highdimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other highthroughput data. The impact of dimensionality on classifications is largely poorly understood. In a seminal paper, Bickel and Levina (2004) show that the Fisher discriminant performs poorly due to diverging spectra and they propose to use the independence rule to overcome the problem. We first demonstrate that even for the independence classification rule, classification using all the features can be as bad as the random guessing due to noise accumulation in estimating population centroids in highdimensional feature space. In fact, we demonstrate further that almost all linear discriminants can perform as bad as the random guessing. Thus, it is paramountly important to select a subset of important features for highdimensional classification, resulting in Features Annealed Independence Rules (FAIR). The conditions under which all the important features can be selected by the twosample tstatistic are established. The choice of the optimal number of features, or equivalently, the threshold value of the test statistics are proposed based on an upper bound of the classification error. Simulation studies and real data analysis support our theoretical results and demonstrate convincingly the advantage of our new classification procedure.
Boosting for highdimensional linear models
 THE ANNALS OF STATISTICS
, 2006
"... We prove that boosting with the squared error loss, L2Boosting, is consistent for very highdimensional linear models, where the number of predictor variables is allowed to grow essentially as fast as O(exp(sample size)), assuming that the true underlying regression function is sparse in terms of th ..."
Abstract

Cited by 80 (4 self)
 Add to MetaCart
We prove that boosting with the squared error loss, L2Boosting, is consistent for very highdimensional linear models, where the number of predictor variables is allowed to grow essentially as fast as O(exp(sample size)), assuming that the true underlying regression function is sparse in terms of the ℓ1norm of the regression coefficients. In the language of signal processing, this means consistency for denoising using a strongly overcomplete dictionary if the underlying signal is sparse in terms of the ℓ1norm. We also propose here an AICbased method for tuning, namely for choosing the number of boosting iterations. This makes L2Boosting computationally attractive since it is not required to run the algorithm multiple times for crossvalidation as commonly used so far. We demonstrate L2Boosting for simulated data, in particular where the predictor dimension is large in comparison to sample size, and for a difficult tumorclassification problem with gene expression microarray data.