Results 1  10
of
720
Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties
, 2001
"... Variable selection is fundamental to highdimensional statistical modeling, including nonparametric regression. Many approaches in use are stepwise selection procedures, which can be computationally expensive and ignore stochastic errors in the variable selection process. In this article, penalized ..."
Abstract

Cited by 946 (62 self)
 Add to MetaCart
Variable selection is fundamental to highdimensional statistical modeling, including nonparametric regression. Many approaches in use are stepwise selection procedures, which can be computationally expensive and ignore stochastic errors in the variable selection process. In this article, penalized likelihood approaches are proposed to handle these kinds of problems. The proposed methods select variables and estimate coefficients simultaneously. Hence they enable us to construct confidence intervals for estimated parameters. The proposed approaches are distinguished from others in that the penalty functions are symmetric, nonconcave on (0, ∞), and have singularities at the origin to produce sparse solutions. Furthermore, the penalty functions should be bounded by a constant to reduce bias and satisfy certain conditions to yield continuous solutions. A new algorithm is proposed for optimizing penalized likelihood functions. The proposed ideas are widely applicable. They are readily applied to a variety of parametric models such as generalized linear models and robust regression models. They can also be applied easily to nonparametric modeling by using wavelets and splines. Rates of convergence of the proposed penalized likelihood estimators are established. Furthermore, with proper choice of regularization parameters, we show that the proposed estimators perform as well as the oracle procedure in variable selection; namely, they work as well as if the correct submodel were known. Our simulation shows that the newly proposed methods compare favorably with other variable selection techniques. Furthermore, the standard error formulas are tested to be accurate enough for practical applications.
Flexible smoothing with Bsplines and penalties
 STATISTICAL SCIENCE
, 1996
"... Bsplines are attractive for nonparametric modelling, but choosing the optimal number and positions of knots is a complex task. Equidistant knots can be used, but their small and discrete number allows only limited control over smoothness and fit. We propose to use a relatively large number of knots ..."
Abstract

Cited by 404 (6 self)
 Add to MetaCart
Bsplines are attractive for nonparametric modelling, but choosing the optimal number and positions of knots is a complex task. Equidistant knots can be used, but their small and discrete number allows only limited control over smoothness and fit. We propose to use a relatively large number of knots and a difference penalty on coefficients of adjacent Bsplines. We show connections to the familiar spline penalty on the integral of the squared second derivative. A short overview of Bsplines, their construction, and penalized likelihood is presented. We discuss properties of penalized Bsplines and propose various criteria for the choice of an optimal penalty parameter. Nonparametric logistic regression, density estimation and scatterplot smoothing are used as examples. Some details of the computations are presented.
Prediction With Gaussian Processes: From Linear Regression To Linear Prediction And Beyond
 Learning and Inference in Graphical Models
, 1997
"... The main aim of this paper is to provide a tutorial on regression with Gaussian processes. We start from Bayesian linear regression, and show how by a change of viewpoint one can see this method as a Gaussian process predictor based on priors over functions, rather than on priors over parameters. Th ..."
Abstract

Cited by 228 (4 self)
 Add to MetaCart
The main aim of this paper is to provide a tutorial on regression with Gaussian processes. We start from Bayesian linear regression, and show how by a change of viewpoint one can see this method as a Gaussian process predictor based on priors over functions, rather than on priors over parameters. This leads in to a more general discussion of Gaussian processes in section 4. Section 5 deals with further issues, including hierarchical modelling and the setting of the parameters that control the Gaussian process, the covariance functions for neural network models and the use of Gaussian processes in classification problems. PREDICTION WITH GAUSSIAN PROCESSES: FROM LINEAR REGRESSION TO LINEAR PREDICTION AND BEYOND 2 1 Introduction In the last decade neural networks have been used to tackle regression and classification problems, with some notable successes. It has also been widely recognized that they form a part of a wide variety of nonlinear statistical techniques that can be used for...
Polynomial Splines and Their Tensor Products in Extended Linear Modeling
 Ann. Statist
, 1997
"... ANOVA type models are considered for a regression function or for the logarithm of a probability function, conditional probability function, density function, conditional density function, hazard function, conditional hazard function, or spectral density function. Polynomial splines are used to m ..."
Abstract

Cited by 221 (16 self)
 Add to MetaCart
(Show Context)
ANOVA type models are considered for a regression function or for the logarithm of a probability function, conditional probability function, density function, conditional density function, hazard function, conditional hazard function, or spectral density function. Polynomial splines are used to model the main effects, and their tensor products are used to model any interaction components that are included. In the special context of survival analysis, the baseline hazard function is modeled and nonproportionality is allowed. In general, the theory involves the L 2 rate of convergence for the fitted model and its components. The methodology involves least squares and maximum likelihood estimation, stepwise addition of basis functions using Rao statistics, stepwise deletion using Wald statistics, and model selection using BIC, crossvalidation or an independent test set. Publically available software, written in C and interfaced to S/SPLUS, is used to apply this methodology to...
Learning with Labeled and Unlabeled Data
, 2001
"... In this paper, on the one hand, we aim to give a review on literature dealing with the problem of supervised learning aided by additional unlabeled data. On the other hand, being a part of the author's first year PhD report, the paper serves as a frame to bundle related work by the author as we ..."
Abstract

Cited by 202 (3 self)
 Add to MetaCart
(Show Context)
In this paper, on the one hand, we aim to give a review on literature dealing with the problem of supervised learning aided by additional unlabeled data. On the other hand, being a part of the author's first year PhD report, the paper serves as a frame to bundle related work by the author as well as numerous suggestions for potential future work. Therefore, this work contains more speculative and partly subjective material than the reader might expect from a literature review. We give a rigorous definition of the problem and relate it to supervised and unsupervised learning. The crucial role of prior knowledge is put forward, and we discuss the important notion of inputdependent regularization. We postulate a number of baseline methods, being algorithms or algorithmic schemes which can more or less straightforwardly be applied to the problem, without the need for genuinely new concepts. However, some of them might serve as basis for a genuine method. In the literature revi...
A forecast comparison of volatility models: Does anything beat a GARCH(1, 1)
, 2001
"... By using intraday returns to calculate a measure for the timevarying volatility, Andersen and Bollerslev (1998a) established that volatility models do provide good forecasts of the conditional variance. In this paper, we take the same approach and use intraday estimated measures of volatility to ..."
Abstract

Cited by 194 (6 self)
 Add to MetaCart
By using intraday returns to calculate a measure for the timevarying volatility, Andersen and Bollerslev (1998a) established that volatility models do provide good forecasts of the conditional variance. In this paper, we take the same approach and use intraday estimated measures of volatility to compare volatility models. Our objective is to evaluate whether the evolution of volatility models has led to better forecasts of volatility when compared to the first “species” of volatility models. We make an outofsample comparison of 330 different volatility models using daily exchange rate data (DM/$) and IBM stock prices. Our analysis does not point to a single winner amongst the different volatility models, as it is different models that are best at forecasting the volatility of the two types of assets. Interestingly, the best models do not provide a significantly better forecast than the GARCH(1,1) model. This result is established by the tests for superior predictive ability of White (2000) and Hansen (2001). If an ARCH(1) model is selected as the benchmark, it is clearly outperformed. We thank Tim Bollerslev for providing us with the exchange rate data set, and Sivan Ritz for suggesting numerous clarifications. All errors remain our responsibility.
Bayesian Classification with Gaussian Processes
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1998
"... We consider the problem of assigning an input vector x to one of m classes by predicting P (cjx) for c = 1; : : : ; m. For a twoclass problem, the probability of class 1 given x is estimated by oe(y(x)), where oe(y) = 1=(1 + e ). A Gaussian process prior is placed on y(x), and is combined wi ..."
Abstract

Cited by 178 (1 self)
 Add to MetaCart
We consider the problem of assigning an input vector x to one of m classes by predicting P (cjx) for c = 1; : : : ; m. For a twoclass problem, the probability of class 1 given x is estimated by oe(y(x)), where oe(y) = 1=(1 + e ). A Gaussian process prior is placed on y(x), and is combined with the training data to obtain predictions for new x points.
NonConcave Penalized Likelihood with a Diverging Number of Parameters
, 2003
"... ..."
(Show Context)
SiZer for exploration of structures in curves
 Journal of the American Statistical Association
, 1997
"... In the use of smoothing methods in data analysis, an important question is often: which observed features are "really there?", as opposed to being spurious sampling artifacts. An approach is described, based on scale space ideas that were originally developed in computer vision literatu ..."
Abstract

Cited by 150 (20 self)
 Add to MetaCart
In the use of smoothing methods in data analysis, an important question is often: which observed features are "really there?", as opposed to being spurious sampling artifacts. An approach is described, based on scale space ideas that were originally developed in computer vision literature. Assessment of Significant ZERo crossings of derivatives, results in the SiZer map, a graphical device for display of significance of features, with respect to both location and scale. Here "scale" means "level of resolution", i.e.
Bivariate Quantile Smoothing Splines
 Biometrika
, 1998
"... It has long been recognized that the mean provides an inadequate summary while the set of quantiles can supply a more complete description of a sample. We introduce bivariate quantile smoothing splines, which belong to the space of bilinear tensorproduct splines, as nonparametric estimators for the ..."
Abstract

Cited by 128 (21 self)
 Add to MetaCart
It has long been recognized that the mean provides an inadequate summary while the set of quantiles can supply a more complete description of a sample. We introduce bivariate quantile smoothing splines, which belong to the space of bilinear tensorproduct splines, as nonparametric estimators for the conditional quantile functions in a two dimensional design space. The estimators can be computed using standard linear programming techniques and can further be used as building blocks for conditional quantile estimations in higher dimensions. For moderately large data sets, we recommend using penalized bivariate Bsplines as approximate solutions. We use real and simulated data to illustrate the proposed methodology. KEY WORDS: Conditional quantile; Linear program; Nonparametric regression; Robust regression; Schwarz information criterion; Tensorproduct spline. Xuming He is Associate Professor and Stephen Portnoy is Professor, Department of Statistics, University of Illinois, 725 S Wri...