Results 1  10
of
703
Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties
, 2001
"... Variable selection is fundamental to highdimensional statistical modeling, including nonparametric regression. Many approaches in use are stepwise selection procedures, which can be computationally expensive and ignore stochastic errors in the variable selection process. In this article, penalized ..."
Abstract

Cited by 914 (61 self)
 Add to MetaCart
Variable selection is fundamental to highdimensional statistical modeling, including nonparametric regression. Many approaches in use are stepwise selection procedures, which can be computationally expensive and ignore stochastic errors in the variable selection process. In this article, penalized likelihood approaches are proposed to handle these kinds of problems. The proposed methods select variables and estimate coefficients simultaneously. Hence they enable us to construct confidence intervals for estimated parameters. The proposed approaches are distinguished from others in that the penalty functions are symmetric, nonconcave on (0, ∞), and have singularities at the origin to produce sparse solutions. Furthermore, the penalty functions should be bounded by a constant to reduce bias and satisfy certain conditions to yield continuous solutions. A new algorithm is proposed for optimizing penalized likelihood functions. The proposed ideas are widely applicable. They are readily applied to a variety of parametric models such as generalized linear models and robust regression models. They can also be applied easily to nonparametric modeling by using wavelets and splines. Rates of convergence of the proposed penalized likelihood estimators are established. Furthermore, with proper choice of regularization parameters, we show that the proposed estimators perform as well as the oracle procedure in variable selection; namely, they work as well as if the correct submodel were known. Our simulation shows that the newly proposed methods compare favorably with other variable selection techniques. Furthermore, the standard error formulas are tested to be accurate enough for practical applications.
Flexible smoothing with Bsplines and penalties
 STATISTICAL SCIENCE
, 1996
"... Bsplines are attractive for nonparametric modelling, but choosing the optimal number and positions of knots is a complex task. Equidistant knots can be used, but their small and discrete number allows only limited control over smoothness and fit. We propose to use a relatively large number of knots ..."
Abstract

Cited by 396 (6 self)
 Add to MetaCart
Bsplines are attractive for nonparametric modelling, but choosing the optimal number and positions of knots is a complex task. Equidistant knots can be used, but their small and discrete number allows only limited control over smoothness and fit. We propose to use a relatively large number of knots and a difference penalty on coefficients of adjacent Bsplines. We show connections to the familiar spline penalty on the integral of the squared second derivative. A short overview of Bsplines, their construction, and penalized likelihood is presented. We discuss properties of penalized Bsplines and propose various criteria for the choice of an optimal penalty parameter. Nonparametric logistic regression, density estimation and scatterplot smoothing are used as examples. Some details of the computations are presented.
Prediction With Gaussian Processes: From Linear Regression To Linear Prediction And Beyond
 Learning and Inference in Graphical Models
, 1997
"... The main aim of this paper is to provide a tutorial on regression with Gaussian processes. We start from Bayesian linear regression, and show how by a change of viewpoint one can see this method as a Gaussian process predictor based on priors over functions, rather than on priors over parameters. Th ..."
Abstract

Cited by 231 (4 self)
 Add to MetaCart
The main aim of this paper is to provide a tutorial on regression with Gaussian processes. We start from Bayesian linear regression, and show how by a change of viewpoint one can see this method as a Gaussian process predictor based on priors over functions, rather than on priors over parameters. This leads in to a more general discussion of Gaussian processes in section 4. Section 5 deals with further issues, including hierarchical modelling and the setting of the parameters that control the Gaussian process, the covariance functions for neural network models and the use of Gaussian processes in classification problems. PREDICTION WITH GAUSSIAN PROCESSES: FROM LINEAR REGRESSION TO LINEAR PREDICTION AND BEYOND 2 1 Introduction In the last decade neural networks have been used to tackle regression and classification problems, with some notable successes. It has also been widely recognized that they form a part of a wide variety of nonlinear statistical techniques that can be used for...
Polynomial Splines and Their Tensor Products in Extended Linear Modeling
 Ann. Statist
, 1997
"... ANOVA type models are considered for a regression function or for the logarithm of a probability function, conditional probability function, density function, conditional density function, hazard function, conditional hazard function, or spectral density function. Polynomial splines are used to m ..."
Abstract

Cited by 217 (16 self)
 Add to MetaCart
(Show Context)
ANOVA type models are considered for a regression function or for the logarithm of a probability function, conditional probability function, density function, conditional density function, hazard function, conditional hazard function, or spectral density function. Polynomial splines are used to model the main effects, and their tensor products are used to model any interaction components that are included. In the special context of survival analysis, the baseline hazard function is modeled and nonproportionality is allowed. In general, the theory involves the L 2 rate of convergence for the fitted model and its components. The methodology involves least squares and maximum likelihood estimation, stepwise addition of basis functions using Rao statistics, stepwise deletion using Wald statistics, and model selection using BIC, crossvalidation or an independent test set. Publically available software, written in C and interfaced to S/SPLUS, is used to apply this methodology to...
Learning with Labeled and Unlabeled Data
, 2001
"... In this paper, on the one hand, we aim to give a review on literature dealing with the problem of supervised learning aided by additional unlabeled data. On the other hand, being a part of the author's first year PhD report, the paper serves as a frame to bundle related work by the author as we ..."
Abstract

Cited by 197 (3 self)
 Add to MetaCart
(Show Context)
In this paper, on the one hand, we aim to give a review on literature dealing with the problem of supervised learning aided by additional unlabeled data. On the other hand, being a part of the author's first year PhD report, the paper serves as a frame to bundle related work by the author as well as numerous suggestions for potential future work. Therefore, this work contains more speculative and partly subjective material than the reader might expect from a literature review. We give a rigorous definition of the problem and relate it to supervised and unsupervised learning. The crucial role of prior knowledge is put forward, and we discuss the important notion of inputdependent regularization. We postulate a number of baseline methods, being algorithms or algorithmic schemes which can more or less straightforwardly be applied to the problem, without the need for genuinely new concepts. However, some of them might serve as basis for a genuine method. In the literature revi...
A forecast comparison of volatility models: Does anything beat a GARCH(1, 1)
, 2001
"... By using intraday returns to calculate a measure for the timevarying volatility, Andersen and Bollerslev (1998a) established that volatility models do provide good forecasts of the conditional variance. In this paper, we take the same approach and use intraday estimated measures of volatility to ..."
Abstract

Cited by 183 (6 self)
 Add to MetaCart
By using intraday returns to calculate a measure for the timevarying volatility, Andersen and Bollerslev (1998a) established that volatility models do provide good forecasts of the conditional variance. In this paper, we take the same approach and use intraday estimated measures of volatility to compare volatility models. Our objective is to evaluate whether the evolution of volatility models has led to better forecasts of volatility when compared to the first “species” of volatility models. We make an outofsample comparison of 330 different volatility models using daily exchange rate data (DM/$) and IBM stock prices. Our analysis does not point to a single winner amongst the different volatility models, as it is different models that are best at forecasting the volatility of the two types of assets. Interestingly, the best models do not provide a significantly better forecast than the GARCH(1,1) model. This result is established by the tests for superior predictive ability of White (2000) and Hansen (2001). If an ARCH(1) model is selected as the benchmark, it is clearly outperformed. We thank Tim Bollerslev for providing us with the exchange rate data set, and Sivan Ritz for suggesting numerous clarifications. All errors remain our responsibility.
Bayesian Classification with Gaussian Processes
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1998
"... We consider the problem of assigning an input vector x to one of m classes by predicting P (cjx) for c = 1; : : : ; m. For a twoclass problem, the probability of class 1 given x is estimated by oe(y(x)), where oe(y) = 1=(1 + e ). A Gaussian process prior is placed on y(x), and is combined wi ..."
Abstract

Cited by 178 (1 self)
 Add to MetaCart
We consider the problem of assigning an input vector x to one of m classes by predicting P (cjx) for c = 1; : : : ; m. For a twoclass problem, the probability of class 1 given x is estimated by oe(y(x)), where oe(y) = 1=(1 + e ). A Gaussian process prior is placed on y(x), and is combined with the training data to obtain predictions for new x points.
NonConcave Penalized Likelihood with a Diverging Number of Parameters
, 2003
"... ..."
(Show Context)
SiZer for exploration of structures in curves
 Journal of the American Statistical Association
, 1997
"... In the use of smoothing methods in data analysis, an important question is often: which observed features are "really there?", as opposed to being spurious sampling artifacts. An approach is described, based on scale space ideas that were originally developed in computer vision literatu ..."
Abstract

Cited by 146 (19 self)
 Add to MetaCart
In the use of smoothing methods in data analysis, an important question is often: which observed features are "really there?", as opposed to being spurious sampling artifacts. An approach is described, based on scale space ideas that were originally developed in computer vision literature. Assessment of Significant ZERo crossings of derivatives, results in the SiZer map, a graphical device for display of significance of features, with respect to both location and scale. Here "scale" means "level of resolution", i.e.
Smoothing Spline Models for the Analysis of Nested and Crossed Samples of Curves
 Journal of the American Statistical Association
, 1998
"... We introduce a class of models for an additive decomposition of groups of curves strati ed by crossed and nested factors, generalizing smoothing splines to such samples by associating them with a corresponding mixed e ects model. The models are also useful for imputation of missing data and explorat ..."
Abstract

Cited by 126 (1 self)
 Add to MetaCart
We introduce a class of models for an additive decomposition of groups of curves strati ed by crossed and nested factors, generalizing smoothing splines to such samples by associating them with a corresponding mixed e ects model. The models are also useful for imputation of missing data and exploratory analysis of variance. We prove that the best linear unbiased predictors (BLUP) from the extended mixed e ects model correspond to solutions of a generalized penalized regression where smoothing parameters are directly related to variance components, and we show that these solutions are natural cubic splines. The model parameters are estimated using a highly e cient implementation of the EM algorithm for restricted maximum likelihood (REML) estimation based on a preliminary eigenvector decomposition. Variability of computed estimates can be assessed with asymptotic techniques or with a novel hierarchical bootstrap resampling scheme for nested mixed e ects models. Our methods are applied to menstrual cycle data from studies of reproductive function that measure daily urinary progesterone; the sample of progesterone curves is strati ed by cycles nested within subjects nested within conceptive and nonconceptive groups.