Results 1  10
of
1,203
Adapting to unknown smoothness via wavelet shrinkage
 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 1995
"... We attempt to recover a function of unknown smoothness from noisy, sampled data. We introduce a procedure, SureShrink, which suppresses noise by thresholding the empirical wavelet coefficients. The thresholding is adaptive: a threshold level is assigned to each dyadic resolution level by the princip ..."
Abstract

Cited by 981 (18 self)
 Add to MetaCart
We attempt to recover a function of unknown smoothness from noisy, sampled data. We introduce a procedure, SureShrink, which suppresses noise by thresholding the empirical wavelet coefficients. The thresholding is adaptive: a threshold level is assigned to each dyadic resolution level by the principle of minimizing the Stein Unbiased Estimate of Risk (Sure) for threshold estimates. The computational effort of the overall procedure is order N log(N) as a function of the sample size N. SureShrink is smoothnessadaptive: if the unknown function contains jumps, the reconstruction (essentially) does also; if the unknown function has a smooth piece, the reconstruction is (essentially) as smooth as the mother wavelet will allow. The procedure is in a sense optimally smoothnessadaptive: it is nearminimax simultaneously over a whole interval of the Besov scale; the size of this interval depends on the choice of mother wavelet. We know from a previous paper by the authors that traditional smoothing methods  kernels, splines, and orthogonal series estimates  even with optimal choices of the smoothing parameter, would be unable to perform
Statistical Comparisons of Classifiers over Multiple Data Sets
, 2006
"... While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but igno ..."
Abstract

Cited by 710 (0 self)
 Add to MetaCart
While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but ignored. This article reviews the current practice and then theoretically and empirically examines several suitable tests. Based on that, we recommend a set of simple, yet safe and robust nonparametric tests for statistical comparisons of classifiers: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding posthoc tests for comparison of more classifiers over multiple data sets. Results of the latter can also be neatly presented with the newly introduced CD (critical difference) diagrams.
Locally weighted learning
 ARTIFICIAL INTELLIGENCE REVIEW
, 1997
"... This paper surveys locally weighted learning, a form of lazy learning and memorybased learning, and focuses on locally weighted linear regression. The survey discusses distance functions, smoothing parameters, weighting functions, local model structures, regularization of the estimates and bias, ass ..."
Abstract

Cited by 594 (52 self)
 Add to MetaCart
This paper surveys locally weighted learning, a form of lazy learning and memorybased learning, and focuses on locally weighted linear regression. The survey discusses distance functions, smoothing parameters, weighting functions, local model structures, regularization of the estimates and bias, assessing predictions, handling noisy data and outliers, improving the quality of predictions by tuning t parameters, interference between old and new data, implementing locally weighted learning e ciently, and applications of locally weighted learning. A companion paper surveys how locally weighted learning can be used in robot learning and control.
Projection Pursuit Regression
 Journal of the American Statistical Association
, 1981
"... A new method for nonparametric multiple regression is presented. The procedure models the regression surface as a sum of general smooth functions of linear combinations of the predictor variables in an iterative manner. It is more general than standard stepwise and stagewise regression procedures, ..."
Abstract

Cited by 547 (6 self)
 Add to MetaCart
(Show Context)
A new method for nonparametric multiple regression is presented. The procedure models the regression surface as a sum of general smooth functions of linear combinations of the predictor variables in an iterative manner. It is more general than standard stepwise and stagewise regression procedures, does not require the definition of a metric in the predictor space, and lends itself to graphical interpretation.
Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments
 STATISTICA SINICA
, 2002
"... DNA microarrays are a new and promising biotechnology whichallows the monitoring of expression levels in cells for thousands of genes simultaneously. The present paper describes statistical methods for the identification of differentially expressed genes in replicated cDNA microarray experiments. A ..."
Abstract

Cited by 431 (11 self)
 Add to MetaCart
DNA microarrays are a new and promising biotechnology whichallows the monitoring of expression levels in cells for thousands of genes simultaneously. The present paper describes statistical methods for the identification of differentially expressed genes in replicated cDNA microarray experiments. Although it is not the main focus of the paper, new methods for the important preprocessing steps of image analysis and normalization are proposed. Given suitably normalized data, the biological question of differential expression is restated as a problem in multiple hypothesis testing: the simultaneous test for each gene of the null hypothesis of no association between the expression levels and responses or covariates of interest. Di erentially expressed genes are identified based on adjusted pvalues for a multiple testing procedure which strongly controls the familywise Type I error rate and takes into account the dependence structure between the gene expression levels. No specific parametric form is assumed for the distribution of the test statistics and a permutation procedure is used to estimate adjusted pvalues. Several data displays are suggested for the visual identification of differentially expressed genes and of important features of these genes. The above methods are applied to microarray data from a study of gene expression in the livers of mice with very low HDL cholesterol levels. The genes identified using data from multiple slides are compared to those identified by recently published singleslide methods.
Flexible smoothing with Bsplines and penalties
 STATISTICAL SCIENCE
, 1996
"... Bsplines are attractive for nonparametric modelling, but choosing the optimal number and positions of knots is a complex task. Equidistant knots can be used, but their small and discrete number allows only limited control over smoothness and fit. We propose to use a relatively large number of knots ..."
Abstract

Cited by 395 (6 self)
 Add to MetaCart
Bsplines are attractive for nonparametric modelling, but choosing the optimal number and positions of knots is a complex task. Equidistant knots can be used, but their small and discrete number allows only limited control over smoothness and fit. We propose to use a relatively large number of knots and a difference penalty on coefficients of adjacent Bsplines. We show connections to the familiar spline penalty on the integral of the squared second derivative. A short overview of Bsplines, their construction, and penalized likelihood is presented. We discuss properties of penalized Bsplines and propose various criteria for the choice of an optimal penalty parameter. Nonparametric logistic regression, density estimation and scatterplot smoothing are used as examples. Some details of the computations are presented.
Principal Curves
, 1989
"... Principal curves are smooth onedimensional curves that pass through the middle of a pdimensional data set, providing a nonlinear summary of the data. They are nonparametric, and their shape is suggested by the data. The algorithm for constructing principal curve starts with some prior summary, suc ..."
Abstract

Cited by 387 (1 self)
 Add to MetaCart
Principal curves are smooth onedimensional curves that pass through the middle of a pdimensional data set, providing a nonlinear summary of the data. They are nonparametric, and their shape is suggested by the data. The algorithm for constructing principal curve starts with some prior summary, such as the usual principalcomponent line. The curve in each successive iteration is a smooth or local average of the pdimensional points, where the definition of local is based on the distance in arc length of the projections of the points onto the curve found in the previous iteration. In this article principal curves are defined, an algorithm for their construction is given, some theoretical results are presented, and the procedure is compared to other generalizations of principal components. Two applications illustrate the use of principal curves. The first describes how the principalcurve procedure was used to align the magnets of the Stanford linear collider. The collider uses about 950 magnets in a roughly circular arrangement to bend electron and positron beams and bring them to collision. After construction, it was found that some of the magnets had ended up significantly out of place. As a result, the beams had to be bent too sharply and could not be focused. The engineers realized that the magnets did not have to be moved to their originally planned locations, but rather to a sufficiently smooth arc through the middle of the existing positions. This arc was found using the principal curve procedure. In the second application, two different assays for gold content in several samples of computerchip waste appear to show some systematic differences that are blurred by measurement error. The classical approach using linear errors in variables regression can detect systematic linear differences but is not able to account for nonlinearities. When the first linear principal component is replaced with a principal curve, a local "bump" is revealed, and bootstrapping is used to verify its presence.
Markov chain monte carlo convergence diagnostics
 JASA
, 1996
"... A critical issue for users of Markov Chain Monte Carlo (MCMC) methods in applications is how to determine when it is safe to stop sampling and use the samples to estimate characteristics of the distribution of interest. Research into methods of computing theoretical convergence bounds holds promise ..."
Abstract

Cited by 365 (6 self)
 Add to MetaCart
(Show Context)
A critical issue for users of Markov Chain Monte Carlo (MCMC) methods in applications is how to determine when it is safe to stop sampling and use the samples to estimate characteristics of the distribution of interest. Research into methods of computing theoretical convergence bounds holds promise for the future but currently has yielded relatively little that is of practical use in applied work. Consequently, most MCMC users address the convergence problem by applying diagnostic tools to the output produced by running their samplers. After giving a brief overview of the area, we provide an expository review of thirteen convergence diagnostics, describing the theoretical basis and practical implementation of each. We then compare their performance in two simple models and conclude that all the methods can fail to detect the sorts of convergence failure they were designed to identify. We thus recommend a combination of strategies aimed at evaluating and accelerating MCMC sampler convergence, including applying diagnostic procedures to a small number of parallel chains, monitoring autocorrelations and crosscorrelations, and modifying parameterizations or sampling algorithms appropriately. We emphasize, however, that it is not possible to say with certainty that a finite sample from an MCMC algorithm is representative of an underlying stationary distribution. 1
Constructive Incremental Learning from Only Local Information
, 1998
"... ... This article illustrates the potential learning capabilities of purely local learning and offers an interesting and powerful approach to learning with receptive fields. ..."
Abstract

Cited by 205 (39 self)
 Add to MetaCart
(Show Context)
... This article illustrates the potential learning capabilities of purely local learning and offers an interesting and powerful approach to learning with receptive fields.