Results 11  20
of
183
Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences
 Ann. Statist
, 2002
"... An empirical Bayes approach to the estimation of possibly sparse sequences observed in Gaussian white noise is set out and investigated. The prior considered is a mixture of an atom of probability at zero and a heavytailed density, with the mixing weight chosen by marginal maximum likelihood, in ..."
Abstract

Cited by 87 (5 self)
 Add to MetaCart
An empirical Bayes approach to the estimation of possibly sparse sequences observed in Gaussian white noise is set out and investigated. The prior considered is a mixture of an atom of probability at zero and a heavytailed density, with the mixing weight chosen by marginal maximum likelihood, in the hope of adapting between sparse and dense sequences. If estimation is then carried out using the posterior median, this is a random thresholding procedure. Other thresholding rules using the same threshold can also be used. Probability bounds on the threshold chosen by the marginal maximum likelihood approach lead to overall bounds on the risk of the method over the class of signal sequences of length n with normalized ` p norm bounded by , for > 0 and 0 < p 2: Estimation error is measured by mean q loss, for 0 < q 2: For all p and q in (0; 2], the method achieves the optimal estimation rate as n ! 1 and ! 0 at various rates, and in this sense adapts automatically to the sparseness or otherwise of the underlying signal. In addition the risk is uniformly bounded over all signals. If the posterior mean is used as the estimator, the results still hold for q > 1: Simulations show excellent performance. Computationally, the method is tractable and essentially of O(n) complexity, and software is available. The extension to a modi ed thresholding method relevant to the wavelet estimation of derivatives of functions is also considered.
A SELECTIVE OVERVIEW OF VARIABLE SELECTION IN HIGH DIMENSIONAL FEATURE SPACE
, 2010
"... High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded ..."
Abstract

Cited by 70 (6 self)
 Add to MetaCart
High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of nonconcave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultrahigh dimensional variable selection, with emphasis on independence screening and twoscale methods.
Distributed detection in sensor networks with packet losses and finite capacity links
 IEEE Transactions on Signal Processing
, 2006
"... We consider a multiobject detection problem over a sensor network (SNET) with limited range multimodal sensors. Limited range sensing environment arises in a sensing field prone to signal attenuation and path losses. The general problem complements the widely considered decentralized detection pro ..."
Abstract

Cited by 64 (5 self)
 Add to MetaCart
(Show Context)
We consider a multiobject detection problem over a sensor network (SNET) with limited range multimodal sensors. Limited range sensing environment arises in a sensing field prone to signal attenuation and path losses. The general problem complements the widely considered decentralized detection problem where all sensors observe the same object. In this paper we develop a distributed detection approach based on recent development of the false discovery rate (FDR) and the associated BH test procedure. The BH procedure is based on rank ordering of scalar test statistics. We first develop scalar test statistics for multidimensional data to handle multimodal sensor observations and establish its optimality in terms of the BH procedure. We then propose a distributed algorithm in the ideal case of infinite attenuation for identification of sensors that are in the immediate vicinity of an object. We demonstrate communication message scalability to large SNETs by showing that the upper bound on the communication message complexity scales linearly with the number of sensors that are in the vicinity of objects and is independent of the total number of sensors in the SNET. This brings forth an important principle for evaluating the performance of an SNET, namely, the need for scalability of communications and performance with respect to the number of objects or events in an SNET irrespective of the network size. We then account for finite attenuation by modeling sensor observations as corrupted by uncertain interference arising from distant objects and developing robust extensions to our idealized distributed scheme. The robustness properties ensure that both the error performance and communication message complexity degrade gracefully with interference. 1
Exponential screening and optimal rates of sparse estimation. Available at ArXiv:1003.2654v3
, 2010
"... In highdimensional linear regression, the goal pursued here is to estimate an unknown regression function using linear combinations of a suitable set of covariates. One of the key assumptions for the success of any statistical procedure in this setup is to assume that the linear combination is sp ..."
Abstract

Cited by 59 (3 self)
 Add to MetaCart
In highdimensional linear regression, the goal pursued here is to estimate an unknown regression function using linear combinations of a suitable set of covariates. One of the key assumptions for the success of any statistical procedure in this setup is to assume that the linear combination is sparse in some sense, for example, that it involves only few covariates. We consider a general, nonnecessarily linear, regression with Gaussian noise and study a related question, that is, to find a linear combination of approximating functions, which is at the same time sparse and has small mean squared error (MSE). We introduce a new estimation procedure, called Exponential Screening, that shows remarkable adaptation properties. It adapts to the linear combination that optimally balances MSE and sparsity, whether the latter is measured in terms of the number of nonzero entries in the combination (0 norm) or in terms of the global weight of the combination (1 norm). The power of this adaptation result is illustrated by showing that Exponential Screening solves optimally and simultaneously all the problems of aggregation in Gaussian regression that have been discussed in the literature. Moreover, we show that the performance of the Exponential Screening estimator cannot be improved in a minimax sense, even if the optimal sparsity is known in advance. The theoretical and numerical superiority of Exponential Screening compared to stateoftheart sparse procedures is also discussed. 1. Introduction. The
Variable selection in data mining: Building a predictive model for bankruptcy
 Journal of the American Statistical Association
, 2004
"... We predict the onset of personal bankruptcy using least squares regression. Although well publicized, only 2,244 bankruptcies occur in our data set of 2.9 million months of creditcard activity. We use stepwise selection to find predictors from a mix of payment history, debt load, demographics, and ..."
Abstract

Cited by 52 (10 self)
 Add to MetaCart
(Show Context)
We predict the onset of personal bankruptcy using least squares regression. Although well publicized, only 2,244 bankruptcies occur in our data set of 2.9 million months of creditcard activity. We use stepwise selection to find predictors from a mix of payment history, debt load, demographics, and their interactions. This combination of rare responses and over 67,000 possible predictors leads to a challenging modeling question: How does one separate coincidental from useful predictors? We show that three modifications turn stepwise regression into an effective methodology for predicting bankruptcy. Our version of stepwise regression (1) organizes calculations to accommodate interactions, (2) exploits modern decision theoretic criteria to choose predictors, and (3) conservatively estimates pvalues to handle sparse data and a binary response. Omitting any one of these leads to poor performance. A final step in our procedure calibrates regression predictions. With these modifications, stepwise regression predicts bankruptcy as well, if not better, than recently developed datamining tools. When sorted, the largest 14,000 resulting predictions hold 1000 of the 1800 bankruptcies hidden in a validation sample of 2.3 million observations. If the cost of missing a bankruptcy is 200 times that of a false positive, our predictions incur less than 2/3 of the costs of classification errors produced by the treebased classifier C4.5. Key Phrases: AIC, Cp, Bonferroni, calibration, hard thresholding, risk inflation criterion (RIC),
Sparse partial least squares regression for simultaneous dimension reduction and variable selection
 J.R. Statist. Soc.B
"... Summary. Analysis of modern biological data often involves illposed problems due to high dimensionality and multicollinearity. Partial Least Squares (pls) regression has been an alternative to ordinary least squares for handling multicollinearity in several areas of scientific research since 1960s. ..."
Abstract

Cited by 47 (0 self)
 Add to MetaCart
Summary. Analysis of modern biological data often involves illposed problems due to high dimensionality and multicollinearity. Partial Least Squares (pls) regression has been an alternative to ordinary least squares for handling multicollinearity in several areas of scientific research since 1960s. At the core of the pls methodology lies a dimension reduction technique coupled with a regression model. Although pls regression has been shown to achieve good predictive performance, it is not particularly tailored for variable/feature selection and therefore often produces linear combinations of the original predictors that are hard to interpret due to high dimensionality. In this paper, we investigate the known asymptotic properties of the pls estimator and show that its consistency property no longer holds with the very large p and small n paradigm. We, then, propose a sparse partial least squares (spls) formulation which aims to simultaneously achieve good predictive performance and variable selection by producing sparse linear combinations of the original predictors. We provide an efficient implementation of spls regression based on the lars algorithm and benchmark the proposed method by comparisons to well known variable selection and dimension reduction approaches via simulation experiments. An additional advantage of the spls regression is its ability to handle multivariate responses without much additional computational cost. We illustrate this in a joint analysis of gene expression and genomewide binding data. 1.
Distilled sensing: Adaptive sampling for sparse detection and estimation
 IEEE Trans. Inform. Theory
, 2011
"... Adaptive sampling results in dramatic improvements in the recovery of sparse signals in white Gaussian noise. A sequential adaptive samplingandrefinement procedure called distilled sensing (DS) is proposed and analyzed. DS is a form of multistage experimental design and testing. Because of the ad ..."
Abstract

Cited by 46 (10 self)
 Add to MetaCart
(Show Context)
Adaptive sampling results in dramatic improvements in the recovery of sparse signals in white Gaussian noise. A sequential adaptive samplingandrefinement procedure called distilled sensing (DS) is proposed and analyzed. DS is a form of multistage experimental design and testing. Because of the adaptive nature of the data collection, DS can detect and localize far weaker signals than possible from nonadaptive measurements. In particular, reliable detection and localization (support estimation) using nonadaptive samples is possible only if the signal amplitudes grow logarithmically with the problem dimension. Here it is shown that using adaptive sampling, reliable detection is possible provided the amplitude exceeds a constant, and localization is possible when the amplitude exceeds any arbitrarily slowly growing function of the dimension. 1. Introduction. In
Wavelets, Ridgelets, and Curvelets for Poisson Noise Removal
"... Abstract—In order to denoise Poisson count data, we introduce a variance stabilizing transform (VST) applied on a filtered discrete Poisson process, yielding a near Gaussian process with asymptotic constant variance. This new transform, which can be deemed as an extension of the Anscombe transform t ..."
Abstract

Cited by 42 (2 self)
 Add to MetaCart
(Show Context)
Abstract—In order to denoise Poisson count data, we introduce a variance stabilizing transform (VST) applied on a filtered discrete Poisson process, yielding a near Gaussian process with asymptotic constant variance. This new transform, which can be deemed as an extension of the Anscombe transform to filtered data, is simple, fast, and efficient in (very) lowcount situations. We combine this VST with the filter banks of wavelets, ridgelets and curvelets, leading to multiscale VSTs (MSVSTs) and nonlinear decomposition schemes. By doing so, the noisecontaminated coefficients of these MSVSTmodified transforms are asymptotically normally distributed with known variances. A classical hypothesistesting framework is adopted to detect the significant coefficients, and a sparsitydriven iterative scheme reconstructs properly the final estimate. A range of examples show the power of this MSVST approach for recovering important structures of various morphologies in (very) lowcount images. These results also demonstrate that the MSVST approach is competitive relative to many existing denoising methods. Index Terms—Curvelets, filtered Poisson process, multiscale variance stabilizing transform, Poisson intensity estimation, ridgelets, wavelets. I.
Modern statistical estimation via oracle inequalities
, 2006
"... A number of fundamental results in modern statistical theory involve thresholding estimators. This survey paper aims at reconstructing the history of how thresholding rules came to be popular in statistics and describing, in a not overly technical way, the domain of their application. Two notions pl ..."
Abstract

Cited by 41 (0 self)
 Add to MetaCart
A number of fundamental results in modern statistical theory involve thresholding estimators. This survey paper aims at reconstructing the history of how thresholding rules came to be popular in statistics and describing, in a not overly technical way, the domain of their application. Two notions play a fundamental role in our narrative: sparsity and oracle inequalities. Sparsity is a property of the object to estimate, which seems to be characteristic of many modern problems, in statistics as well as applied mathematics and theoretical computer science, to name a few. ‘Oracle inequalities’ are a powerful decisiontheoretic tool which has served to understand the optimality of thresholding rules, but which has many other potential applications, some of which we will discuss. Our story is also the story of the dialogue between statistics and applied harmonic analysis. Starting with the work of Wiener, we will see that certain representations emerge as being optimal for estimation. A leitmotif throughout