Results 1  10
of
87
Sparse solution of underdetermined linear equations by stagewise orthogonal matching pursuit
, 2006
"... Finding the sparsest solution to underdetermined systems of linear equations y = Φx is NPhard in general. We show here that for systems with ‘typical’/‘random ’ Φ, a good approximation to the sparsest solution is obtained by applying a fixed number of standard operations from linear algebra. Our pr ..."
Abstract

Cited by 274 (22 self)
 Add to MetaCart
(Show Context)
Finding the sparsest solution to underdetermined systems of linear equations y = Φx is NPhard in general. We show here that for systems with ‘typical’/‘random ’ Φ, a good approximation to the sparsest solution is obtained by applying a fixed number of standard operations from linear algebra. Our proposal, Stagewise Orthogonal Matching Pursuit (StOMP), successively transforms the signal into a negligible residual. Starting with initial residual r0 = y, at the sth stage it forms the ‘matched filter ’ Φ T rs−1, identifies all coordinates with amplitudes exceeding a speciallychosen threshold, solves a leastsquares problem using the selected coordinates, and subtracts the leastsquares fit, producing a new residual. After a fixed number of stages (e.g. 10), it stops. In contrast to Orthogonal Matching Pursuit (OMP), many coefficients can enter the model at each stage in StOMP while only one enters per stage in OMP; and StOMP takes a fixed number of stages (e.g. 10), while OMP can take many (e.g. n). StOMP runs much faster than competing proposals for sparse solutions, such as ℓ1 minimization and OMP, and so is attractive for solving largescale problems. We use phase diagrams to compare algorithm performance. The problem of recovering a ksparse vector x0 from (y, Φ) where Φ is random n × N and y = Φx0 is represented by a point (n/N, k/n)
Adapting to unknown sparsity by controlling the false discovery rate
, 2000
"... We attempt to recover a highdimensional vector observed in white noise, where the vector is known to be sparse, but the degree of sparsity is unknown. We consider three different ways of defining sparsity of a vector: using the fraction of nonzero terms; imposing powerlaw decay bounds on the order ..."
Abstract

Cited by 183 (23 self)
 Add to MetaCart
We attempt to recover a highdimensional vector observed in white noise, where the vector is known to be sparse, but the degree of sparsity is unknown. We consider three different ways of defining sparsity of a vector: using the fraction of nonzero terms; imposing powerlaw decay bounds on the ordered entries; and controlling the ℓp norm for p small. We obtain a procedure which is asymptotically minimax for ℓr loss, simultaneously throughout a range of such sparsity classes. The optimal procedure is a dataadaptive thresholding scheme, driven by control of the False Discovery Rate (FDR). FDR control is a recent innovation in simultaneous testing, in which one seeks to ensure that at most a certain fraction of the rejected null hypotheses will correspond to false rejections. In our treatment, the FDR control parameter q also plays a controlling role in asymptotic minimaxity. Our results say that letting q = qn → 0 with problem size n is sufficient for asymptotic minimaxity, while keeping fixed q>1/2prevents asymptotic minimaxity. To our knowledge, this relation between ideas in simultaneous inference and asymptotic decision theory is new. Our work provides a new perspective on a class of model selection rules which has been introduced recently by several authors. These new rules impose complexity penalization of the form 2·log ( potential model size / actual model size). We exhibit a close connection with FDRcontrolling procedures having q tending to 0; this connection strongly supports a conjecture of simultaneous asymptotic minimaxity for such model selection rules.
EMPIRICAL BAYES SELECTION OF WAVELET THRESHOLDS
, 2005
"... This paper explores a class of empirical Bayes methods for leveldependent threshold selection in wavelet shrinkage. The prior considered for each wavelet coefficient is a mixture of an atom of probability at zero and a heavytailed density. The mixing weight, or sparsity parameter, for each level of ..."
Abstract

Cited by 117 (5 self)
 Add to MetaCart
This paper explores a class of empirical Bayes methods for leveldependent threshold selection in wavelet shrinkage. The prior considered for each wavelet coefficient is a mixture of an atom of probability at zero and a heavytailed density. The mixing weight, or sparsity parameter, for each level of the transform is chosen by marginal maximum likelihood. If estimation is carried out using the posterior median, this is a random thresholding procedure; the estimation can also be carried out using other thresholding rules with the same threshold. Details of the calculations needed for implementing the procedure are included. In practice, the estimates are quick to compute and there is software available. Simulations on the standard model functions show excellent performance, and applications to data drawn from various fields of application are used to explore the practical performance of the approach. By using a general result on the risk of the corresponding marginal maximum likelihood approach for a single sequence, overall bounds on
Size, power and false discovery rates
, 2007
"... Modern scientific technology has provided a new class of largescale simultaneous inference problems, with thousands of hypothesis tests to consider at the same time. Microarrays epitomize this type of technology, but similar situations arise in proteomics, spectroscopy, imaging, and social science ..."
Abstract

Cited by 53 (4 self)
 Add to MetaCart
Modern scientific technology has provided a new class of largescale simultaneous inference problems, with thousands of hypothesis tests to consider at the same time. Microarrays epitomize this type of technology, but similar situations arise in proteomics, spectroscopy, imaging, and social science surveys. This paper uses false discovery rate methods to carry out both size and power calculations on largescale problems. A simple empirical Bayes approach allows the fdr analysis to proceed with a minimum of frequentist or Bayesian modeling assumptions. Closedform accuracy formulas are derived for estimated false discovery rates, and used to compare different methodologies: local or tailarea fdr’s, theoretical, permutation, or empirical null hypothesis estimates. Two microarray data sets as well as simulations are used to evaluate the methodology the power diagnostics showing why nonnull cases might easily fail to appear on a list of “significant ” discoveries. Short Title “Size, Power, and Fdr’s”
High dimensional statistical inference and random matrices
 IN: PROCEEDINGS OF INTERNATIONAL CONGRESS OF MATHEMATICIANS
, 2006
"... Multivariate statistical analysis is concerned with observations on several variables which are thought to possess some degree of interdependence. Driven by problems in genetics and the social sciences, it first flowered in the earlier half of the last century. Subsequently, random matrix theory ..."
Abstract

Cited by 49 (1 self)
 Add to MetaCart
Multivariate statistical analysis is concerned with observations on several variables which are thought to possess some degree of interdependence. Driven by problems in genetics and the social sciences, it first flowered in the earlier half of the last century. Subsequently, random matrix theory (RMT) developed, initially within physics, and more recently widely in mathematics. While some of the central objects of study in RMT are identical to those of multivariate statistics, statistical theory was slow to exploit the connection. However, with vast data collection ever more common, data sets now often have as many or more variables than the number of individuals observed. In such contexts, the techniques and results of RMT have much to offer multivariate statistics. The paper reviews some of the progress to date.
Multiscale Poisson data smoothing
 J. Roy. Stat. Soc. B
, 2006
"... This paper introduces a framework for nonlinear, multiscale decompositions of Poisson data with piecewise smooth intensity curves. The key concept is conditioning on the sum of the observations that are involved in the computation of a given coefficient. Within this framework, most classical wavelet ..."
Abstract

Cited by 33 (2 self)
 Add to MetaCart
(Show Context)
This paper introduces a framework for nonlinear, multiscale decompositions of Poisson data with piecewise smooth intensity curves. The key concept is conditioning on the sum of the observations that are involved in the computation of a given coefficient. Within this framework, most classical wavelet thresholding schemes for data with additive, homoscedastic noise apply. Any family of wavelet transforms (orthogonal, biorthogonal, second generation) can be incorporated into this framework. The second contribution is a Bayesian shrinkage with an original prior for coefficients of this decomposition. As such, the method combines the advantages of the Fiszwavelet transform and (Bayesian) Multiscale Likelihood models, with additional benefits, such as the extendibility towards arbitrary wavelet families.
Non parametric empirical Bayes and compound decision approaches to estimation of a high dimensional vector of normal means
, 2007
"... We consider the classical problem of estimating a vector µ = (µ1,...,µn) based on independent observations Yi ∼ N(µi,1), i = 1,...,n. Suppose µi, i = 1,...,n are independent realizations from a completely unknown G. We suggest an easily computed estimator ˆµ, such that the ratio of its risk E(ˆµ − µ ..."
Abstract

Cited by 28 (11 self)
 Add to MetaCart
We consider the classical problem of estimating a vector µ = (µ1,...,µn) based on independent observations Yi ∼ N(µi,1), i = 1,...,n. Suppose µi, i = 1,...,n are independent realizations from a completely unknown G. We suggest an easily computed estimator ˆµ, such that the ratio of its risk E(ˆµ − µ) 2 with that of the Bayes procedure approaches 1. A related compound decision result is also obtained. Our asymptotics is of a triangular array; that is, we allow the distribution G to depend on n. Thus, our theoretical asymptotic results are also meaningful in situations where the vector µ is sparse and the proportion of zero coordinates approaches 1. We demonstrate the performance of our estimator in simulations, emphasizing sparse setups. In “moderatelysparse ” situations, our procedure performs very well compared to known procedures tailored for sparse setups. It also adapts well to nonsparse situations.
Feature selection by higher criticism thresholding: Optimal phase diagram
, 2008
"... We consider twoclass linear classification in a highdimensional, lowsample size setting. Only a small fraction of the features are useful, the useful features are unknown to us, and each useful feature contributes weakly to the classification decision – this setting was called the rare/weak model ..."
Abstract

Cited by 26 (6 self)
 Add to MetaCart
(Show Context)
We consider twoclass linear classification in a highdimensional, lowsample size setting. Only a small fraction of the features are useful, the useful features are unknown to us, and each useful feature contributes weakly to the classification decision – this setting was called the rare/weak model (RW Model) in [11]. We select features by thresholding feature zscores. The threshold is set by higher criticism (HC) [11]. Let πi denote the Pvalue associated to the ith zscore and π(i) denote the ith order statistic of the collection of Pvalues. The HC threshold (HCT) is the order statistic of the zscore corresponding to index i maximizing (i/n − π(i)) / p π(i)(1 − π(i)). The ideal threshold optimizes the classification error. In [11] we showed that HCT was numerically close to the ideal threshold. We formalize an asymptotic framework for studying the RW model, considering a sequence of problems with increasingly many features and relatively fewer observations. We show that along this sequence, the limiting performance of ideal HCT is essentially just as good as the limiting performance of ideal thresholding. Our results describe twodimensional
Streamwise Feature Selection
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... In streamwise feature selection, new features are sequentially considered for addition to a predictive model. When the space of potential features is large, streamwise feature selection offers many advantages over traditional feature selection methods, which assume that all features are known in ..."
Abstract

Cited by 19 (9 self)
 Add to MetaCart
In streamwise feature selection, new features are sequentially considered for addition to a predictive model. When the space of potential features is large, streamwise feature selection offers many advantages over traditional feature selection methods, which assume that all features are known in advance. Features can be generated dynamically, focusing the search for new features on promising subspaces, and overfitting can be controlled by dynamically adjusting the threshold for adding features to the model. In contrast to traditional forward feature selection algorithms such as stepwise regression in which at each step all possible features are evaluated and the best one is selected, streamwise feature selection only evaluates each feature once when it is generated. We describe informationinvesting and #investing, two adaptive complexity penalty methods for streamwise feature selection which dynamically adjust the threshold on the error reduction required for adding a new feature. These two methods give false discovery rate style guarantees against overfitting. They differ
2005b) Ebayesthresh: R programs for Empirical Bayes thresholding
 J. Statist. Softwr
"... Suppose that a sequence of unknown parameters is observed subject to independent Gaussian noise. The EbayesThresh package in the S language implements a class of Empirical Bayes thresholding methods that can take advantage of possible sparsity in the sequence, to improve the quality of estimation. T ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
Suppose that a sequence of unknown parameters is observed subject to independent Gaussian noise. The EbayesThresh package in the S language implements a class of Empirical Bayes thresholding methods that can take advantage of possible sparsity in the sequence, to improve the quality of estimation. The prior for each parameter in the sequence is a mixture of an atom of probability at zero and a heavytailed density. Within the package, this can be either a Laplace (double exponential) density or else a mixture of normal distributions with tail behavior similar to the Cauchy distribution. The mixing weight, or sparsity parameter, is chosen automatically by marginal maximum likelihood. If estimation is carried out using the posterior median, this is a random thresholding procedure; the estimation can also be carried out using other thresholding rules with the same threshold, and the package provides the posterior mean, and hard and soft thresholding, as additional options. This paper reviews the method, and gives details (far beyond those previously published) of the calculations needed for implementing the procedures. It explains and motivates