Results 1  10
of
132
Adapting to unknown sparsity by controlling the false discovery rate
, 2000
"... We attempt to recover a highdimensional vector observed in white noise, where the vector is known to be sparse, but the degree of sparsity is unknown. We consider three different ways of defining sparsity of a vector: using the fraction of nonzero terms; imposing powerlaw decay bounds on the order ..."
Abstract

Cited by 183 (23 self)
 Add to MetaCart
We attempt to recover a highdimensional vector observed in white noise, where the vector is known to be sparse, but the degree of sparsity is unknown. We consider three different ways of defining sparsity of a vector: using the fraction of nonzero terms; imposing powerlaw decay bounds on the ordered entries; and controlling the ℓp norm for p small. We obtain a procedure which is asymptotically minimax for ℓr loss, simultaneously throughout a range of such sparsity classes. The optimal procedure is a dataadaptive thresholding scheme, driven by control of the False Discovery Rate (FDR). FDR control is a recent innovation in simultaneous testing, in which one seeks to ensure that at most a certain fraction of the rejected null hypotheses will correspond to false rejections. In our treatment, the FDR control parameter q also plays a controlling role in asymptotic minimaxity. Our results say that letting q = qn → 0 with problem size n is sufficient for asymptotic minimaxity, while keeping fixed q>1/2prevents asymptotic minimaxity. To our knowledge, this relation between ideas in simultaneous inference and asymptotic decision theory is new. Our work provides a new perspective on a class of model selection rules which has been introduced recently by several authors. These new rules impose complexity penalization of the form 2·log ( potential model size / actual model size). We exhibit a close connection with FDRcontrolling procedures having q tending to 0; this connection strongly supports a conjecture of simultaneous asymptotic minimaxity for such model selection rules.
Generalizations of the familywise error rate
 Ann. Statist
, 2005
"... Consider the problem of simultaneously testing null hypotheses H1,...,Hs. The usual approach to dealing with the multiplicity problem is to restrict attention to procedures that control the familywise error rate (FWER), the probability of even one false rejection. In many applications, particularly ..."
Abstract

Cited by 75 (8 self)
 Add to MetaCart
(Show Context)
Consider the problem of simultaneously testing null hypotheses H1,...,Hs. The usual approach to dealing with the multiplicity problem is to restrict attention to procedures that control the familywise error rate (FWER), the probability of even one false rejection. In many applications, particularly if s is large, one might be willing to tolerate more than one false rejection provided the number of such cases is controlled, thereby increasing the ability of the procedure to detect false null hypotheses. This suggests replacing control of the FWER by controlling the probability of k or more false rejections, which we call the kFWER. We derive both singlestep and stepdown procedures that control the kFWER, without making any assumptions concerning the dependence structure of the pvalues of the individual tests. In particular, we derive a stepdown procedure that is quite simple to apply, and prove that it cannot be improved without violation of control of the kFWER. We also consider the false discovery proportion (FDP) defined by the number of false rejections divided by the total number of rejections (defined to be 0 if there are no rejections). The false discovery rate proposed by Benjamini
Statistical challenges with high dimensionality: feature selection in knowledge discovery
, 2006
"... ..."
(Show Context)
False Discoveries in Mutual Fund Performance: Measuring Luck in Estimated Alphas
 Journal of Finance
, 2010
"... and SGF 2006 for their helpful comments. The first and second authors acknowledge ..."
Abstract

Cited by 65 (6 self)
 Add to MetaCart
and SGF 2006 for their helpful comments. The first and second authors acknowledge
Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses
 ANN. STAT
, 2006
"... We consider the problem of estimating the number of false null hypotheses among a very large number of independently tested hypotheses, focusing on the situation in which the proportion of false null hypotheses is very small. We propose a family of methods for establishing lower 100(1 − α) % confide ..."
Abstract

Cited by 60 (4 self)
 Add to MetaCart
We consider the problem of estimating the number of false null hypotheses among a very large number of independently tested hypotheses, focusing on the situation in which the proportion of false null hypotheses is very small. We propose a family of methods for establishing lower 100(1 − α) % confidence bounds for this proportion, based on the empirical distribution of the pvalues of the tests. Methods in this family are then compared in terms of ability to consistently estimate the proportion by letting α → 0 as the number of hypothesis tests increases and the proportion decreases. This work is motivated by a signal detection problem that occurs in astronomy.
Size, power and false discovery rates
, 2007
"... Modern scientific technology has provided a new class of largescale simultaneous inference problems, with thousands of hypothesis tests to consider at the same time. Microarrays epitomize this type of technology, but similar situations arise in proteomics, spectroscopy, imaging, and social science ..."
Abstract

Cited by 53 (4 self)
 Add to MetaCart
Modern scientific technology has provided a new class of largescale simultaneous inference problems, with thousands of hypothesis tests to consider at the same time. Microarrays epitomize this type of technology, but similar situations arise in proteomics, spectroscopy, imaging, and social science surveys. This paper uses false discovery rate methods to carry out both size and power calculations on largescale problems. A simple empirical Bayes approach allows the fdr analysis to proceed with a minimum of frequentist or Bayesian modeling assumptions. Closedform accuracy formulas are derived for estimated false discovery rates, and used to compare different methodologies: local or tailarea fdr’s, theoretical, permutation, or empirical null hypothesis estimates. Two microarray data sets as well as simulations are used to evaluate the methodology the power diagnostics showing why nonnull cases might easily fail to appear on a list of “significant ” discoveries. Short Title “Size, Power, and Fdr’s”
False discovery control with pvalue weighting
, 2006
"... We present a method for multiple hypothesis testing that maintains control of the false discovery rate while incorporating prior information about the hypotheses. The prior information takes the form of pvalue weights. If the assignment of weights is positively associated with the null hypotheses b ..."
Abstract

Cited by 45 (2 self)
 Add to MetaCart
(Show Context)
We present a method for multiple hypothesis testing that maintains control of the false discovery rate while incorporating prior information about the hypotheses. The prior information takes the form of pvalue weights. If the assignment of weights is positively associated with the null hypotheses being false, the procedure improves power, except in cases where power is already near one. Even if the assignment of weights is poor, power is only reduced slightly, as long as the weights are not too large. We also provide a similar method for controlling false discovery exceedance.
Estimating the null and the proportion of nonnull effects in largescale multiple comparisons
 J. Amer. Statist. Assoc
, 2007
"... An important issue raised by Efron [7] in the context of largescale multiple comparisons is that in many applications the usual assumption that the null distribution is known is incorrect, and seemingly negligible differences in the null may result in large differences in subsequent studies. This s ..."
Abstract

Cited by 39 (6 self)
 Add to MetaCart
(Show Context)
An important issue raised by Efron [7] in the context of largescale multiple comparisons is that in many applications the usual assumption that the null distribution is known is incorrect, and seemingly negligible differences in the null may result in large differences in subsequent studies. This suggests that a careful study of estimation of the null is indispensable. In this paper, we consider the problem of estimating a null normal distribution, and a closely related problem, estimation of the proportion of nonnull effects. We develop an approach based on the empirical characteristic function and Fourier analysis. The estimators are shown to be uniformly consistent over a wide class of parameters. Numerical performance of the estimators is investigated using both simulated and real data. In particular, we apply our
Control of generalized error rates in multiple testing
 IEW  WORKING PAPERS IEWWP245, INSTITUTE FOR EMPIRICAL RESEARCH IN ECONOMICS  IEW (2005) AVAILABLE AT HTTP://IDEAS.REPEC.ORG/P/ZUR/IEWWPX/245.HTML
, 2005
"... Consider the problem of testing s hypotheses simultaneously. The usual approach to dealing with the multiplicity problem is to restrict attention to procedures that control the probability of even one false rejection, the familiar familywise error rate (FWER). In many applications, particularly if s ..."
Abstract

Cited by 37 (7 self)
 Add to MetaCart
Consider the problem of testing s hypotheses simultaneously. The usual approach to dealing with the multiplicity problem is to restrict attention to procedures that control the probability of even one false rejection, the familiar familywise error rate (FWER). In many applications, particularly if s is large, one might be willing to tolerate more than one false rejection if the number of such cases is controlled, thereby increasing the ability of the procedure to reject false null hypotheses One possibility is to replace control of the FWER by control of the probability of k or more false rejections, which is called the kFWER. We derive both singlestep and stepdown procedures that control the kFWER in finite samples or asymptotically, depending on the situation. Lehmann and Romano (2005a) derive some exact methods for this purpose, which apply whenever pvalues are available for individual tests; no assumptions are made on the joint dependence of the pvalues. In contrast, we construct methods that implicitly take into account the dependence structure of the individual test statistics in order to further increase the ability to detect false null hypotheses. We also consider the false discovery proportion (FDP) defined as the number of false rejections divided by the total number of rejections (and defined to be 0 if there are no rejections). The false discovery rate proposed by Benjamini and Hochberg (1995) controls E(FDP). Here, the goal is to construct methods which satisfy, for a given γ and α, P {FDP> γ} ≤ α, at least asymptotically.
Variance of the number of false discoveries
 JRSSB
, 2005
"... Summary. In highthroughput genomic work, a very large number d of hypotheses are tested based on n ≪ d data samples. The large number of tests necessitates an adjustment for false discoveries in which a true null hypothesis was rejected. The expected number of false discoveries is easy to obtain. D ..."
Abstract

Cited by 36 (1 self)
 Add to MetaCart
Summary. In highthroughput genomic work, a very large number d of hypotheses are tested based on n ≪ d data samples. The large number of tests necessitates an adjustment for false discoveries in which a true null hypothesis was rejected. The expected number of false discoveries is easy to obtain. Dependencies among the hypothesis tests greatly affect the variance of the number of false discoveries. Assuming that the tests are independent gives an inadequate variance formula. This paper presents a variance formula that takes account of the correlations among test statistics. That formula involves O(d 2) correlations, and so a naive implementation has cost O(nd 2). A method based on sampling pairs of tests, allows the variance to be approximated at a cost independent of d. Keywords: False discovery rate; Microarrays; Multiple comparisons; SNPs; Stepdown 1.