Results 1  10
of
48
Largescale multiple testing under dependence
 J ROY STAT SOC B
, 2009
"... Summary. The paper considers the problem of multiple testing under dependence in a compound decision theoretic framework. The observed data are assumed to be generated from an underlying twostate hidden Markov model.We propose oracle and asymptotically optimal datadriven procedures that aim to mini ..."
Abstract

Cited by 25 (2 self)
 Add to MetaCart
Summary. The paper considers the problem of multiple testing under dependence in a compound decision theoretic framework. The observed data are assumed to be generated from an underlying twostate hidden Markov model.We propose oracle and asymptotically optimal datadriven procedures that aim to minimize the false nondiscovery rate FNR subject to a constraint on the false discovery rate FDR. It is shown that the performance of a multipletesting procedure can be substantially improved by adaptively exploiting the dependence structure among hypotheses, and hence conventional FDR procedures that ignore this structural information are inefficient. Both theoretical properties and numerical performances of the procedures proposed are investigated. It is shown that the procedures proposed control FDR at the desired level, enjoy certain optimality properties and are especially powerful in identifying clustered nonnull cases. The new procedure is applied to an influenzalike illness surveillance study for detecting the timing of epidemic periods.
Proportion of nonzero normal means: Universal oracle equivalences and uniformly consistent estimators
, 2007
"... OnlineOpen: This article is available free online at www.blackwellsynergy.com Summary. Since James and Stein's seminal work, the problem of estimating n normal means has received plenty of enthusiasm in the statistics community. Recently, driven by the fast expansion of the field of largesca ..."
Abstract

Cited by 17 (8 self)
 Add to MetaCart
(Show Context)
OnlineOpen: This article is available free online at www.blackwellsynergy.com Summary. Since James and Stein's seminal work, the problem of estimating n normal means has received plenty of enthusiasm in the statistics community. Recently, driven by the fast expansion of the field of largescale multiple testing, there has been a resurgence of research interest in the n normal means problem. The new interest, however, is more or less concentrated on testing n normal means: to determine simultaneously which means are 0 and which are not. In this setting, the proportion of the nonzero means plays a key role. Motivated by examples in genomics and astronomy, we are particularly interested in estimating the proportion of nonzero means, i.e. given n independent normal random variables with individual means X j N.μ j , 1/, j D 1, ..., n, to estimate the proportion " n D .1=n/ #{j : μ j = D0}. We propose a general approach to construct the universal oracle equivalence of the proportion. The construction is based on the underlying characteristic function. The oracle equivalence reduces the problem of estimating the proportion to the problem of estimating the oracle, which is relatively easier to handle. In fact, the oracle equivalence naturally yields a family of estimators for the proportion, which are consistent under mild conditions, uniformly across a wide class of parameters. The approach compares favourably with recent works by Meinshausen and Rice, and Genovese and Wasserman. In particular, the consistency is proved for an unprecedentedly broad class of situations; the class is almost the largest that can be hoped for without further constraints on the model. We also discuss various extensions of the approach, report results on simulation experiments and make connections between the approach and several recent procedures in largescale multiple testing, including the false discovery rate approach and the local false discovery rate approach.
False discovery rate control with multivariate pvalues
 Electronic J. Statist
, 2008
"... In multiple testing, hypotheses can often be assessed by multivariate test statistics. We study how to use such statistics to control false discovery rate (FDR) with reasonable power and capacity to control the socalled positive FDR (pFDR). We show that this can be done by using nested regions of m ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
(Show Context)
In multiple testing, hypotheses can often be assessed by multivariate test statistics. We study how to use such statistics to control false discovery rate (FDR) with reasonable power and capacity to control the socalled positive FDR (pFDR). We show that this can be done by using nested regions of multivariate pvalues derived from the test statistics. If the distributions of the test statistics are known, then, by choosing the regions appropriately, the FDR can be controlled while the power is maximized. Our focus, however, is how to select nested regions when the distributions of the test statistics are only partially known. Under certain conditions, the procedure based on our proposed nested regions has asymptotically maximum power as its pFDR control level approaches the attainable limit. The dependency of the procedure’s performance on its parameters is studied. The procedure is also compared with those based on nested regions that are easier to construct as well as those based on more straightforward combinations of the test statistics. Both analysis and simulation give favorable results for the former.
Semiparametric Differential Expression Analysis via Partial Mixture Estimation
, 2007
"... We develop an approach for microarray differential expression analysis, i.e. identifying genes whose expression levels differ between two or more groups. Current approaches to inference rely either on full parametric assumptions or on permutationbased techniques for sampling under the null distribu ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
We develop an approach for microarray differential expression analysis, i.e. identifying genes whose expression levels differ between two or more groups. Current approaches to inference rely either on full parametric assumptions or on permutationbased techniques for sampling under the null distribution. In some situations, however, a full parametric model cannot be justified, or the sample size per group is too small for permutation methods to be valid. We propose a semiparametric framework based on partial mixture estimation which only requires a parametric assumption for the null (equally expressed) distribution and can handle small sample sizes where permutation methods break down. We develop two novel improvements of Scott’s minimum integrated square error criterion for partial mixture estimation [Scott, 2004a,b]. As a side benefit, we obtain interpretable and closedform estimates for the proportion of EE genes. PseudoBayesian and frequentist procedures for controlling the false discovery rate are given. Results from simulations and real datasets indicate that our approach can provide substantial 1 1
Weighted hypothesis testing
, 2006
"... The power of multiple testing procedures can be increased by using weighted pvalues (Genovese, Roeder and Wasserman 2005). We derive the optimal weights and we show that the power is remarkably robust to misspecification of these weights. We consider two methods for choosing weights in practice. Th ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
The power of multiple testing procedures can be increased by using weighted pvalues (Genovese, Roeder and Wasserman 2005). We derive the optimal weights and we show that the power is remarkably robust to misspecification of these weights. We consider two methods for choosing weights in practice. The first, external weighting, is based on prior information. The second, estimated weighting, uses the data to choose weights. 1
The joint null criterion for multiple hypothesis tests
 Stat. Appl. Genet. Mol. Biol
, 2011
"... Abstract Simultaneously performing many hypothesis tests is a problem commonly encountered in highdimensional biology. In this setting, a large set of pvalues is calculated from many related features measured simultaneously. Classical statistics provides a criterion for defining what a "corr ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
(Show Context)
Abstract Simultaneously performing many hypothesis tests is a problem commonly encountered in highdimensional biology. In this setting, a large set of pvalues is calculated from many related features measured simultaneously. Classical statistics provides a criterion for defining what a "correct" pvalue is when performing a single hypothesis test. We show here that even when each pvalue is marginally correct under this single hypothesis criterion, it may be the case that the joint behavior of the entire set of pvalues is problematic. On the other hand, there are cases where each pvalue is marginally incorrect, yet the joint distribution of the set of pvalues is satisfactory. Here, we propose a criterion defining a well behaved set of simultaneously calculated pvalues that provides precise control of common error rates and we introduce diagnostic procedures for assessing whether the criterion is satisfied with simulations. Multiple testing pvalues that satisfy our new criterion avoid potentially large study specific errors, but also satisfy the usual assumptions for strong control of false discovery rates and familywise error rates. We utilize the new criterion and proposed diagnostics to investigate two common issues in highdimensional multiple testing for genomics: dependent multiple hypothesis tests and pooled versus testspecific null distributions.
Robust bayesian fdr control with bayes factors. arXiv preprint arXiv:1311.3981
, 2013
"... ar ..."
(Show Context)
Multiple testing in disease mapping and descriptive epidemiology
"... Abstract. The problem of multiple testing is rarely addressed in disease mapping or descriptive epidemiology. This issue is relevant when a large number of small areas or diseases are analysed. Control of the family wise error rate (FWER), for example via the Bonferroni correction, is avoided becaus ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract. The problem of multiple testing is rarely addressed in disease mapping or descriptive epidemiology. This issue is relevant when a large number of small areas or diseases are analysed. Control of the family wise error rate (FWER), for example via the Bonferroni correction, is avoided because it leads to loss of statistical power. To overcome such difficulties, control of the false discovery rate (FDR), the expected proportion of false rejections among all rejected hypotheses, was proposed in the context of clinical trials and genomic data analysis. FDR has a Bayesian interpretation and it is the basis of the so called qvalue, the Bayesian counterpart of the pvalue. In the present work, we address the multiplicity problem in disease mapping and show the performance of the FDR approach with two real examples and a small simulation study. The examples consider testing multiple diseases for a given area or multiple areas for a given disease. Using unadjusted pvalues for multiple testing, an inappropriately large number of areas or diseases at altered risk are identified, whilst FDR procedures are appropriate and more powerful than the control of the FWER with the Bonferroni correction. We conclude that the FDR approach is adequate to screen for high/low risk areas or for disease excess/deficit and useful as a complementary procedure to point estimates and confidence intervals.