Results 1  10
of
45
Oracle and adaptive compound decision rules for false discovery rate control
 J. Am. Statist. Ass
, 2007
"... We develop a compound decision theory framework for multipletesting problems and derive an oracle rule based on the z values that minimizes the false nondiscovery rate (FNR) subject to a constraint on the false discovery rate (FDR). We show that many commonly used multipletesting procedures, which ..."
Abstract

Cited by 48 (7 self)
 Add to MetaCart
We develop a compound decision theory framework for multipletesting problems and derive an oracle rule based on the z values that minimizes the false nondiscovery rate (FNR) subject to a constraint on the false discovery rate (FDR). We show that many commonly used multipletesting procedures, which are p value–based, are inefficient, and propose an adaptive procedure based on the z values. The z value–based adaptive procedure asymptotically attains the performance of the z value oracle procedure and is more efficient than the conventional p value–based methods. We investigate the numerical performance of the adaptive procedure using both simulated and real data. In particular, we demonstrate our method in an analysis of the microarray data from a human immunodeficiency virus study that involves testing a large number of hypotheses simultaneously.
Learning a Prior on Regulatory Potential from eQTL Data
"... Genomewide RNA expression data provide a detailed view of an organism’s biological state; hence, a dataset measuring expression variation between genetically diverse individuals (eQTL data) may provide important insights into the genetics of complex traits. However, with data from a relatively smal ..."
Abstract

Cited by 28 (6 self)
 Add to MetaCart
(Show Context)
Genomewide RNA expression data provide a detailed view of an organism’s biological state; hence, a dataset measuring expression variation between genetically diverse individuals (eQTL data) may provide important insights into the genetics of complex traits. However, with data from a relatively small number of individuals, it is difficult to distinguish true causal polymorphisms from the large number of possibilities. The problem is particularly challenging in populations with significant linkage disequilibrium, where traits are often linked to large chromosomal regions containing many genes. Here, we present a novel method, Lirnet, that automatically learns a regulatory potential for each sequence polymorphism, estimating how likely it is to have a significant effect on gene expression. This regulatory potential is defined in terms of ‘‘regulatory features’’—including the function of the gene and the conservation, type, and position of genetic polymorphisms—that are available for any organism. The extent to which the different features influence the regulatory potential is learned automatically, making Lirnet readily applicable to different datasets, organisms, and feature sets. We apply Lirnet both to the human HapMap eQTL dataset and to a yeast eQTL dataset and provide statistical and biological results demonstrating that Lirnet produces significantly better regulatory programs than other recent approaches. We demonstrate in the yeast data that Lirnet can correctly suggest a specific causal sequence variation within a large, linked chromosomal region. In one example, Lirnet uncovered a novel, experimentally validated connection between Puf3—a
Simultaneous Inference: When Should Hypothesis Testing Problems Be Combined?
"... Modern statisticians are often presented with hundreds or thousands of hypothesis testing problems to evaluate at the same time, generated from new scientific technologies such as microarrays, medical and satellite imaging devices, or flow cytometry counters. The relevant statistical literature ten ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
Modern statisticians are often presented with hundreds or thousands of hypothesis testing problems to evaluate at the same time, generated from new scientific technologies such as microarrays, medical and satellite imaging devices, or flow cytometry counters. The relevant statistical literature tends to begin with the tacit assumption that a single combined analysis, for instance a False Discovery Rate assessment, should be applied to the entire set of problems at hand. This can be a dangerous assumption, as the examples in the paper show, leading to overly conservative or overly liberal conclusions within any particular subclass of the cases. A simple Bayesian theory yields a succinct description of the effects of separation or combination on false discovery rate analyses. The theory allows efficient testing within small subclasses, and has applications to “enrichment”, the detection of multicase effects. Key Words: false discovery rates, Twoclass model, enrichment 1. Introduction Modern scientific devices such as microarrays routinely provide the statistician with thousands of hypothesis testing problems to consider at the same time. A
Two simple sufficient conditions for FDR control
 Electron. J. Stat
, 2008
"... Abstract: We show that the control of the false discovery rate (FDR) for a multiple testing procedure is implied by two coupled simple sufficient conditions. The first one, which we call “selfconsistency condition”, concerns the algorithm itself, and the second, called “dependency control condition ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
(Show Context)
Abstract: We show that the control of the false discovery rate (FDR) for a multiple testing procedure is implied by two coupled simple sufficient conditions. The first one, which we call “selfconsistency condition”, concerns the algorithm itself, and the second, called “dependency control condition” is related to the dependency assumptions on the pvalue family. Many standard multiple testing procedures are selfconsistent (e.g. stepup, stepdown or stepupdown procedures), and we prove that the dependency control condition can be fulfilled when choosing correspondingly appropriate rejection functions, in three classical types of dependency: independence, positive dependency (PRDS) and unspecified dependency. As a consequence, we recover earlier results through simple and unifying proofs while extending their scope to several regards: weighted FDR, pvalue reweighting, new family of stepup procedures under unspecified pvalue dependency and adaptive stepup procedures. We give additional examples of other possible applications. This framework also allows for defining and studying FDR control for multiple testing procedures over a continuous, uncountable space of hypotheses.
False Discovery Rate Control with Groups
"... In the context of largescale multiple hypothesis testing, the hypotheses often possess certain group structures based on additional information such as Gene Ontology in gene expression data and phenotypes in genomewide association studies. It is hence desirable to incorporate such information when ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
In the context of largescale multiple hypothesis testing, the hypotheses often possess certain group structures based on additional information such as Gene Ontology in gene expression data and phenotypes in genomewide association studies. It is hence desirable to incorporate such information when dealing with multiplicity problems to increase statistical power. In this article, we demonstrate the benefit of considering group structure by presenting a pvalue weighting procedure which utilizes the relative importance of each group while controlling the false discovery rate under weak conditions. The procedure is easy to implement and shown to be more powerful than the classical BenjaminiHochberg procedure in both theoretical and simulation studies. By estimating the proportion of true null hypotheses, the datadriven procedure controls the false discovery rate asymptotically. Our analysis on one breast cancer data set confirms that the procedure performs favorably
Using functional annotation for the empirical determination of Bayes factors for genomewide association study analysis
 PLoS One
, 2011
"... A genome wide association study (GWAS) typically results in a few highly significant ‘hits ’ and a much larger set of suggestive signals (‘nearhits’). The latter group are expected to be a mixture of true and false associations. One promising strategy to help separate these is to use functional ann ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
A genome wide association study (GWAS) typically results in a few highly significant ‘hits ’ and a much larger set of suggestive signals (‘nearhits’). The latter group are expected to be a mixture of true and false associations. One promising strategy to help separate these is to use functional annotations for prioritisation of variants for followup. A key task is to determine which annotations might prove most valuable. We address this question by examining the functional annotations of previously published GWAS hits. We explore three annotation categories: nonsynonymous SNPs (nsSNPs), promoter SNPs and cis expression quantitative trait loci (eQTLs) in open chromatin regions. We demonstrate that GWAS hit SNPs are enriched for these three functional categories, and that it would be appropriate to provide a higher weighting for such SNPs when performing Bayesian association analyses. For GWAS studies, our analyses suggest the use of a Bayes
Weighted hypothesis testing
, 2006
"... The power of multiple testing procedures can be increased by using weighted pvalues (Genovese, Roeder and Wasserman 2005). We derive the optimal weights and we show that the power is remarkably robust to misspecification of these weights. We consider two methods for choosing weights in practice. Th ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
The power of multiple testing procedures can be increased by using weighted pvalues (Genovese, Roeder and Wasserman 2005). We derive the optimal weights and we show that the power is remarkably robust to misspecification of these weights. We consider two methods for choosing weights in practice. The first, external weighting, is based on prior information. The second, estimated weighting, uses the data to choose weights. 1
Alphainvesting: A procedure for sequential control of expected
, 2007
"... false discoveries ..."
(Show Context)
Graphicalmodel Based Multiple Testing under Dependence, with Applications to Genomewide Association Studies
"... Largescale multiple testing tasks often exhibit dependence, and leveraging the dependence between individual tests is still one challenging and important problem in statistics. With recent advances in graphical models, it is feasible to use them to perform multiple testing under dependence. We prop ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Largescale multiple testing tasks often exhibit dependence, and leveraging the dependence between individual tests is still one challenging and important problem in statistics. With recent advances in graphical models, it is feasible to use them to perform multiple testing under dependence. We propose a multiple testing procedure which is based on a Markovrandomfieldcoupled mixture model. The ground truth of hypotheses is represented by a latent binary Markov random field, and the observed test statistics appear as the coupled mixture variables. The parameters in our model can be automatically learned by a novel EM algorithm. We use an MCMC algorithm to infer the posterior probability that each hypothesis is null (termed local index of significance), and the false discovery rate can be controlled accordingly. Simulations show that the numerical performance of multiple testing can be improved substantially by using our procedure. We apply the procedure to a realworld genomewide association study on breast cancer, and we identify several SNPs with strong association evidence. 1
Selfconsistent multiple testing procedures
, 2008
"... We study the control of the false discovery rate (FDR) for a general class of multiple testing procedures. We introduce a general condition, called “selfconsistency”, on the set of hypotheses rejected by the procedure, which we show is sufficient to ensure the control of the corresponding false dis ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
We study the control of the false discovery rate (FDR) for a general class of multiple testing procedures. We introduce a general condition, called “selfconsistency”, on the set of hypotheses rejected by the procedure, which we show is sufficient to ensure the control of the corresponding false discovery rate under various conditions on the distribution of the pvalues. Maximizing the size of the rejected null hypotheses set under the constraint of selfconsistency, we recover various stepup procedures. As a consequence, we recover earlier results through simple and unifying proofs while extending their scope to several regards: arbitrary measure of set size, pvalue reweighting, new family of stepup procedures under unspecified pvalue dependency. Our framework also allows for defining and studying FDR control for multiple testing procedures over a continuous, uncountable space of hypotheses. 1