Results 1  10
of
25
False discovery control in largescale spatial multiple testing. Manuscript submitted for publication
, 2012
"... Summary. This article develops a unified theoretical and computational framework for false discovery control in multiple testing of spatial signals. We consider both pointwise and clusterwise spatial analyses, and derive oracle procedures which optimally control the false discovery rate, false di ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Summary. This article develops a unified theoretical and computational framework for false discovery control in multiple testing of spatial signals. We consider both pointwise and clusterwise spatial analyses, and derive oracle procedures which optimally control the false discovery rate, false discovery exceedance and false cluster rate, respectively. A datadriven finite approximation strategy is developed to mimic the oracle procedures on a continuous spatial domain. Our multiple testing procedures are asymptotically valid and can be effectively implemented using Bayesian computational algorithms for analysis of large spatial data sets. Numerical results show that the proposed procedures lead to more accurate error control and better power performance than conventional methods. We demonstrate our methods for analyzing the time trends in tropospheric ozone in eastern US.
Graphicalmodel Based Multiple Testing under Dependence, with Applications to Genomewide Association Studies
"... Largescale multiple testing tasks often exhibit dependence, and leveraging the dependence between individual tests is still one challenging and important problem in statistics. With recent advances in graphical models, it is feasible to use them to perform multiple testing under dependence. We prop ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Largescale multiple testing tasks often exhibit dependence, and leveraging the dependence between individual tests is still one challenging and important problem in statistics. With recent advances in graphical models, it is feasible to use them to perform multiple testing under dependence. We propose a multiple testing procedure which is based on a Markovrandomfieldcoupled mixture model. The ground truth of hypotheses is represented by a latent binary Markov random field, and the observed test statistics appear as the coupled mixture variables. The parameters in our model can be automatically learned by a novel EM algorithm. We use an MCMC algorithm to infer the posterior probability that each hypothesis is null (termed local index of significance), and the false discovery rate can be controlled accordingly. Simulations show that the numerical performance of multiple testing can be improved substantially by using our procedure. We apply the procedure to a realworld genomewide association study on breast cancer, and we identify several SNPs with strong association evidence. 1
Optimality of the WestfallYoung permutation procedure for multiple testing under dependence
, 2011
"... Test statistics are often strongly dependent in largescale multiple testing applications. Most corrections for multiplicity are unduly conservative for correlated test statistics, resulting in a loss of power to detect true positives. We show that the WestfallYoung permutation method has asymptoti ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Test statistics are often strongly dependent in largescale multiple testing applications. Most corrections for multiplicity are unduly conservative for correlated test statistics, resulting in a loss of power to detect true positives. We show that the WestfallYoung permutation method has asymptotically optimal power for a broad class of testing problems with a blockdependence and sparsity structure among the tests, when the number of tests tends to infinity.
Variational bayes approach for model aggregation in unsupervised classification with markovian dependency
 Comput. Statis. & Data Analysis
, 2012
"... We consider a binary unsupervised classification problem where each observation is associated with an unobserved label that we want to retrieve. More precisely, we assume that there are two groups of observation: normal and abnormal. The ‘normal’ observations are coming from a known distribution whe ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
We consider a binary unsupervised classification problem where each observation is associated with an unobserved label that we want to retrieve. More precisely, we assume that there are two groups of observation: normal and abnormal. The ‘normal’ observations are coming from a known distribution whereas the distribution of the ‘abnormal ’ observations is unknown. Several models have been developed to fit this unknown distribution. In this paper, we propose an alternative based on a mixture of Gaussian distributions. The inference is done within a variational Bayesian framework and our aim is to infer the posterior probability of belonging to the class of interest. To this end, it makes no sense to estimate the mixture component number since each mixture model provides more or less relevant information to the posterior probability estimation. By computing a weighted average (named aggregated estimator) over the model collection, Bayesian Model Averaging (BMA) is one way of combining models in order to account for information provided by each model. The aim is then the estimation of the weights and the posterior probability for one specific model. In this work, we derive optimal approximations of these quantities from the variational theory and propose other approximations of the weights. To perform our method, we consider that the data are dependent (Markovian dependency) and hence we consider a Hidden Markov Model. A simulation study is carried out to evaluate the accuracy of the estimates in terms of classification. We also present an application to the analysis of public health surveillance systems.
ASYMPTOTIC OPTIMALITY OF THE WESTFALL–YOUNG PERMUTATION PROCEDURE FOR MULTIPLE TESTING UNDER DEPENDENCE1
"... ar ..."
(Show Context)
Statistical Methods for Genomewide Association Studies and Personalized Medicine
, 2014
"... iAbstract In genomewide association studies (GWAS), researchers analyze the genetic variation across the entire human genome, searching for variations that are associated with observable traits or certain diseases. There are several inference challenges in GWAS, including the huge number of genetic ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
iAbstract In genomewide association studies (GWAS), researchers analyze the genetic variation across the entire human genome, searching for variations that are associated with observable traits or certain diseases. There are several inference challenges in GWAS, including the huge number of genetic markers to test, the weak association between truly associated markers and the traits, and the correlation structure between the genetic markers. This thesis mainly develops statistical methods that are suitable for genomewide association studies and their clinical translation for personalized medicine. After we introduce more background and related work in Chapters 1 and 2, we further discuss the problem of high dimensional statistical inference, especially capturing the dependence among multiple hypotheses, which has been underutilized in classical multiple testing procedures. Chapter 3 proposes a feature selection approach based on a unique graphical model which can leverage correlation structure among the markers. This graphical modelbased feature selection approach significantly outperforms the conventional feature selection methods used in GWAS. Chapter 4 reformulates this feature selection approach as a multiple testing procedure that has many elegant
Integrating Prior Knowledge in Multiple Testing Under Dependence with Applications in Detecting Differential DNA Methylation
 Biometrics
, 2012
"... DNA methylation has emerged as an important hallmark of epigenetics. Numerous platforms including tiling arrays and next generation sequencing, and experimental protocols are available for profiling DNA methylation. Similar to other tiling array data, DNA methylation data shares the characteristics ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
DNA methylation has emerged as an important hallmark of epigenetics. Numerous platforms including tiling arrays and next generation sequencing, and experimental protocols are available for profiling DNA methylation. Similar to other tiling array data, DNA methylation data shares the characteristics of inherent correlation structure among nearby probes. However, unlike gene expression or protein DNA binding data, the varying CpG density which gives rise to CpG island, shore and shelf definition provides exogenous information in detecting differential methylation. This paper aims to introduce a robust testing and probe ranking procedure based on a nonhomogeneous hidden Markov model that incorporates the abovementioned features for detecting differential methylation. We revisit the seminal work of Sun and Cai (2009, J. R. Stat. Soc. B. 71, 393424) and propose modeling the nonnull using a nonparametric symmetric distribution in twosided hypothesis testing. We show that this model improves probe ranking and is robust to model misspecification based on extensive simulation studies. We further illustrate that our proposed framework achieves good operating characteristics as compared to commonly used methods in real DNA methylation data that aims to detect differential methylation sites.
C © 2013 Biometrika Trust Printed in Great Britain The optimal power puzzle: scrutiny of the monotone
"... In single hypothesis testing, power is a nondecreasing function of Type I error rate; hence it is desirable to test at the nominal level exactly to achieve optimal power. The optimal power puzzle arises from the fact that for multiple testing under the false discovery rate paradigm, such a monotonic ..."
Abstract
 Add to MetaCart
In single hypothesis testing, power is a nondecreasing function of Type I error rate; hence it is desirable to test at the nominal level exactly to achieve optimal power. The optimal power puzzle arises from the fact that for multiple testing under the false discovery rate paradigm, such a monotonic relationship may not hold. In particular, exact false discovery rate control may lead to a less powerful testing procedure if a test statistic fails to fulfil the monotone likelihood ratio condition. In this article, we identify different scenarios wherein the condition fails and give caveats for conducting multiple testing in practical settings. Some key words: False discovery rate; Heteroscedasticity; Monotone likelihood ratio; Multiple testing dependence. 1.