Results 1  10
of
24
Microarrays, empirical Bayes and the twogroups model
 STATIST. SCI
, 2006
"... The classic frequentist theory of hypothesis testing developed by Neyman, Pearson, and Fisher has a claim to being the Twentieth Century’s most influential piece of applied mathematics. Something new is happening in the TwentyFirst Century: high throughput devices, such as microarrays, routinely re ..."
Abstract

Cited by 73 (10 self)
 Add to MetaCart
The classic frequentist theory of hypothesis testing developed by Neyman, Pearson, and Fisher has a claim to being the Twentieth Century’s most influential piece of applied mathematics. Something new is happening in the TwentyFirst Century: high throughput devices, such as microarrays, routinely require simultaneous hypothesis tests for thousands of individual cases, not at all what the classical theory had in mind. In these situations empirical Bayes information begins to force itself upon frequentists and Bayesians alike. The twogroups model is a simple Bayesian construction that facilitates empirical Bayes analysis. This article concerns the interplay of Bayesian and frequentist ideas in the twogroups setting, with particular attention focussed on Benjamini and Hochberg’s False Discovery Rate method. Topics include the choice and meaning of the null hypothesis in largescale testing situations, power considerations, the limitations of permutation methods, significance testing for groups of cases (such as pathways in microarray studies), correlation effects, multiple confidence intervals, and Bayesian competitors to the twogroups model.
L: Genomewide significance levels and weighted hypothesis testing. Stat Sci 2009
"... Abstract. Genetic investigations often involve the testing of vast numbers of related hypotheses simultaneously. To control the overall error rate, a substantial penalty is required, making it difficult to detect signals of moderate strength. To improve the power in this setting, a number of authors ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
Abstract. Genetic investigations often involve the testing of vast numbers of related hypotheses simultaneously. To control the overall error rate, a substantial penalty is required, making it difficult to detect signals of moderate strength. To improve the power in this setting, a number of authors have considered using weighted pvalues, with the motivation often based upon the scientific plausibility of the hypotheses. We review this literature, derive optimal weights and show that the power is remarkably robust to misspecification of these weights. We consider two methods for choosing weights in practice. The first, external weighting, is based on prior information. The second, estimated weighting, uses the data to choose weights. Key words and phrases: Bonferroni correction, multiple testing, weighted pvalues.
Tweedie’s Formula and Selection Bias
"... We suppose that the statistician observes some large number of estimates zi, each with its own unobserved expectation parameter µi. The largest few of the zi’s are likely to substantially overestimate their corresponding µi’s, this being an example of selection bias, or regression to the mean. Tweed ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
We suppose that the statistician observes some large number of estimates zi, each with its own unobserved expectation parameter µi. The largest few of the zi’s are likely to substantially overestimate their corresponding µi’s, this being an example of selection bias, or regression to the mean. Tweedie’s formula, first reported by Robbins in 1956, offers a simple empirical Bayes approach for correcting selection bias. This paper investigates its merits and limitations. In addition to the methodology, Tweedie’s formula raises more general questions concerning empirical Bayes theory, discussed here as “relevance ” and “empirical Bayes information. ” There is a close connection between applications of the formula and James–Stein estimation. Keywords: Bayesian relevance, empirical Bayes information, James–Stein, false discovery rates, regret, winner’s curse
False Discovery Rate Control with Groups
"... In the context of largescale multiple hypothesis testing, the hypotheses often possess certain group structures based on additional information such as Gene Ontology in gene expression data and phenotypes in genomewide association studies. It is hence desirable to incorporate such information when ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
In the context of largescale multiple hypothesis testing, the hypotheses often possess certain group structures based on additional information such as Gene Ontology in gene expression data and phenotypes in genomewide association studies. It is hence desirable to incorporate such information when dealing with multiplicity problems to increase statistical power. In this article, we demonstrate the benefit of considering group structure by presenting a pvalue weighting procedure which utilizes the relative importance of each group while controlling the false discovery rate under weak conditions. The procedure is easy to implement and shown to be more powerful than the classical BenjaminiHochberg procedure in both theoretical and simulation studies. By estimating the proportion of true null hypotheses, the datadriven procedure controls the false discovery rate asymptotically. Our analysis on one breast cancer data set confirms that the procedure performs favorably
Hakonarson H: Multiple testing in genomewide association studies via hidden Markov models
 Bioinformatics
"... Motivation: Genome wide association studies (GWAS) interrogate common genetic variation across the entire human genome in an unbiased manner and hold promise in identifying genetic variants with moderate or weak effect sizes. However, conventional testing procedures, which are mostly pvalue based, ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
(Show Context)
Motivation: Genome wide association studies (GWAS) interrogate common genetic variation across the entire human genome in an unbiased manner and hold promise in identifying genetic variants with moderate or weak effect sizes. However, conventional testing procedures, which are mostly pvalue based, ignore the dependency and therefore suffer from loss of efficiency. The goal of this article is to exploit the dependency information among adjacent SNPs to improve the screening efficiency in GWAS. Results: We propose to model the linear block dependency in the SNP data using hidden Markov Models. A compound decisiontheoretic framework for testing HMMdependent hypotheses is developed. We propose a powerful datadriven procedure (PLIS) that controls the false discovery rate (FDR) at the nominal level. PLIS is shown to be optimal in the sense that it has the smallest
PowerEnhanced Multiple Decision Functions Controlling FamilyWise Error and False Discovery Rates
, 2009
"... 2 ..."
(Show Context)
The Future of Indirect Evidence
"... Stanford UniversityWhat is Statistics? • The theory of learning from experience when experience arrives a little bit at a time • Collecting and combining many small pieces of sometimes contradictory evidence • Direct and Indirect statistical evidence ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Stanford UniversityWhat is Statistics? • The theory of learning from experience when experience arrives a little bit at a time • Collecting and combining many small pieces of sometimes contradictory evidence • Direct and Indirect statistical evidence
False Discovery Rates and Copy Number Variation
"... Huge data sets, simple questions ..."
(Show Context)
Transferred subgroup false discovery rate for rare posttranslational modifications detected by mass spectrometry
 Mol. Cell. Proteomics
"... In shotgun proteomics, highthroughput mass spectrometry experiments and the subsequent data analysis produce thousands to millions of hypothetical peptide identifications. The common way to estimate the false discovery rate (FDR) of peptide identifications is the targetdecoy database search st ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
In shotgun proteomics, highthroughput mass spectrometry experiments and the subsequent data analysis produce thousands to millions of hypothetical peptide identifications. The common way to estimate the false discovery rate (FDR) of peptide identifications is the targetdecoy database search strategy, which is efficient and accurate for large datasets. However, the legitimacy of the targetdecoy strategy for proteinmodificationcentric studies has rarely been rigorously validated. It is often the case that a global FDR is estimated for all peptide identifications including both modified and unmodified peptides, but that only a subgroup of identifications with a certain type of modification is focused on. As revealed recently, the subgroup FDR of modified peptide identifications can differ dramatically from the global FDR at the same score threshold, and thus the former, when it is of interest, should be separately estimated. However, rare modifications often result in a very small number of modified peptide identifications, which makes the direct separate FDR estimation inaccurate because of the inadequate sample size. This paper presents a method called the transferred FDR for accurately estimating the FDR of an arbitrary number of modified peptide identifications. Through flexible use of the empirical data from a targetdecoy database search, a theoretical relationship between the subgroup FDR and the global FDR is made computable. Through this relationship, the subgroup FDR can be predicted from the global FDR, allowing one to avoid an inaccurate direct estimation from a limited amount of data. The effectiveness of the method is demonstrated with both simulated and real mass
Integrating Prior Knowledge in Multiple Testing Under Dependence with Applications in Detecting Differential DNA Methylation
 Biometrics
, 2012
"... DNA methylation has emerged as an important hallmark of epigenetics. Numerous platforms including tiling arrays and next generation sequencing, and experimental protocols are available for profiling DNA methylation. Similar to other tiling array data, DNA methylation data shares the characteristics ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
DNA methylation has emerged as an important hallmark of epigenetics. Numerous platforms including tiling arrays and next generation sequencing, and experimental protocols are available for profiling DNA methylation. Similar to other tiling array data, DNA methylation data shares the characteristics of inherent correlation structure among nearby probes. However, unlike gene expression or protein DNA binding data, the varying CpG density which gives rise to CpG island, shore and shelf definition provides exogenous information in detecting differential methylation. This paper aims to introduce a robust testing and probe ranking procedure based on a nonhomogeneous hidden Markov model that incorporates the abovementioned features for detecting differential methylation. We revisit the seminal work of Sun and Cai (2009, J. R. Stat. Soc. B. 71, 393424) and propose modeling the nonnull using a nonparametric symmetric distribution in twosided hypothesis testing. We show that this model improves probe ranking and is robust to model misspecification based on extensive simulation studies. We further illustrate that our proposed framework achieves good operating characteristics as compared to commonly used methods in real DNA methylation data that aims to detect differential methylation sites.