Assigning significance to peptides identified by tandem mass spectrometry using decoy databases
 J. Proteome Res
, 2008
"... Automated methods for assigning peptides to observed tandem mass spectra typically return a list of peptidespectrum matches, ranked according to an arbitrary score. In this article, we describe methods for converting these arbitrary scores into more useful statistical significance measures. These m ..."
Abstract

Cited by 65 (13 self)
Automated methods for assigning peptides to observed tandem mass spectra typically return a list of peptidespectrum matches, ranked according to an arbitrary score. In this article, we describe methods for converting these arbitrary scores into more useful statistical significance measures. These methods employ a decoy sequence database as a model of the null hypothesis, and use false discovery rate (FDR) analysis to correct for multiple testing. We first describe a simple FDR inference method and then describe how estimating and taking into account the percentage of incorrectly identified spectra in the entire data set can lead to increased statistical power.
Innovated higher criticism for detecting sparse signals in correlated noise
 Ann. Statist
, 2010
"... Higher Criticism is a method for detecting signals that are both sparse and weak. Although first proposed in cases where the noise variables are independent, Higher Criticism also has reasonable performance in settings where those variables are correlated. In this paper we show that, by exploiting t ..."
Abstract

Cited by 41 (8 self)
Higher Criticism is a method for detecting signals that are both sparse and weak. Although first proposed in cases where the noise variables are independent, Higher Criticism also has reasonable performance in settings where those variables are correlated. In this paper we show that, by exploiting the nature of the correlation, performance can be improved by using a modified approach which exploits the potential advantages that correlation has to offer. Indeed, it turns out that the case of independent noise is the most difficult of all, from a statistical viewpoint, and that more accurate signal detection (for a given level of signal sparsity and strength) can be obtained when correlation is present. We characterize the advantages of correlation by showing how to incorporate them into the definition of an optimal detection boundary. The boundary has particularly attractive properties when correlation decays at a polynomial rate or the correlation matrix is Toeplitz.
Estimating the null and the proportion of nonnull effects in largescale multiple comparisons
 J. Amer. Statist. Assoc
, 2007
"... An important issue raised by Efron [7] in the context of largescale multiple comparisons is that in many applications the usual assumption that the null distribution is known is incorrect, and seemingly negligible differences in the null may result in large differences in subsequent studies. This s ..."
Abstract

Cited by 39 (6 self)
An important issue raised by Efron [7] in the context of largescale multiple comparisons is that in many applications the usual assumption that the null distribution is known is incorrect, and seemingly negligible differences in the null may result in large differences in subsequent studies. This suggests that a careful study of estimation of the null is indispensable. In this paper, we consider the problem of estimating a null normal distribution, and a closely related problem, estimation of the proportion of nonnull effects. We develop an approach based on the empirical characteristic function and Fourier analysis. The estimators are shown to be uniformly consistent over a wide class of parameters. Numerical performance of the estimators is investigated using both simulated and real data. In particular, we apply our
Estimation and confidence sets for sparse normal mixtures
, 2006
"... Estimation and confidence sets for sparse normal mixtures ..."
Abstract

Cited by 35 (17 self)
Estimation and confidence sets for sparse normal mixtures
Goodnessoffit tests via phidivergences
, 2006
"... A unified family of goodnessoffit tests based on φdivergences is introduced and studied. The new family of test statistics Sn(s) includes both the supremum version of the Anderson–Darling statistic and the test statistic of Berk and Jones [Z. Wahrsch. Verw. Gebiete 47 (1979) 47–59] as special cas ..."
Abstract

Cited by 27 (2 self)
A unified family of goodnessoffit tests based on φdivergences is introduced and studied. The new family of test statistics Sn(s) includes both the supremum version of the Anderson–Darling statistic and the test statistic of Berk and Jones [Z. Wahrsch. Verw. Gebiete 47 (1979) 47–59] as special cases (s = 2 and s = 1, resp.). We also introduce integral versions of the new statistics. We show that the asymptotic null distribution theory of Berk
Genomewide association for major depressive disorder: a possible role for the presynaptic protein piccolo
, 2009
Feature selection by higher criticism thresholding: Optimal phase diagram
, 2008
"... We consider twoclass linear classification in a highdimensional, lowsample size setting. Only a small fraction of the features are useful, the useful features are unknown to us, and each useful feature contributes weakly to the classification decision – this setting was called the rare/weak model ..."
Abstract

Cited by 26 (6 self)
We consider twoclass linear classification in a highdimensional, lowsample size setting. Only a small fraction of the features are useful, the useful features are unknown to us, and each useful feature contributes weakly to the classification decision – this setting was called the rare/weak model (RW Model) in [11]. We select features by thresholding feature zscores. The threshold is set by higher criticism (HC) [11]. Let πi denote the Pvalue associated to the ith zscore and π(i) denote the ith order statistic of the collection of Pvalues. The HC threshold (HCT) is the order statistic of the zscore corresponding to index i maximizing (i/n − π(i)) / p π(i)(1 − π(i)). The ideal threshold optimizes the classification error. In [11] we showed that HCT was numerically close to the ideal threshold. We formalize an asymptotic framework for studying the RW model, considering a sequence of problems with increasingly many features and relatively fewer observations. We show that along this sequence, the limiting performance of ideal HCT is essentially just as good as the limiting performance of ideal thresholding. Our results describe twodimensional
ON THE FALSE DISCOVERY RATE AND AN ASYMPTOTICALLY OPTIMAL REJECTION CURVE 1
, 903
"... In this paper we introduce and investigate a new rejection curve for asymptotic control of the false discovery rate (FDR) in multiple hypotheses testing problems. We first give a heuristic motivation for this new curve and propose some procedures related to it. Then we introduce a set of possible as ..."
Abstract

Cited by 25 (3 self)
In this paper we introduce and investigate a new rejection curve for asymptotic control of the false discovery rate (FDR) in multiple hypotheses testing problems. We first give a heuristic motivation for this new curve and propose some procedures related to it. Then we introduce a set of possible assumptions and give a unifying short proof of FDR control for procedures based on Simes ’ critical values, whereby certain types of dependency are allowed. This methodology of proof is then applied to other fixed rejection curves including the proposed new curve. Among others, we investigate the problem of finding least favorable parameter configurations such that the FDR becomes largest. We then derive a series of results concerning asymptotic FDR control for procedures based on the new curve and discuss several example procedures in more detail. A main result will be an asymptotic optimality statement for various procedures based on the new curve in the class of fixed rejection curves. Finally, we briefly discuss strict FDR control for a finite number of hypotheses.
Stepup procedures controlling generalized FWER and generalized
, 2007
"... In many applications of multiple hypothesis testing where more than one false rejection can be tolerated, procedures controlling error rates measuring at least k false rejections, instead of at least one, for some fixed k ≥ 1 can potentially increase the ability of a procedure to detect false null h ..."
Abstract

Cited by 18 (4 self)
In many applications of multiple hypothesis testing where more than one false rejection can be tolerated, procedures controlling error rates measuring at least k false rejections, instead of at least one, for some fixed k ≥ 1 can potentially increase the ability of a procedure to detect false null hypotheses. The kFWER, a generalized version of the usual familywise error rate (FWER), is such an error rate that has recently been introduced in the literature and procedures controlling it have been proposed. A further generalization of a result on the kFWER is provided in this article. In addition, an alternative and less conservative notion of error rate, the kFDR, is introduced in the same spirit as the kFWER by generalizing the usual false discovery rate (FDR). A kFWER procedure is constructed given any set of increasing constants by utilizing the kth order joint null distributions of the pvalues without assuming any specific form of dependence among all the pvalues. Procedures controlling the kFDR are also developed by using the kth order joint null distributions of the pvalues, first assuming that the sets of null and nonnull pvalues are mutually independent or they are jointly positively dependent in the sense of being multivariate totally positive of order two (MTP2) and then discarding that assumption about the overall dependence among the pvalues. 1. Introduction. Having
On the performance of FDR control: constraints and a partial solution
 Ann. Statist
