Results 11  20
of
1,093
A stochastic process approach to False discovery rates
, 2001
"... This paper extends the theory of false discovery rates (FDR) pioneered by Benjamini and Hochberg (1995). We develop a framework in which the False Discovery Proportion (FDP) – the number of false rejections divided by the number of rejections – is treated as a stochastic process. After obtaining th ..."
Abstract

Cited by 132 (6 self)
 Add to MetaCart
This paper extends the theory of false discovery rates (FDR) pioneered by Benjamini and Hochberg (1995). We develop a framework in which the False Discovery Proportion (FDP) – the number of false rejections divided by the number of rejections – is treated as a stochastic process. After obtaining the limiting distribution of the process, we demonstrate the validitiy of a class of procedures for controlling the False Discovery Rate (the expected FDP). We construct a confidence envelope for the whole FDP process. From these envelopes we derive confidence thresholds, for controlling the quantiles of the distribution of the FDP as well as controlling the number of false discoveries. We also
EMPIRICAL BAYES SELECTION OF WAVELET THRESHOLDS
, 2005
"... This paper explores a class of empirical Bayes methods for leveldependent threshold selection in wavelet shrinkage. The prior considered for each wavelet coefficient is a mixture of an atom of probability at zero and a heavytailed density. The mixing weight, or sparsity parameter, for each level of ..."
Abstract

Cited by 117 (5 self)
 Add to MetaCart
This paper explores a class of empirical Bayes methods for leveldependent threshold selection in wavelet shrinkage. The prior considered for each wavelet coefficient is a mixture of an atom of probability at zero and a heavytailed density. The mixing weight, or sparsity parameter, for each level of the transform is chosen by marginal maximum likelihood. If estimation is carried out using the posterior median, this is a random thresholding procedure; the estimation can also be carried out using other thresholding rules with the same threshold. Details of the calculations needed for implementing the procedure are included. In practice, the estimates are quick to compute and there is software available. Simulations on the standard model functions show excellent performance, and applications to data drawn from various fields of application are used to explore the practical performance of the approach. By using a general result on the risk of the corresponding marginal maximum likelihood approach for a single sequence, overall bounds on
XJ: GOEAST: a webbased software toolkit for Gene Ontology enrichment analysis
 Nucleic Acid Res 2008, 36(Web Server issue):W358363. et al. BMC Research Notes 2011, 4:493 http://www.biomedcentral.com/17560500/4/493 Page 17 of 17
"... Gene Ontology (GO) analysis has become a commonly used approach for functional studies of largescale genomic or transcriptomic data. Although there have been a lot of software with GOrelated analysis functions, new tools are still needed to meet the requirements for data generated by newly develope ..."
Abstract

Cited by 110 (3 self)
 Add to MetaCart
(Show Context)
Gene Ontology (GO) analysis has become a commonly used approach for functional studies of largescale genomic or transcriptomic data. Although there have been a lot of software with GOrelated analysis functions, new tools are still needed to meet the requirements for data generated by newly developed technologies or for advanced analysis purpose. Here, we present a Gene Ontology Enrichment
Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions
 Bioinformatics
, 2003
"... Motivation: Two practical realities constrain the analysis of microarray data, mass spectra from proteomics, and biomedical infrared or magnetic resonance spectra. One is the ‘curse of dimensionality’: the number of features characterizing these data is in the thousands or tens of thousands. The oth ..."
Abstract

Cited by 91 (3 self)
 Add to MetaCart
Motivation: Two practical realities constrain the analysis of microarray data, mass spectra from proteomics, and biomedical infrared or magnetic resonance spectra. One is the ‘curse of dimensionality’: the number of features characterizing these data is in the thousands or tens of thousands. The other is the ‘curse of dataset sparsity’: the number of samples is limited. The consequences of these two curses are farreaching when such data are used to classify the presence or absence of disease. Results: Using very simple classifiers, we show for several publicly available microarray and proteomics datasets how these curses influence classification outcomes. In particular, even if the sample per feature ratio is increased to the recommended 5–10 by feature extraction/reduction methods, dataset sparsity can render any classification result statistically suspect. In addition, several ‘optimal’ feature sets are typically identifiable for sparse datasets, all producing perfect classification results, both for the training and independent validation sets. This nonuniqueness leads to interpretational difficulties and casts doubt on the biological relevance of any of these ‘optimal’ feature sets. We suggest an approach to assess the relative quality of apparently equally good classifiers.
Stepwise multiple testing as formalized data snooping
 Econometrica
, 2005
"... It is common in econometric applications that several hypothesis tests are carried out at the same time. The problem then becomes how to decide which hypotheses to reject, accounting for the multitude of tests. In this paper, we suggest a stepwise multiple testing procedure which asymptotically cont ..."
Abstract

Cited by 88 (10 self)
 Add to MetaCart
It is common in econometric applications that several hypothesis tests are carried out at the same time. The problem then becomes how to decide which hypotheses to reject, accounting for the multitude of tests. In this paper, we suggest a stepwise multiple testing procedure which asymptotically controls the familywise error rate at a desired level. Compared to related singlestep methods, our procedure is more powerful in the sense that it often will reject more false hypotheses. In addition, we advocate the use of studentization when it is feasible. Unlike some stepwise methods, our method implicitly captures the joint dependence structure of the test statistics, which results in increased ability to detect alternative hypotheses. We prove our method asymptotically controls the familywise error rate under minimal assumptions. We present our methodology in the context of comparing several strategies to a common benchmark and deciding which strategies actually beat the benchmark. However, our ideas can easily be extended and/or modified to other contexts, such as making inference for the individual regression coefficients in a multiple regression framework. Some simulation studies show the improvements of our methods over previous proposals. We also provide an application to a set of real data.
A novel signaling pathway impact analysis
 Bioinformatics
, 2009
"... Motivation: Gene expression class comparison studies may identify hundreds or thousands of genes as differentially expressed (DE) between sample groups. Gaining biological insight from the result of such experiments can be approached, for instance, by identifying the signaling pathways impacted by t ..."
Abstract

Cited by 85 (1 self)
 Add to MetaCart
(Show Context)
Motivation: Gene expression class comparison studies may identify hundreds or thousands of genes as differentially expressed (DE) between sample groups. Gaining biological insight from the result of such experiments can be approached, for instance, by identifying the signaling pathways impacted by the observed changes. Most of the existing pathway analysis methods focus on either the number of DE genes observed in a given pathway (enrichment analysis methods), or on the correlation between the pathway genes and the class of the samples (functional class scoring methods). Both approaches treat the pathways as simple sets of genes, disregarding the complex gene interactions that these pathways are built to describe. Results: We describe a novel Signaling Pathway Impact Analysis (SPIA) that combines the evidence obtained from the classical enrichment analysis with a novel type of evidence, which measures the actual perturbation on a given pathway under a given condition. A bootstrap procedure is used to assess the significance of the observed total pathway perturbation. Using simulations we show that the evidence derived from perturbations is independent of the pathway enrichment evidence. This allows us to calculate a global pathway significance pvalue, which combines the enrichment and perturbation pvalues. We illustrate the capabilities of the novel method on 4 real data sets. The results obtained on these data show that SPIA has better specificity and more sensitivity than several widely used pathway analysis methods. Availability: SPIA was implemented as an R package which is available at
Functional interpretation of microarray experiments
 OMICS
, 2006
"... Over the past few years, due to the popularisation of highthroughput methodologies such as DNA microarrays, the possibility of obtaining experimental data has increased significantly. Nevertheless, the interpretation of the results, which involves translating these data into useful biological knowl ..."
Abstract

Cited by 75 (24 self)
 Add to MetaCart
Over the past few years, due to the popularisation of highthroughput methodologies such as DNA microarrays, the possibility of obtaining experimental data has increased significantly. Nevertheless, the interpretation of the results, which involves translating these data into useful biological knowledge, still remains a challenge. The methods and strategies used for this interpretation are in continuous evolution and new proposals are constantly arising. Initially, a twostep approach was used in which genes of interest were initially selected, based on thresholds that consider only experimental values, and then in a second, independent step the enrichment of these genes in biologically relevant terms, was analysed. For different reasons, these methods are relatively poor in terms of performance and a new generation of procedures, which draw inspiration from systems biology criteria, are currently under development. Such procedures, aim to directly test the behaviour of blocks of functionally related genes, instead of focusing on single genes.
Generalizations of the familywise error rate
 Ann. Statist
, 2005
"... Consider the problem of simultaneously testing null hypotheses H1,...,Hs. The usual approach to dealing with the multiplicity problem is to restrict attention to procedures that control the familywise error rate (FWER), the probability of even one false rejection. In many applications, particularly ..."
Abstract

Cited by 75 (8 self)
 Add to MetaCart
(Show Context)
Consider the problem of simultaneously testing null hypotheses H1,...,Hs. The usual approach to dealing with the multiplicity problem is to restrict attention to procedures that control the familywise error rate (FWER), the probability of even one false rejection. In many applications, particularly if s is large, one might be willing to tolerate more than one false rejection provided the number of such cases is controlled, thereby increasing the ability of the procedure to detect false null hypotheses. This suggests replacing control of the FWER by controlling the probability of k or more false rejections, which we call the kFWER. We derive both singlestep and stepdown procedures that control the kFWER, without making any assumptions concerning the dependence structure of the pvalues of the individual tests. In particular, we derive a stepdown procedure that is quite simple to apply, and prove that it cannot be improved without violation of control of the kFWER. We also consider the false discovery proportion (FDP) defined by the number of false rejections divided by the total number of rejections (defined to be 0 if there are no rejections). The false discovery rate proposed by Benjamini
Statistical challenges with high dimensionality: feature selection in knowledge discovery
, 2006
"... ..."
False discovery rate: adjusted multiple confidence intervals for selected parameters [with comments, rejoinder]. J Am Stat Assoc
, 2005
"... Full terms and conditions of use: ..."
(Show Context)