Results 1  10
of
138
BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks
 Bioinformatics
, 2005
"... Summary: The Biological Networks Gene Ontology tool (BiNGO) is an opensource Java tool to determine which Gene Ontology (GO) terms are significantly overrepresented in a set of genes. BiNGO can be used either on a list of genes, pasted as text, or interactively on subgraphs of biological networks v ..."
Abstract

Cited by 530 (4 self)
 Add to MetaCart
(Show Context)
Summary: The Biological Networks Gene Ontology tool (BiNGO) is an opensource Java tool to determine which Gene Ontology (GO) terms are significantly overrepresented in a set of genes. BiNGO can be used either on a list of genes, pasted as text, or interactively on subgraphs of biological networks visualized in Cytoscape. BiNGO maps the predominant functional themes of the tested gene set on the GO hierarchy, and takes advantage of Cytoscape’s versatile visualization environment to produce an intuitive and customizable visual representation of the results.
Correlation and LargeScale Simultaneous Significance Testing
 Journal of the American Statistical Association
"... Largescale hypothesis testing problems, with hundreds or thousands of test statistics “zi ” to consider at once, have become familiar in current practice. Applications of popular analysis methods such as false discovery rate techniques do not require independence of the zi’s, but their accuracy can ..."
Abstract

Cited by 96 (8 self)
 Add to MetaCart
Largescale hypothesis testing problems, with hundreds or thousands of test statistics “zi ” to consider at once, have become familiar in current practice. Applications of popular analysis methods such as false discovery rate techniques do not require independence of the zi’s, but their accuracy can be compromised in highcorrelation situations. This paper presents computational and theoretical methods for assessing the size and effect of correlation in largescale testing. A simple theory leads to the identification of a single omnibus measure of correlation. The theory relates to the correct choice of a null distribution for simultaneous significance testing, and its effect on inference. 1. Introduction Modern computing machinery and improved scientific equipment have combined to revolutionize experimentation in fields such as biology, medicine, genetics, and neuroscience. One effect on statistics has been to vastly magnify the scope of multiple hypothesis testing, now often involving thousands of cases considered simultaneously. The cases themselves are typically of familiar form, each perhaps a simple twosample comparison,
Improving false discovery rate estimation
 Bioinformatics
"... Motivation: Recent attempts to account for multiple testing in the analysis of microarray data have focused on controlling the false discovery rate (FDR).However, rigorous control of the FDR at a preselected level is often impractical. Consequently, it has been suggested to use the qvalue as an est ..."
Abstract

Cited by 52 (6 self)
 Add to MetaCart
Motivation: Recent attempts to account for multiple testing in the analysis of microarray data have focused on controlling the false discovery rate (FDR).However, rigorous control of the FDR at a preselected level is often impractical. Consequently, it has been suggested to use the qvalue as an estimate of the proportion of false discoveries among a set of significant findings. However, such an interpretation of the qvalue may be unwarranted considering that the qvalue is based on an unstable estimator of the positive FDR (pFDR). Another method proposes estimating the FDR by modeling pvalues as arising from a betauniform mixture (BUM) distribution. Unfortunately, the BUM approach is reliable only in settings where the assumed model accurately represents the actual distribution of pvalues. Methods: A method called the spacings LOESS histogram (SPLOSH) is proposed for estimating the conditional FDR (cFDR), the expected proportion of false positives conditioned on having k ‘significant ’ findings. SPLOSH is designed to be more stable than the qvalue and applicable in a wider variety of settings than BUM. Results: In a simulation study and data analysis example, SPLOSH exhibits the desired characteristics relative to the qvalue and BUM. Availability: The Web site www.stjuderesearch.org/statistics/ splosh.html has links to freely available Splus code to implement the proposed procedure. Contact:
Young S: Sample size calculation for multiple testing in microarray data analysis
 Biostatistics
"... Microarray technology is rapidly emerging for genomewide screening of differentially expressed genes between clinical subtypes or different conditions of human diseases. Traditional statistical testing approaches, such as the twosample ttest or Wilcoxon test, are frequently used for evaluating st ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
Microarray technology is rapidly emerging for genomewide screening of differentially expressed genes between clinical subtypes or different conditions of human diseases. Traditional statistical testing approaches, such as the twosample ttest or Wilcoxon test, are frequently used for evaluating statistical significance of informative expressions but require adjustment for largescale multiplicity. Due to its simplicity, Bonferroni adjustment has been widely used to circumvent this problem. It is well known, however, that the standard Bonferroni test is often very conservative. In the present paper, we compare three multiple testing procedures in the microarray context: the original Bonferroni method, a Bonferronitype improved singlestep method and a stepdown method. The latter two methods are based on nonparametric resampling, by which the null distribution can be derived with the dependency structure among gene expressions preserved and the familywise error rate accurately controlled at the desired level. We also present a sample size calculation method for designing microarray studies. Through simulations and data analyses, we find that the proposed methods for testing and sample size calculation are computationally fast and control error and power precisely.
SOME NONASYMPTOTIC RESULTS ON RESAMPLING IN HIGH DIMENSION, I: CONFIDENCE REGIONS
 SUBMITTED TO THE ANNALS OF STATISTICS
, 2009
"... We study generalized bootstrap confidence regions for the mean of a random vector whose coordinates have an unknown dependency structure. The random vector is supposed to be either Gaussian or to have a symmetric and bounded distribution. The dimensionality ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
(Show Context)
We study generalized bootstrap confidence regions for the mean of a random vector whose coordinates have an unknown dependency structure. The random vector is supposed to be either Gaussian or to have a symmetric and bounded distribution. The dimensionality
Multiple Testing Part II: Stepdown procedures for control of the familywise error rate
 Stat Appl Genet Mol Biol
"... Copyright c©2003 by the authors. ..."
Association rule interestingness: Measure and statistical validation
 In Quality Measures in Data Mining
, 2007
"... Summary. The search for interesting Boolean association rules is an important topic in knowledge discovery in databases. The set of admissible rules for the selected support and con dence thresholds can easily be extracted by algorithms based on support and con dence, such as Apriori. However, they ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
(Show Context)
Summary. The search for interesting Boolean association rules is an important topic in knowledge discovery in databases. The set of admissible rules for the selected support and con dence thresholds can easily be extracted by algorithms based on support and con dence, such as Apriori. However, they may produce a large number of rules, many of them are uninteresting. One has to resolve a twotier problem: choosing the measures best suited to the problem at hand, then validating the interesting rules against the selected measures. First, the usual measures suggested in the literature will be reviewed and criteria to appreciate the qualities of these measures will be proposed. Statistical validation of the most interesting rules requests performing a large number of tests. Thus, controlling for false discoveries (type I errors) is of prime importance. An original bootstrapbased validation method is proposed which controls, for a given level, the number of false discoveries. The interest of this method for the selection of interesting association rules will be illustrated by several examples.
Inference with transposable data: modelling the effects of row and column correlations
 Journal of the Royal Statistical Society: Series B (Statistical Methodology
, 2012
"... Summary. We consider the problem of largescale inference on the row or column variables of data in the form of a matrix. Many of these data matrices are transposable meaning that neither the row variables nor the column variables can be considered independent instances. An example of this scenario ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
(Show Context)
Summary. We consider the problem of largescale inference on the row or column variables of data in the form of a matrix. Many of these data matrices are transposable meaning that neither the row variables nor the column variables can be considered independent instances. An example of this scenario is detecting significant genes in microarrays when the samples may be dependent due to latent variables or unknown batch effects. By modeling this matrix data using the matrixvariate normal distribution, we study and quantify the effects of row and column correlations on procedures for largescale inference. We then propose a simple solution to the myriad of problems presented by unanticipated correlations: We simultaneously estimate row and column covariances and use these to sphere or decorrelate the noise in the underlying data before conducting inference. This procedure yields data with approximately independent rows and columns so that test statistics more closely follow null distributions and multiple testing procedures correctly control the desired error rates. Results on simulated models and real microarray data demonstrate major advantages of this approach: (1) increased statistical power, (2) less bias in estimating the false discovery rate, and (3) reduced variance of the false discovery rate estimators.
CATdb: a public access to Arabidopsis transcriptome
, 2007
"... data from the URGVCATMA platform ..."
(Show Context)
Comparative analysis of gene sets in the gene ontology space under the multiple hypothesis testing framework
 In Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
"... The Gene Ontology (GO) resource can be used as a powerful tool to uncover the properties shared among, and specific to, a list of genes produced by highthroughput functional genomics studies, such as microarray studies. In the comparative analysis of several gene lists, researchers maybe interested ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
(Show Context)
The Gene Ontology (GO) resource can be used as a powerful tool to uncover the properties shared among, and specific to, a list of genes produced by highthroughput functional genomics studies, such as microarray studies. In the comparative analysis of several gene lists, researchers maybe interested in knowing which GO terms are enriched in one list of genes but relatively depleted in another. Statistical tests such as Fisher’s exact test or Chisquare test can be performed to search for such GO terms. However, because multiple GO terms are tested simultaneously, individual pvalues from individual tests do not serve as good indicators for picking GO terms. Furthermore, these multiple tests are highly correlated, usual