Results 11  20
of
797
An Insightbased Longitudinal Study of Visual Analytics
 IEEE Trans. on Visualization and Computer Graphics
"... ..."
(Show Context)
Sample classification from protein mass spectrometry, by “peak probability contrasts”
, 2004
"... Motivation: Early cancer detection has always been a major research focus in solid tumor oncology. Early tumor detection can theoretically result in lower stage tumors, more treatable diseases and ultimately higher cure rates with less treatmentrelated morbidities. Protein mass spectrometry is a pot ..."
Abstract

Cited by 53 (0 self)
 Add to MetaCart
Motivation: Early cancer detection has always been a major research focus in solid tumor oncology. Early tumor detection can theoretically result in lower stage tumors, more treatable diseases and ultimately higher cure rates with less treatmentrelated morbidities. Protein mass spectrometry is a potentially powerful tool for early cancer detection. We propose a novel method for sample classification from protein mass spectrometry data. When applied to spectra from both diseased and healthy patients, the ‘peak probability contrast’ technique provides a list of all common peaks among the spectra, their statistical significance and their relative importance in discriminating between the two groups. We illustrate the method on matrixassisted laser desorption and ionization mass spectrometry data from a study of ovarian cancers. Results: Compared to other statistical approaches for class prediction, the peak probability contrast method performs as well or better than several methods that require the full spectra, rather than just labelled peaks. It is also much more interpretable biologically. The peak probability contrast method is a potentially useful tool for sample classification from protein mass spectrometry data.
Improving false discovery rate estimation
 Bioinformatics
"... Motivation: Recent attempts to account for multiple testing in the analysis of microarray data have focused on controlling the false discovery rate (FDR).However, rigorous control of the FDR at a preselected level is often impractical. Consequently, it has been suggested to use the qvalue as an est ..."
Abstract

Cited by 52 (6 self)
 Add to MetaCart
Motivation: Recent attempts to account for multiple testing in the analysis of microarray data have focused on controlling the false discovery rate (FDR).However, rigorous control of the FDR at a preselected level is often impractical. Consequently, it has been suggested to use the qvalue as an estimate of the proportion of false discoveries among a set of significant findings. However, such an interpretation of the qvalue may be unwarranted considering that the qvalue is based on an unstable estimator of the positive FDR (pFDR). Another method proposes estimating the FDR by modeling pvalues as arising from a betauniform mixture (BUM) distribution. Unfortunately, the BUM approach is reliable only in settings where the assumed model accurately represents the actual distribution of pvalues. Methods: A method called the spacings LOESS histogram (SPLOSH) is proposed for estimating the conditional FDR (cFDR), the expected proportion of false positives conditioned on having k ‘significant ’ findings. SPLOSH is designed to be more stable than the qvalue and applicable in a wider variety of settings than BUM. Results: In a simulation study and data analysis example, SPLOSH exhibits the desired characteristics relative to the qvalue and BUM. Availability: The Web site www.stjuderesearch.org/statistics/ splosh.html has links to freely available Splus code to implement the proposed procedure. Contact:
Implementing false discovery rate control: increasing your power. Oikos
, 2005
"... Popular procedures to control the chance of making type I errors when multiple statistical tests are performed come at a high cost: a reduction in power. As the number of tests increases, power for an individual test may become unacceptably low. This is a consequence of minimizing the chance of maki ..."
Abstract

Cited by 51 (0 self)
 Add to MetaCart
Popular procedures to control the chance of making type I errors when multiple statistical tests are performed come at a high cost: a reduction in power. As the number of tests increases, power for an individual test may become unacceptably low. This is a consequence of minimizing the chance of making even a single type I error, which is the aim of, for instance, the Bonferroni and sequential Bonferroni procedures. An alternative approach, control of the false discovery rate (FDR), has recently been advocated for ecological studies. This approach aims at controlling the proportion of significant results that are in fact type I errors. Keeping the proportion of type I errors low among all significant results is a sensible, powerful, and easytointerpret way of addressing the multiple testing issue. To encourage practical use of the approach, in this note we illustrate how the proposed procedure works, we compare it to more traditional methods that control the familywise error rate, and we discuss some recent useful developments in FDR control.
Supplement to “FIMO: Scanning for occurrences of a given motif”
"... MATCH, a related, protein scanning tool) are commercial products ..."
Abstract

Cited by 49 (2 self)
 Add to MetaCart
MATCH, a related, protein scanning tool) are commercial products
The optimal discovery procedure I: A new approach to simultaneous significance testing
 UW Biostatistics Working Paper Series, Working Paper 259. http:// www.bepress.com/uwbiostat/paper259
, 2005
"... Significance testing is one of the main objectives of statistics. The NeymanPearson lemma provides a simple rule for optimally testing a single hypothesis when the null and alternative distributions are known. This result has played a major role in the development of significance testing strategies ..."
Abstract

Cited by 48 (7 self)
 Add to MetaCart
Significance testing is one of the main objectives of statistics. The NeymanPearson lemma provides a simple rule for optimally testing a single hypothesis when the null and alternative distributions are known. This result has played a major role in the development of significance testing strategies that are used in practice. Most of the work extending single testing strategies to multiple tests has focused on formulating and estimating new types of significance measures, such as the false discovery rate. These methods tend to be based on pvalues that are calculated from each test individually, ignoring information from the other tests. As shrinkage estimation borrows strength across point estimates to improve their overall performance, I show here that borrowing strength across multiple significance tests can improve their performance as well. The ‘’optimal discovery procedure ” (ODP) is introduced, which shows how to maximize the number of expected true positives for each fixed number of expected false positives. The optimality achieved by this procedure is shown to be closely related to optimality in terms of the false discovery rate. The ODP motivates a new approach to testing multiple hypotheses, especially when the tests are related. As a simple example, a new simultaneous procedure for testing several Normal means is defined; this is surprisingly demonstrated to outperform the optimal single test procedure, showing that an optimal method for single tests may no longer be optimal in the multiple test setting. Connections to other concepts in statistics are discussed, including Stein’s paradox, shrinkage estimation, and Bayesian classification theory.
Next station in microarray data analysis: GEPAS
 Nucleic Acids Res
, 2006
"... The Gene Expression Profile Analysis Suite (GEPAS) has been running for more than four years. During this time it has evolved to keep pace with the new interests and trends in the still changing world of microarray data analysis. GEPAS has been designed to provide an intuitive although powerful web ..."
Abstract

Cited by 41 (19 self)
 Add to MetaCart
(Show Context)
The Gene Expression Profile Analysis Suite (GEPAS) has been running for more than four years. During this time it has evolved to keep pace with the new interests and trends in the still changing world of microarray data analysis. GEPAS has been designed to provide an intuitive although powerful webbased interface that offers diverse analysis options from the early step of preprocessing (normalization of Affymetrix and twocolour microarray experiments and other preprocessing options), to the final step of the functional annotation of the experiment (using Gene Ontology, pathways, PubMed abstracts etc.), and include different possibilities for clustering, gene selection, class prediction and arraycomparative genomic hybridization management. GEPAS is extensively used by researchers of many countries and its records indicate an average usage rate of 400 experiments per day. The webbased pipeline for microarray gene expression data, GEPAS, is available at
Normalization, testing, and false discovery rate estimation for RNAsequencing data
 Biostat. Oxf. Engl
, 2012
"... We discuss the identification of genes that are associated with an outcome in RNA sequencing and other sequencebased comparative genomic experiments. RNAsequencing data take the form of counts, so models based on the Gaussian distribution are unsuitable. Moreover, normalization is challenging beca ..."
Abstract

Cited by 34 (1 self)
 Add to MetaCart
We discuss the identification of genes that are associated with an outcome in RNA sequencing and other sequencebased comparative genomic experiments. RNAsequencing data take the form of counts, so models based on the Gaussian distribution are unsuitable. Moreover, normalization is challenging because different sequencing experiments may generate quite different total numbers of reads. To overcome these difficulties, we use a loglinear model with a new approach to normalization. We derive a novel procedure to estimate the false discovery rate (FDR). Our method can be applied to data with quantitative, twoclass, or multipleclass outcomes, and the computation is fast even for large data sets. We study the accuracy of our approaches for significance calculation and FDR estimation, and we demonstrate that our method has potential advantages over existing methods that are based on a Poisson or negative binomial model. In summary, this work provides a pipeline for the significance analysis of sequencing data.
Genetical Genomics Analysis of a Yeast Segregant Population for Transcription Network Inference
, 2005
"... Genetic analysis of gene expression in a segregating population, which is expression profiled and genotyped at DNA markers throughout the genome, can reveal regulatory networks of polymorphic genes. We propose an analysis strategy with several steps: (1) Genomewide QTL analysis of all expression pr ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
Genetic analysis of gene expression in a segregating population, which is expression profiled and genotyped at DNA markers throughout the genome, can reveal regulatory networks of polymorphic genes. We propose an analysis strategy with several steps: (1) Genomewide QTL analysis of all expression profiles to identify eQTL confidence regions, followed by finemapping of identified eQTL; (2) identification of regulatory candidate genes in each eQTL region; (3) correlation analysis of the expression profiles of the candidates in any eQTL region with the gene affected by the eQTL to reduce the number of candidates; (4) drawing directional links from retained regulatory candidate genes to genes affected by the eQTL and joining links to form networks, and (5) statistical validation and refinement of the inferred network structure. Here, we apply an initial implementation of this strategy to a segregating yeast population. In 65%, 7%, and 28 % of the identified eQTL regions, a single candidate regulatory gene, no gene, or more than one gene were retained in step (3), respectively. Overall, 768 putative regulatory links were retained, 331 of which are the strongest candidate links, as they were retained
An inexact interior point method for l1regularized sparse covariance selection
, 2010
"... Sparse covariance selection problems can be formulated as logdeterminant (logdet) semidefinite programming (SDP) problems with large numbers of linear constraints. Standard primaldual interiorpoint methods that are based on solving the Schur complement equation would encounter severe computation ..."
Abstract

Cited by 32 (3 self)
 Add to MetaCart
(Show Context)
Sparse covariance selection problems can be formulated as logdeterminant (logdet) semidefinite programming (SDP) problems with large numbers of linear constraints. Standard primaldual interiorpoint methods that are based on solving the Schur complement equation would encounter severe computational bottlenecks if they are applied to solve these SDPs. In this paper, we consider a customized inexact primaldual pathfollowing interiorpoint algorithm for solving large scale logdet SDP problems arising from sparse covariance selection problems. Our inexact algorithm solve the large and illconditioned linear system of equations in each iteration by a preconditioned iterative solver. By exploiting the structures in sparse covariance selection problems, we are able to design highly effective preconditioners to efficiently solve the large and illconditioned linear systems. Numerical experiments on both synthetic and real covariance selection problems show that our algorithm is highly efficient and outperforms other existing algorithms.