Results

**1 - 5**of**5**### Digital Repository @ Iowa State University

, 2010

"... Hidden Markov models for simultaneous testing of multiple gene sets and adaptive and dynamic adaptive procedures for false discovery rate control and estimation ..."

Abstract
- Add to MetaCart

(Show Context)
Hidden Markov models for simultaneous testing of multiple gene sets and adaptive and dynamic adaptive procedures for false discovery rate control and estimation

### DDDooowwwnnnllloooaaaddd DDDaaattteee || | 444///555///111222 999:::222222 PPPMMMA Model-Based Analysis to Infer the Functional Content of a Gene List

"... An important challenge in statistical genomics concerns integrating experimental data with exogenous information about gene function. A number of statistical methods are available to address this challenge, but most do not accommodate complexities in the functional record. To infer activity of a fun ..."

Abstract
- Add to MetaCart

(Show Context)
An important challenge in statistical genomics concerns integrating experimental data with exogenous information about gene function. A number of statistical methods are available to address this challenge, but most do not accommodate complexities in the functional record. To infer activity of a functional category (e.g., a gene ontology term), most methods use gene-level data on that category, but do not use other functional properties of the same genes. Not doing so creates undue errors in inference. Recent developments in model-based category analysis aim to overcome this difficulty, but in attempting to do so they are faced with serious computational problems. This paper investigates statistical properties and the structure of posterior computation in one such model for the analysis of functional category data. We examine the graphical structures underlying posterior computation in the original parameterization and in a new parameterization aimed at leveraging elements of the model. We characterize identifiability of the underlying activation states, describe a new prior distribution, and introduce approximations that aim to support numerical methods for posterior inference.

### METHODOLOGY ARTICLE Open Access

"... A shortcut for multiple testing on the directed acyclic graph of gene ontology ..."

Abstract
- Add to MetaCart

(Show Context)
A shortcut for multiple testing on the directed acyclic graph of gene ontology

### Gene expression SimSeq: a nonparametric approach to simulation of RNA-sequence datasets

"... Motivation: RNA sequencing analysis methods are often derived by relying on hypothetical paramet-ric models for read counts that are not likely to be precisely satisfied in practice. Methods are often tested by analyzing data that have been simulated according to the assumed model. This testing stra ..."

Abstract
- Add to MetaCart

Motivation: RNA sequencing analysis methods are often derived by relying on hypothetical paramet-ric models for read counts that are not likely to be precisely satisfied in practice. Methods are often tested by analyzing data that have been simulated according to the assumed model. This testing strat-egy can result in an overly optimistic view of the performance of an RNA-seq analysis method. Results: We develop a data-based simulation algorithm for RNA-seq data. The vector of read counts simulated for a given experimental unit has a joint distribution that closely matches the dis-tribution of a source RNA-seq dataset provided by the user. We conduct simulation experiments based on the negative binomial distribution and our proposed nonparametric simulation algo-rithm. We compare performance between the two simulation experiments over a small subset of statistical methods for RNA-seq analysis available in the literature. We use as a benchmark the ability of a method to control the false discovery rate. Not surprisingly, methods based on paramet-ric modeling assumptions seem to perform better with respect to false discovery rate control when data are simulated from parametric models rather than using our more realistic nonparametric simulation strategy. Availability and implementation: The nonparametric simulation algorithm developed in this article is implemented in the R package SimSeq, which is freely available under the GNU General Public License (version 2 or later) from the Comprehensive R Archive Network (http://cran.rproject.org/). Contact:

### SimSeq: A Nonparametric Approach to Simulation of RNA-Sequence Datasets

"... Motivation: RNA sequencing analysis methods are often derived by relying on hypothetical parametric models for read counts that are not likely to be precisely satisfied in practice. Methods are often tested by analyzing data that have been simulated according to the assumed model. This testing strat ..."

Abstract
- Add to MetaCart

Motivation: RNA sequencing analysis methods are often derived by relying on hypothetical parametric models for read counts that are not likely to be precisely satisfied in practice. Methods are often tested by analyzing data that have been simulated according to the assumed model. This testing strategy can result in an overly optimistic view of the performance of an RNA-seq analysis method. Results:We develop a data-based simulation algorithm for RNA-seq data. The vector of read counts simulated for a given experimental unit has a joint distribution that closely matches the distribution of a source RNA-seq dataset provided by the user. We conduct simulation experiments based on the negative binomial distribution and our proposed nonparametric simulation algorithm. We compare performance between the two simulation experiments over a small subset of statistical methods for RNA-seq analysis available in the literature. We use as a benchmark the ability of a method to control the false discovery rate (FDR). Not surprisingly, methods based on parametric modeling assumptions seem to perform better with respect to FDR control when data are simulated from parametric models rather than using our more realistic nonparametric simulation strategy. Availability: The nonparametric simulation algorithm developed in this paper is implemented in the R package SimSeq, which is freely available under the GNU General Public License (version 2 or later)