Results 1 - 10
of
545
Detecting differential usage of exons from RNA-Seq data
, 2012
"... RNA-Seq is a powerful tool for the study of alternative splicing and other forms of alternative isoform expression. Understanding the regulation of these processes requires sensitive and specific detection of differential iso-form abundance in comparisons between conditions, cell types or tissues. W ..."
Abstract
-
Cited by 108 (9 self)
- Add to MetaCart
RNA-Seq is a powerful tool for the study of alternative splicing and other forms of alternative isoform expression. Understanding the regulation of these processes requires sensitive and specific detection of differential iso-form abundance in comparisons between conditions, cell types or tissues. We present DEXSeq, a statistical method to test for differential exon usage in RNA-Seq data. DEXSeq employs generalized linear models and offers reliable control of false discoveries by taking biological variation into account. DEXSeq detect genes, and in many cases specific exons, that are subject to differential exon usage with high sensitivity. We demonstrate the versatility of DEXSeq by applying it to several data sets. The method facilitates the study of regulation and function of alternative exon usage on a genome-wide scale. An implementation of DEXSeq is available as an R/Bioconductor package.
MJ: Transcript length bias in RNA-seq data confounds systems biology. Biol Direct 2009
"... This is an Open Access article distributed under the terms of the Creative Commons Attribution License ..."
Abstract
-
Cited by 96 (3 self)
- Add to MetaCart
This is an Open Access article distributed under the terms of the Creative Commons Attribution License
Statistical inferences for isoform expression in RNA-Seq
- Bioinformatics
, 2009
"... Summary: The development of RNA sequencing (RNA-Seq) makes it possible for us to measure transcription at an unprecedented precision and throughput. However challenges remain in understanding the source and distribution of the reads, modeling the transcript abundance and developing efficient computa ..."
Abstract
-
Cited by 91 (4 self)
- Add to MetaCart
Summary: The development of RNA sequencing (RNA-Seq) makes it possible for us to measure transcription at an unprecedented precision and throughput. However challenges remain in understanding the source and distribution of the reads, modeling the transcript abundance and developing efficient computational methods. In this paper, we develop a method to deal with the isoform expression estimation problem. The count of reads falling into a locus on the genome annotated with multiple isoforms is modeled as a Poisson variable. The expression of each individual isoform is estimated by solving a convex optimization problem and statistical inferences about the parameters are obtained from the posterior distribution by importance sampling. Our results show that isoform expression inference in RNA-Seq is possible by employing appropriate statistical methods. Contact:
Analysis and design of RNA sequencing experiments for identifying isoform regulation
- Nat Methods
"... Through alternative splicing, most human genes express multiple isoforms that often differ in function. To infer isoform regulation from high-throughput sequencing of cDNA fragments (RNA-seq), we developed the mixture-of-isoforms (MISO) model, a statistical model that estimates expression of alterna ..."
Abstract
-
Cited by 76 (4 self)
- Add to MetaCart
Through alternative splicing, most human genes express multiple isoforms that often differ in function. To infer isoform regulation from high-throughput sequencing of cDNA fragments (RNA-seq), we developed the mixture-of-isoforms (MISO) model, a statistical model that estimates expression of alternatively spliced exons and isoforms and assesses confidence in these estimates. Incorporation of mRNA fragment length distribution in paired-end RNA-seq greatly improved estimation of alternative-splicing levels. MISO also detects differentially regulated exons or isoforms. Application of MISO implicated the RNA splicing factor hnRNP H1 in the regulation of alternative cleavage and polyadenylation, a role that was supported by UV cross-linking–immunoprecipitation sequencing (CLIP-seq) analysis in human cells. Our results provide a probabilistic framework for RNA-seq analysis, give functional insights into pre-mRNA processing and yield guidelines for the optimal design of RNA-seq experiments for studies of gene and isoform expression. The distinct isoforms expressed from metazoan genes through alternative splicing can be important in development, differentiation and disease1. For example, the pyruvate kinase
An abundance of ubiquitously expressed genes revealed by tissue transcriptome sequence data
- PLoS Comput. Biol
, 2009
"... The parts of the genome transcribed by a cell or tissue reflect the biological processes and functions it carries out. We characterized the features of mammalian tissue transcriptomes at the gene level through analysis of RNA deep sequencing (RNA-Seq) data across human and mouse tissues and cell lin ..."
Abstract
-
Cited by 61 (0 self)
- Add to MetaCart
(Show Context)
The parts of the genome transcribed by a cell or tissue reflect the biological processes and functions it carries out. We characterized the features of mammalian tissue transcriptomes at the gene level through analysis of RNA deep sequencing (RNA-Seq) data across human and mouse tissues and cell lines. We observed that roughly 8,000 protein-coding genes were ubiquitously expressed, contributing to around 75 % of all mRNAs by message copy number in most tissues. These mRNAs encoded proteins that were often intracellular, and tended to be involved in metabolism, transcription, RNA processing or translation. In contrast, genes for secreted or plasma membrane proteins were generally expressed in only a subset of tissues. The distribution of expression levels was broad but fairly continuous: no support was found for the concept of distinct expression classes of genes. Expression estimates that included reads mapping to coding exons only correlated better with qRT-PCR data than estimates which also included 39 untranslated regions (UTRs). Muscle and liver had the least complex transcriptomes, in that they expressed predominantly ubiquitous genes and a large fraction of the transcripts came from a few highly expressed genes, whereas brain, kidney and testis expressed more complex transcriptomes with the vast majority of genes expressed and relatively small contributions from the most expressed genes. mRNAs expressed in brain had unusually long 39UTRs, and mean 39UTR length was higher for genes involved in development, morphogenesis and signal transduction, suggesting added complexity of UTR-based regulation for these genes. Our results support a model in
A two-parameter generalized Poisson model to improve the analysis of RNA-seq data
- Nucleic Acids Research
, 2010
"... Deep sequencing of RNAs (RNA-seq) has been a useful tool to characterize and quantify tran-scriptomes. However, there are significant chal-lenges in the analysis of RNA-seq data, such as how to separate signals from sequencing bias and how to perform reasonable normalization. Here, we focus on a fun ..."
Abstract
-
Cited by 49 (2 self)
- Add to MetaCart
(Show Context)
Deep sequencing of RNAs (RNA-seq) has been a useful tool to characterize and quantify tran-scriptomes. However, there are significant chal-lenges in the analysis of RNA-seq data, such as how to separate signals from sequencing bias and how to perform reasonable normalization. Here, we focus on a fundamental question in RNA-seq analysis: the distribution of the position-level read counts. Specifically, we propose a two-parameter generalized Poisson (GP) model to the position-level read counts. We show that the GP model fits the data much better than the traditional Poisson model. Based on the GP model, we can better estimate gene or exon expression, perform a more reasonable normalization across different sam-ples, and improve the identification of differentially expressed genes and the identification of differen-tially spliced exons. The usefulness of the GP model is demonstrated by applications to multiple RNA-seq data sets.
Detection of splice junctions from paired-end RNA-seq data by SpliceMap
- Nucleic Acids Res
, 2010
"... Alternative splicing is a prevalent posttranscriptional process, which is not only important to normal cellular function but is also involved in human diseases. The newly developed second generation sequencing technique provides highthroughput data (RNA-seq data) to study alternative splicing events ..."
Abstract
-
Cited by 44 (0 self)
- Add to MetaCart
(Show Context)
Alternative splicing is a prevalent posttranscriptional process, which is not only important to normal cellular function but is also involved in human diseases. The newly developed second generation sequencing technique provides highthroughput data (RNA-seq data) to study alternative splicing events in different types of cells. Here, we present a computational method, SpliceMap, to detect splice junctions from RNA-seq data. This method does not depend on any existing annotation of gene structures and is capable of finding novel splice junctions with high sensitivity and specificity. It can handle long reads (50–100 nt) and can exploit paired-read information to improve mapping accuracy. Several parameters are included in the output to indicate the reliability of the predicted junction and help filter out false predictions. We applied SpliceMap to analyze 23 million paired 50-nt reads from human brain tissue. The results show at this depth of sequencing, RNA-seq can support reliable detection of splice junctions except for those that are present at very low level. Compared to current methods, SpliceMap can achieve 12 % higher sensitivity without sacrificing specificity.
Comprehensive comparative analysis of strand-specific RNA sequencing methods
- Nat. Methods
, 2010
"... Strand-specific, massively-parallel cDNA sequencing (RNA-Seq) is a powerful tool for novel transcript discovery, genome annotation, and expression profiling. Despite multiple published methods for strand-specific RNA-Seq, no consensus exists as to how to choose between them. Here, we developed a com ..."
Abstract
-
Cited by 40 (4 self)
- Add to MetaCart
Strand-specific, massively-parallel cDNA sequencing (RNA-Seq) is a powerful tool for novel transcript discovery, genome annotation, and expression profiling. Despite multiple published methods for strand-specific RNA-Seq, no consensus exists as to how to choose between them. Here, we developed a comprehensive computational pipeline to compare library quality metrics from any RNA-Seq method. Using the well-annotated Saccharomyces cerevisiae transcriptome as a benchmark, we compared seven library construction protocols, including both published and our own novel methods. We found marked differences in strand-specificity, library complexity, evenness and continuity of coverage, agreement with known annotations, and accuracy for expression profiling. Weighing each method’s performance and ease, we identify the dUTP second strand marking and the Illumina RNA ligation methods as the leading protocols, with the former benefitting from the current availability of paired-end sequencing. Our analysis provides a comprehensive benchmark, and our computational pipeline is applicable for assessment of future protocols in other organisms. Users may view, print, copy, download and text and data- mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:
Most ‘‘dark matter’’ transcripts are associated with known genes
- PLoS Biol
, 2010
"... A series of reports over the last few years have indicated that a much larger portion of the mammalian genome is transcribed than can be accounted for by currently annotated genes, but the quantity and nature of these additional transcripts remains unclear. Here, we have used data from single- and p ..."
Abstract
-
Cited by 32 (1 self)
- Add to MetaCart
(Show Context)
A series of reports over the last few years have indicated that a much larger portion of the mammalian genome is transcribed than can be accounted for by currently annotated genes, but the quantity and nature of these additional transcripts remains unclear. Here, we have used data from single- and paired-end RNA-Seq and tiling arrays to assess the quantity and composition of transcripts in PolyA+ RNA from human and mouse tissues. Relative to tiling arrays, RNA-Seq identifies many fewer transcribed regions (‘‘seqfrags’’) outside known exons and ncRNAs. Most nonexonic seqfrags are in introns, raising the possibility that they are fragments of pre-mRNAs. The chromosomal locations of the majority of intergenic seqfrags in RNA-Seq data are near known genes, consistent with alternative cleavage and polyadenylation site usage, promoter- and terminator-associated transcripts, or new alternative exons; indeed, reads that bridge splice sites identified 4,544 new exons, affecting 3,554 genes. Most of the remaining seqfrags correspond to either single reads that display characteristics of random sampling from a low-level background or several thousand small transcripts (median length = 111 bp) present at higher levels, which also tend to display sequence conservation and originate from regions with open chromatin. We conclude that, while there are bona fide new intergenic transcripts, their number and abundance is generally low in comparison to known exons, and the genome is not as pervasively transcribed as previously reported.