Results 1 - 10
of
13
Identification of novel transcripts in annotated genomes using RNA-Seq
- Bioinformatics
, 2011
"... Summary: We describe a new ‘reference annotation based transcript assembly ’ problem for RNA-Seq data that involves assembling novel transcripts in the context of an existing annotation. This problem arises in the analysis of expression in model organisms, where it is desirable to leverage existing ..."
Abstract
-
Cited by 47 (0 self)
- Add to MetaCart
Summary: We describe a new ‘reference annotation based transcript assembly ’ problem for RNA-Seq data that involves assembling novel transcripts in the context of an existing annotation. This problem arises in the analysis of expression in model organisms, where it is desirable to leverage existing annotations for discovering novel transcripts. We present an algorithm for reference annotation-based transcript assembly and show how it can be used to rapidly investigate novel transcripts revealed by RNA-Seq in comparison with a reference annotation. Availability: The methods described in this article are implemented in the Cufflinks suite of software for RNA-Seq, freely available from
Tools for mapping high-throughput sequencing data
- Bioinformatics
, 2012
"... Motivation: A ubiquitous and fundamental step in high-throughput sequencing analysis is the alignment (mapping) of the generated reads to a reference sequence. To accomplish this task numerous software tools have been proposed. Determining the mappers that are most suitable for a specic application ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
Motivation: A ubiquitous and fundamental step in high-throughput sequencing analysis is the alignment (mapping) of the generated reads to a reference sequence. To accomplish this task numerous software tools have been proposed. Determining the mappers that are most suitable for a specic application is not trivial. Results: This survey focuses on classifying mappers through a wide number of characteristics. The goal is to allow practitioners to com-pare the mappers more easily and nd those that are most suitable for their specic problem. Availability: A regularly updated compendium of mappers can be found at
Statistical design and analysis of RNA sequencing data
- Genetics
, 2010
"... ABSTRACT Next-generation sequencing technologies are quickly becoming the preferred approach for characterizing and quantifying entire genomes. Even though data produced from these technologies are proving to be the most informative of any thus far, very little attention has been paid to fundamenta ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
(Show Context)
ABSTRACT Next-generation sequencing technologies are quickly becoming the preferred approach for characterizing and quantifying entire genomes. Even though data produced from these technologies are proving to be the most informative of any thus far, very little attention has been paid to fundamental design aspects of data collection and analysis, namely sampling, randomization, replication, and blocking. We discuss these concepts in an RNA sequencing framework. Using simulations we demonstrate the benefits of collecting replicated RNA sequencing data according to well known statistical designs that partition the sources of biological and technical variation. Examples of these designs and their corresponding models are presented with the goal of testing differential expression.
Efficiently identifying genome-wide changes with next-generation
, 2011
"... sequencing data ..."
(Show Context)
Inferring gene networks from discrete expression data
- Biostatistics
, 2013
"... The modeling of gene networks from transcriptional expression data is an important tool in biomedical research to reveal signaling pathways and to identify treatment targets. Current gene network modeling is primarily based on the use of Gaussian graphical models applied to continuous data, which gi ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
The modeling of gene networks from transcriptional expression data is an important tool in biomedical research to reveal signaling pathways and to identify treatment targets. Current gene network modeling is primarily based on the use of Gaussian graphical models applied to continuous data, which give a closed-formmarginal likelihood. In this paper, we extend networkmodeling to discrete data, specifically data from serial analysis of gene expression, and RNA-sequencing experiments, both of which generate counts of mRNA transcripts in cell samples.We propose a generalized linear model to fit the discrete gene expression data and assume that the log ratios of the mean expression levels follow a Gaussian distribution. We restrict the gene network structures to decomposable graphs and derive the graphs by selecting the covariance matrix of the Gaussian distribution with the hyper-inverse Wishart priors. Furthermore, we incorporate prior network models based on gene ontology information, which avails existing biological information on the genes of interest. We conduct simulation studies to examine the performance of our discrete graphical model and apply the method to two real datasets for gene network inference.
RNASEQR–a streamlined and accurate RNA-seq sequence analysis program
- Nucleic Acids Res
, 2012
"... Next-generation sequencing (NGS) technologies-based transcriptomic profiling method often called RNA-seq has been widely used to study global gene expression, alternative exon usage, new exon dis-covery, novel transcriptional isoforms and genomic sequence variations. However, this technique also pos ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Next-generation sequencing (NGS) technologies-based transcriptomic profiling method often called RNA-seq has been widely used to study global gene expression, alternative exon usage, new exon dis-covery, novel transcriptional isoforms and genomic sequence variations. However, this technique also poses many biological and informatics challenges to extracting meaningful biological information. The RNA-seq data analysis is built on the foundation of high quality initial genome localization and alignment information for RNA-seq sequences. Toward this goal, we have developed RNASEQR to accurately and effectively map millions of RNA-seq sequences. We have systematically compared RNASEQR with four of the most widely used tools using a simulated data set created from the Consensus CDS project and two ex-perimental RNA-seq data sets generated from a human glioblastoma patient. Our results showed that RNASEQR yields more accurate estimates for gene expression, complete gene structures and new transcript isoforms, as well as more accurate detection of single nucleotide variants (SNVs). RNASEQR analyzes raw data from RNA-seq experiments effectively and outputs results in a manner that is compatible with a wide variety of specialized downstream analyses on desktop computers.
Computational methods for transcriptome annotation and quantification using RNA-seq
"... Defining a precise map of all genes along with their alternative isoforms and expression across diverse cell types is critical for understanding biology. Until recently, production of such data was prohibitively expensive and experimentally laborious. The major method for annotating a transcriptome ..."
Abstract
- Add to MetaCart
Defining a precise map of all genes along with their alternative isoforms and expression across diverse cell types is critical for understanding biology. Until recently, production of such data was prohibitively expensive and experimentally laborious. The major method for annotating a transcriptome required the slow and costly process of cloning cDNAs or expressed sequence tag (EST) libraries, followed by capillary sequencing 1-3 . Owing to the high cost and limited data yield intrinsic to this approach, it only provided a glimpse of the true complexity of cell type-specific splicing and transcription Recent advances in DNA sequencing technology have made it possible to sequence cDNA derived from cellular RNA by massively parallel sequencing technologies, a process termed RNA-seq Here we focus on the computational methods needed to address RNA-seq analysis core challenges. First, we describe methods to align reads directly to a reference transcriptome or genome ('read mapping'). Second, we discuss methods to identify expressed genes and isoforms ('transcriptome reconstruction'). Third, we present methods for estimation of gene and isoform abundance, as well as methods for the analysis of differential expression across samples ('expression quantification'). Because of ongoing improvements in RNA-seq data generation, there is great variability in the maturity of available computational tools. In some areas, such as read mapping, a wealth of algorithms exists but in others, such as differential expression analysis, solutions are only beginning to emerge. Rather than comprehensively describing each method, we highlight the key common principles as well as the critical differences underlying High-throughput RNA sequencing (RNA-seq) promises a comprehensive picture of the transcriptome, allowing for the complete annotation and quantification of all genes and their isoforms across samples. Realizing this promise requires increasingly complex computational methods. These computational challenges fall into three main categories: (i) read mapping, (ii) transcriptome reconstruction and (iii) expression quantification. Here we explain the major conceptual and practical challenges, and the general classes of solutions for each category. Finally, we highlight the interdependence between these categories and discuss the benefits for different biological applications.
Detecting and Comparing Non-Coding RNAs in the High-Throughput Era
, 2013
"... In recent years there has been a growing interest in the field of non-coding RNA. This surge is a direct consequence of the discovery of a huge number of new non-coding genes and of the finding that many of these transcripts are involved in key cellular functions. In this context, accurately detect ..."
Abstract
- Add to MetaCart
In recent years there has been a growing interest in the field of non-coding RNA. This surge is a direct consequence of the discovery of a huge number of new non-coding genes and of the finding that many of these transcripts are involved in key cellular functions. In this context, accurately detecting and comparing RNA sequences has become important. Aligning nucleotide sequences is a key requisite when searching for homologous genes. Accurate alignments reveal evolutionary relationships, conserved regions and more generally any biologically relevant pattern. Comparing RNA molecules is, however, a challenging task. The nucleotide alphabet is simpler and therefore less informative than that of amino-acids. Moreover for many non-coding RNAs, evolution is likely to be mostly constrained at the structural level and not at the sequence level. This results in very poor sequence conservation impeding comparison of these molecules. These difficulties define a context where new methods are urgently needed in order to exploit experimental results to their full potential. This review focuses on the comparative genomics of non-coding RNAs in the context of new sequencing technologies and especially dealing with two extremely important and timely research aspects: the development of new methods to align RNAs and the analysis of high-throughput data.