Results 1 - 10
of
72
A benchmark of multiple sequence alignment programs upon structural RNAs
- Nucleic Acids Res
, 2005
"... To date, few attempts have been made to benchmark the alignment algorithms upon nucleic acid sequences. Frequently, sophisticated PAM or BLOSUM like models are used to align proteins, yet equivalents are not considered for nucleic acids; instead, rather ad hoc models are generally favoured. Here, we ..."
Abstract
-
Cited by 58 (8 self)
- Add to MetaCart
To date, few attempts have been made to benchmark the alignment algorithms upon nucleic acid sequences. Frequently, sophisticated PAM or BLOSUM like models are used to align proteins, yet equivalents are not considered for nucleic acids; instead, rather ad hoc models are generally favoured. Here, we systematically test the performance of existing alignment algorithms on structural RNAs. This work was aimed at achieving the following goals: (i) to determine conditions where it is appropriate to apply common sequence alignment methods to the structuralRNAalignmentproblem.Thisindicates where and when researchers should consider augmenting the alignment process with auxiliary information, such as secondary structure and (ii) to determine which sequence alignment algorithms perform well under the broadest range of conditions. We find that sequence alignment alone, using the current algorithms, is generally inappropriate,50–60 % sequence identity. Second, we note that the probabilistic method ProAlign and the aging Clustal algorithms generally outperform other sequence-based algorithms, under the broadest range of applications.
Consensus Folding of Aligned Sequences as a New Measure for the Detection of Functional RNAs by Comparative Genomics
, 2004
"... Facing the ever-growing list of newly discovered classes of functional RNAs, it can be expected that further types of functional RNAs are still hidden in recently completed genomes. The computational identification of such RNA genes is, therefore, of major importance. While most known functional RNA ..."
Abstract
-
Cited by 45 (12 self)
- Add to MetaCart
Facing the ever-growing list of newly discovered classes of functional RNAs, it can be expected that further types of functional RNAs are still hidden in recently completed genomes. The computational identification of such RNA genes is, therefore, of major importance. While most known functional RNAs have characteristic secondary structures, their free energies are generally not statistically significant enough to distinguish RNA genes from the genomic background. Additional information is required. Considering the wide availability of new genomic data of closely related species, comparative studies seem to be the most promising approach. Here we show that prediction of consensus structures of aligned sequences can be a significant measure to detect functional RNAs. We report a new method how to test multiple sequence alignments for the existence of an unusually structured and conserved fold. We show for alignments of six types of well known functional RNA that an energy score consisting of free energy and a covariation term significantly improves sensitivity compared to single sequence predictions. We further test our method on a number of non coding RNAs from C. elegans/C. briggsae and seven Saccharomyces species. Most RNAs can be detected with high significance. We provide a Perl implementation which can be readily used to score single alignments and discuss how the methods described here can be extended to allow for e#cient genome-wide screens.
MIPS Arabidopsis thaliana database (MAtDB): an integrated biological knowledge resource based on the first complete plant genome
- Nucleic Acids Res
, 2002
"... integrated biological knowledge resource for plant genomics ..."
Abstract
-
Cited by 40 (6 self)
- Add to MetaCart
integrated biological knowledge resource for plant genomics
Contrafold: RNA secondary structure prediction without physics-based models
- Bioinformatics
, 2006
"... doi:10.1093/bioinformatics/btl246 ..."
R: Noncode: an integrated knowledge database of non-coding rnas
- Nucleic Acids Res
, 2005
"... NONCODE is an integrated knowledge database dedicated to non-coding RNAs (ncRNAs), that is to say, RNAs that function without being translated into proteins. All ncRNAs in NONCODE were filtered automatically from literature and GenBank, and were later manually curated. The distinctive features of NO ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
NONCODE is an integrated knowledge database dedicated to non-coding RNAs (ncRNAs), that is to say, RNAs that function without being translated into proteins. All ncRNAs in NONCODE were filtered automatically from literature and GenBank, and were later manually curated. The distinctive features of NONCODE are as follows: (i) the ncRNAs in NONCODE include almost all the types of ncRNAs, except transfer RNAs and ribosomal RNAs. (ii) All ncRNA sequences and their related information (e.g. function, cellular role, cellular location, chromosomal information, etc.) in NONCODE have been confirmed manually by consulting relevant literature: more than 80 % of the entries are based on experimental data. (iii) Based on the cellular process and function, which a given ncRNA is involved in, we introduced a novel classification system, labeled process function class, to integrate existing classification systems. (iv) In addition, some 1100 ncRNAs have been grouped into nine other classes according to whether they are specific to gender or tissue or associated with tumors and diseases, etc. (v) NONCODE provides a user-friendly interface, a visualization platform and a convenient search option, allowing efficient recovery of sequence, regulatory elements in the flanking sequences, secondary structure, related publications and other information. The first release of NONCODE (v1.0) contains 5339 non-redundant sequences from 861 organisms, including eukaryotes, eubacteria, archaebacteria, virus and viroids. Access is free for all users through a web interface at
Faster Genome Annotation of Non-coding RNA Families without Loss of Accuracy
- In Proceedings of the Eighth Annual International Conference on Computational Molecular Biology (RECOMB
, 2004
"... RNA molecules that do not code for proteins. Covariance Models (CMs) are a useful statistical tool to find new members of an ncRNA gene family in a large genome database, using both sequence and, importantly, RNA secondary structure information. Unfortunately, CM searches are slow. This paper shows ..."
Abstract
-
Cited by 23 (7 self)
- Add to MetaCart
RNA molecules that do not code for proteins. Covariance Models (CMs) are a useful statistical tool to find new members of an ncRNA gene family in a large genome database, using both sequence and, importantly, RNA secondary structure information. Unfortunately, CM searches are slow. This paper shows how to make CMs faster while provably sacrificing none of their accuracy. Specifically, based on the CM, our software builds a profile hidden Markov model (HMM), which filters the genome database. This HMM is a rigorous filter, i.e., its filtering eliminates only sequences that provably could not be annotated as homologs. The CM is run only on what remains. Optimizing the HMM for filtering involves minimizing an exponential objective # Dept. of Computer Science & Engineering, University of Washington, Box 352350, Seattle, WA, USA, 98195, zasha@cs.washington.edu + Depts. of Computer Science & Engineering and Genome Sciences, University of Washington, Box 352350, Seattle, WA, USA, 98195, ruzzo@cs.washington.edu c ACM, 2004. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version will be published in Proc. Eighth Annual Inter. Conf. on Computational Molecular Biology (RECOMB) , 2004. See http://recomb04.sdsc.edu/.
Abstract shapes of RNA
- Nucleic Acids Res
, 2004
"... The function of a non-protein-coding RNA is often determined by its structure. Since experimental determination of RNA structure is time-consuming and expensive, its computational prediction is of great interest, and efficient solutions based on thermodynamic parameters are known. Frequently, howeve ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
The function of a non-protein-coding RNA is often determined by its structure. Since experimental determination of RNA structure is time-consuming and expensive, its computational prediction is of great interest, and efficient solutions based on thermodynamic parameters are known. Frequently, however, the predicted minimum free energy structures are not the native ones, leading to the necessity of generating suboptimal solutions. While this can be accomplished by a number of programs, the user is often confronted with large outputs of similar structures, although he or she is interested in structures with more fundamentaldifferences,or, inotherwords, with different abstract shapes. Here, we formalize the concept of abstract shapes and introduce their efficient computation. Each shape of an RNA molecule comprises a class of similar structures and has a representative structure of minimal free energy within the class. Shape analysis is implemented in the program RNAshapes. We applied RNAshapes to the prediction of optimal and suboptimal abstract shapes of severalRNAs.For a given energy range, the number of shapes is considerably smaller than the number of structures, and in all cases, the native structures were among the top shape representatives. This demonstrates that the researcher can quickly focus on the structures of interest, without processing up to thousands of near-optimal solutions. We complement this study with a large-scale analysis of the growth behaviour of structure and shape spaces. RNAshapes is available for download and as an online version on the Bielefeld Bioinformatics Server.
Backofen R: MARNA: multiple alignment and consensus structure prediction of RNAs based on sequence structure comparisons
- Bioinformatics
, 2005
"... of RNAs based on sequence structure comparisons ..."
Human microRNA prediction through a probabilistic co-learning model of sequence and structure
- Nucleic Acids Res
, 2005
"... and structure ..."
Pair stochastic tree adjoining grammars for aligning and predicting pseudoknot RNA structures
- Bioinformatics
, 2005
"... Motivation: Since the whole genome sequences for many species are currently available, computational predictions of RNA secondary structures and computational identifications of those non-coding RNA regions by comparative genomics become important, and require more advanced alignment methods. Recent ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Motivation: Since the whole genome sequences for many species are currently available, computational predictions of RNA secondary structures and computational identifications of those non-coding RNA regions by comparative genomics become important, and require more advanced alignment methods. Recently, an approach of structural alignments for RNA sequences has been introduced to solve these problems. By structural alignments, we mean a pairwise alignment to align an unfolded RNA sequence into a folded RNA sequence of known secondary structure. Pair HMMs on tree structures (PHMMTSs) proposed by Sakakibara are efficient automata-theoretic models for structural alignments of RNA secondary structures, but are incapable of handling pseudoknots. On the other hand, tree adjoining grammars (TAGs) is a subclass of context-sensitive grammar, which is suitable for modeling pseudoknots. Our goal is to extend PHMMTSs by incorporating TAGs to be able to handle pseudoknots. Results: We propose the pair stochastic tree adjoining grammars (PSTAGs) for modeling RNA secondary structures including pseudoknots and show the strong experimental evidences that modeling pseudoknot structures significantly improves the prediction accuracies of RNA secondary structures. First, we extend the notion of PHMMTSs defined on alignments of ‘trees ’ to PSTAGs defined on alignments of “TAG (derivation) trees”, which represent a topdown parsing process of TAGs and are functionally equivalent to derived trees of TAGs. Second, we modify PSTAGs so that it takes as input a pair of a linear sequence and a TAG tree representing a pseudoknot structure of RNA to produce a structural alignment. Then, we develop a polynomial-time algorithm for obtaining an optimal structural alignment by PSTAGs, based on dynamic programming parser. We have done several computational experiments for predicting pseudoknots by PSTAGs, and our computational experiments suggests that prediction of RNA pseudoknot struc-

