Results 1 - 10
of
24
A benchmark of multiple sequence alignment programs upon structural RNAs
- Nucleic Acids Res
, 2005
"... To date, few attempts have been made to benchmark the alignment algorithms upon nucleic acid sequences. Frequently, sophisticated PAM or BLOSUM like models are used to align proteins, yet equivalents are not considered for nucleic acids; instead, rather ad hoc models are generally favoured. Here, we ..."
Abstract
-
Cited by 58 (8 self)
- Add to MetaCart
To date, few attempts have been made to benchmark the alignment algorithms upon nucleic acid sequences. Frequently, sophisticated PAM or BLOSUM like models are used to align proteins, yet equivalents are not considered for nucleic acids; instead, rather ad hoc models are generally favoured. Here, we systematically test the performance of existing alignment algorithms on structural RNAs. This work was aimed at achieving the following goals: (i) to determine conditions where it is appropriate to apply common sequence alignment methods to the structuralRNAalignmentproblem.Thisindicates where and when researchers should consider augmenting the alignment process with auxiliary information, such as secondary structure and (ii) to determine which sequence alignment algorithms perform well under the broadest range of conditions. We find that sequence alignment alone, using the current algorithms, is generally inappropriate,50–60 % sequence identity. Second, we note that the probabilistic method ProAlign and the aging Clustal algorithms generally outperform other sequence-based algorithms, under the broadest range of applications.
An Iterated loop matching approach to the prediction of RNA secondary structures with pseudoknots
, 2004
"... Motivation: Pseudoknots have generally been excluded from the prediction of RNA secondary structures due to its difficulty in modeling. Although, several dynamic programming algorithms exist for the prediction of pseudoknots using thermodynamic approaches, they are neither reliable nor efficient. On ..."
Abstract
-
Cited by 39 (2 self)
- Add to MetaCart
Motivation: Pseudoknots have generally been excluded from the prediction of RNA secondary structures due to its difficulty in modeling. Although, several dynamic programming algorithms exist for the prediction of pseudoknots using thermodynamic approaches, they are neither reliable nor efficient. On the other hand, comparative methods are more reliable, but are often done in an ad hoc manner and require expert intervention. Maximum weighted matching, an algorithm for pseudoknot prediction with comparative analysis, suffers from low-prediction accuracy in many cases.
Stochastic Context-Free Grammars for Modeling RNA
, 1993
"... this paper, we apply stochastic context-free grammars (SCFGs) to the problems of statistical modeling, database searching, multiple alignment, and prediction of the secondary structure of RNA families. This approach is highly related to our previous work on modeling protein families with HMMs [HKMS9 ..."
Abstract
-
Cited by 36 (4 self)
- Add to MetaCart
this paper, we apply stochastic context-free grammars (SCFGs) to the problems of statistical modeling, database searching, multiple alignment, and prediction of the secondary structure of RNA families. This approach is highly related to our previous work on modeling protein families with HMMs [HKMS93, KBM
Backofen R: MARNA: multiple alignment and consensus structure prediction of RNAs based on sequence structure comparisons
- Bioinformatics
, 2005
"... of RNAs based on sequence structure comparisons ..."
A Comparative Genomics Approach to Prediction of New Members of Regulons
, 2001
"... this article, an approach is presented toev3KK3K the computational predicitions of regulatory sites.We combine the prediction of transcription units havs. orthologous genes with the prediction of transcription factor binding sites based on probabilistic models.We augment the sets of genes in that ..."
Abstract
-
Cited by 21 (6 self)
- Add to MetaCart
this article, an approach is presented toev3KK3K the computational predicitions of regulatory sites.We combine the prediction of transcription units havs. orthologous genes with the prediction of transcription factor binding sites based on probabilistic models.We augment the sets of genes in that are expected to be regulated by two transcription factors, the cAMP receptor protein and the fumarate and nitrate reduction regulatory protein, through a comparison with theHaemophil4 inflphil genome At the same time, we learned more about the regulatory networks of , a species with much less experimental knowledge than E.col By studying orthologous genes subject to regulation by the same transcription factor, we also gained understanding of theev.2K624 of the entire regulatory systems The numberof complete microbial genome sequences is increasing at an unprecedented rate. To date, 29 bacterial genomes have been determined, 11 more are in annotation stage, and 83 are in progress. This surgeof sequenceinfncew--RB provides an enormous amount of dataft comparative genomics analysis. During the earlier stageof genomic analysis, mostof theefwPA was devoted to analysesof protein-coding regions because, in the courseof evolution, protein-coding sequences change much slower than the noncoding sequences (Koonin et al. 1997, 1998). These comparative genomics studies have proved highlyinfywPzRP--w allowing fowingwzz assignmentsfs many putative proteins in poorly studied organisms (Overbeek et al. 1999). One surprising resultfsu these analyses was the lackof long-range conservationof gene order in bacterial genomes, with the exceptionof species within the same genus (Tatusov et al. 1996; Himmelreich et al. 1997). For speciesof intermediate phylogenetic distance, such as in Escherichi...
The Application of Stochastic Context-Free Grammars to Folding, Aligning and Modeling Homologous RNA Sequences
, 1993
"... Stochastic context-free grammars (SCFGs) are applied to the problems of folding, aligning and modeling families of homologous RNA sequences. SCFGs capture the sequences' common primary and secondary structure and generalize the hidden Markov models (HMMs) used in related work on protein and DNA. ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Stochastic context-free grammars (SCFGs) are applied to the problems of folding, aligning and modeling families of homologous RNA sequences. SCFGs capture the sequences' common primary and secondary structure and generalize the hidden Markov models (HMMs) used in related work on protein and DNA. The novel aspect of this work is that SCFG parameters are learned automatically from unaligned, unfolded training sequences. A generalization of the HMM forward-backward algorithm is introduced to do this. The new algorithm, Tree-Grammar EM, based on tree grammars and faster than the previously proposed SCFG inside-outside training algorithm, produced a model that we tested on the transfer RNA (tRNA) family. Results show that after having been trained on as few as 20 tRNA sequences from only two tRNA subfamilies (mitochondrial and cytoplasmic), the model can discern general tRNA from similarlength RNA sequences of other kinds, can find secondary structure of new tRNA sequences, and c...
Automatic RNA Secondary Structure Determination with Stochastic Context-Free Grammars
, 1995
"... We have developed a method for predicting the common secondary structure of large RNA multiple alignments using only the information in the alignment. It uses a series of progressively more sensitive searches of the data in an iterative manner to discover regions of base pairing; the first pass exam ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We have developed a method for predicting the common secondary structure of large RNA multiple alignments using only the information in the alignment. It uses a series of progressively more sensitive searches of the data in an iterative manner to discover regions of base pairing; the first pass examines the entire multiple alignment. The searching uses two methods to find base pairings. Mutual information is used to measure covariation between pairs of columns in the multiple alignment and a minimum length encoding method is used to detect column pairs with high potential to base pair. Dynamic programming is used to recover the optimal tree made up of the best potential base pairs and to create a stochastic context-free grammar. The information in the tree guides the next iteration of searching. The method is similar to the traditional comparative sequence analysis technique. The method correctly identifies most of the common secondary structure in 16S and 23S rRNA.
A new method to predict the consensus secondary structure of a set of unaligned RNA sequences
, 1999
"... Motivation: Predict the consensus secondary structure, possibly including pseudoknots, of a set of RNA unaligned sequences Results: We have designed a method based on a new representation of any RNA secondary structure as a set of structural relations between the helices of the structure. We refer t ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
Motivation: Predict the consensus secondary structure, possibly including pseudoknots, of a set of RNA unaligned sequences Results: We have designed a method based on a new representation of any RNA secondary structure as a set of structural relations between the helices of the structure. We refer to this representation as a structural pattern. In a first step we use thermodynamic parameters to select, for each sequence, the best secondary structures according to energy minimisation and we represent each of them using its corresponding structural pattern. In a second step we search for the repeated structural patterns i.e. the largest structural patterns that occur in at least one sequence, i.e. included in at least one of the structural patterns associated to each sequence. Thanks to an efficient encoding of structural patterns this search comes down to identify the largest repeated word suffixes in a dictionary. In a third step we compute the plausibility of each repeated structura...
CorreLogo: An online server for 3D sequence logos of RNA
- and DNA alignments,” Nucleic Acids Res
, 2006
"... We present an online server that generates a 3D representation of properties of user-submitted RNA or DNA alignments. The visualized properties are information of single alignment columns, mutual information of two alignment positions as well as the position-specific fraction of gaps. The nucleotide ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
We present an online server that generates a 3D representation of properties of user-submitted RNA or DNA alignments. The visualized properties are information of single alignment columns, mutual information of two alignment positions as well as the position-specific fraction of gaps. The nucleotide composition of both single columns and column pairs is visualized with the help of color-coded 3D bars labeled with letters. The server generates both VRML and JVX output that can be viewed with a VRML viewer or the JavaView applet, respectively. We show that combining these different features of an alignment into one 3D representation is helpful in identifying correlations between bases and potential RNA and DNA base pairs. Significant known correlations between the tRNA 30 anticodon cardinal nucleotide and the extended anticodon were observed, as were correlations within the amino acid acceptor stem and between the cardinal nucleotide and the acceptor stem. The online server can be accessed using the
Probabilistic Structure Calculations: A Three-Dimensional tRNA Structure from Sequence Correlation Data
, 1993
"... Algorithms based on probability theory can address issues of uncertainty directly through their representational framework and their theory for data combination. In this paper, we discuss the advantages of probabilistic formulations for molecular-structure calculations, describe one implementat ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
Algorithms based on probability theory can address issues of uncertainty directly through their representational framework and their theory for data combination. In this paper, we discuss the advantages of probabilistic formulations for molecular-structure calculations, describe one implementation of such a formulation, and show its performance on a data set derived from analysis of the statistical correlations within a set of aligned transfer RNA sequences. By assigning reasonable physical interpretations to certain statistical correlations, we are able to calculate three-dimensional structures for tRNA from a random starting structure. The constraints that we use are associated with different variances, and so their effects are not uniform, and must be reconciled by a probabilistic algorithm to yield the most likely structure. As might be predicted, the uncertainty in the position for each base is a function of both the number and strength of the constraints, and is reflected in the variances in atomic position calculated by the algorithm. For example, the hinge region in the tRNA is shown to be the most uncertain. In addition, the algorithm retains information about positional covariation that is useful for understanding the relationships between different parts of the structure.

