Results 1 -
3 of
3
Multiseed lossless filtration
- IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (TCBB)
, 2005
"... We study a method of seed-based lossless filtration for approximate string matching and related bioinformatics applications. The method is based on a simultaneous use of several spaced seeds rather than a single seed as studied by Burkhardt and Karkkainen [1]. We present algorithms to compute severa ..."
Abstract
-
Cited by 35 (7 self)
- Add to MetaCart
We study a method of seed-based lossless filtration for approximate string matching and related bioinformatics applications. The method is based on a simultaneous use of several spaced seeds rather than a single seed as studied by Burkhardt and Karkkainen [1]. We present algorithms to compute several important parameters of seed families, study their combinatorial properties, and describe several techniques to construct efficient families. We also report a large-scale application of the proposed technique to the problem of oligonucleotide selection for an EST sequence database.
A Novel Heuristic for Local Multiple Alignment of Interspersed DNA Repeats
"... Abstract—Pairwise local sequence alignment methods have been the prevailing technique to identify homologous nucleotides between related species. However, existing methods that identify and align all homologous nucleotides in one or more genomes have suffered from poor scalability and limited accura ..."
Abstract
- Add to MetaCart
Abstract—Pairwise local sequence alignment methods have been the prevailing technique to identify homologous nucleotides between related species. However, existing methods that identify and align all homologous nucleotides in one or more genomes have suffered from poor scalability and limited accuracy. We propose a novel method that couples a gapped extension heuristic with an efficient filtration method for identifying interspersed repeats in genome sequences. During gapped extension, we use the MUSCLE implementation of progressive global multiple alignment with iterative refinement. The resulting gapped extensions potentially contain alignments of unrelated sequence. We detect and remove such undesirable alignments using a hidden Markov model (HMM) to predict the posterior probability of homology. The HMM emission frequencies for nucleotide substitutions can be derived from any timereversible nucleotide substitution matrix. We evaluate the performance of our method and previous approaches on a hybrid data set of real genomic DNA with simulated interspersed repeats. Our method outperforms a related method in terms of sensitivity, positive predictive value, and localizing boundaries of homology. The described methods have been implemented in freely available software,
Archive of SID gpALIGNER: A Fast Algorithm for Global Pairwise Alignment of DNA Sequences
"... ABSTRACT: Bioinformatics, through the sequencing of the full genomes for many species, is increasingly relying on efficient global alignment tools exhibiting both high sensitivity and specificity. Many computational algorithms have been applied for solving the sequence alignment problem. Dynamic pro ..."
Abstract
- Add to MetaCart
ABSTRACT: Bioinformatics, through the sequencing of the full genomes for many species, is increasingly relying on efficient global alignment tools exhibiting both high sensitivity and specificity. Many computational algorithms have been applied for solving the sequence alignment problem. Dynamic programming, statistical methods, approximation and heuristic algorithms are the most common methods applied to this problem. We introduce gpALIGNER, a fast pairwise DNA-DNA global alignment algorithm. gpALIGNER uses similar score schema with DIALIGN-T to produce the final alignment. It also uses the concept of “spaced seeds ” to determine locally aligned subsequences which construct semi-global alignment as the preliminaries of global alignment computation. This enables gpALIGNER to have the precision provided by the DIALIGN-T algorithm in considerably less time and space complexities. We performed benchmarking of our approach based on numerous datasets from standard benchmarking databases and real sequences of NCBI database where gpALIGNER performed three times faster than DIALIGN-T. gpALIGNER is a new alternative for having sensitivity and selectivity of DIALIGN-T but with less computational cost.

