Results 1  10
of
30
The study of correlation structures of DNA sequences: A critical review,”
 Computers Chem.,
, 1997
"... ..."
Applications of recursive segmentation to the analysis of DNA sequences
 Comput. Chem
, 2002
"... Abstract Recursive segmentation is a procedure that partitions a DNA sequence into domains with a homogeneous composition of the four nucleotides A, C, G and T. This procedure can also be applied to any sequence converted from a DNA sequence, such as to a binary strong(G + C)/weak(A+ T) sequence, t ..."
Abstract

Cited by 43 (3 self)
 Add to MetaCart
(Show Context)
Abstract Recursive segmentation is a procedure that partitions a DNA sequence into domains with a homogeneous composition of the four nucleotides A, C, G and T. This procedure can also be applied to any sequence converted from a DNA sequence, such as to a binary strong(G + C)/weak(A+ T) sequence, to a binary sequence indicating the presence or absence of the dinucleotide CpG, or to a sequence indicating both the base and the codon position information. We apply various conversion schemes in order to address the following five DNA sequence analysis problems: isochore mapping, CpG island detection, locating the origin and terminus of replication in bacterial genomes, finding complex repeats in telomere sequences, and delineating coding and noncoding regions. We find that the recursive segmentation procedure can successfully detect isochore borders, CpG islands, and the origin and terminus of replication, but it needs improvement for detecting complex repeats as well as borders between coding and noncoding regions.
Spectral Repeat Finder (SRF): Identification of Repetitive Sequences using Fourier Transformation
 Bioinformatics
"... Motivation: Repetitive DNA sequences, besides having a variety of regulatory functions, are one of the principal causes of genomic instability. Understanding their origin and evolution is of fundamental importance for genome studies. The identification of repeats and their units helps in deducing t ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
(Show Context)
Motivation: Repetitive DNA sequences, besides having a variety of regulatory functions, are one of the principal causes of genomic instability. Understanding their origin and evolution is of fundamental importance for genome studies. The identification of repeats and their units helps in deducing the intragenomic dynamics as an important feature of comparative genomics. A major difficulty in identification of repeats arises from the fact that the repeat units can be either exact or imperfect, in tandem or dispersed, and of unspecified length. Results: The Spectral Repeat Finder program circumvents these problems by using a discrete Fourier transformation to identify significant periodicities present in a sequence. The specific regions of the sequence that contribute to a given periodicity are located through a sliding window analysis, and an exact search method is then used to find the repetitive units. Efficient and complete detection of repeats is provided together with interactive and detailed visualization of the spectral analysis of input sequence. We demonstrate the utility of our method with various examples that contain previously unannotated repeats. A Web server has been developed for convenient access to the automated program.
Application of Information Theory to DNA sequence analysis: a review, Pattern recognition
, 1996
"... AbstractThe analysis of DNA sequences through information theory methods is reviewed from the beginning in the 70s. The subject is addressed within a broad context, describing in some detail the cornerstone contributions in the field. The emerging interest concerning longrange correlations and th ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
AbstractThe analysis of DNA sequences through information theory methods is reviewed from the beginning in the 70s. The subject is addressed within a broad context, describing in some detail the cornerstone contributions in the field. The emerging interest concerning longrange correlations and the mosaic structure of DNA sequences is considered from our own point of view. A recent procedure developed by the authors is also outlined. Copyright © 1996 Pattern Recognition Society. Published by Elsevier Science Ltd. Information theory DNA sequences Entropy Chaosgame representation
866 Quantification of DNA Patchiness Using LongRange Correlation Measures
"... ABSTRACT We introduce and develop new techniques to quantify DNA patchiness, and to quantify characteristics of its mosaic structure. These techniques, which involve calculating two functions, a(f) and f3(f), measure correlations at length scale e and detect distinct characteristic patch sizes embed ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
ABSTRACT We introduce and develop new techniques to quantify DNA patchiness, and to quantify characteristics of its mosaic structure. These techniques, which involve calculating two functions, a(f) and f3(f), measure correlations at length scale e and detect distinct characteristic patch sizes embedded in scaleinvariant patch size distributions. Using these new methods, we address a number of issues relating to the mosaic structure of genomic DNA. We find several distinct characteristic patch sizes in certain genomic sequences, and compare, contrast, and quantify the correlation properties of different sequences, including a number of yeast, human, and prokaryotic sequences. We exclude the possibility that the correlation properties and the known mosaic structure of DNA can be explained either by simple Markov processes or by tandem repeats of dinucleotides. We find that the distinct patch sizes in all 16 yeast chromosomes are similar. Furthermore, we test the hypothesis that, for yeast, patchiness is caused by the alternation of coding and noncoding regions, and the hypothesis that in human sequences patchiness is related to repetitive sequences. We find that, by themselves, neither the alternation of coding and noncoding regions, nor repetitive sequences, can fully explain the longrange correlation properties of DNA.
Fractal properties of DNA walks
, 1998
"... We describe two dimensional DNA walks, and analyze their fractal properties. We show results for the complete genome of S. cerevisiae. We find that the mean square deviation of the walks is superdifussive, corresponding to a fractal structure of dimension lower than two. Furthermore, the coding part ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We describe two dimensional DNA walks, and analyze their fractal properties. We show results for the complete genome of S. cerevisiae. We find that the mean square deviation of the walks is superdifussive, corresponding to a fractal structure of dimension lower than two. Furthermore, the coding part of the genome seems to have smaller fractal dimension, and longer correlations, than noncoding parts. Key words: DNA sequences. Random walks. Fractals. S. cerevisiae. 1 Introduction There is a growing interest in the scientific community in studying DNA sequences from a physical or mathematical point of view. It has been claimed that long range correlations exist in DNA (Peng et al., 1992; Voss, 1992; Mantegna et al., 1994; Li et al., 1994; Mantegna et al., 1995), as well as the contrary (Azbel, 1995), and several controversial points have been discussed in the last years. (See, for example, the series of comments that followed (Mantegna et al., 1994) in (Israeloff et al., 1996; Bonhoef...
Delineating relative homogeneous G + C domains in DNA sequences
, 2001
"... The concept of homogeneity of G + C content is always relative and subjective. This point is emphasized and quantified in this paper using a simple example of one sequence segmented into two subsequences. Whether the sequence is homogeneous or not can be answered by whether the twosubsequence model ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
The concept of homogeneity of G + C content is always relative and subjective. This point is emphasized and quantified in this paper using a simple example of one sequence segmented into two subsequences. Whether the sequence is homogeneous or not can be answered by whether the twosubsequence model describes the DNA sequence better than the onesequence model. There are at least three equivalent ways of looking at the 1to2 segmentation: Jensen–Shannon divergence measure, log likelihood ratio test, and model selection using Bayesian information criterion. Once a criterion is chosen, a DNA sequence can be recursively segmented into multiple domains. We use one subjective criterion called segmentation strength based on the Bayesian information criterion. Whether or not a sequence is homogeneous and how many domains it has depend on this criterion. We compare six different genome sequences (yeast S. cerevisiae chromosome III and IV, bacterium M. pneumoniae, human major histocompatibility complex sequence, longest contigs in human chromosome 21 and 22) by recursive segmentations at different strength criteria. Results by recursive segmentation confirm that yeast chromosome IV is more homogeneous than yeast chromosome III, human chromosome 21 is more homogeneous than human chromosome 22, and bacterial genomes may not be homogeneous due to short segments with distinct base compositions. The recursive segmentation also provides a quantitative criterion for identifying isochores in human sequences. Some features of our recursive segmentation, such as the possibility of
www.elsevier.com/locate/gene Are isochore sequences homogeneous?
"... Three statistical/mathematical analyses are carried out on isochore sequences: spectral analysis, analysis of variance, and segmentation analysis. Spectral analysis shows that there are GC content fluctuations at different length scales in isochore sequences. The analysis of variance shows that the ..."
Abstract
 Add to MetaCart
(Show Context)
Three statistical/mathematical analyses are carried out on isochore sequences: spectral analysis, analysis of variance, and segmentation analysis. Spectral analysis shows that there are GC content fluctuations at different length scales in isochore sequences. The analysis of variance shows that the null hypothesis (the mean value of a group of GC contents remains the same along the sequence) may or may not be rejected for an isochore sequence, depending on the subwindow sizes at which GC contents are sampled, and the window size within which group members are defined. The segmentation analysis shows that there are stronger indications of GC content changes at isochore borders than within an isochore. These analyses support the notion of isochore sequences, but reject the assumption that isochore sequences are homogeneous at the base level. An isochore sequence may pass a homogeneity test when GC content fluctuations at smaller length scales are
Quantumlike Chaos in the Frequency Distributions of Bases A, C, G, T in Human Chromosome1 DNA
, 2004
"... Introduction DNA topology is of fundamental importance for a wide range of biological processes [1]. Since the topological state of genomic DNA is of importance for its replication, recombination and transcription, there is an immediate interest to obtain information about the supercoiled state fro ..."
Abstract
 Add to MetaCart
Introduction DNA topology is of fundamental importance for a wide range of biological processes [1]. Since the topological state of genomic DNA is of importance for its replication, recombination and transcription, there is an immediate interest to obtain information about the supercoiled state from sequence periodicities [2,3]. Identification of dominant periodicities in DNA sequence will help understand the important role of coherent structures in genome sequence organization [4,5]. Li [6] has discussed meaningful applications of spectral analyses in DNA sequence studies. Recent studies indicate that the DNA sequence of letters A, C, G and T exhibit the inverse power law form 1/f frequency spectrum where f is the frequency and a the exponent. It is possible, therefore, that the sequences have longrange order [714]. Inverse powerlaw form for power spectra of fractal spacetime fluctuations is generic to dynamical systems in nature and is identified as selforganized criticality
Article URL
, 2007
"... This Provisional PDF corresponds to the article as it appeared upon acceptance. Fully formatted PDF and full text (HTML) versions will be made available soon. miRNAminer: a tool for homologous microRNA gene search BMC Bioinformatics 2008, 9:39 doi:10.1186/14712105939 ..."
Abstract
 Add to MetaCart
This Provisional PDF corresponds to the article as it appeared upon acceptance. Fully formatted PDF and full text (HTML) versions will be made available soon. miRNAminer: a tool for homologous microRNA gene search BMC Bioinformatics 2008, 9:39 doi:10.1186/14712105939