Results 1  10
of
22
Information Theory and Statistics
, 1968
"... Entropy and relative entropy are proposed as features extracted from symbol sequences. Firstly, a proper Iterated Function System is driven by the sequence, producing a fractaMike representation (CSR) with a low computational cost. Then, two entropic measures are applied to the CSR histogram of th ..."
Abstract

Cited by 1805 (2 self)
 Add to MetaCart
Entropy and relative entropy are proposed as features extracted from symbol sequences. Firstly, a proper Iterated Function System is driven by the sequence, producing a fractaMike representation (CSR) with a low computational cost. Then, two entropic measures are applied to the CSR histogram of the CSR and theoretically justified. Examples are included.
L: The spectrum of genomic signatures: from dinucleotides to chaos game representation
 Gene
"... In the post genomic era, access to complete genome sequence data for numerous diverse species has triggered multiple avenues for examining and comparing primary DNA sequence organization of entire genomes. Previously, the concept of a genomic signature was introduced with the observation of species ..."
Abstract

Cited by 34 (4 self)
 Add to MetaCart
In the post genomic era, access to complete genome sequence data for numerous diverse species has triggered multiple avenues for examining and comparing primary DNA sequence organization of entire genomes. Previously, the concept of a genomic signature was introduced with the observation of speciestype specific Dinucleotide Relative Abundance Profiles (DRAPs); dinucleotides were identified as the subsequence with the greatest bias in representation in a majority of genomes. Herein, we demonstrate that DRAP is one particular genomic signature contained within a broader spectrum of signatures. In this spectrum, an alternative genomic signature, Chaos Game Representation (CGR), provides a unique visualization of patterns in sequence organization. A genomic signature is associated with a particular integer order or subsequence length that represents a measure of the resolution or granularity in the analysis of primary DNA sequence organization. We quantitatively explore the organizational information provided by genomic signatures of different orders through different distance measures, including a novel Image Distance. The Image Distance and other existing distance measures are evaluated by comparing the phylogenetic trees they generate for 26 complete mitochondrial genomes from a diversity of species. The phylogenetic tree generated by the Image Distance is compatible with the known relatedness of species. Quantitative evaluation of the spectrum of genomic signatures may be used to ultimately gain insight into the determinants and biological relevance of the genome signatures. 1 1
Spatial Representation of Symbolic Sequences through Iterative Function Systems
 IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems and Humans
, 1998
"... Jeffrey proposed a graphic representation of DNA sequences using Barnsley's iterative function systems. In spite of further developments in this direction, the proposed graphic representation of DNA sequences has been lacking a rigorous connection between its spatial scaling characteristics and ..."
Abstract

Cited by 30 (10 self)
 Add to MetaCart
Jeffrey proposed a graphic representation of DNA sequences using Barnsley's iterative function systems. In spite of further developments in this direction, the proposed graphic representation of DNA sequences has been lacking a rigorous connection between its spatial scaling characteristics and the statistical characteristics of the DNA sequences themselves. We 1) generalize Jeffrey's graphic representation to accommodate (possibly infinite) sequences over an arbitrary finite number of symbols, 2) establish a direct correspondence between the statistical characterization of symbolic sequences via R'enyi entropy spectra and the multifractal characteristics (R'enyi generalized dimensions) of the sequences' spatial representations, 3) show that for general symbolic dynamical systems, the multifractal f H  spectra in the sequence space coincide with the f H spectra on spatial sequence representations. Keywords Multifractal theory, Iterative function systems, Chaos game representation...
Application of Information Theory to DNA sequence analysis: a review, Pattern recognition
, 1996
"... AbstractThe analysis of DNA sequences through information theory methods is reviewed from the beginning in the 70s. The subject is addressed within a broad context, describing in some detail the cornerstone contributions in the field. The emerging interest concerning longrange correlations and th ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
(Show Context)
AbstractThe analysis of DNA sequences through information theory methods is reviewed from the beginning in the 70s. The subject is addressed within a broad context, describing in some detail the cornerstone contributions in the field. The emerging interest concerning longrange correlations and the mosaic structure of DNA sequences is considered from our own point of view. A recent procedure developed by the authors is also outlined. Copyright © 1996 Pattern Recognition Society. Published by Elsevier Science Ltd. Information theory DNA sequences Entropy Chaosgame representation
Universal Sequence Map (USM) of Arbitrary Discrete Sequences
"... For over a decade the idea of representing biological sequences in a continuous coordinate space has maintained its appeal but not been fully realized. The basic idea is that any sequence of symbols may define trajectories in the continuous space conserving all its statistical properties. Ideally, s ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
For over a decade the idea of representing biological sequences in a continuous coordinate space has maintained its appeal but not been fully realized. The basic idea is that any sequence of symbols may define trajectories in the continuous space conserving all its statistical properties. Ideally, such a representation would allow scale independent sequence analysis without the context of fixed memory length. A simple example would consist on being able to infer the homology between two sequences solely by comparing the coordinates of any two homologous units.
Dynamics and topographic organization in recursive selforganizing map
 NEURAL COMPUTATION
, 2006
"... Recently, there has been an outburst of interest in extending topographic maps of vectorial data to more general data structures, such as sequences or trees. However, at present, there is no general consensus as to how best to process sequences using topographic maps and this topic remains a very a ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
Recently, there has been an outburst of interest in extending topographic maps of vectorial data to more general data structures, such as sequences or trees. However, at present, there is no general consensus as to how best to process sequences using topographic maps and this topic remains a very active focus of current neurocomputational research. The representational capabilities and internal representations of the models are not well understood. We rigorously analyze a generalization of the SelfOrganizing Map (SOM) for processing sequential data, Recursive SOM (RecSOM) (Voegtlin, 2002), as a nonautonomous dynamical system consisting of a set of fixed input maps. We argue that contractive fixed input maps are likely to produce Markovian organizations of receptive fields on the RecSOM map. We derive bounds on parameter β (weighting the importance of importing past information when processing sequences) under which contractiveness of the fixed input maps is guaranteed. Some generalizations of SOM contain a dynamic module responsible for processing temporal contexts as an integral part of the model. We show that Markovian topographic maps of sequential data can be produced using a simple fixed (nonadaptable) dynamic module externally feeding a standard topographic model designed to process static vectorial data of fixed dimensionality (e.g. SOM). However, by allowing trainable feedback connections one can obtain Markovian maps with superior memory depth and topography preservation. We elaborate upon the importance of nonMarkovian organizations in topographic maps of 2sequential data.
Constructing FiniteContext Sources From Fractal Representations of Symbolic Sequences
, 1998
"... We propose a novel approach to constructing predictive models on long complex symbolic sequences. The models are constructed by first transforming the training sequence nblock structure into a spatial structure of points in a unit hypercube. The transformation between the symbolic and Euclidean spa ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
We propose a novel approach to constructing predictive models on long complex symbolic sequences. The models are constructed by first transforming the training sequence nblock structure into a spatial structure of points in a unit hypercube. The transformation between the symbolic and Euclidean spaces embodies a natural smoothness assumption (nblocks with long common suffices are likely to produce similar continuations) in that the longer is the common suffix shared by any two nblocks, the closer lie their point representations. Finding a set of prediction contexts is then formulated as a resource allocation problem solved by vector quantizing the spatial representation of the training sequence nblock structure. Our predictive models are similar in spirit to variable memory length Markov models (VLMMs). We compare the proposed models with both the classical and variable memory length Markov models on two chaotic symbolic sequences with different levels of subsequence distribution ...
Extracting Finite State Representations from Recurrent Neural Networks trained on Chaotic Symbolic Sequences
 IEEE Transactions on Neural Networks
, 1999
"... While much work has been done in neural based modeling of real valued chaotic time series, little effort has been devoted to address similar problems in the symbolic domain. We investigate the knowledge induction process associated with training recurrent neural networks (RNNs) on single long chaoti ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
While much work has been done in neural based modeling of real valued chaotic time series, little effort has been devoted to address similar problems in the symbolic domain. We investigate the knowledge induction process associated with training recurrent neural networks (RNNs) on single long chaotic symbolic sequences. Even though training RNNs to predict the next symbol leaves the standard performance measures such as the mean square error on the network output virtually unchanged, the networks nevertheless do extract a lot of knowledge. We monitor the knowledge extraction process by considering the networks stochastic sources and letting them generate sequences which are then confronted with the training sequence via information theoretic entropy and crossentropy measures. We also study the possibility of reformulating the knowledge gained by RNNs in a compact and easytoanalyze form of finite state stochastic machines. The experiments are performed on two sequences with different...
cell development
"... and searchable full.pdf file below This peerreviewed science paper includes accomplishments of a) guiding one of the earliest proponents [M.J.S. in the form of filed patens and even in the movie of ABC producer Sony Pemberton] from his definitive but general statement that “Junk DNA ” can not be th ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
and searchable full.pdf file below This peerreviewed science paper includes accomplishments of a) guiding one of the earliest proponents [M.J.S. in the form of filed patens and even in the movie of ABC producer Sony Pemberton] from his definitive but general statement that “Junk DNA ” can not be there “for the importance of doing nothing ” since evolution would have eliminated such burden. “If not Junk, what is it? ” M.J.S. came to the conclusion of the specific statement [by A.J.P.] of FractoGene; “Fractal genome grows fractal organisms” b) pointing to and reproducing “the Heurekafigure ” of the FractoGene now in force as a utilitypatent with enormously farreaching Claims. (#1: Utility of relation of genomic and organismal fractals)