Results 1 - 10
of
346
Probabilistic and Statistical Properties of Words: An Overview
- Journal of Computational Biology
, 2000
"... In the following, an overview is given on statistical and probabilistic properties of words, as occurring in the analysis of biological sequences. Counts of occurrence, counts of clumps, and renewal counts are distinguished, and exact distributions as well as normal approximations, Poisson process a ..."
Abstract
-
Cited by 68 (1 self)
- Add to MetaCart
In the following, an overview is given on statistical and probabilistic properties of words, as occurring in the analysis of biological sequences. Counts of occurrence, counts of clumps, and renewal counts are distinguished, and exact distributions as well as normal approximations, Poisson process approximations, and compound Poisson approximations are derived. Here, a sequence is modelled as a stationary ergodic Markov chain; a test for determining the appropriate order of the Markov chain is described. The convergence results take the error made by estimating the Markovian transition probabilities into account. The main tools involved are moment generating functions, martingales, Stein’s method, and the Chen-Stein method. Similar results are given for occurrences of multiple patterns, and, as an example, the problem of unique recoverability of a sequence from SBH chip data is discussed. Special emphasis lies on disentangling the complicated dependence structure between word occurrences, due to self-overlap as well as due to overlap between words. The results can be used to derive approximate, and conservative, con � dence intervals for tests. Key words: word counts, renewal counts, Markov model, exact distribution, normal approximation, Poisson process approximation, compound Poisson approximation, occurrences of multiple words, sequencing by hybridization, martingales, moment generating functions, Stein’s method, Chen-Stein method. 1.
A Variety Theorem Without Complementation.
, 1995
"... this paper is to show that such results are not isolated, but are instances of a result as general as Eilenberg's theorem. On the language side, we consider positive varieties of languages, which have the same properties as varieties of languages except they are not supposed to be closed under compl ..."
Abstract
-
Cited by 53 (21 self)
- Add to MetaCart
this paper is to show that such results are not isolated, but are instances of a result as general as Eilenberg's theorem. On the language side, we consider positive varieties of languages, which have the same properties as varieties of languages except they are not supposed to be closed under complement. On the algebraic side, varieties of finite semigroups are replaced by varieties of finite ordered semigroups. Our main result states there is a one-to-one correspondence between positive varieties of languages and varieties of finite ordered semigroups. Due to the lack of space, we shall just give a few examples of this correspondence and defer to future papers the detailed study of our new types of varieties. For instance, P. Weil and the author have shown that the theorems of Birkhoff and Reiterman can be extended to ordered semigroups by replacing equations by inequations
On the numerical integration of ordinary differential equations by symmetric composition methods
- SIAM J. Sci. Comput
, 1995
"... Abstract. Differential equations of the form ˙x = X = A + B are considered, where the vector fields A and B can be integrated exactly, enabling numerical integration of X by composition of the flows of A and B. Various symmetric compositions are investigated for order, complexity, and reversibility. ..."
Abstract
-
Cited by 52 (10 self)
- Add to MetaCart
Abstract. Differential equations of the form ˙x = X = A + B are considered, where the vector fields A and B can be integrated exactly, enabling numerical integration of X by composition of the flows of A and B. Various symmetric compositions are investigated for order, complexity, and reversibility. Free Lie algebra theory gives simple formulae for the number of determining equations for a method to have a particular order. A new, more accurate way of applying the methods thus obtained to compositions of an arbitrary first-order integrator is described and tested. The determining equations are explored, and new methods up to 100 times more accurate (at constant work) than those previously known are given. 1. Composition methods. Composition methods are particularly useful for numerically integrating differential equations when the equations have some special structure which it is advantageous to preserve. They tend to have larger local truncation errors than standard (Runge-Kutta, multistep) methods [4,5], but this defect can be more than compensated for by their superior conservation properties. Capital letters such as X will denote vector fields on some space with coordinates x, with flows exp(tX), i.e., ˙x = X(x) ⇒ x(t) = exp(tX)(x(0)). The vector field X is given and is to be integrated numerically with fixed time step t. Composition methods apply when one can write X = A + B in such a way that exp(tA), exp(tB) can both be calculated explicitly. Then the most elementary such method is the map (essentially the “Lie-Trotter ” formula [26]) ϕ: x ↦ → x ′ = exp(tA) exp(tB)(x) = x(t) + O(t 2). (1.1) The advantage of composing exact solutions in this way is that many geometric properties of the true flow exp(tX) are preserved: group properties in particular. If X, A, and B are Hamiltonian vector fields then both exp(tX) and the map ϕ
Basic Analytic Combinatorics of Directed Lattice Paths
- Theoretical Computer Science
, 2001
"... This paper develops a unified enumerative and asymptotic theory of directed 2-dimensional lattice paths in half-planes and quarter-planes. The lattice paths are speci ed by a finite set of rules that are both time and space homogeneous, and have a privileged direction of increase. (They are then ess ..."
Abstract
-
Cited by 52 (11 self)
- Add to MetaCart
This paper develops a unified enumerative and asymptotic theory of directed 2-dimensional lattice paths in half-planes and quarter-planes. The lattice paths are speci ed by a finite set of rules that are both time and space homogeneous, and have a privileged direction of increase. (They are then essentially 1-dimensional objects.) The theory relies on a specific "kernel method" that provides an important decomposition of the algebraic generating functions involved, as well as on a generic study of singularities of an associated algebraic curve. Consequences are precise computable estimates for the number of lattice paths of a given length under various constraints (bridges, excursions, meanders) as well as a characterization of the limit laws associated to several basic parameters of paths.
Uniform Spectral Properties Of One-Dimensional Quasicrystals, IV. Quasi-Sturmian Potentials
- I. Absence of eigenvalues, Commun. Math. Phys
, 2000
"... We consider discrete one-dimensional Schrodinger operators with quasi-Sturmian potentials. We present a new approach to the trace map dynamical system which is independent of the initial conditions and establish a characterization of the spectrum in terms of bounded trace map orbits. Using this, ..."
Abstract
-
Cited by 42 (28 self)
- Add to MetaCart
We consider discrete one-dimensional Schrodinger operators with quasi-Sturmian potentials. We present a new approach to the trace map dynamical system which is independent of the initial conditions and establish a characterization of the spectrum in terms of bounded trace map orbits. Using this, it is shown that the operators have purely singular continuous spectrum and their spectrum is a Cantor set of Lebesgue measure zero. We also exhibit a subclass having purely ff-continuous spectrum. All these results hold uniformly on the hull generated by a given potential.
Timed Automata and the Theory of Real Numbers
- CONCUR'99, LNCS 1664
, 1999
"... A configuration of a timed automaton is given by a control state and finitely many clock (real) values. We show here that the binary reachability relation between configurations of a timed automaton is definable in an additive theory of real numbers, which is decidable. This result implies the decid ..."
Abstract
-
Cited by 34 (0 self)
- Add to MetaCart
A configuration of a timed automaton is given by a control state and finitely many clock (real) values. We show here that the binary reachability relation between configurations of a timed automaton is definable in an additive theory of real numbers, which is decidable. This result implies the decidability of model checking for some properties which cannot be expressed in timed temporal logics and provide with alternative proofs of some known decidable properties. Our proof is based on two intermediate results: 1. Every timed automaton can be effectively emulated by a timed automaton which does not contain nested loops. 2. The binary reachability relation for counter automata without nested loops (called here flat automata) is expressible in the additive theory of integers (resp. real numbers). The second result can be derived from [10]. 1 Introduction Timed automata have been introduced in [4] to model real time systems and became quickly a standard. They roughly consist in adding to...
Efficient detection of unusual words
- J. COMP. BIOL
, 2000
"... Words that are, by some measure, over- or underrepresented in the context of larger sequences have been variously implicated in biological functions and mechanisms. In most approaches to such anomaly detections, the words (up to a certain length) are enumerated more or less exhaustively and are indi ..."
Abstract
-
Cited by 34 (7 self)
- Add to MetaCart
Words that are, by some measure, over- or underrepresented in the context of larger sequences have been variously implicated in biological functions and mechanisms. In most approaches to such anomaly detections, the words (up to a certain length) are enumerated more or less exhaustively and are individually checked in terms of observed and expected frequencies, variances, and scores of discrepancy and significance thereof. Here we take the global approach of annotating the suffix tree of a sequence with some such values and scores, having in mind to use it as a collective detector of all unexpected behaviors, or perhaps just as a preliminary filter for words suspicious enough to undergo a more accurate scrutiny. We consider in depth the simple probabilistic model in which sequences are produced by a random source emitting symbols from a known alphabet independently and according to a given distribution. Our main result consists of showing that, within this model, full tree annotations can be carried out in a time-and-space optimal fashion for the mean, variance and some of the adopted measures of significance. This result is achieved by an ad hoc embedding in statistical expressions of the combinatorial structure of the periods of a string. Specifically,
Transcendence of Sturmian or morphic continued fractions
- J. Number Theory
"... Communicated byM. Waldschmidt ..."

