Results 1  10
of
21
An analytic approach to the asymptotic variance of trie statistics and related structures
, 2013
"... ..."
(Show Context)
Joint String Complexity for Markov Sources
"... String complexity is defined as the cardinality of a set of all distinct words (factors) of a given string. For two strings, we define joint string complexity as the set of words that are common to both strings. We also relax this definition and introduce joint semicomplexity restricted to the commo ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
String complexity is defined as the cardinality of a set of all distinct words (factors) of a given string. For two strings, we define joint string complexity as the set of words that are common to both strings. We also relax this definition and introduce joint semicomplexity restricted to the common words appearing at least twice in both strings. In this paper we analyze joint complexity and joint semicomplexity when both strings are generated by a Markov source. The problem turns out to be quite challenging requiring subtle singularity analysis and saddle point method over infinity many saddle points leading to novel oscillatory phenomena with single and double periodicities.
The height of list tries and TST
 In International Conference on Analysis of Algorithms
"... Sorting using complete subintervals and the maximum number of runs in a randomly evolving se ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Sorting using complete subintervals and the maximum number of runs in a randomly evolving se
(Un)Expected Behavior of Digital Search Tree Profile
"... A digital search tree (DST) – one of the most fundamental data structure on words – is a digital tree in which keys (strings, words) are stored directly in (internal) nodes. Such trees find myriad of applications from the popular LempelZiv’78 data compression scheme to distributed hash tables. The ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
A digital search tree (DST) – one of the most fundamental data structure on words – is a digital tree in which keys (strings, words) are stored directly in (internal) nodes. Such trees find myriad of applications from the popular LempelZiv’78 data compression scheme to distributed hash tables. The profile of a DST measures the number of nodes at the same distance from the root; it is a function of the number of stored strings and the distance from the root. Most parameters of DST (e.g., height, fillup) can be expressed in terms of the profile. However, from the inception of DST, the analysis of the profile has been elusive and it has become a prominent open problem in the area of analysis of algorithms. We make here the first, but decisive step, towards solving this problem. We present a precise analysis of the average profile when stored strings are generated by a biased memoryless source. The main technical difficulty of analyzing the profile lies in solving a sophisticated recurrence equation. We present such a solution for the Poissonized version of the problem (i.e., when the number of stored strings is generated by a Poisson distribution) in the Mellin transform domain. To accomplish it, we introduce a novel functional operator that allows
On unary nodes in tries
 In 21st International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms (AofA’10), Discrete Math. Theor. Comput. Sci. Proc., AM. Assoc. Discrete
, 2010
"... The difference between ordinary tries and Patricia tries lies in the fact that all unary nodes are removed in the latter. Their average number is thus easily determined from earlier results on the size of tries/Patricia tries. In a wellknown contention resolution algorithm, whose probabilistic mode ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
The difference between ordinary tries and Patricia tries lies in the fact that all unary nodes are removed in the latter. Their average number is thus easily determined from earlier results on the size of tries/Patricia tries. In a wellknown contention resolution algorithm, whose probabilistic model is essentially equivalent to tries, unary nodes correspond to repetitions, i.e., steps in the algorithm that do not resolve anything at all. In this paper, we take an individual’s view on such repetitions: we consider the distribution of the number of repetitions a certain contender encounters in the course of the algorithm—which is equivalent to the number of unary nodes on the path from the root to a random string in a trie. We encounter an example of a sequence of distributions that does not actually converge to a limit distribution, but rather oscillates around a (discrete) limit distribution.
Classification of Markov Sources Through Joint String Complexity: Theory and Experiments
"... Abstract—We propose a classification test to discriminate Markov sources [19] based on the joint string complexity. String complexity is defined as the cardinality of a set of all distinct words (factors) of a given string. For two strings, we define joint string complexity as the set of words that ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Abstract—We propose a classification test to discriminate Markov sources [19] based on the joint string complexity. String complexity is defined as the cardinality of a set of all distinct words (factors) of a given string. For two strings, we define joint string complexity as the set of words that are common to both strings. In this paper we analyze the average joint complexity when both strings are generated by a Markov source and provide fast converging asymptotic expansions. We also present some experimental results showing its usefulness to texts discrimination. I.
An analysis of the height of tries with random weights on the edges
 Combinatorics, Probability and Computing
"... We analyze the weighted height of random tries built from independent strings of i.i.d. symbols on the finite alphabet {1,..., d}. The edges receive random weights whose distribution depends upon the number of strings that visit that edge. Such a model covers the hybrid tries of de la Briandais (195 ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
We analyze the weighted height of random tries built from independent strings of i.i.d. symbols on the finite alphabet {1,..., d}. The edges receive random weights whose distribution depends upon the number of strings that visit that edge. Such a model covers the hybrid tries of de la Briandais (1959) and the TST of Bentley and Sedgewick (1997), where the search time for a string can be decomposed as a sum of processing times for each symbol in the string. Our weighted trie model also permits one to study maximal path imbalance. In all cases, the weighted height is shown be asymptotic to c log n in probability, where c is determined by the behavior of the core of the trie (the part where all nodes have a full set of children) and the fringe of the trie (the part of the trie where nodes have only one child and form spaghettilike trees). It can be found by maximizing a function that is related to the Cramér exponent of the distribution of the edge weights.
On Correlation Polynomials and Subword Complexity
"... We consider words with letters from a qary alphabet A. The kth subword complexity of a word w ∈ A ∗ is the number of distinct subwords of length k that appear as contiguous subwords of w. We analyze subword complexity from both combinatorial and probabilistic viewpoints. Our first main result is a ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We consider words with letters from a qary alphabet A. The kth subword complexity of a word w ∈ A ∗ is the number of distinct subwords of length k that appear as contiguous subwords of w. We analyze subword complexity from both combinatorial and probabilistic viewpoints. Our first main result is a precise analysis of the expected kth subword complexity of a randomlychosen word w ∈ A n. Our other main result describes, for w ∈ A ∗ , the degree to which one understands the set of all subwords of w, provided that one knows only the set of all subwords of some particular length k. Our methods rely upon a precise characterization of overlaps between words of length k. We use three kinds of correlation polynomials of words of length k: unweighted correlation polynomials; correlation polynomials associated to a Bernoulli source; and generalized multivariate correlation polynomials. We survey previouslyknown results about such polynomials, and we also present some new results concerning correlation polynomials.
The average profile of suffix trees
 In The Fourth Workshop on Analytic Algorithmics and Combinatorics
, 2007
"... The internal profile of a tree structure denotes the number of internal nodes found at a specific level of the tree. Similarly, the external profile denotes the number of leaves on a level. The profile is of great interest because of its intimate connection to many other parameters of trees. For ins ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
The internal profile of a tree structure denotes the number of internal nodes found at a specific level of the tree. Similarly, the external profile denotes the number of leaves on a level. The profile is of great interest because of its intimate connection to many other parameters of trees. For instance, the depth, fillup level, height, path length, shortest path, and size of trees can each be interpreted in terms of the profile. The current study is motivated by the work of Park et al. [22], which was a comprehensive study of the profile of tries constructed from independent strings (also, each string generated by a memoryless source). In the present paper, however, we consider suffix trees, which are constructed from suffixes of a common string. The dependency between
On the Average Profile of Symmetric Digital Search Trees, preprint
, 2008
"... Digital Search Trees (DST) are one of the most popular data structures storing keys, usually represented by strings. The profile of a digital search tree is a parameter that counts the number of nodes at the same distance from the root. It is a function of the number of nodes and the distance from t ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Digital Search Trees (DST) are one of the most popular data structures storing keys, usually represented by strings. The profile of a digital search tree is a parameter that counts the number of nodes at the same distance from the root. It is a function of the number of nodes and the distance from the root. Several, if not all, tree parameters such as height, size, depth, shortest path, and fillup level can be uniformly analyzed through the profile. In this note we analyze asymptotically the average profile for a symmetric digital search trees in which keys are generated by an unbiased memoryless source. 1