Results 1  10
of
13
Digital Trees and Memoryless Sources: from Arithmetics to Analysis
 21st International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms (AofA’10), Discrete Math. Theor. Comput. Sci. Proc
, 2010
"... Digital trees, also known as “tries”, are fundamental to a number of algorithmic schemes, including radixbased searching and sorting, lossless text compression, dynamic hashing algorithms, communication protocols of the tree or stack type, distributed leader election, and so on. This extended abstr ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
(Show Context)
Digital trees, also known as “tries”, are fundamental to a number of algorithmic schemes, including radixbased searching and sorting, lossless text compression, dynamic hashing algorithms, communication protocols of the tree or stack type, distributed leader election, and so on. This extended abstract develops the asymptotic form of expectations of the main parameters of interest, such as tree size and path length. The analysis is conducted under the simplest of all probabilistic models; namely, the memoryless source, under which letters that data items are comprised of are drawn independently from a fixed (finite) probability distribution. The precise asymptotic structure of the parameters’ expectations is shown to depend on fine singular properties in the complex plane of a ubiquitous Dirichlet series. Consequences include the characterization of a broad range of asymptotic regimes for error terms associated with trie parameters, as well as a classification that depends on specific arithmetic properties, especially irrationality measures, of the sources under consideration.
Distributional convergence for the number of symbol comparisons used by QuickSort
, 2012
"... Most previous studies of the sorting algorithm QuickSort have used the number of key comparisons as a measure of the cost of executing the algorithm. Here we suppose that the n independent and identically distributed (iid) keys are each represented as a sequence of symbols from a probabilistic sourc ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
Most previous studies of the sorting algorithm QuickSort have used the number of key comparisons as a measure of the cost of executing the algorithm. Here we suppose that the n independent and identically distributed (iid) keys are each represented as a sequence of symbols from a probabilistic source and that QuickSort operates on individual symbols, and we measure the execution cost as the number of symbol comparisons. Assuming only a mild “tameness ” condition on the source, we show that there is a limiting distribution for the number of symbol comparisons after normalization: first centering by the mean and then dividing by n. Additionally, under a condition that grows more restrictive as p increases, we have convergence of moments of orders p and smaller. In particular, we have convergence in distribution and convergence of moments of every order whenever the source is memoryless, i.e., whenever each key is generated as an infinite string of iid symbols. This is somewhat surprising: Even for the classical model that each key is an iid string of unbiased (“fair”) bits, the mean exhibits periodic fluctuations of order n.
Analysis of the Expected Number of Bit Comparisons Required by Quickselect
"... When algorithms for sorting and searching are applied to keys that are represented as bit strings, we can quantify the performance of the algorithms not only in terms of the number of key comparisons required by the algorithms but also in terms of the number of bit comparisons. Some of the standard ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
(Show Context)
When algorithms for sorting and searching are applied to keys that are represented as bit strings, we can quantify the performance of the algorithms not only in terms of the number of key comparisons required by the algorithms but also in terms of the number of bit comparisons. Some of the standard sorting and searching algorithms have been analyzed with respect to key comparisons but not with respect to bit comparisons. In this extended abstract, we investigate the expected number of bit comparisons required by Quickselect (also known as Find). We develop exact and asymptotic formulae for the expected number of bit comparisons required to find the smallest or largest key by Quickselect and show that the expectation is asymptotically linear with respect to the number of keys. Similar results are obtained for the average case. For finding keys of arbitrary rank, we derive an exact formula for the expected number of bit comparisons that (using rational arithmetic) requires only finite summation (rather than such operations as numerical integration) and use it to compute the expectation for each target rank.
The Number of Symbol Comparisons in QuickSort and QuickSelect
"... We revisit the classical QuickSort and QuickSelect algorithms, under a complexity model that fully takes into account the elementary comparisons between symbols composing the records to be processed. Our probabilistic models belong to a broad category of information sources that encompasses memory ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
We revisit the classical QuickSort and QuickSelect algorithms, under a complexity model that fully takes into account the elementary comparisons between symbols composing the records to be processed. Our probabilistic models belong to a broad category of information sources that encompasses memoryless (i.e., independentsymbols) and Markov sources, as well as many unboundedcorrelation sources. We establish that, under our conditions, the averagecase complexity of QuickSort is O(n log² n) [rather than O(n log n), classically], whereas that of QuickSelect remains O(n). Explicit expressions for the implied constants are provided by our combinatorial–analytic methods.
Compact Hilbert Indices: Spacefilling curves for domains with unequal side lengths
, 2007
"... In this paper we define a new compact Hilbert index which, while maintaining all of the advantages of the standard Hilbert curve, permits spaces with unequal dimension cardinalities. The compact Hilbert index can be used in any application that would have previously relied on Hilbert curves but, in ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
In this paper we define a new compact Hilbert index which, while maintaining all of the advantages of the standard Hilbert curve, permits spaces with unequal dimension cardinalities. The compact Hilbert index can be used in any application that would have previously relied on Hilbert curves but, in the case of unequal side lengths, provides a more memory efficient representation. This advantage is particularly important in distributed applications (Parallel, P2P and Grid), in which not only is memory space saved but communication volume is significantly reduced.
Compact Hilbert Indices for Multidimensional Data
 In IEEE CISIS
, 2007
"... Spacefilling curves, particularly Hilbert curves, have proven to be a powerful paradigm for maintaining spatial groupings of multidimensional data in a variety of application areas including database systems,data structures and distributed information systems. One significant limitation in the st ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Spacefilling curves, particularly Hilbert curves, have proven to be a powerful paradigm for maintaining spatial groupings of multidimensional data in a variety of application areas including database systems,data structures and distributed information systems. One significant limitation in the standard definition of Hilbert curves is the requirement that the grid size (i.e. the cardinality) in each dimension be the same. In the real world, not all dimensions are of equal size and the workaround of padding all dimensions to the size of the largest dimension wastes memory and disk space, while increasing the time spent manipulating and communicating these “inflated ” values. In this paper we define a new compact Hilbert index which, maintains all the advantages of the standard Hilbert curve and permits dimension cardinalities of varying sizes. This index can be used in any application that would have previously relied on Hilbert curves but, in the case of unequal side lengths, provides a more memory efficient representation. This is particularly important in distributed applications (Parallel, P2P and Grid), in which not only is memory space saved but communication volume reduced. 1.
THE LIMITING DISTRIBUTION FOR THE NUMBER OF SYMBOL COMPARISONS USED BY QUICKSORT IS NONDEGENERATE (EXTENDED ABSTRACT)
"... In a continuoustime setting, Fill [2] proved, for a large class of probabilistic sources, that the number of symbol comparisons used by QuickSort, when centered by subtracting the mean and scaled by dividing by time, has a limiting distribution, but proved little about that limiting random variable ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
In a continuoustime setting, Fill [2] proved, for a large class of probabilistic sources, that the number of symbol comparisons used by QuickSort, when centered by subtracting the mean and scaled by dividing by time, has a limiting distribution, but proved little about that limiting random variable Y —not even that it is nondegenerate. We establish the nondegeneracy of Y. The proof is perhaps surprisingly difficult. 1. The number of symbol comparisons used by QuickSort: Brief review of a limitingdistribution result In this section we briefly review the main theorem of [2]. An infinite sequence of independent and identically distributed keys is generated; each key is a random word (w1, w2,...) = w1w2 · · · , that is, an infinite sequence, or “string”, of symbols wi drawn from a totally ordered finite alphabet Σ. The common distribution µ of the keys (called a probabilistic source) is allowed to be any distribution over words, i.e., the distribution of any stochastic process with time parameter set {1, 2,...} and state space Σ. We know thanks to Kolmogorov’s consistency criterion (e.g.,
DataSpecific Analysis of String Sorting
"... We consider the complexity of sorting strings in the model that counts comparisons between symbols and not just comparisons between strings. We show that for any set of strings S the complexity of sorting S can naturally be expressed in terms of the trie induced by S. This holds not only for lower b ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
We consider the complexity of sorting strings in the model that counts comparisons between symbols and not just comparisons between strings. We show that for any set of strings S the complexity of sorting S can naturally be expressed in terms of the trie induced by S. This holds not only for lower bounds but also for the running times of various algorithms. Thus this “dataspecific” analysis allows a direct comparison of different algorithms running on the same data. We give such “dataspecific ” analyses for various versions of quicksort and versions of mergesort. As a corollary we arrive at a very simple analysis of quicksorting random strings, which so far required rather sophisticated mathematical tools. As part of this we provide insights in the analysis of tries of random strings which may be interesting in their own right. 1
Analytic Combinatorics  A Calculus of Discrete Structures
, 2007
"... The efficiency of many discrete algorithms crucially depends on quantifying properties of large structured combinatorial configurations. We survey methods of analytic combinatorics that are simply based on the idea of associating numbers to atomic elements that compose combinatorial structures, then ..."
Abstract
 Add to MetaCart
The efficiency of many discrete algorithms crucially depends on quantifying properties of large structured combinatorial configurations. We survey methods of analytic combinatorics that are simply based on the idea of associating numbers to atomic elements that compose combinatorial structures, then examining the geometry of the resulting functions. In this way, an operational calculus of discrete structures emerges. Applications to basic algorithms, data structures, and the theory of random discrete structures are outlined.
The Limiting Distribution for the Number of Symbol Comparisons Used by QuickSort is
, 2012
"... In a continuoustime setting, Fill (2012) proved, for a large class of probabilistic sources, that the number of symbol comparisons used by QuickSort, when centered by subtracting the mean and scaled by dividing by time, has a limiting distribution, but proved little about that limiting random varia ..."
Abstract
 Add to MetaCart
In a continuoustime setting, Fill (2012) proved, for a large class of probabilistic sources, that the number of symbol comparisons used by QuickSort, when centered by subtracting the mean and scaled by dividing by time, has a limiting distribution, but proved little about that limiting random variable Y —not even that it is nondegenerate. We establish the nondegeneracy of Y. The proof is perhaps surprisingly difficult.