Results 1  10
of
106
Fast Algorithms for Sorting and Searching Strings
, 1997
"... We present theoretical algorithms for sorting and searching multikey data, and derive from them practical C implementations for applications in which keys are character strings. The sorting algorithm blends Quicksort and radix sort; it is competitive with the best known C sort codes. The searching a ..."
Abstract

Cited by 166 (0 self)
 Add to MetaCart
We present theoretical algorithms for sorting and searching multikey data, and derive from them practical C implementations for applications in which keys are character strings. The sorting algorithm blends Quicksort and radix sort; it is competitive with the best known C sort codes. The searching algorithm blends tries and binary search trees; it is faster than hashing and other commonly used search methods. The basic ideas behind the algorithms date back at least to the 1960s, but their practical utility has been overlooked. We also present extensions to more complex string problems, such as partialmatch searching. 1. Introduction Section 2 briefly reviews Hoare's [9] Quicksort and binary search trees. We emphasize a wellknown isomorphism relating the two, and summarize other basic facts. The multikey algorithms and data structures are presented in Section 3. Multikey Quicksort orders a set of n vectors with k components each. Like regular Quicksort, it partitions its input into...
The influence of caches on the performance of sorting
 IN PROCEEDINGS OF THE SEVENTH ANNUAL ACMSIAM SYMPOSIUM ON DISCRETE ALGORITHMS
, 1997
"... We investigate the effect that caches have on the performance of sorting algorithms both experimentally and analytically. To address the performance problems that high cache miss penalties introduce we restructure mergesort, quicksort, and heapsort in order to improve their cache locality. For all t ..."
Abstract

Cited by 123 (4 self)
 Add to MetaCart
(Show Context)
We investigate the effect that caches have on the performance of sorting algorithms both experimentally and analytically. To address the performance problems that high cache miss penalties introduce we restructure mergesort, quicksort, and heapsort in order to improve their cache locality. For all three algorithms the improvementincache performance leads to a reduction in total execution time. We also investigate the performance of radix sort. Despite the extremely low instruction count incurred by this linear time sorting algorithm, its relatively poor cache performance results in worse overall performance than the e cient comparison based sorting algorithms. For each algorithm we provide an analysis that closely predicts the number of cache misses incurred by the algorithm.
AverageCase Analysis of Algorithms and Data Structures
, 1990
"... This report is a contributed chapter to the Handbook of Theoretical Computer Science (NorthHolland, 1990). Its aim is to describe the main mathematical methods and applications in the averagecase analysis of algorithms and data structures. It comprises two parts: First, we present basic combinato ..."
Abstract

Cited by 105 (8 self)
 Add to MetaCart
This report is a contributed chapter to the Handbook of Theoretical Computer Science (NorthHolland, 1990). Its aim is to describe the main mathematical methods and applications in the averagecase analysis of algorithms and data structures. It comprises two parts: First, we present basic combinatorial enumerations based on symbolic methods and asymptotic methods with emphasis on complex analysis techniques (such as singularity analysis, saddle point, Mellin transforms). Next, we show how to apply these general methods to the analysis of sorting, searching, tree data structures, hashing, and dynamic algorithms. The emphasis is on algorithms for which exact "analytic models" can be derived.
A Limit Theorem for "Quicksort"
 Applications/Theoretical Informatics and Applications
, 1999
"... Let X n be the number of comparisons needed by the sorting algorithm Quicksort to sort a list of n numbers into their natural ordering. We show that (X n \Gamma E(X n ))=n converges weakly to some random variable Y. The distribution of Y is characterized as the fixed point of some contraction. It sa ..."
Abstract

Cited by 98 (2 self)
 Add to MetaCart
Let X n be the number of comparisons needed by the sorting algorithm Quicksort to sort a list of n numbers into their natural ordering. We show that (X n \Gamma E(X n ))=n converges weakly to some random variable Y. The distribution of Y is characterized as the fixed point of some contraction. It satisfies a recursive equation, which is used to provide recursive relations for the moments. The random variable Y has exponential tails. Therefore the probability that Quicksort performs badly, e.g. that X n is larger than 2E(X n ) converges polynomially fast of every order to zero. R'esum'e Soit X n le nombre de comparaisons utilis'ees par la proc'edure Quicksort pour trier une liste de nombres distincts. Nous d'emontrons que (X n \Gamma E(X n ))=n converge faiblement vers une certaine variable al'eatoire Y. La distribution de Y est le point fixe d'une contraction et peut etre calcul'ee num'eriquement par it'eration. Keywords: sorting algorithm quicksort, fixed point, asymptotic distribut...
General Method of Program Code Obfuscation
, 2002
"... Obfuscation can be a simple tool for soft ware protection. In this paper we present a method of machine code obfuscation, which can be applied to most present processors. The obfuscation method is based on a theory, which led to two useful theorems. The proposed algorithm of obfuscation was impleme ..."
Abstract

Cited by 58 (0 self)
 Add to MetaCart
Obfuscation can be a simple tool for soft ware protection. In this paper we present a method of machine code obfuscation, which can be applied to most present processors. The obfuscation method is based on a theory, which led to two useful theorems. The proposed algorithm of obfuscation was implemented and tested using analytical and empirical approaches. The obtained results give the first estimation of the maximum possible eciency of the obfuscation process.
Introspective Sorting and Selection Algorithms
 Software Practice and Experience
, 1997
"... Quicksort is the preferred inplace sorting algorithm in many contexts, since its average computing time on uniformly distributed inputs is \Theta(N log N) and it is in fact faster than most other sorting algorithms on most inputs. Its drawback is that its worstcase time bound is \Theta(N ). Previo ..."
Abstract

Cited by 51 (1 self)
 Add to MetaCart
(Show Context)
Quicksort is the preferred inplace sorting algorithm in many contexts, since its average computing time on uniformly distributed inputs is \Theta(N log N) and it is in fact faster than most other sorting algorithms on most inputs. Its drawback is that its worstcase time bound is \Theta(N ). Previous attempts to protect against the worst case by improving the way quicksort chooses pivot elements for partitioning have increased the average computing time too muchone might as well use heapsort, which has a \Theta(N log N) worstcase time bound but is on the average 2 to 5 times slower than quicksort. A similar dilemma exists with selection algorithms (for finding the ith largest element) based on partitioning. This paper describes a simple solution to this dilemma: limit the depth of partitioning, and for subproblems that exceed the limit switch to another algorithm with a better worstcase bound. Using heapsort as the "stopper" yields a sorting algorithm that is just as fast as quicksort in the average case but also has an \Theta(N log N) worst case time bound. For selection, a hybrid of Hoare's find algorithm, which is linear on average but quadratic in the worst case, and the BlumFloydPrattRivestTarjan algorithm is as fast as Hoare's algorithm in practice, yet has a linear worstcase time bound. Also discussed are issues of implementing the new algorithms as generic algorithms and accurately measuring their performance in the framework of the C++ Standard Template Library.
A Dynamically Tuned Sorting Library
, 2004
"... Empirical search is a strategy used during the installation of library generators such as ATLAS, FFTW, and SPIRAL to identify the algorithm or the version of an algorithm that delivers the best performance. In the past, empirical search has been applied almost exclusively to scientific problems. In ..."
Abstract

Cited by 51 (7 self)
 Add to MetaCart
Empirical search is a strategy used during the installation of library generators such as ATLAS, FFTW, and SPIRAL to identify the algorithm or the version of an algorithm that delivers the best performance. In the past, empirical search has been applied almost exclusively to scientific problems. In this paper, we discuss the application of empirical search to sorting, which is one of the best understood symbolic computing problems. When contrasted with the dense numerical computations of ATLAS, FFTW, and SPIRAL, sorting presents a new challenge, namely that the relative performance of the algorithms depend not only on the characteristics of the target machine and the size of the input data but also on the distribution of values in the input data set. Empirical search is applied in the study reported here as part of a sorting library generator. The resulting routines dynamically adapt to the characteristics of the input data by selecting the best sorting algorithm from a small set of alternatives. To generate the run time selection mechanism our generator makes use of machine learning to predict the best algorithm as a function of the characteristics of the input data set and the performance of the different algorithms on the target machine. This prediction is based on the data obtained through empirical search at installation time. Our results show that our approach is quite effective. When sorting data inputs of 12M keys with various standard deviations, our adaptive approach selected the best algorithm for all the input data sets and all platforms that we tried in our experiments. The wrong decision could have introduced a performance degradation of up to 133%, with an average value of 44%.
On the Analysis of Stochastic Divide and Conquer Algorithms.
, 1999
"... This paper develops general tools for the analysis of stochastic divide and conquer algorithms. We concentrate on the average performance and the distribution of the duration of the algorithm. In particular we analyse the average performance and the running time distribution of the 2k + 1median ..."
Abstract

Cited by 48 (1 self)
 Add to MetaCart
This paper develops general tools for the analysis of stochastic divide and conquer algorithms. We concentrate on the average performance and the distribution of the duration of the algorithm. In particular we analyse the average performance and the running time distribution of the 2k + 1median version of Quicksort.