Results 1 - 10
of
41
Why simple hash functions work: Exploiting the entropy in a data stream
- In Proceedings of the 19th Annual ACM-SIAM Symposium on Discrete Algorithms
, 2008
"... Hashing is fundamental to many algorithms and data structures widely used in practice. For theoretical analysis of hashing, there have been two main approaches. First, one can assume that the hash function is truly random, mapping each data item independently and uniformly to the range. This idealiz ..."
Abstract
-
Cited by 49 (8 self)
- Add to MetaCart
Hashing is fundamental to many algorithms and data structures widely used in practice. For theoretical analysis of hashing, there have been two main approaches. First, one can assume that the hash function is truly random, mapping each data item independently and uniformly to the range. This idealized model is unrealistic because a truly random hash function requires an exponential number of bits to describe. Alternatively, one can provide rigorous bounds on performance when explicit families of hash functions are used, such as 2-universal or O(1)-wise independent families. For such families, performance guarantees are often noticeably weaker than for ideal hashing. In practice, however, it is commonly observed that weak hash functions, including 2-universal hash functions, perform as predicted by the idealized analysis for truly random hash functions. In this paper, we try to explain this phenomenon. We demonstrate that the strong performance of universal hash functions in practice can arise naturally from a combination of the randomness of the hash function and the data. Specifically, following the large body of literature on random sources and randomness extraction, we model the data as coming from a “block source, ” whereby
Linear probing with constant independence
- In STOC ’07: Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
, 2007
"... Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analyses rely either on complicated and space ..."
Abstract
-
Cited by 23 (2 self)
- Add to MetaCart
(Show Context)
Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analyses rely either on complicated and space consuming hash functions, or on the unrealistic assumption of free access to a truly random hash function. Already Carter and Wegman, in their seminal paper on universal hashing, raised the question of extending their analysis to linear probing. However, we show in this paper that linear probing using a pairwise independent family may have expected logarithmic cost per operation. On the positive side, we show that 5-wise independence is enough to ensure constant expected time per operation. This resolves the question of finding a space and time efficient hash function that provably ensures good performance for linear probing.
On Dynamic Range Reporting in One Dimension
, 2008
"... We consider the problem of maintaining a dynamic set of integers and answering queries of the form: report a point (equivalently, all points) in a given interval. Range searching is a natural and fundamental variant of integer search, and can be solved using predecessor search. However, for a RAM wi ..."
Abstract
-
Cited by 20 (5 self)
- Add to MetaCart
We consider the problem of maintaining a dynamic set of integers and answering queries of the form: report a point (equivalently, all points) in a given interval. Range searching is a natural and fundamental variant of integer search, and can be solved using predecessor search. However, for a RAM with w-bit words, we show how to perform updates in O(lg w) time and answer queries in O(lg lg w) time. The update time is identical to the van Emde Boas structure, but the query time is exponentially faster. Existing lower bounds show that achieving our query time for predecessor search requires doubly-exponentially slower updates. We present some arguments supporting the conjecture that our solution is optimal. Our solution is based on a new and interesting recursion idea which is “more extreme” that the van Emde Boas recursion. Whereas van Emde Boas uses a simple recursion (repeated halving) on each path in a trie, we use a nontrivial, van Emde Boas-like recursion on every such path. Despite this, our algorithm is quite clean when seen from the right angle. To achieve linear space for our data structure, we solve a problem which is of independent interest. We develop the first scheme for dynamic perfect hashing requiring sublinear space. This gives a dynamic Bloomier filter (an approximate storage scheme for sparse vectors) which uses low space. We strengthen previous lower bounds to show that these results are optimal.
Strongly history-independent hashing with applications
- In Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science
, 2007
"... We present a strongly history independent (SHI) hash table that supports search in O(1) worst-case time, and insert and delete in O(1) expected time using O(n) data space. This matches the bounds for dynamic perfect hashing, and improves on the best previous results by Naor and Teague on history ind ..."
Abstract
-
Cited by 19 (5 self)
- Add to MetaCart
(Show Context)
We present a strongly history independent (SHI) hash table that supports search in O(1) worst-case time, and insert and delete in O(1) expected time using O(n) data space. This matches the bounds for dynamic perfect hashing, and improves on the best previous results by Naor and Teague on history independent hashing, which were either weakly history independent, or only supported insertion and search (no delete) each in O(1) expected time. The results can be used to construct many other SHI data structures. We show straightforward constructions for SHI ordered dictionaries: for n keys from {1,..., n k} searches take O(log log n) worst-case time and updates (insertions and deletions) O(log log n) expected time, and for keys in the comparison model searches take O(log n) worst-case time and updates O(log n) expected time. We also describe a SHI data structure for the order-maintenance problem. It supports comparisons in O(1) worst-case time, and updates in O(1) expected time. All structures use O(n) data space. 1
Hash-Based Techniques for High-Speed Packet Processing
"...
Hashing is an extremely useful technique for a variety of high-speed packet-processing applications in routers. In this chapter, we survey much of the recent work in this area, paying particular attention to the interaction between theoretical and applied research. We assume very little background ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
(Show Context)
Hashing is an extremely useful technique for a variety of high-speed packet-processing applications in routers. In this chapter, we survey much of the recent work in this area, paying particular attention to the interaction between theoretical and applied research. We assume very little background in either the theory or applications of hashing, reviewing the fundamentals as necessary.
Backyard Cuckoo Hashing: Constant Worst-Case Operations with a Succinct Representation
, 2010
"... The performance of a dynamic dictionary is measured mainly by its update time, lookup time, and space consumption. In terms of update time and lookup time there are known constructions that guarantee constant-time operations in the worst case with high probability, and in terms of space consumption ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
(Show Context)
The performance of a dynamic dictionary is measured mainly by its update time, lookup time, and space consumption. In terms of update time and lookup time there are known constructions that guarantee constant-time operations in the worst case with high probability, and in terms of space consumption there are known constructions that use essentially optimal space. In this paper we settle two fundamental open problems: • We construct the first dynamic dictionary that enjoys the best of both worlds: we present a two-level variant of cuckoo hashing that stores n elements using (1+ϵ)n memory words, and guarantees constant-time operations in the worst case with high probability. Specifically, for any ϵ = Ω((log log n / log n) 1/2) and for any sequence of polynomially many operations, with high probability over the randomness of the initialization phase, all operations are performed in constant time which is independent of ϵ. The construction is based on augmenting cuckoo hashing with a “backyard ” that handles a large fraction of the elements, together with a de-amortized perfect hashing scheme for eliminating the dependency on ϵ.
Tabulation Based 5-Universal Hashing and Linear Probing
"... Previously [SODA’04] we devised the fastest known algorithm for 4-universal hashing. The hashing was based on small pre-computed4-universal tables. This led to a five-fold improvement in speed over direct methods based on degree 3 polynomials. In this paper, we show that if the pre-computed tables a ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
Previously [SODA’04] we devised the fastest known algorithm for 4-universal hashing. The hashing was based on small pre-computed4-universal tables. This led to a five-fold improvement in speed over direct methods based on degree 3 polynomials. In this paper, we show that if the pre-computed tables are made 5-universal, then the hash value becomes 5-universal without any other change to the computation. Relatively this leads to even bigger gains since the direct methods for 5-universal hashing use degree 4 polynomials. Experimentally, we find that our method can gain up to an order of magnitude in speed over direct 5-universal hashing. Some of the most popular randomized algorithms have been proved to have the desired expected running time using
De Dictionariis Dynamicis Pauco Spatio Utentibus (lat. On Dynamic Dictionaries Using Little Space)
, 2005
"... We develop dynamic dictionaries on the word RAM that use asymptotically optimal space, up to constant factors, subject to insertions and deletions, and subject to supporting perfect-hashing queries and/or membership queries, each operation in constant time with high probability. When supporting only ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
We develop dynamic dictionaries on the word RAM that use asymptotically optimal space, up to constant factors, subject to insertions and deletions, and subject to supporting perfect-hashing queries and/or membership queries, each operation in constant time with high probability. When supporting only membership queries, we attain the optimal space bound of Θ(n lg u n) bits, where n and u are the sizes of the dictionary and the universe, respectively. Previous dictionaries either did not achieve this space bound or had time bounds that were only expected and amortized. When supporting perfect-hashing queries, the optimal space bound depends on the range {1, 2,..., n + t} of hashcodes allowed as output. We prove that the optimal space bound is Θ(n lg lg u n n + n lg t+1) bits when supporting only perfect-hashing queries, and it is Θ(n lg u n n + n lg t+1) bits when also supporting membership queries. All upper bounds are new,) lower bound. as is the Ω(n lg n t+1 1