Results 1 - 10
of
22
Beyond Bloom Filters: From Approximate Membership Checks to Approximate State Machines
- SIGCOMM '06
, 2006
"... Many networking applications require fast state lookups in a concurrent state machine, which tracks the state of a large number of flows simultaneously. We consider the question of how to compactly represent such concurrent state machines. To achieve compactness, we consider data structures for Appr ..."
Abstract
-
Cited by 24 (4 self)
- Add to MetaCart
Many networking applications require fast state lookups in a concurrent state machine, which tracks the state of a large number of flows simultaneously. We consider the question of how to compactly represent such concurrent state machines. To achieve compactness, we consider data structures for Approximate Concurrent State Machines (ACSMs) that can return false positives, false negatives, or a “don’t know” response. We describe three techniques based on Bloom filters and hashing, and evaluate them using both theoretical analysis and simulation. Our analysis leads us to an extremely efficient hashing-based scheme with several parameters that can be chosen to trade off space, computation, and the impact of errors. Our hashing approach also yields a simple alternative structure with the same functionality as a counting Bloom filter that uses much less space. We show how ACSMs can be used for video congestion control. Using an ACSM, a router can implement sophisticated Active Queue Management (AQM) techniques for video traffic (without the need for standards changes to mark packets or change video formats), with a factor of four reduction in memory compared to full-state schemes and with very little error. We also show that ACSMs show promise for real-time detection of P2P traffic.
Implementing signatures for transactional memory
- 40th Intl. Symp. on Microarchitecture
, 2007
"... Transactional Memory (TM) systems must track the read and write sets—items read and written during a transaction—to detect conflicts among concurrent transactions. Several TMs use signatures, which summarize unbounded read/write sets in bounded hardware at a performance cost of false positives (conf ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
Transactional Memory (TM) systems must track the read and write sets—items read and written during a transaction—to detect conflicts among concurrent transactions. Several TMs use signatures, which summarize unbounded read/write sets in bounded hardware at a performance cost of false positives (conflicts detected when none exists). This paper examines different organizations to achieve hardware-efficient and accurate TM signatures. First, we find that implementing each signature with a single k-hashfunction Bloom filter (True Bloom signature) is inefficient, as it requires multi-ported SRAMs. Instead, we advocate using k single-hash-function Bloom filters in parallel (Parallel Bloom signature), using area-efficient single-ported SRAMs. Our formal analysis shows that both organizations perform equally well in theory and our simulationbased evaluation shows this to hold approximately in practice. We also show that by choosing high-quality hash functions we can achieve signature designs noticeably more accurate than the previously proposed implementations. Finally, we adapt Pagh and Rodler’s cuckoo hashing to implement Cuckoo-Bloom signatures. While this representation does not support set intersection, it mitigates false positives for the common case of small read/write sets and performs like a Bloom filter for large sets. 1.
Less hashing, same performance: Building a better bloom filter
- In Proc. the 14th Annual European Symposium on Algorithms (ESA 2006
, 2006
"... ABSTRACT: A standard technique from the hashing literature is to use two hash functions h1(x) and h2(x) to simulate additional hash functions of the form gi(x) = h1(x) + ih2(x). We demonstrate that this technique can be usefully applied to Bloom filters and related data structures. Specifically, on ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
ABSTRACT: A standard technique from the hashing literature is to use two hash functions h1(x) and h2(x) to simulate additional hash functions of the form gi(x) = h1(x) + ih2(x). We demonstrate that this technique can be usefully applied to Bloom filters and related data structures. Specifically, only two hash functions are necessary to effectively implement a Bloom filter without any loss in the asymptotic false positive probability. This leads to less computation and potentially less need for
An Improved Construction for Counting Bloom Filters
- 14th Annual European Symposium on Algorithms, LNCS 4168
, 2006
"... Abstract. A counting Bloom filter (CBF) generalizes a Bloom filter data structure so as to allow membership queries on a set that can be changing dynamically via insertions and deletions. As with a Bloom filter, a CBF obtains space savings by allowing false positives. We provide a simple hashing-bas ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
Abstract. A counting Bloom filter (CBF) generalizes a Bloom filter data structure so as to allow membership queries on a set that can be changing dynamically via insertions and deletions. As with a Bloom filter, a CBF obtains space savings by allowing false positives. We provide a simple hashing-based alternative based on d-left hashing called a d-left CBF (dlCBF). The dlCBF offers the same functionality as a CBF, but uses less space, generally saving a factor of two or more. We describe the construction of dlCBFs, provide an analysis, and demonstrate their effectiveness experimentally. 1
Using Bloom filters to refine web search results
- In Proc. 7th WebDB
, 2005
"... Search engines have primarily focused on presenting the most relevant pages to the user quickly. A less well explored aspect of improving the search experience is to remove or group all near-duplicate documents in the results presented to the user. In this paper, we apply a Bloom filter based simila ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Search engines have primarily focused on presenting the most relevant pages to the user quickly. A less well explored aspect of improving the search experience is to remove or group all near-duplicate documents in the results presented to the user. In this paper, we apply a Bloom filter based similarity detection technique to address this issue by refining the search results presented to the user. First, we present and analyze our technique for finding similar documents using contentdefined chunking and Bloom filters, and demonstrate its effectiveness in compactly representing and quickly matching pages for similarity testing. Later, we demonstrate how a number of results of popular and random search queries retrieved from different search engines, Google, Yahoo, MSN, are similar and can be eliminated or re-organized. 1.
Distance-Sensitive Bloom Filters
- Proc. Eighth Workshop Algorithm Eng. and Experiments (ALENEX ’06
, 2006
"... A Bloom filter is a space-efficient data structure that answers set membership queries with some chance of a false positive. We introduce the problem of designing generalizations of Bloom filters designed to answer queries of the form, “Is x close to an element of S?” where closeness is measured und ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
A Bloom filter is a space-efficient data structure that answers set membership queries with some chance of a false positive. We introduce the problem of designing generalizations of Bloom filters designed to answer queries of the form, “Is x close to an element of S?” where closeness is measured under a suitable metric. Such a data structure would have several natural applications in networking and database applications. We demonstrate how appropriate data structures can be designed using locality-sensitive hash functions as a building block, and we specifically analyze the performance of a natural scheme under the Hamming metric. 1
External perfect hashing for very large key sets
- In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM’07
, 2007
"... A perfect hash function (PHF) h: S → [0, m − 1] for a key set S ⊆ U of size n, where m ≥ n and U is a key universe, is an injective function that maps the keys of S to unique values. A minimal perfect hash function (MPHF) is a PHF with m = n, the smallest possible range. Minimal perfect hash functio ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
A perfect hash function (PHF) h: S → [0, m − 1] for a key set S ⊆ U of size n, where m ≥ n and U is a key universe, is an injective function that maps the keys of S to unique values. A minimal perfect hash function (MPHF) is a PHF with m = n, the smallest possible range. Minimal perfect hash functions are widely used for memory efficient storage and fast retrieval of items from static sets. In this paper we present a distributed and parallel version of a simple, highly scalable and near-space optimal perfect hashing algorithm for very large key sets, recently presented in [4]. The sequential implementation of the algorithm constructs a MPHF for a set of 1.024 billion URLs of average length 64 bytes collected from the Web in approximately 50 minutes using a commodity PC. The parallel implementation proposed here presents the following performance using 14 commodity PCs: (i) it constructs a MPHF for the same set of 1.024 billion URLs in approximately 4 minutes; (ii) it constructs a MPHF for a set of 14.336 billion 16-byte random integers in approximately 50 minutes with a performance degradation of 20%; (iii) one version of the parallel algorithm distributes the description of the MPHF among the participating machines and its evaluation is done in a distributed way, faster than the centralized function.
Simple Summaries for Hashing with Multiple Choices
"... In a multiple-choice hashing scheme, each item is stored in one of d> = 2 possible hash tablebuckets. The availability of these multiple choices allows for a substantial reduction in the maximum load of the buckets. However, a lookup may now require examining each of the d locations. Forapplication ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
In a multiple-choice hashing scheme, each item is stored in one of d> = 2 possible hash tablebuckets. The availability of these multiple choices allows for a substantial reduction in the maximum load of the buckets. However, a lookup may now require examining each of the d locations. Forapplications where this cost is undesirable, Song et al. propose keeping a summary that allows one to determine which of the d locations is appropriate for each item, where the summary may allowfalse positives for items not in hash table. We propose alternative, simple constructions of such summaries that use less space for both the summary and the underlying hash table. Moreover, ourconstructions are easily analyzable and tunable.
Rank-indexed hashing: A compact construction of bloom filters and extra bits per counter (σ) lg(M/N
"... Abstract—Bloom filter and its variants have found widespread use in many networking applications. For these applications, minimizing storage cost is paramount as these filters often need to be implemented using scarce and costly (on-chip) SRAM. Besides supporting membership queries, Bloom filters ha ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Abstract—Bloom filter and its variants have found widespread use in many networking applications. For these applications, minimizing storage cost is paramount as these filters often need to be implemented using scarce and costly (on-chip) SRAM. Besides supporting membership queries, Bloom filters have been generalized to support deletions and the encoding of information. Although a standard Bloom filter construction has proven to be extremely space-efficient, it is unnecessarily costly when generalized. Alternative constructions based on storing fingerprints in hash tables have been proposed that offer the same functionality as some Bloom filter variants, but using less space. In this paper, we propose a new fingerprint hash table construction called Rank-Indexed Hashing that can achieve very compact representations. A rank-indexed hashing construction that offers the same functionality as a counting Bloom filter can be achieved with a factor of three or more in space savings even for a false positive probability of just 1%. Even for a basic Bloom filter function that only supports membership queries, a rank-indexed hashing construction requires less space for a false positive probability as high as 0.1%, which is significant since a standard Bloom filter construction is widely regarded as extremely space-efficient for approximate membership problems. I.
Hash-Based Techniques for High-Speed Packet Processing
"... Abstract Hashing is an extremely useful technique for a variety of high-speed packet-processing applications in routers. In this chapter, we survey much of the recent work in this area, paying particular attention to the interaction between theoretical and applied research. We assume very little bac ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Abstract Hashing is an extremely useful technique for a variety of high-speed packet-processing applications in routers. In this chapter, we survey much of the recent work in this area, paying particular attention to the interaction between theoretical and applied research. We assume very little background in either the theory or applications of hashing, reviewing the fundamentals as necessary. 1

