Results 1 -
7 of
7
Bitmap algorithms for counting active flows on high speed links
- In Internet Measurement Conference
, 2003
"... ..."
Loglog Counting of Large Cardinalities
- In ESA
, 2003
"... Using an auxiliary memory smaller than the size of this abstract, the LogLog algorithm makes it possible to estimate in a single pass and within a few percents the number of different words in the whole of Shakespeare's works. In general the LogLog algorithm makes use of m "small bytes" of auxiliary ..."
Abstract
-
Cited by 57 (2 self)
- Add to MetaCart
Using an auxiliary memory smaller than the size of this abstract, the LogLog algorithm makes it possible to estimate in a single pass and within a few percents the number of different words in the whole of Shakespeare's works. In general the LogLog algorithm makes use of m "small bytes" of auxiliary memory in order to estimate in a single pass the number of distinct elements (the "cardinality") in a file, and it does so with an accuracy that is of the order of 1= m. The "small bytes" to be used in order to count cardinalities till Nmax comprise about log log Nmax bits, so that cardinalities well in the range of billions can be determined using one or two kilobytes of memory only. The basic version of the LogLog algorithm is validated by a complete analysis. An optimized version, super-LogLog, is also engineered and tested on real-life data. The algorithm parallelizes optimally.
Data Streaming Algorithms for Efficient and Accurate Estimation of Flow Size Distribution
, 2004
"... Knowing the distribution of the sizes of traffic flows passing through a network link helps a network operator to characterize network resource usage, infer traffic demands, detect traffic anomalies, and accommodate new traffic demands through better traffic engineering. Previous work on estimating ..."
Abstract
-
Cited by 56 (5 self)
- Add to MetaCart
Knowing the distribution of the sizes of traffic flows passing through a network link helps a network operator to characterize network resource usage, infer traffic demands, detect traffic anomalies, and accommodate new traffic demands through better traffic engineering. Previous work on estimating the flow size distribution has been focused on making inferences from sampled network traffic. Its accuracy is limited by the (typically) low sampling rate required to make the sampling operation affordable. In this paper we present a novel data streaming algorithm to provide much more accurate estimates of flow distribution, using a "lossy data structure" which consists of an array of counters fitted well into SRAM. For each incoming packet, our algorithm only needs to increment one underlying counter, making the algorithm fast enough even for 40 Gbps (OC-768) links. The data structure is lossy in the sense that sizes of multiple flows may collide into the same counter. Our algorithm uses Bayesian statistical methods such as Expectation Maximization to infer the most likely flow size distribution that results in the observed counter values after collision. Evaluations of this algorithm on large Internet traces obtained from several sources (including a tier-1 ISP) demonstrate that it has very high measurement accuracy (within 2%). Our algorithm not only dramatically improves the accuracy of flow distribution measurement, but also contributes to the field of data streaming by formalizing an existing methodology and applying it to the context of estimating the flow-distribution.
The Scaling Hypothesis: Simplifying the Prediction of Network Performance using Scaled-down Simulations
, 2003
"... As the Internet grows, so do the complexity and computational requirements of network simulations. This leads either to unrealistic, or to prohibitely expensive simulation experiments. ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
As the Internet grows, so do the complexity and computational requirements of network simulations. This leads either to unrealistic, or to prohibitely expensive simulation experiments.
Experimental Evaluation of an Adaptive Flash Crowd Protection System
, 2003
"... Network early warning system (NEWS) is an adaptive flashcrowd protection system. Unlike approaches using manually configured request rate limit, NEWS regulates incoming requests by observing response performance, automatically adapting to changing tra#c mixes. We have previously studied NEWS perfor ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Network early warning system (NEWS) is an adaptive flashcrowd protection system. Unlike approaches using manually configured request rate limit, NEWS regulates incoming requests by observing response performance, automatically adapting to changing tra#c mixes. We have previously studied NEWS performance through simulation; this paper presents an implementation of NEWS on a Linux-based router. We evaluate this implementation in testbed experiments with HTTP server log recorded during a flash crowd. Our first contribution is to use implementation and testbed experiments to evaluate NEWS performance in a server memory-limited scenario, which was not considered in our previous simulation study. Our results show that NEWS is e#ective in both network- and server-limited scenarios. Second, we evaluate the run-time cost of NEWS tra#c monitoring in practice, and find that it consumes little CPU time and relatively small memory. Finally, we extend core NEWS algorithms to include a simple hot-spot identification function to protect bystander tra#c from flash crowds e#ciently.
Bitmap Algorithms for Counting Active Flows on High
"... This paper presents a family of bitmap algorithms that address the problem of counting the number of distinct header patterns (flows) seen on a high speed link. Such counting can be used to detect DoS attacks and port scans, and to solve measurement problems. Counting is especially hard when process ..."
Abstract
- Add to MetaCart
This paper presents a family of bitmap algorithms that address the problem of counting the number of distinct header patterns (flows) seen on a high speed link. Such counting can be used to detect DoS attacks and port scans, and to solve measurement problems. Counting is especially hard when processing must be done within a packet arrival time (8 nsec at OC-768 speeds) and, hence, must require only a small number of accesses to limited, fast memory. A naive solution that maintains a hash table requires several Mbytes because the number of flows can be above a million. By contrast, our new probabilistic algorithms take very little memory and are fast. The reduction in memory is particularly important for applications that run multiple concurrent counting instances. For example, we replaced the port scan detection component of the popular intrusion detection system Snort with one of our new algorithms. This reduced memory usage on a ten minute trace from 50 Mbytes to 5.6 Mbytes while maintaining a 99.77% probability of alarming on a scan within 6 seconds of when the large-memory algorithm would. The best known prior algorithm (probabilistic counting) takes 4 times more memory on port scan detection and 8 times more on a measurement application. Fundamentally, this is because our algorithms can be customized to take advantage of special features of applications such as a large number of instances that have very small counts or prior knowledge of the likely range of the count.

