Results 1 - 10
of
42
Building a Better NetFlow
, 2004
"... Network operators need to determine the composition of the traffic mix on links when looking for dominant applications, users, or estimating traffic matrices. Cisco's NetFlow has evolved into a solution that satisfies this need by reporting flow records that summarize a sample of the traffic travers ..."
Abstract
-
Cited by 102 (5 self)
- Add to MetaCart
Network operators need to determine the composition of the traffic mix on links when looking for dominant applications, users, or estimating traffic matrices. Cisco's NetFlow has evolved into a solution that satisfies this need by reporting flow records that summarize a sample of the traffic traversing the link. But sampled NetFlow has shortcomings that hinder the collection and analysis of traffic data. First, during flooding attacks router memory and network bandwidth consumed by flow records can increase beyond what is available; second, selecting the right static sampling rate is difficult because no single rate gives the right tradeoff of memory use versus accuracy for all traffic mixes; third, the heuristics routers use to decide when a flow is reported are a poor match to most applications that work with time bins; finally, it is impossible to estimate without bias the number of active flows for aggregates with non-TCP traffic. In thi paper we propose...
Bitmap algorithms for counting active flows on high speed links
- In Internet Measurement Conference
, 2003
"... ..."
Inverting Sampled Traffic
- In Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
, 2003
"... Routers have the ability to output statistics about packets and flows of packets that traverse them. Since however the generation of detailed tra#c statistics does not scale well with link speed, increasingly routers and measurement boxes implement sampling strategies at the packet level. In this pa ..."
Abstract
-
Cited by 70 (0 self)
- Add to MetaCart
Routers have the ability to output statistics about packets and flows of packets that traverse them. Since however the generation of detailed tra#c statistics does not scale well with link speed, increasingly routers and measurement boxes implement sampling strategies at the packet level. In this paper we study both theoretically and practically what information about the original tra#c can be inferred when sampling, or `thinning', is performed at the packet level. While basic packet level characteristics such as first order statistics can be fairly directly recovered, other aspects require more attention. We focus mainly on the spectral density, a second order statistic, and the distribution of the number of packets per flow, showing how both can be exactly recovered, in theory. We then show in detail why in practice this cannot be done using the traditional packet based sampling, even for high sampling rate. We introduce an alternative flow based thinning, where practical inversion is possible even at arbitrarily low sampling rate. We also investigate the theory and practice of fitting the parameters of a Poisson cluster process, modelling the full packet tra#c, from sampled data.
Data Streaming Algorithms for Efficient and Accurate Estimation of Flow Size Distribution
, 2004
"... Knowing the distribution of the sizes of traffic flows passing through a network link helps a network operator to characterize network resource usage, infer traffic demands, detect traffic anomalies, and accommodate new traffic demands through better traffic engineering. Previous work on estimating ..."
Abstract
-
Cited by 56 (5 self)
- Add to MetaCart
Knowing the distribution of the sizes of traffic flows passing through a network link helps a network operator to characterize network resource usage, infer traffic demands, detect traffic anomalies, and accommodate new traffic demands through better traffic engineering. Previous work on estimating the flow size distribution has been focused on making inferences from sampled network traffic. Its accuracy is limited by the (typically) low sampling rate required to make the sampling operation affordable. In this paper we present a novel data streaming algorithm to provide much more accurate estimates of flow distribution, using a "lossy data structure" which consists of an array of counters fitted well into SRAM. For each incoming packet, our algorithm only needs to increment one underlying counter, making the algorithm fast enough even for 40 Gbps (OC-768) links. The data structure is lossy in the sense that sizes of multiple flows may collide into the same counter. Our algorithm uses Bayesian statistical methods such as Expectation Maximization to infer the most likely flow size distribution that results in the observed counter values after collision. Evaluations of this algorithm on large Internet traces obtained from several sources (including a tier-1 ISP) demonstrate that it has very high measurement accuracy (within 2%). Our algorithm not only dramatically improves the accuracy of flow distribution measurement, but also contributes to the field of data streaming by formalizing an existing methodology and applying it to the context of estimating the flow-distribution.
Predicting Resource Usage and Estimation Accuracy in an IP Flow Measurement Collection Infrastructure
, 2003
"... This paper describes a measurement infrastructure used to collect detailed IP traffic measurements from an IP backbone. Usage, i.e, bytes transmitted, is determined from raw NetFlow records generated by the backbone routers. The amount of raw data is immense. Two types of data sampling in order to m ..."
Abstract
-
Cited by 36 (5 self)
- Add to MetaCart
This paper describes a measurement infrastructure used to collect detailed IP traffic measurements from an IP backbone. Usage, i.e, bytes transmitted, is determined from raw NetFlow records generated by the backbone routers. The amount of raw data is immense. Two types of data sampling in order to manage data volumes: (i) (packet) sampled NetFlow in the routers; (ii) sizedependent sampling of NetFlow records. Furthermore, dropping of NetFlow records in transmission can be regarded as an uncontrolled form of sampling.
Locating network monitors: Complexity, heuristics and coverage
- in Proceedings of IEEE Infocom
, 2005
"... Abstract — There is increasing interest in concurrent passive monitoring of IP flows at multiple locations within an IP network. The common objective of such a distributed monitoring system is to sample packets belonging to a large fraction of IP flows in a cost-effective manner by carefully placing ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
Abstract — There is increasing interest in concurrent passive monitoring of IP flows at multiple locations within an IP network. The common objective of such a distributed monitoring system is to sample packets belonging to a large fraction of IP flows in a cost-effective manner by carefully placing monitors and controlling their sampling rates. In this paper, we consider the problem of where to place monitors within the network and how to control their sampling. To address the tradeoff between monitoring cost and monitoring coverage, we consider minimum cost and maximum coverage problems under various budget constraints. We show that all of the defined problems are NPhard. We propose greedy heuristics, and show that the heuristics provide solutions quite close to the optimal solutions through experiments using synthetic and real network topologies. In addition, our experiments show that a small number of monitors is often enough to monitor most of the traffic in an entire IP network. I.
Reformulating the monitor placement problem: Optimal network-wide sampling
- in Proceedings of ACM CoNEXT
, 2006
"... Confronted with the generalization of monitoring in operational networks, researchers have proposed placement algorithms that can help ISPs deploy their monitoring infrastructure in a cost effective way, while maximizing the benefits of their infrastructure. However, a static placement of monitors c ..."
Abstract
-
Cited by 26 (1 self)
- Add to MetaCart
Confronted with the generalization of monitoring in operational networks, researchers have proposed placement algorithms that can help ISPs deploy their monitoring infrastructure in a cost effective way, while maximizing the benefits of their infrastructure. However, a static placement of monitors cannot be optimal given the short-term and longterm variations in traffic due to re-routing events, anomalies and the normal network evolution. In addition, most ISPs already deploy router embedded monitoring functionalities. Despite some limitations (inherent to being part of a router), these monitoring tools give greater visibility on the network traffic but raise the question on how to configure a networkwide monitoring infrastructure that may contain hundreds of monitoring points. We reformulate the placement problem as follows. Given a network where all links can be monitored, which monitors should be activated and which sampling rate should be set on these monitors in order to achieve a given measurement task with high accuracy and low resource consumption? We provide a formulation of the problem, an optimal algorithm to solve it, and we study its performance on a real backbone network. 1.
The power of slicing in internet flow measurement
- In IMC’05
, 2005
"... Abstract – Network service providers use high speed flow measurement solutions in routers to track dominant applications, compute traffic matrices and to perform other such operational tasks. These solutions typically need to operate within the constraints of the three precious router resources – CP ..."
Abstract
-
Cited by 19 (5 self)
- Add to MetaCart
Abstract – Network service providers use high speed flow measurement solutions in routers to track dominant applications, compute traffic matrices and to perform other such operational tasks. These solutions typically need to operate within the constraints of the three precious router resources – CPU, memory and bandwidth. Cisco’s Net-Flow, a widely deployed flow measurement solution, uses a configurable static sampling rate to control these resources. In this paper, we propose Flow Slices, a solution inspired from previous enhancements to NetFlow such as Smart Sampling [8], Adaptive NetFlow (ANF) [10]. Flow Slices, in contrast to NetFlow, controls the three resource bottlenecks at the router using separate “tuning knobs”; it uses packet sampling to control CPU usage, flow sampling to control memory usage and finally multi-factor smart sampling to control reporting bandwidth. The resulting solution has smaller resource requirements than current proposals (up to 80 % less memory usage than ANF), enables more accurate traffic analysis results (up to 10 % less error than ANF) and balances better the error in estimates of byte, packet and flow counts (flow count estimates up to 8 times more accurate than after Smart Sampling). We provide theoretical analyses of the unbiasedness and variances of the estimators based on Flow Slices and experimental comparisons with other flow measurement solutions such as ANF. 1
Reversible sketches for efficient and accurate change detection over network data streams
- in ACM SIGCOMM IMC
, 2004
"... Traffic anomalies such as failures and attacks are increasing in frequency and severity, and thus identifying them rapidly and accurately is critical for large network operators. The detection typically treats the traffic as a collection of flows and looks for heavy changes in traffic patterns (e.g. ..."
Abstract
-
Cited by 18 (5 self)
- Add to MetaCart
Traffic anomalies such as failures and attacks are increasing in frequency and severity, and thus identifying them rapidly and accurately is critical for large network operators. The detection typically treats the traffic as a collection of flows and looks for heavy changes in traffic patterns (e.g., volume, number of connections). However, as link speeds and the number of flows increase, keeping per-flow state is not scalable. The recently proposed sketch-based schemes [14] are among the very few that can detect heavy changes and anomalies over massive data streams at network traffic speeds. However, sketches do not preserve the key (e.g., source IP address) of the flows. Hence, even if anomalies are detected, it is difficult to infer the culprit flows, making it a big practical hurdle for online deployment. Meanwhile, the number of keys is too large to record. To address this challenge, we propose efficient reversible hashing algorithms to infer the keys of culprit flows from sketches without storing any explicit key information. No extra memory or memory accesses are needed for recording the streaming data. Meanwhile, the heavy change detection daemon runs in the background with space complexity and computational time sublinear to the key space size. This short paper describes the conceptual framework of the reversible sketches, as well as some initial approaches for implementation. See [23] for the optimized algorithms in details. Evaluated with netflow traffic traces of a large edge router, we demonstrate that the reverse hashing can quickly infer the keys of culprit flows even for many changes with high accuracy.
Data streaming algorithms for accurate and efficient measurement of traffic and flow matrices
- In Proc. ACM SIGMETRICS
, 2005
"... The traffic volume between origin/destination (OD) pairs in a network, known as traffic matrix, is essential for efficient network provisioning and traffic engineering. Existing approaches of estimating the traffic matrix, based on statistical inference and/or packet sampling, usually cannot achieve ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
The traffic volume between origin/destination (OD) pairs in a network, known as traffic matrix, is essential for efficient network provisioning and traffic engineering. Existing approaches of estimating the traffic matrix, based on statistical inference and/or packet sampling, usually cannot achieve very high estimation accuracy. In this work, we take a brand new approach in attacking this problem. We propose a novel data streaming algorithm that can process traffic stream at very high speed (e.g., 40 Gbps) and produce traffic digests that are orders of magnitude smaller than the traffic stream. By correlating the digests collected at any OD pair using Bayesian statistics, the volume of traffic flowing between the OD pair can be accurately determined. We also establish principles and techniques for optimally combining this streaming method with sampling, when sampling is necessary due to stringent resource constraints. In addition, we propose another data streaming algorithm that estimates flow matrix, a finer-grained characterization than traffic matrix. Flow matrix is concerned with not only the total traffic between an OD pair (traffic matrix), but also how it splits into flows of various sizes. Through rigorous theoretical analysis and extensive synthetic experiments on real Internet traffic, we demonstrate that these two algorithms can produce very accurate estimation of traffic matrix and flow matrix respectively.

