Watch global, cache local: YouTube network traffic at a campus network  measurements and implications
"... User Generated Content has become very popular since the birth of web services such as YouTube allowing the distribution of such userproduced media content in an easy manner. YouTubelike services are different from existing traditional VoD services because the service provider has only limited con ..."
User Generated Content has become very popular since the birth of web services such as YouTube allowing the distribution of such userproduced media content in an easy manner. YouTubelike services are different from existing traditional VoD services because the service provider has only limited control over the creation of new content. We analyze how the content distribution in YouTube is realized and then conduct a measurement study of YouTube traffic in a large university campus network. The analysis of the traffic shows that: (1) No strong correlation is observed between global and local popularity; (2) neither time scale nor user population has an impact on the local popularity distribution; (3) video clips of local interest have a high local popularity. Using our measurement data to drive tracedriven simulations, we also demonstrate the implications of alternative distribution infrastructures on the performance of a YouTubelike VoD service. The results of these simulations show that clientbased local caching, P2Pbased distribution, and proxy caching can reduce network traffic significantly and allow faster access to video clips. Keywords: Measurement study, Peertopeer, Content distribution, Caching 1.
Delay bounds in communication networks with heavytailed and selfsimilar traffic
 IEEE Transactions on Information Theory
, 2012
"... Traffic with selfsimilar and heavytailed characteristics has been widely reported in communication networks, yet, the stateoftheart of analytically predicting the delay performance of such networks is lacking. We address a particularly difficult type of heavytailed traffic where only the first ..."
Traffic with selfsimilar and heavytailed characteristics has been widely reported in communication networks, yet, the stateoftheart of analytically predicting the delay performance of such networks is lacking. We address a particularly difficult type of heavytailed traffic where only the first moment can be computed, and present nonasymptotic endtoend delay bounds for such traffic. The derived performance bounds are nonasymptotic in that they do not assume a steady state, large buffer, or many sources regime. The analysis follows a network calculus approach where traffic is characterized by envelope functions and service is described by service curves. Our analysis considers a multihop path of fixedcapacity links with heavytailed selfsimilar cross traffic at each node. A key contribution of the analysis is a novel probabilistic samplepath bound for heavytailed arrival and service processes, which is based on a scalefree sampling method. We explore how delays scale as a function of the length of the path, and compare them with lower bounds. A comparison with simulations illustrates pitfalls when simulating selfsimilar heavytailed traffic, providing further evidence for the need of analytical bounds. I.
Transition from heavy to light tails in retransmission durations
, 2010
"... Abstract — Retransmissions serve as the basic building block that communication protocols use to achieve reliable data transfer. Until recently, the number of retransmissions were thought to follow a light tailed (in particular, a geometric) distribution. However, recent work seems to suggest that w ..."
Abstract — Retransmissions serve as the basic building block that communication protocols use to achieve reliable data transfer. Until recently, the number of retransmissions were thought to follow a light tailed (in particular, a geometric) distribution. However, recent work seems to suggest that when the distribution of the packets have infinite support, retransmissionbased protocols may result in heavy tailed delays and even possibly zero throughput. While this result is true even when the distribution of packet sizes are lighttailed, it requires the assumption that the packet sizes have infinite support. However, in reality, packet sizes are often bounded by the Maximum Transmission Unit (MTU), and thus the aforementioned result merits a deeper investigation. To that end, in this paper, we allow the distribution of the packet size L to have finite support. This packet is sent over an onoff channel {(Ai,Ui)} with alternating available Ai and
Delay Bounds for Networks with HeavyTailed and SelfSimilar Traffic
, 2009
"... We provide upper bounds on the endtoend backlog and delay in a network with heavytailed and selfsimilar traffic. The analysis follows a network calculus approach where traffic is characterized by envelope functions and service is described by service curves. A key contribution of this paper is t ..."
We provide upper bounds on the endtoend backlog and delay in a network with heavytailed and selfsimilar traffic. The analysis follows a network calculus approach where traffic is characterized by envelope functions and service is described by service curves. A key contribution of this paper is the derivation of a probabilistic sample path bound for heavytailed selfsimilar arrival processes, which is enabled by a suitable envelope characterization, referred to as htss envelope. We derive a heavytailed service curve for an entire network path when the service at each node on the path is characterized by heavytailed service curves. We obtain backlog and delay bounds for traffic that is characterized by an htss envelope and receives service given by a heavytailed service curve. The derived performance bounds are nonasymptotic in that they do not assume a steadystate, large buffer, or many sources regime. We also explore the scale of growth of delays as a function of the length of the path. The appendix contains an analysis for selfsimilar traffic with a Gaussian tail distribution.
Modulated Branching Processes, Origins of Power Laws and Queueing Duality
, 2007
"... Power law distributions have been repeatedly observed in a wide variety of socioeconomic, biological and technological areas. In many of the observations, e.g., city populations and sizes of living organisms, the objects of interest evolve due to the replication of their many independent components, ..."
Power law distributions have been repeatedly observed in a wide variety of socioeconomic, biological and technological areas. In many of the observations, e.g., city populations and sizes of living organisms, the objects of interest evolve due to the replication of their many independent components, e.g., birthsdeaths of individuals and replications of cells. Furthermore, the rates of the replication are often controlled by exogenous parameters causing periods of expansion and contraction, e.g., baby booms and busts, economic booms and recessions, etc. In addition, the sizes of these objects often have reflective lower boundaries, e.g., cities do not fall bellow a certain size, low income individuals are subsidized by the government, companies are protected by bankruptcy laws, etc. Hence, it is natural to propose reflected modulated branching processes as generic models for many of the preceding observations. Indeed, our main results show that the proposed mathematical models result in power law distributions under quite general polynomial GärtnerEllis conditions, the generality of which could explain the ubiquitous nature of power law distributions. In addition, on a logarithmic scale, we establish an asymptotic equivalence between the reflected branching processes and the corresponding multiplicative ones. The latter, as recognized by Goldie (1991) [32], is known to be dual to queueing/additive processes. We emphasize this duality further in the generality of stationary and ergodic processes.
Lévy flights and fractal modeling of Internet traffic
 IEEE/ACM Transactions on Networking
, 2009
"... Abstract—The relation between burstiness and selfsimilarity of network traffic was identified in numerous papers in the past decade. These papers suggested that the widely used Poisson based models were not suitable for modeling bursty, localarea and widearea network traffic. Poisson models were ..."
Abstract—The relation between burstiness and selfsimilarity of network traffic was identified in numerous papers in the past decade. These papers suggested that the widely used Poisson based models were not suitable for modeling bursty, localarea and widearea network traffic. Poisson models were abandoned as unrealistic and simplistic characterizations of network traffic. Recent papers have challenged the accuracy of these results in today’s networks. Authors of these papers believe that it is time to reexamine the Poisson traffic assumption. The explanation is that as the amount of Internet traffic grows dramatically, any irregularity of the network traffic, such as burstiness, might cancel out because of the huge number of different multiplexed flows. Some of these results are based on analyses of particular OC48 Internet backbone connections and other historical traffic traces. We analyzed the same traffic traces and applied new methods to characterize them in terms of packet interarrival times and packet lengths. The major contribution of the paper is the application of two new analytical methods. We apply the theory of smoothly truncated Levy flights and the linear fractal model in examining the variability of Internet traffic from selfsimilar to Poisson. The paper demonstrates that the series of interarrival times is still close to a selfsimilar process, but the burstiness of the packet lengths decreases significantly compared to earlier traces. Index Terms—Burstiness, fractal modelling, Lévy flights, longrange dependence, network traffic.
Enhancing DDoS flood attack detection via intelligent fuzzy logic
, 2010
"... Distributed denialofservice (DDoS) flood attack remains great threats to the Internet. This kind of attack consumes a large amount of network bandwidth or occupies network equipment resources by flooding them with packets from the machines distributed all over the world. To ensure the network usab ..."
Distributed denialofservice (DDoS) flood attack remains great threats to the Internet. This kind of attack consumes a large amount of network bandwidth or occupies network equipment resources by flooding them with packets from the machines distributed all over the world. To ensure the network usability and reliability, realtime and accurate detection of these attacks is critical. To date, various approaches have been proposed to detect these attacks, but with limited success when they are used in the real world. This paper presents a method that can realtime identify the occurrence of the DDoS flood attack and determine its intensity using the fuzzy logic. The proposed process consists of two stages: (i) statistical analysis of the network traffic time series using discrete wavelet transform (DWT) and Schwarz information criterion (SIC) to find out the change point of Hurst parameter resulting from DDoS flood attack, and then (ii) adaptively decide the intensity of the DDoS flood attack by using the intelligent fuzzy logic technology to analyze the Hurst parameter and its changing rate. The test results by NS2based simulation with various network traffic characteristics and attacks intensities demonstrate that the proposed method can detect the DDoS flood attack timely, effectively and intelligently. Povzetek: Opisan je postopek za prepoznavo spletnega napada DDoS s pomočjo mehke logike. 1
Fractional Gaussian Noise and Network Traffic Modeling
"... Abstract: Fractional Gaussian noise (fGn) is a commonly used model of network traffic with longrange dependence (LRD). This paper revisits the basic results of fGn towards noticing its limitation in traffic modeling. KeyWords: Fractional Gaussian noise; Longrange dependence; Network traffic. ..."
Abstract: Fractional Gaussian noise (fGn) is a commonly used model of network traffic with longrange dependence (LRD). This paper revisits the basic results of fGn towards noticing its limitation in traffic modeling. KeyWords: Fractional Gaussian noise; Longrange dependence; Network traffic.
Generalised entropy maximisation and queues with bursty and/or heavy tails,
 in “Network Performance Engineering, A Handbook on Convergent MultiService Networks and Next Generation Internet, Lecture Notes in Computer Science,”
, 2011
"... Abstract. An exposition of the 'extensive' (EME) and 'nonextensive' (NME) maximum entropy formalisms is undertaken in conjunction with their applicability into the analysis of queues with bursty and/or heavy tails that are often observed in performance evaluation studies of het ..."
Abstract. An exposition of the 'extensive' (EME) and 'nonextensive' (NME) maximum entropy formalisms is undertaken in conjunction with their applicability into the analysis of queues with bursty and/or heavy tails that are often observed in performance evaluation studies of heterogeneous networks and Internet exhibiting traffic burstiness, selfsimilarity and longrange dependence (LRD). The credibility of these formalisms, as methods of inductive inference, for the study of physical systems with both shortrange and longrange interactions is explored in terms of four potential consistency axioms. Focusing on stable single server queues, it is shown that the EME and NME state probabilities are characterized by generalised types of modified geometric and ZipfMandelbrot distributions depicting, respectively, bursty generalized exponential and/or heavy tails with asymptotic power law behaviour. Numerical experiments are included to highlight the credibility of the maximum entropy solutions and assess the combined impact of traffic burstiness and selfsimilarity on the performance of the queue.
WIKIPEDIA AS DOMAIN KNOWLEDGE NETWORKS: Domain Extraction and Statistical Measurement
"... This paper investigates knowledge networks of specific domains extracted from Wikipedia and performs statistical measurements to selected domains. In particular, we first present an efficient method to extract a specific domain knowledge network from Wikipedia. We then extract four domain networks o ..."
This paper investigates knowledge networks of specific domains extracted from Wikipedia and performs statistical measurements to selected domains. In particular, we first present an efficient method to extract a specific domain knowledge network from Wikipedia. We then extract four domain networks on, respectively, mathematics, physics, biology, and chemistry. We compare the mathematics domain network extracted from Wikipedia with MathWorld, the web’s most extensive mathematical resource created and maintained by professional mathematicians, and show that they are statistically similar to each other. This indicates that MathWorld and Wikipedia’s mathematics domain knowledge share a similar internal structure. Such information may be useful for investigating knowledge networks. 1