| Mark Crovella and Azer Bestavros (1995). "Explaining World Wide Web Traffic Self-Similarity." Technical Report BUCS-TR-95-015. http://citeseer.nj.nec.com/crovella95explaining.html |
.... able to derive a simple estimation of the average rate of such a traffic source, especially since the offtime (reading time) is in general larger than the on time [9] Further discussion of the distributions and their parameter fittings for the generation of self similar traffic can be found in [12,13]. The HTTP object size is drawn from a truncated power tail distribution (TPT) with a truncation level T = 20 [14] We limited the maximum file size to 100 MB additionally to ensure convergence of the simulation. The major contribution to the self similarity of the traffic comes from the ....
....is drawn from a truncated power tail distribution (TPT) with a truncation level T = 20 [14] We limited the maximum file size to 100 MB additionally to ensure convergence of the simulation. The major contribution to the self similarity of the traffic comes from the distribution of the file size [12,13]. Therefore we deviate from [9] in using a geometric distribution for the number of HTTP inline objects and a negative exponentially distributed off time. Main Object In line Obj. 1 In line Obj. N Main Object . HTTP Off HTTP On . time Request Main Obj. Request In line Objs. ....
Mark E. Crovella and Azer Bestavros, "Explaining World Wide Web traffic self-similarity," Tech. Rep., Boston University, Oct. 1995.
....Allocation In order to increase the convergence speed of the simulations we changed stochastic distributions that contribute to the mean on time to constant values of their respective mean values. Especially power tailed probability distributions, that are used to generate self similar traffic [3, 4], are known to converge very slowly to their corresponding mean value [5] A distribution without a power tail, like the negative exponential distribution used in our simulations, should be used for the off time in order to desynchronize the sources. 2.1 Estimation of the Source Allocation We ....
Mark E. Crovella and Azer Bestavros, "Explaining World Wide Web traffic selfsimilarity, " Tech. Rep., Boston University, Oct. 1995.
....traffic intensities in the form of a traffic matrix and the stochastic parameters of the traffic. The traffic measured in the Internet is known to be self similar [5, 6, 7] The selection of the distributions and their parameter fittings for the generation of self similar traffic can be found in [8, 3, 9]. One major concern for the simulation of large networks is the establishment of the required traffic intensities since the steady state formulas for the TCP throughput [10] are not applicable for short time connections that are characteristic for WWW traffic. Moreover, even though computing ....
Mark E. Crovella and Azer Bestavros, "Explaining World Wide Web traffic self-similarity," Tech. Rep., Boston University, Oct. 1995, paper-archive/self-sim/tr-version.ps .
....Poisson model was widely used to model Internet traffic, but in the last few years new characteristics have emerged. Long Range Dependence (LRD) in traffic arrival processes has been discovered in LANs [12] 19] WANs [14] and MANs [17] This dependence spans several applications: World Wide Web [4] [5] Variable Bit Rate (VBR) video traffic [3] and also Aggregate traffic [12] 19] It has been verified in measurements collected on different types of computer networks (see Figure 5) Ethernet [12] 19] ISDN [8] ATM [17] and CCSN SS7 [6] LRD traffic is more bursty than traffic generated ....
.... and is bursty over many time scales (self similar) They argue that the Hurst parameter quantifies the degree of self similarity and can be used as a measure of traffic burstiness (the higher the Hurst value the burstier the aggregate traffic) Other researchers including Crovella and Bestavros [4] [5] have also found that the H value declines somewhat in light load traffic conditons as compared to busy hours. This is consistent with results found by Leland [12] Based on the study of the fractional Brownian motion model, Neidhardt and Wang gained further insight on the impact of the Hurst ....
[Article contains additional citation context not shown here]
Mark E. Crovella and Azer Bestavros, Explaining World Wide Web Traffic SelfSimilarity. Technical Report TR-95-015, Boston University (1995).
....loaded server but may consume several milliseconds on a heavily loaded web server. This delay will be reflected in the time between the TCP open and TCP ACK open in Figure 1. WORKLOAD CHARACTERIZATION There has been a considerable body of work that characterizes workload on web servers [4,5,6]. In [5] it was shown that web traffic exhibits a highly bursty characteristic known as self similarity [7] A prime cause of web traffic self similarity is the heavy tailed distribution of web file sizes. In other words, the tail of the probability density function of file retrieval probability ....
....server but may consume several milliseconds on a heavily loaded web server. This delay will be reflected in the time between the TCP open and TCP ACK open in Figure 1. WORKLOAD CHARACTERIZATION There has been a considerable body of work that characterizes workload on web servers [4,5,6] In [5], it was shown that web traffic exhibits a highly bursty characteristic known as self similarity [7] A prime cause of web traffic self similarity is the heavy tailed distribution of web file sizes. In other words, the tail of the probability density function of file retrieval probability versus ....
[Article contains additional citation context not shown here]
M.E. Crovella and A. Bestavros, "Explaining World Wide Web traffic self-similarity", TR-95-015, Computer Science Dept., Boston University, October 1995.
....the Weibull model is a good match for the data. Since the Weibull distribution is not long tailed, we conclude that this dataset does not support the hypothesis that the distribution of interarrival times for TCP connections is long tailed. C. HTTP Request Interarrivals Crovella and Bestavros [23] examine the distribution of times between web requests (OFF times) and report that, although it is long tailed, it is less long tailed than the ON time distribution. We obtained their traces, which we call the BU dataset, from their web site. The traces contain logs from web browsers on 37 ....
Mark E. Crovella and Azer Bestavros, "Explaining World Wide Web traffic self-similarity," Tech. Rep. BU-CS-95-015, Boston University, 1995.
....mix show that some properties cannot be explained by Poisson like models. Analysis of these data is challenging since there is strong evidence that the classical modeling assumptions (such as independence or the lack of long memory) do not hold any longer. In recent years, a number of studies, [1], 2] 5] 8] 15] and [19] demonstrated that in certain environments, the traffic appears to exhibit many unusual characteristics such as heavy tailed distributions, long range dependence and self similarity. In some of these publications useful analytical methods which were used to ....
....HVW##DOSKD# ##### Figure 5: DeHaan s estimate of index ff of WFS data set We can conclude that the set of file sizes transferred over the Internet seems to fit well the Pareto distribution with index ff about 0:7. Similar results were also explored by other researchers in [1]. As discussed in this paper, it may be an evidence of selfsimilar WWW traffic. 5.2 Testing for LRD and selfsimilarity For estimating the Hurst parameter, a number of algorithms has been worked out. Algorithms were described, for example, in [3] 4] and [16] In this section four widely used ....
[Article contains additional citation context not shown here]
M.E. Crovella and A. Bestavros. Explaining World Wide Web Traffic Self-similarity. Technical Report TR-95-015, Boston University Computer Science Department, Revised, October 12, 1995.
....We know that when a user arrives, he she clicks on a page or starts the browser, this causes a number of consecutive connections depending on the number of pictures frames on that requested page. Each connection, containing picture or frame, has a different file size. In our model, we will look at [4]: How users arrive (The user inter arrivals, k T ) The number of the connections due to a request (in one session) The time between two consecutive connections ( n t ) The file size (F) as defined in figure 11. Traffic from the Web arrives as a burst with different lengths. In ....
[Crovella and Bestavros 1995] Crovella, M., and Bestavros, A., Explaining World Wide Web traffic self-similarity, Tech. Rep.BUCS-TR-95-015, Boston University, CS Dept., Boston, MA 02215, 1995.
....without bound as the base bandwidth increases. In [7] traffic traces were studied, with the conclusion that much of the WAN traffic in the Internet could not be modeled with a Poisson inter arrivals process, but instead exhibited distributions with much larger variances and self similarity [8] [9]. Although it is difficult to predict the load distributions faced by future networks, such results suggest that reservations will be necessary, at least for certain classes of traffic. Providing programmable platforms within the network greatly increases the ways in which end users may consume ....
M. Crovella and A. Bestavros, "Explaining world wide web traffic selfsimilarity, " Technical Report 95-015, Boston Univ., Aug. 1995.
....the bottleneck link is approximated by FGN. In Section IV, we will compare the estimated value of the Hurst parameter with the theoretical value of (3 Gamma ff) 2. We note that document size distribution of WWW (World Wide Web) traffic is approximated by the Pareto distribution with ff = 1:12 [16]. Moreover, by using the Least Squares Estimator [15] we have found that file size distribution of a UNIX operating system [5] is approximated by the Pareto distribution with ff = 0:685, and its tail distribution (over 2 Mbyte) is approximated by the Pareto distribution with ff = 1:294. In our ....
M. E. Crovella and A. Bestavros, "Explaining World Wide Web Traffic Self-similarity," Tech. Rep. TR-95-015, Boston Univ., October 1995.
....and stabilized around the maximum threshold for a large number of connections. Simulations were also conducted with more realistic traffic by using a large number of TCP connections (2,000 3,500) to transfer random size files with a size distribution derived from measurements of Web transfers [5]. Between file transfers, the TCP connections were idle for a think time also based on the same data (but with the mean reduced by a factor of 10 to generate a heavier load) The only results reported from simulations with these traffic conditions, however, were for buffer occupancy in the RED ....
M. Crovella and A. Bestavros, Explaining World Wide Web Traffic Self-Similarity, TR-95-015, Boston University Computer Science Department, Revised, October 12, 1995.
....lasted over six months. They also used Xmosaic. The obtained traces were used widely. Cunha 1995] and [Crovella 1997] found that the distributions of transmission times and document sizes versus number of requests were Pareto. The distribution of document popularity follows Zipf s distribution. [Crovella 1995] and [Crovella 1997a] demonstrated that WWW traffic had the nature of self similarity. Many researchers like to study WWW traffic by analyzing proxy traffic traces, because there are more proxy traces available than client traces. But many of these studies focus on improving caching algorithms ....
....limitation of the information in our trace files, we cannot provide parameters such as file sizes, file access frequency and number of sessions per user. But we can still derive something comparable. The distribution of session size is similar to the distribution of document size in [Cunha 1995] [Crovella 1995] and [Barford 1997] They follow a power law distribution that takes a hyperbolic shape. It can be expressed as f(x) x a . When a is between 0 and 2, the distribution is called heavy tailed. For the file size distribution [Cunhua 1995] gave a = 1.35, Crovella 1995] presented a = 1.0, Arlitt ....
[Article contains additional citation context not shown here]
Mark E. Crovella, Azer Bestavros, Explaining World Wide Web Traffic Self-Similarity, Department of Computer Science, Boston University, Boston, MA, Technical Report BUCS-TR-95-015, 1995.
....exhibit the phenomena of heavy tailed marginal distributions and long range dependence. Tails can be so heavy that only infinite variance models are possible (eg, 43] and sometimes, as in file size data, even first moments are infinite. See [1] Heavy tails have been fit to file lengths ( 1] [9], 10] cpu time to complete a job, call holding times, inter arrival times between packets in a network ( 39] lengths of on off cycles ( 43] 42] Other areas where heavy tails abound are finance and economics ( 12] 13] 20] 6] 7] and insurance analysis ( 30] 32] Of course, long ....
....was originially considered in hydrology in connection with the Hurst phenomenon. See [21] 22] 4] 5] In telecommunications, long range dependence has been found in video conference data ( 3] packet counts per unit time in ethernet traffic ( 27] and in bytes per unit time in WWW traffic ([9], 10] i ii FLUID QUEUES, LEAKY BUCKETS Tails of many teletraffic quantities are heavy and there is suspicion that they are getting heavier as WWW users become more demanding. However, this hypothesis is yet to be verified statistically and the data currently collected may be inadequate to ....
[Article contains additional citation context not shown here]
M. Crovella and A. Bestavros. Explaining world wide web traffic self-- similarity. Preprint available as TR-95-015 from fcrovella,bestgcs.bu.edu, 1995.
....behavior at the bottleneck link can be described as a M G 1 Processor Sharing queue with multiclass customers. The reason we choose a regularly varying file size distribution is from the findings that the distribution of filesizes and the length of web sessions on the Internet is heavy tailed [6] [7]. In this queue, we are interested in the tail of sojourn time distribution, which represents the time required to successfully transmit a file of a customer in class i. In multiclass M G 1 Processor Sharing queue, we have the following relationship: Theorem 1 (Zwart [8] Suppose customers in all ....
Mark Crovella and Azer Bestavros, "Explaining World Wide Web traffic self-similarity," Tech. Rep. TR-95-015, Computer Science Department, Boston University, 1995.
.... of 2 KB, images have an average size of 14 KB Traffic properties [Sedayao 1994; Cunha et al. 1995; Mogul 1995; Arlitt and Williamson 1996] Small images account for the majority of the traffic and document size is inversely related to request frequency Self similarity of HTTP traffic [Crovella and Bestavros 1995; Gribble and Brewer 1997; Abdulla 1998] Bursty, self similar traffic between the micro second and minute time range Periodic nature of HTTP traffic [Bolot and Hoschka 1996; Abdulla et al. 1997c; Gribble and Brewer 1997] Periodic traffic patterns able to be modeled by time series ....
....on caching algorithms were also presented. An application level specific caching analysis using the same data can be found in [Bestavros et al. 1995] 4 The same data set was used to demonstrate and explain the self similar nature of WWW traffic for time ranges between 1 second and 100 seconds [Crovella and Bestavros 1995; Crovella and Bestavros 1996] A clear day night cycle of network demand is noted at the 16.6 minute bin level. The number of bytes per unit time was used as the primary metric to gauge burstiness. Their explanation of self similarity utilizes the Pareto distribution of file sizes, transmission ....
[Article contains additional citation context not shown here]
Crovella, M., and Bestavros, A. (1995). Explaining World Wide Web traffic self-similarity.
....enough m, X m 1 GammaH X (m) where X (m) X (m) k) k 1) is the aggregated process of order m, given by X (m) k) 1 m (X (k Gamma1)m 1 Delta Delta Delta X km ) k 1: The process under consideration is the number of connection arrivals per time unit. In the past [LTWW94, GW94, Ber94, PF95, WTLW95, CB95] various graphical tools, such as variancetime plots , Pox plot of R S , and periodigram plots , and statistical tools, such as periodigram based MLE estimate have been used to estimate the Hurst parameter. Using these tools on the http connection arrival from the busy hour of the external ....
....in ftp connections. Mogul [Mog95] considered the arrival of world wide web requests at a busy world wide web server. His conclusions include that http request arrivals approximately follow Poisson curves for interarrival times below the mean but have much larger tails. Bestavros and Crovella [CB95, CB97] examine world wide web requests using a different data collection scheme. They point out that durations of http connections follow heavy tailed distributions. They explain this observation by pointing out that file size distributions on Web servers are also heavy tailed. 5 Modeling ....
Mark E. Crovella and Azer Bestavros. Explaining world wide web traffic self-similarity. Technical Report BU-CS-95-015, Boston University, MA, 1995.
....scene recorded 15 World Wide Web traffic A group of researchers from the Boston University (M. Crovella, A. Bestavros and others) published a set of papers in 1995 and 1996 where they report measurement studies done of WWW traffic, and where the main result was that this traffic is selfsimilar [8, 9]. Over a half million Web request traces from about 40 workstations at the Boston University were collected and analyzed. The study was divided into three classes of measurements, namely on total WWW traffic, individual user (source) behavior and transmission times. The authors did also additional ....
....behavior (at a session call level and also during the session) is due to (human) user behavior. It was found, for instance, that the distribution of user requests (think time) and preferences for documents on Internet (WWW) show an extreme degree of fluctuations over a wide range of time scales [8, 9]. Furthermore, different flow control mechanisms existent in diverse traffic sources (e.g. VBR video MPEG coded sources, ABR, TCP) which regulate the output traffic rate depending upon the states of the network, may also contribute to an increased burstiness in the network traffic [21] It is ....
[Article contains additional citation context not shown here]
Crovella M.E. and Bestavros A., Explaining World Wide Web Traffic SelfSimilarity, Technical Report: TR-95-015, Computer Science Department, Boston University, 1995.
....C. ST ARIC A origins and effects of the self similarity. Willinger et al. 57, 58, 59, 77, 83, 84, 85] discussed self similarity of packet counts per unit time in LANS and WANS and a parallel discussion of selfsimilarity of bytes per unit time in WWW traffic was conducted by Crovella et al. ([15, 16, 19, 17]) Crovella, Kim and Park ( 18] conducted a large simulation study to assess the causes and effects of self similarity in situations that involved slowdown nodes, buffers, varying rates and varying tail parameters. Errammilli and Willinger ( 25] used experimental queueing analysis to show why ....
....from the fBm model and the fl for the transfer times was smaller. We selected a short period with high traffic. Realistic models should include the variation in the number of logged on workstations. An extensive statistical analysis of these data has been carried out by the authors of the trace ([15]) In particular they present similar estimates for tails and traffic rates and explain the discrepancy with the theory through the low traffic level. From Figure 4.2 one sees that the left tail (near 0) of the inter arrival times looks like an exponential or Weibull tail while the right tail ....
M. Crovella and A. Bestavros. Explaining world wide web traffic self--similarity. Preprint available as TR-95-015 from fcrovella,bestgcs.bu.edu, 1995.
....produced strong indications of longrange dependence and self similarity. Several empirical studies present statistical evidence for existence of these non standard dependence structures. See for example Leland, Taqqu, Willinger and Wilson (1993, 1994) Willinger, Taqqu, Leland and Wilson (1995) Crovella and Bestavros (1995); Cunha, Bestavros and Crovella (1995) Seeking an explanation for the observed long range dependence and self similarity, Willinger, Taqqu, Sherman and Wilson, 1995) have modeled traffic between a single source and destination as an on off or packet train process. In their model, an idealized ....
....and self similarity. Several empirical studies present statistical evidence for existence of these non standard dependence structures. See for example Leland, Taqqu, Willinger and Wilson (1993, 1994) Willinger, Taqqu, Leland and Wilson (1995) Crovella and Bestavros (1995) Cunha, Bestavros and Crovella (1995). Seeking an explanation for the observed long range dependence and self similarity, Willinger, Taqqu, Sherman and Wilson, 1995) have modeled traffic between a single source and destination as an on off or packet train process. In their model, an idealized source alternates between an on state, ....
[Article contains additional citation context not shown here]
Crovella, M and Bestavros, A., Explaining world wide web traffic self--similarity, Preprint available as TR-95-015 from fcrovella,bestg@cs.bu.edu (1995).
....origins and effects of the self similarity. Willinger et al. 32] 24] 25] 35] 26] 36] 37] discussed self similarity of packet counts per unit time in LANS and WANS and a parallel discussion of self similarity of bytes per unit time in WWW traffic was conducted by Crovella et al. ([6, 7, 10, 8]) Crovella, Kim and Park ( 9] conducted a large simulation study to assess the causes and effects of self similarity in situations that involved slowdown nodes, buffers, varying rates and varying tail parameters. Errammilli and Willinger ( 13] used experimental queueing analysis to show why ....
....without a finite mean, stationary versions of renewal processes do not exist and (uncontrolled) buffer content stochastic processes would not be stable. Despite the prevalence of this assumption that 1 ff 2, it is clear that other assumptions have to be considered. The Boston University study ([6], 7] 11] suggests self similarity of web traffic stems from heavy tailed file sizes and reports an overall estimate for a five month measurement period (see [11] of ff = 1:05. However, there is considerable month to month variation in these estimates and, for instance, the estimate for ....
M. Crovella and A. Bestavros. Explaining world wide web traffic self--similarity. Preprint available as TR-95-015 from fcrovella,bestgcs.bu.edu, 1995.
....2.2.3 Handling Bursts and Incremental Growth As the load on the system grows, the SNS Manager must pull in idle nodes, and start new workers to deal with excess load. Moreover, the system must be able to deal with bursts in load. Network traffic has been shown to be bursty at varying time scales [37, 18, 33], and a network service must be able to handle such bursts. We deal with short traffic bursts by replicating workers and directing tasks across all replicated workers for greater throughput. More prolonged bursts, however, can result in stressing the system s resources. For example, after the ....
Crovella, M. E., and Bestavros, A. Explaining world wide web traffic self-similarity. Tech. Rep. TR95 -015, Computer Science Department, Boston University, Oct 1995.
.... fact, since we expect relevant data applications serviced by WLANs to very much resemble today s WWW applications, in our performance analysis we use data traffic models which are very similar to those recently proposed in the literature for these type of applications (i.e. WWW applications) 3] [2]. Since we evaluate the HIPERLAN performances with one class of data traffic we only take into consideration the low user priority. Although data integrity is a key requirement for data transmission, we assume an error free radio channel despite the fact that we are dealing with a very unreliable ....
M. E. Crovella, A. Bestavros, "Explaining World Wide Web Traffic Self-Similarity", Technical Report TR 95-015, Computer Science Dept., Boston University, August 29, 1995.
....upon the theory and analysis techniques presented in these two papers to demonstrate the presence of selfsimilarity in file system traffic. Self similarity in various other types of systems (such as wide area traffic [20] ATM networks [8] variable bit rate video [3] and World Wide Web traffic [5]) has been detected using similar techniques. The analysis of file system performance, access characteristics, and traffic patterns has received considerable attention in the past few years. In [24] the effects of file layout and fragmentation of a disk on file system performance are measured ....
Crovella, M. E., and Bestavros, A. Explaining world wide web traffic self-similarity. Tech. Rep. TR-95015, Computer Science Department, Boston University, Oct 1995.
.... given by (5) and the Pareto cumulative distribution by (6) F (t) 1 Gamma e Gammat (5) F (t) 1 Gamma ( k t ) ff ; t k (6) The parameters are , which is the inverse of the mean inter arrival time, k is the minimum packet size in the Pareto distribution, and, ff is a shape parameter[7, 8]. For ff 1 the distribution has infinite mean, and for ff 2 infinite variance. In Figures 3 4 the outcome of two simulations are plotted. In Figure 3, the top graphs show the bit throughput represented by a vertical bar for each frame, and a different color nuance of gray for each of 25 users. ....
Mark E. Crovella and Azer Bestavros "Explaining World Wide Web Traffic Self-Similarity", Technical Report TR-95-015 - Revised, Computer Science Department, Boston University, 1995
....traffic. Empirical evidence on the existence of self similarity and LRD in traffic measurements can be found in [24, 9, 11] A common explanation for observed LRD and self similarity of network traffic is heavy tailed transmission times. Sometimes, this is due to file lengths being heavy tailed [8, 10, 12, 13, 11, 2] and sometimes due to heavy tailed burst lengths, where a burst is a period where packet arrivals are not separated by more than some threshold value [28] Analysts are largely in agreement about the self similar nature of aggregate traffic, at least at time scales above a certain threshold. ....
M. Crovella and A. Bestavros. Explaining world wide web traffic self--similarity. Preprint available as TR-95-015 from fcrovella,bestgcs.bu.edu, 1995.
....the trace pAug, which attains a value of 2.6 at a time scale of 1 sec. On the other hand, the activity of the OctExt trace differs significantly from On Off type behavior. For example, its peak to mean ratio is 16.4 at a time scale 10 sec. Further, the apparent non stationarity 0 See, however, [4] which provides a plausible explanation of why World Wide Web traffic is self similar. 1 These traces, collected in 1989, are available at ftp.bellcore.com under the directory pub wel lan traffic. renders the application of stationary models difficult. We choose the FSNDP model for the ....
....traffic under general conditions exists. Considering that major LAN and WAN applications employing TCP exhibit fractal nature, we suspect that one of the origins of self similarity at the network level lies in the functionalities of TCP such as the slow start algorithm and retransmission. Ref. [4] explains the self similarity of the World Wide Web (WWW) traffic via the Cox s M=G=1 type model based on the observation that the file size is heavy tailed. But this approach fails to capture the following two characteristics of the WWW traffic: i) the file size and the TCP connection duration ....
Mark Crovella and Azer Bestavros. Explaining World Wide Web traffic self-similarity. Technical Report TR-95-015, Boston University, CS Dept, August 1995.
....extreme risk that banks, insurance companies, governmental institutions and others are trying to control, hence the theoretical interest in modeling heavy tailed phenomena. Empirical evidence seems to indicate that their presence is almost universal. See, for example, Willinger et al. 1995) and Crovella and Bestavros (1995) for the evidence of heavy tails in communication networks (file sizes, on off times) Resnick (1997) for a discussion and measurement of heavy tails in an insurance context and Mittnik and Rachev (1993) for a description of heavy tails in financial markets. The iid heavy tailed ruin problem was ....
M. Crovella and A. Bestavros (1995): Explaining world wide web traffic self--similarity. Preprint available as TR-95-015 from fcrovella,bestgcs.bu.edu.
....volume. This is a reasonable model for some of the most important components of future network traffic, for example many independent multi media World Wide Web type connections multiplexed onto a high capacity link. The sizes of documents accessible by the Web are known to be heavy tailed [7]. As developed in the following, such a storage model has points in common with a M=G=1 model where heavy bursts correspond in some sense to service times with infinite second moments. In the case of the M=G=1 queue [2] such heavy service times generate a power law tail for Q(x) as x grows to ....
M.E. Crovella and A.Bestavros, Explaining World Wide Web traffic self-similarity, Boston University, Computer Science Dpt., Technical report TR-95-015, Oct. 1995.
.... is the structure if compare any of the subsequent plots, nevertheless there are significant changes if difference in time scales is large (see, upper and lover plots in Figure 6) The formal ways of estimating self similarity via Hurst parameter are given in [10] and applied, for example, in [14]. We list below some of main techniques used to estimate Hurst parameter. Variance time plot As indicated before, the variance of the process X m i oe 2 (X m i ) m Gammafi = m 2H Gamma2 . If oe 2 (X m i ) is plotted against m on a log log plot, then a straight line with slope fi ....
....Whittle estimator The Whittle estimator, contrary to the tree previous methods, provides confidence intervals. The method underlies some stochastic self similar model. The most common models are Fractional Gaussian noise (FGN) with parameter 0:5 H 1 and fractional ARIMA(p,d,q) are used, see [10, 14, 23]. Below, we give definitions for Fractional Gaussian noise (FGN) fractional ARIMA(p,q) and fractional ARIMA(p,d,q) Definition 5 [Fractional Gaussian noise] Let B(t) be the Fractional Brownian motion process, then X t = B(t) Gamma B(t Gamma 1) is the increment process of the Fractional ....
[Article contains additional citation context not shown here]
Crovella, M. E. and Bestavros, A. Explaining World Wide Web Traffic Self-Similarity. Boston University, Technical Report TR-95-015, October, 1995.
....NASA ClarkNet NCSA Figure 4.1: Distribution of File Sizes, by Server Figure 4.1 shows that the distribution of filesizes is heavy tailed. A distribution is deemed to be heavy tailed if (regardless of the behaviour for small values) the asymptotic shape of the distribution is hyperbolic [18, 33, 55]. The simplest heavy tailed distribution is the Pareto distribution. This distribution was originally applied to socioeconomic applications, such as the distribution of income [33] More recently, this distribution has been used to model the distribution of file sizes [18] and FTPDATA bursts [54] ....
....is hyperbolic [18, 33, 55] The simplest heavy tailed distribution is the Pareto distribution. This distribution was originally applied to socioeconomic applications, such as the distribution of income [33] More recently, this distribution has been used to model the distribution of file sizes [18] and FTPDATA bursts [54] The Pareto distribution has the probability density function: p(x) ffk ff x Gammaff Gamma1 ; ff; k 0; x k (4.1) 31 Table 4.6: Maximum Likelihood Estimates of ff (All Data Sets) Item Waterloo Calgary Saskatchewan NASA ClarkNet NCSA ff 0.40 0.49 0.54 0.45 0.63 ....
[Article contains additional citation context not shown here]
M. Crovella and A. Bestavros, "Explaining World Wide Web Traffic SelfSimilarity ", Proceedings of the 1996 SIGMETRICS Conference on the Measurement and Modeling of Computer Systems, Philadelphia, PA, pp. 160-169, May 23-26, 1996.
....are used by workers to make local decisions regarding their choice of consumers for tasks. As the load on the system grows, the SNS Manager pulls in idle nodes, and starts new workers. In addition, the system must be able to deal with sudden bursts in load (Leland, Taqqu, Willinger Wilson 1994, Crovella Bestavros 1995, Gribble, Manku, Roselli, Brewer, Gibson Miller 1998) We deal with short traffic bursts by replicating workers and directing tasks across all replicated workers for greater throughput. To handle more prolonged bursts, our design supports the notion of a pool of overflow nodes that are not ....
Crovella, M. E. & Bestavros, A. (1995), Explaining world wide web traffic self-similarity, Technical Report TR-95-015, Computer Science Department, Boston University.
....no large bursts of activity are present. At the scale of tens of seconds, very pronounced bursts of activity can be seen; peak to average ratios of more than 5:1 are common. Many studies have explored the self similarity of network traffic ( 4] 16] 21] 22] 24] 30] including web traffic [9]. Self similarity implies burstiness at all timescales this property is not compatible with our observations. One indicator of self similarity is a heavy tailed interarrival process. As figure 5 clearly shows, the interarrival time of GIF requests seen within the traces is exponentially ....
....requests in the traces. Many papers have been written on the topic of web server and client trace analysis. In [32] removal policies for network caches of WWW documents are explored, based in part on simulations driven by traces gathered from the Computer Science department of Virginia Tech. In [9], WWW traffic self similarity is demonstrated and in part explained through analysis of the Boston University web client traces. In [25] a series of proxy cache experiments are run on a sophisticated proxy cache simulation environment called SPA (Squid Proxy Analysis) using the DEC SQUID proxy ....
Mark E. Crovella and Azer Bestavros. Explaining world wide web traffic self-similarity. Technical Report TR-95-015, Computer Science Department, Boston University, Oct 1995.
.... time series of file lengths, cpu times to complete jobs, call holding times, times between terminal transmissions, inter arrival times between packets in a network and lengths of on off cycles (Duffy, et al. 1993, 1994; Meier Hellstern et al., 1991; Willinger, Taqqu, Sherman and Wilson, 1995; Crovella and Bestavros, 1995; Cunha, Bestavros and Crovella, 1995) The preliminary analysis which confirms the presence of heavy tails is based on results that show that using estimators such as the Hill estimator (see Hill, 1975; Mason, 1982) originally designed for independent and identically distributed (iid) ....
.... jobs, call holding times, times between terminal transmissions, inter arrival times between packets in a network and lengths of on off cycles (Duffy, et al. 1993, 1994; Meier Hellstern et al., 1991; Willinger, Taqqu, Sherman and Wilson, 1995; Crovella and Bestavros, 1995; Cunha, Bestavros and Crovella, 1995). The preliminary analysis which confirms the presence of heavy tails is based on results that show that using estimators such as the Hill estimator (see Hill, 1975; Mason, 1982) originally designed for independent and identically distributed (iid) observations is legitimate for stationary ....
Crovella, M and Bestavros, A., Explaining world wide web traffic self--similarity, Preprint available as TR-95-015 from fcrovella,bestg@cs.bu.edu (1995).
.... A well known example is Ethernet in which all layers (possible exception of the physical layer) contribute to traffic randomness, giving rise to the selfsimilarity It ranges from collision to flow control to error control (i.e. retransmission of TCP) even including human behavior and file size [3], all of which make self similarity visible over an extremely wide range of time scales. Based on the observation that the Internet traffic [3] is well explained by the self similarity, one might expect that future B ISDN will also exhibit selfsimilarity. However, we note one critical difference ....
....to the selfsimilarity It ranges from collision to flow control to error control (i.e. retransmission of TCP) even including human behavior and file size [3] all of which make self similarity visible over an extremely wide range of time scales. Based on the observation that the Internet traffic [3] is well explained by the self similarity, one might expect that future B ISDN will also exhibit selfsimilarity. However, we note one critical difference between the Internet and the B ISDN: unlike the Internet, future B ISDN requires scheduling at the end systems and the intermediate switch nodes ....
Mark Crovella and Azer Bestavros. Explaining world wide web traffic selfsimilarity. Technical Report TR-95-015, Boston University, CS Dept, August 1995.
....are how to reduce the volume of network traffic produced by Web clients and servers, and how to improve the response time for WWW users. Fundamental to the goal of improving Web performance is a solid understanding of WWW workloads. While there are several studies reported in the literature [3, 4, 6, 7, 12], most studies present data from only one measurement site, making it difficult to generalize results to other sites. Furthermore, most studies focus on characterizing Web clients, rather than Web servers. The purpose of this paper is to present a detailed workload characterization study of ....
....100,000 bytes. This distribution is consistent with the file size distribution reported by Braun and Claffy [4] A more rigourous study shows that the observed file size distributions match well with the Pareto distribution [11, 17] for ff 1. This observation has been noted in the literature [6, 16], and is confirmed in all six of our data sets. In particular, the tails of the distributions (for file sizes larger than 1024 bytes) are Pareto with 0:40 ff 0:63. This characteristic is present in all six data sets, and is thus added to Table 1. Table 5: Breakdown of Document Types and Sizes ....
[Article contains additional citation context not shown here]
M. Crovella and A. Bestavros, "Explaining World Wide Web Traffic Self-Similarity", Proceedings of the 1996 ACM SIGMETRICS Conference, Philadelphia, PA, May 1996.
....involves sets with invariant probability distributions under scaling. Objects are said to be self similar in a statistical sense, when parts of the whole fit the whole in distributions, rather than being exact copies. Hyperbolic distributions are shown to satisfy the requirements of selfsimilarity [CrBe95, VMHK83]. Fractals and self similarity are intimately related. Traditional performance models based on Markov characteristics have been extensively used to analyze computer systems [Klei76, MeAD94] Poisson distributions have been successfully used to construct performance models of computer systems ....
....the fractal like behavior of Ethernet LANs. The authors point out that analytical results show a clear distinction between predicted performance of certain queueing models with traditional Poisson streams and the same queuing models with self similar inputs. Recent work by Crovella and Azer [CrBe95] shows evidences that the World Wide Web traffic may be self similar. The authors also explain that the self similarity in a wide area network traffic stems from factors such as the underlying distribution of of WWW document sizes and user think times. In this paper, we look at a different aspect ....
[Article contains additional citation context not shown here]
Crovella, M. E. and Bestavros, A., "Explaining World Wide Web Traffic SelfSimilarity ", Technical Report TR-95-015 , Computer Science Department, Boston University, Boston, October 1995.
....scalability. 2.2. 3 Prolonged Bursts and Incremental Growth Although we would like to assume that there is a well defined average load and that arriving traffic follows a Poisson distribution, burstiness has been demonstrated for Ethernet traffic [33] file system traffic [26] and Web requests [16], and is confirmed by our traces of web traffic (discussed later) In addition, Internet services can experience relatively rare but prolonged bursts of high load: after the recent landing of Pathfinder on Mars, its web site served over 220 million hits in a 4 day period [42] Often, it is during ....
....in the trace file. We thus had fine grained control over both the amount and nature of the load offered to our implementation during our experimentation. 4. 2 Burstiness Burstiness is a fundamental property of a great variety of computing systems, and can be observed across all time scales [16,26,33]. Our HTTP traces show that the offered load to our implementation will contain bursts Figure 6 shows the request rate observed from the user base across a 24 hour, 3.5 hour, and 3.5 minute time interval. The 24 hour interval exhibits a strong 24 hour cycle that is overlaid with shorter ....
M.E. Crovella and A. Bestavros. Explaining World Wide Web Traffic Self-Similarity. Tech Rep. TR-95-015, Computer Science Department, Boston University, October 1995.
....no connection has been made between user behavior (e.g. connection arrival process) and the fractal nature of Ethernet traffic. As will be shown later, user behavior plays an important role determining the factors contributing to the interaction induced fractal phenomenon. Crovella and Bestavros [4] suggest Cox s immigration death model (also known as M=G=1 model) as a means of explaining the self similarity of World Wide Web (www) traffic. They found that the distribution of file sizes at a web server obeys a heavy tailed form. Assuming that the request arrival process to this web server ....
Mark Crovella and Azer Bestavros. Explaining World Wide Web traffic selfsimilarity. Technical Report TR-95-015, Boston University, CS Dept, August 1995.
....of heavy tailed marginal distributions. Examples include file lengths, cpu time to complete a job, call holding times, inter arrival times between packets in a network and lengths of on off cycles (Duffy, et al. 1993, 1994; Meier Hellstern et al., 1991; Willinger, Taqqu, Sherman and Wilson, 1995; Crovella and Bestavros, 1995; Cunha, Bestavros and Crovella, 1995) A key question of course is how to fit models to data which require heavy tailed marginal distributions. In the traditional setting of a stationary time series with finite variance, every purely non deterministic process can be expressed as a linear process ....
.... include file lengths, cpu time to complete a job, call holding times, inter arrival times between packets in a network and lengths of on off cycles (Duffy, et al. 1993, 1994; Meier Hellstern et al., 1991; Willinger, Taqqu, Sherman and Wilson, 1995; Crovella and Bestavros, 1995; Cunha, Bestavros and Crovella, 1995). A key question of course is how to fit models to data which require heavy tailed marginal distributions. In the traditional setting of a stationary time series with finite variance, every purely non deterministic process can be expressed as a linear process driven by an uncorrelated input ....
Crovella, M and Bestavros, A., Explaining world wide web traffic self--similarity, Preprint available as TR-95-015 from fcrovella,bestg@cs.bu.edu (1995).
....no large bursts of activity are present. At the scale of tens of seconds, very pronounced bursts of activity can be seen; peak to average ratios of more than 5:1 are common. Many studies have explored the self similarity of network traffic ( 4] 16] 21] 22] 24] 31] including web traffic [9]. Self similarity implies burstiness at all timescale, which is not compatible with our observations. One indicator of self similarity is a heavytailed interarrival process. As Figure 6 clearly shows, the interarrival time of GIF requests seen within the traces is exponentially distributed, and ....
....requests in the traces. Many papers have been written on the topic of web server and client trace analysis. In [34] removal policies for network caches of WWW documents are explored, based in part on simulations driven by traces gathered from the Computer Science department of Virginia Tech. In [9], WWW traffic self similarity is demonstrated and in part explained through analysis of the Boston University web client traces. In [25] a series of proxy cache 32 experiments are run on a sophisticated proxy cache simulation environment called SPA (Squid Proxy Analysis) using the DEC SQUID ....
Mark E. Crovella and Azer Bestavros. Explaining world wide web traffic self-similarity. Technical Report TR-95-015, Computer Science Department, Boston University, Oct 1995.
....was a linearly increasing function of the number of simultaneous operations, with a slope approximately proportional to the size of the original GIF (in bytes) Because N distillers shared the workstation s CPU equally, each distillation operation took N times as long to complete. Recent work [32,11] strongly suggests that access to WWW documents is bursty. For example, an access to a new page causes a flurry of distillation operations. Since a user tends to digest the document for a period of time before moving on, there are variablelength periods of inactivity between distillations ....
M. E. Crovella and A. Bestavros. Explaining World Wide Web Traffic Self-Similarity. Boston University Technical Report TR-95-015.
....the statistics we could come up with. 2.3 Log Statistics In this section we report some simple descriptive statistics characterizing the data collected. The data reported here is just a small sample of the possibilities that our data permits. For a more complex analysis using this same data see [32, 30]. 22 Although the data collection phase lasted until May 8, 1995, the results described here are based on the traces accumulated over a period starting November 21, 1994, and ending February 28, 1995. During this period, the instrumented version of Mosaic was on all the time 6 , whereby a ....
....two from an academic and one from a commercial environment, searching for common characteristics that servers share. They came up with a list of 10 items that all servers analyzed presented. Among these items, it is interesting to note the confirmation of certain points claimed in other studies [32, 14, 30], for example, that small files are responsible for most of the transfers, and that few files are responsible for most of the bytes transfer and requests received. 2.6 Summary In this Chapter, the trace collection procedure at clients was presented. The modifications made to Mosaic and the types ....
[Article contains additional citation context not shown here]
Mark Crovella and Azer Bestavros. Explaining World Wide Web Traffic SelfSimilarity. Technical Report TR-95-015, Computer Science Department, Boston University, 111 Cummington St, Boston, MA 02215, August 1995.
....in this study is analyzed in [CB96] and is shown to exhibit characteristics consistent with self similarity. While transmission times correspond to ON times, the size distribution of OFF times (corresponding to times when the browser is not actively transferring a file) is also important. CB95] contains further analyses of the data in this paper showing that silent times appear to exhibit heavy tailed characteristics with ff approximately in the range of 1.5. Thus, since the transmission time distribution appears to be heavier tailed than the silent time distribution, it seems more ....
Mark E. Crovella and Azer Bestavros. Explaining World Wide Web traffic self-similarity. Technical Report TR-95-015 (Revised), Boston University Department of Computer Science, October 1995.
.... The fundamental contribution of our work to date is two fold: 1) We have shown that WWW traffic shows strong indications of self similarity; and 2) we have shown that self similarity in general may arise from a surprising direction: the distribution of file sizes being transferred over the network [25, 26, 65]. This latter conclusion draws a causal connection between the heavy tailed distribution of WWW file sizes and the self similarity of WWW network traffic. This is interesting because heavy tailed distributions of information chunks are common in information science (e.g. the distribution of ....
Mark Crovella and Azer Bestavros. Explaining world wide web traffic self-similarity. Technical Report TR-95-015, Boston University, CS Dept, Boston, MA 02215, August 1995.
....in this study is analyzed in [CB96] and is shown to exhibit characteristics consistent with self similarity. While transmission times correspond to ON times, the size distribution of OFF times (corresponding to times when the browser is not actively transferring a file) is also important. In [CB95] analyses similar to those in this paper are presented, showing that silent times appear to exhibit heavy tailed characteristics with ff approximately in the range of 1.5. Since the transmission time distribution appears to be heavier tailed than the silent time distribution, it seems that the ....
Mark E. Crovella and Azer Bestavros. Explaining World Wide Web traffic self-similarity. Technical Report TR-95-015 (Revised), Boston University Department of Computer Science, October 1995.
....0 0 1 2 3 4 5 6 7 8 Log10(P[X x] Log10(File Size in Bytes) 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 Log10(P[X x] Log10(File Size in Bytes) All Files Image Files Audio Files Video Files Text Files Figure 9: LLCD of File Sizes of 32 Web Sites document is accessed and the size of the document. In [7], we showed that there is an inverse correlation between file size and file reuse. This relationship suggests that systems that perform caching on WWW objects will tend to increase the tail weight of the data traffic resulting from misses in the cache as compared to the traffic without caching. To ....
....In subsection 5.2, we attributed the self similarity of Web traffic to the superimposition of heavy tailed ON OFF processes, where the ON times correspond to the transmission durations of individual Web files and OFF times correspond to periods when a workstation is not receiving Web data. In [7], we present analyses similar to those in this paper showing that OFF times exhibit two regimes. The important regime is determined by user behavior and appears to exhibit heavy tailed characteristics with ff approximately 1.5. Comparing the distributions of ON and OFF times, we find that the ON ....
Mark E. Crovella and Azer Bestavros. Explaining world wide web traffic self-similarity. Technical Report TR-95-015 (Revised) , Boston University Department of Computer Science, October 1995.
No context found.
Mark Crovella and Azer Bestavros (1995). "Explaining World Wide Web Traffic Self-Similarity." Technical Report BUCS-TR-95-015. http://citeseer.nj.nec.com/crovella95explaining.html
No context found.
M. E. Crovella and A. Bestavros, "Explaining World Wide Web Traffic SelfSimilarity, " Computer Science Dept., Boston University, Technical Report TR-95-015, August 29 1995.
No context found.
Crovella, M. E. and Bestavros, A. (1995). Explaining World Wide Web Traffic Self-Similarity. Technical Report TR-95-015, Computer Science Department, Boston University, 111 Cummington St, Boston, MA 02215.
No context found.
M. Crovella, A. Bestravros. "Explaining World Wide Web Traffic Self Similarity", Technical Report TR-95-015, Boston : Boston University, 1995
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC