Results 1 - 10
of
58
Self-Similarity in World Wide Web Traffic: Evidence and Possible Causes
, 1996
"... Recently the notion of self-similarity has been shown to apply to wide-area and local-area network traffic. In this paper we examine the mechanisms that give rise to the self-similarity of network traffic. We present a hypothesized explanation for the possible self-similarity of traffic by using a p ..."
Abstract
-
Cited by 1023 (22 self)
- Add to MetaCart
Recently the notion of self-similarity has been shown to apply to wide-area and local-area network traffic. In this paper we examine the mechanisms that give rise to the self-similarity of network traffic. We present a hypothesized explanation for the possible self-similarity of traffic by using a particular subset of wide area traffic: traffic due to the World Wide Web (WWW). Using an extensive set of traces of actual user executions of NCSA Mosaic, reflecting over half a million requests for WWW documents, we examine the dependence structure of WWW traffic. While our measurements are not conclusive, we show evidence that WWW traffic exhibits behavior that is consistent with self-similar traffic models. Then we show that the self-similarity insuch traffic can be explained based on the underlying distributions of WWW document sizes, the effects of caching and user preference in le transfer, the effect of user "think time", and the superimposition of many such transfers in a local area network. To do this we rely on empirically measured distributions both from our traces and from data independently collected at over thirty WWW sites.
The Design and Implementation of a Log-Structured File System
- ACM Transactions on Computer Systems
, 1992
"... This paper presents a new technique for disk storage management called a log-structured file system. A logstructured file system writes all modifications to disk sequentially in a log-like structure, thereby speeding up both file writing and crash recovery. The log is the only structure on disk; it ..."
Abstract
-
Cited by 808 (6 self)
- Add to MetaCart
This paper presents a new technique for disk storage management called a log-structured file system. A logstructured file system writes all modifications to disk sequentially in a log-like structure, thereby speeding up both file writing and crash recovery. The log is the only structure on disk; it contains indexing information so that files can be read back from the log efficiently. In order to maintain large free areas on disk for fast writing, we divide the log into segments and use a segment cleaner to compress the live information from heavily fragmented segments. We present a series of simulations that demonstrate the efficiency of a simple cleaning policy based on cost and benefit. We have implemented a prototype logstructured file system called Sprite LFS; it outperforms current Unix file systems by an order of magnitude for small-file writes while matching or exceeding Unix performance for reads and large writes. Even when the overhead for cleaning is included, Sprite LFS can use 70 % of the disk bandwidth for writing, whereas Unix file systems typically can use only 5-10%. 1.
A Trace-Driven Analysis of the UNIX 4.2 BSD File System
, 1985
"... We analyzed the UNIX 4.2 BSD file system by recording userlevel activity in trace files and writing programs to analyze the traces. The tracer did not record individual read and write operations, yet still provided tight bounds on what information was accessed and when. The trace analysis shows that ..."
Abstract
-
Cited by 245 (5 self)
- Add to MetaCart
We analyzed the UNIX 4.2 BSD file system by recording userlevel activity in trace files and writing programs to analyze the traces. The tracer did not record individual read and write operations, yet still provided tight bounds on what information was accessed and when. The trace analysis shows that the average file system bandwidth needed per user is low (a few hundred bytes per second). Most of the files accessed are open only a short time and are accessed sequentially. Most new information is deleted or overwritten within a few minutes of its creation. We also wrote a simulator that uses the traces to predict the performance of caches for disk blocks. The moderate-sized caches used in UNIX reduce disk traffic for file blocks by about 50%, but larger caches (several megabytes) can eliminate 90% or more of all disk traffic. With those large caches, large block sizes (16 kbytes or more) result in the fewest disk accesses. Trace-Driven Analysis of 4.2 BSD File System January 2, 1993 1...
UNIX Disk Access Patterns
, 1993
"... Disk access patterns are becoming ever more important to understand as the gap between processor and disk performance increases. The study presented here is a detailed characterization of every lowlevel disk access generated by three quite different systems over a two month period. The contributions ..."
Abstract
-
Cited by 242 (20 self)
- Add to MetaCart
Disk access patterns are becoming ever more important to understand as the gap between processor and disk performance increases. The study presented here is a detailed characterization of every lowlevel disk access generated by three quite different systems over a two month period. The contributions of this paper are the detailed information we provide about the disk accesses on these systems (many of our results are significantly different from those reported in the literature, which provide summary data only for file-level access on small-memory systems); and the analysis of a set of optimizations that could be applied at the disk level to improve performance. Our traces show that the majority of all operations are writes; disk accesses are rarely sequential; 25-- 50% of all accesses are asynchronous; only 13--41% of accesses are to user data (the rest result from swapping, metadata, and program execution); and I/O activity is very bursty: mean request queue lengths seen by an incoming request range from 1.7 to 8.9 (1.2--1.9 for reads, 2.0--14.8 for writes), while we saw 95th percentile queue lengths as large as 89 entries, and maxima of over 1000. Using a simulator to analyze the effect of write caching at the disk level, we found that using a small non-volatile cache at each disk allowed writes to be serviced considerably faster than with a regular disk. In particular, short bursts of writes go much faster -- and such bursts are common: writes rarely come singly. Adding even 8KB of non-volatile memory per disk could reduce disk traffic by 10-- 18%, and 90% of metadata write traffic can be absorbed with as little as 0.2MB per disk of nonvolatile RAM. Even 128KB of NVRAM cache in each disk can improve write performance by as much as a factor of three. FCFS scheduling...
On the Relationship Between File Sizes, Transport Protocols, and Self-Similar Network Traffic
- In Proc. IEEE International Conference on Network Protocols
, 1996
"... Recent measurements of local-area and wide-area traffic have shown that network traffic exhibits variability at a wide range of scales. In this paper, we examine a mechanism that gives rise to self-similar network traffic and present some of its performance implications. The mechanism we study is th ..."
Abstract
-
Cited by 193 (21 self)
- Add to MetaCart
Recent measurements of local-area and wide-area traffic have shown that network traffic exhibits variability at a wide range of scales. In this paper, we examine a mechanism that gives rise to self-similar network traffic and present some of its performance implications. The mechanism we study is the transfer of files or messages whose size is drawn from a heavy-tailed distribution. First, we show that in a “realistic ” client/server network environment—i.e., one with bounded resources and coupling among traffic sources competing for resources—the degree to which file sizes are heavy-tailed can directly determine the degree of traffic self-similarity at the link level. We show that this causal relationship is robust with respect to changes in network resources (bottleneck bandwidth and
Beating the I/O Bottleneck: A Case for Log-Structured File Systems
- Operating Systems Review
, 1988
"... CPU speeds are improving at a dramatic rate, while disk speeds are not. This technology shift suggests that many engineering and office applications may become so I/O-limited that they cannot benefit from further CPU improvements. This paper discusses several techniques for improving I/O performance ..."
Abstract
-
Cited by 128 (2 self)
- Add to MetaCart
CPU speeds are improving at a dramatic rate, while disk speeds are not. This technology shift suggests that many engineering and office applications may become so I/O-limited that they cannot benefit from further CPU improvements. This paper discusses several techniques for improving I/O performance, including caches, battery-backed-up caches, and cache logging. We then examine in particular detail an approach called log-structured file systems, where the file system's only representation on disk is in the form of an append-only log. Log-structured file systems potentially provide order-of-magnitude improvements in write performance. When log-structured file systems are combined with arrays of small disks (which provide high bandwidth) and large main-memory file caches (which satisfy most read accesses), we believe it will be possible to achieve 1000-fold improvements in I/O performance over today's systems. ############################# The work described here was supported in part b...
Heavy-Tailed Probability Distributions in the World Wide Web
- IN A PRACTICAL GUIDE TO HEAVY TAILS: STATISTICAL TECHNIQUES AND APPLICATIONS
, 1998
"... The explosion of the World Wide Web as a medium for information dissemination has made it important to understand its characteristics, in particular the distribution of its file sizes. This paper presents evidence that a number of file size distributions in the Web exhibit heavy tails, including ..."
Abstract
-
Cited by 117 (10 self)
- Add to MetaCart
The explosion of the World Wide Web as a medium for information dissemination has made it important to understand its characteristics, in particular the distribution of its file sizes. This paper presents evidence that a number of file size distributions in the Web exhibit heavy tails, including files requested by users, files transmitted through the network, transmission durations of files, and files stored on servers. In addition, we argue that because of the presence of caching in the Web, the size distribution of transmitted files is primarily determined by the distribution of files available in the Web, and is relatively insensitive to the distribution of files requested by users. Finally, we discuss some of the implications of heavy-tailed transmission durations and relate these results to selfsimilarity in network traffic.
File System Usage in Windows NT 4.0
- ACM Symposium on Operating System Principles (Kiawah Island Resort
, 1999
"... We have performed a study of the usage of the Windows NT File System through long-term kernel tracing. Our goal was to provide a new data point with respect to the 1985 and 1991 trace-based File System studies, to investigate the usage details of the Windows NT file system architecture, and to study ..."
Abstract
-
Cited by 106 (0 self)
- Add to MetaCart
We have performed a study of the usage of the Windows NT File System through long-term kernel tracing. Our goal was to provide a new data point with respect to the 1985 and 1991 trace-based File System studies, to investigate the usage details of the Windows NT file system architecture, and to study the overall statistical behavior of the usage data. In this paper we report on these issues through a detailed comparison with the older traces, through details on the operational characteristics and through a usage analysis of the file system and cache manager. Next to architectural insights we provide evidence for the pervasive presence of heavy-tail distribution characteristics in all aspect of file system usage. Extreme variances are found in session inter-arrival time, session holding times, read/write frequencies, read/write buffer sizes, etc., which is of importance to system engineering, tuning and benchmarking.
On the Effect of Traffic Self-similarity on Network Performance
, 1997
"... Recent measurements of network traffic have shown that self-similarity is an ubiquitous phenomenon present in both local area and wide area traffic traces. In previous work, we have shown a simple, robust application layer causal mechanism of traffic self-similarity, namely, the transfer of files i ..."
Abstract
-
Cited by 83 (9 self)
- Add to MetaCart
Recent measurements of network traffic have shown that self-similarity is an ubiquitous phenomenon present in both local area and wide area traffic traces. In previous work, we have shown a simple, robust application layer causal mechanism of traffic self-similarity, namely, the transfer of files in a network system where the file size distributions are heavy-tailed. In this paper, we study the effect of scale-invariant burstiness on network performance when the functionality of the transport layer and the nteraction of traffic sources sharing bounded network resources is incorporated. First, we show that transport layer mechanisms are important factors in translating the application layer causality into link traffic self-similarity. Network performance as captured by throughput, packet loss rate, and packet retransmission rate degrades gradually with increased heavy-tailedness while queueing delay, response time, and fairness deteriorate more drastically. The degree to which heavy-tailedness affects self-similarity is determined by how well congestion control is able to shape a source traffic into an on-average constant output stream while conserving information. Second, we show that increasing network resources such as link bandwidth and buffer capacity results in a superlinear improvement in performance. When large file transfers occur with nonnegligible probability, the incremental
Long Term Distributed File Reference Tracing: Implementation and Experience
, 1994
"... DFSTrace is a system to collect and analyze long-term file reference data in a distributed UNIX workstation environment. The design of DFSTrace is unique in that it pays particular attention to efficiency, extensibility, and the logistics of long-term trace data collection in a distributed environme ..."
Abstract
-
Cited by 82 (3 self)
- Add to MetaCart
DFSTrace is a system to collect and analyze long-term file reference data in a distributed UNIX workstation environment. The design of DFSTrace is unique in that it pays particular attention to efficiency, extensibility, and the logistics of long-term trace data collection in a distributed environment. The components of DFSTrace are a set of kernel hooks, a kernel buffer mechanism, a data extraction agent, a set of collection servers, and post-processing tools. Our experience with DFSTrace has been highly positive. Tracing has been virtually unnoticeable, degrading performance 3-7%, depending on the level of detail of tracing. We have collected file reference traces from approximately 30 workstations continuously for over two years. We have implemented a post-processing library to provide a convenient programmer interface to the traces, and have created an on-line database of results from a suite of analysis programs to aid trace selection. Our data has been used for a wide variety of purposes, including file system studies, performance measurement and tuning, and debugging. Extensions of DFSTrace have enabled its use in applications such as field reliability testing and determining disk geometry. This paper presents the design, implementation, and evaluation of DFSTrace and associated tools, and describes how they have been used.

