Results 1 - 10
of
565
The Design and Implementation of a Log-Structured File System
- ACM Transactions on Computer Systems
, 1992
"... This paper presents a new technique for disk storage management called a log-structured file system. A logstructured file system writes all modifications to disk sequentially in a log-like structure, thereby speeding up both file writing and crash recovery. The log is the only structure on disk; it ..."
Abstract
-
Cited by 1092 (8 self)
- Add to MetaCart
(Show Context)
This paper presents a new technique for disk storage management called a log-structured file system. A logstructured file system writes all modifications to disk sequentially in a log-like structure, thereby speeding up both file writing and crash recovery. The log is the only structure on disk; it contains indexing information so that files can be read back from the log efficiently. In order to maintain large free areas on disk for fast writing, we divide the log into segments and use a segment cleaner to compress the live information from heavily fragmented segments. We present a series of simulations that demonstrate the efficiency of a simple cleaning policy based on cost and benefit. We have implemented a prototype logstructured file system called Sprite LFS; it outperforms current Unix file systems by an order of magnitude for small-file writes while matching or exceeding Unix performance for reads and large writes. Even when the overhead for cleaning is included, Sprite LFS can use 70 % of the disk bandwidth for writing, whereas Unix file systems typically can use only 5-10%. 1.
Serverless Network File Systems
- ACM TRANSACTIONS ON COMPUTER SYSTEMS
, 1995
"... In this paper, we propose a new paradigm for network file system design, serverless network file systems. While traditional network file systems rely on a central server machine, a serverless system utilizes workstations cooperating as peers to provide all file system services. Any machine in the sy ..."
Abstract
-
Cited by 473 (28 self)
- Add to MetaCart
(Show Context)
In this paper, we propose a new paradigm for network file system design, serverless network file systems. While traditional network file systems rely on a central server machine, a serverless system utilizes workstations cooperating as peers to provide all file system services. Any machine in the system can store, cache, or control any block of data. Our approach uses this location independence, in combination with fast local area networks, to provide better performance and scalability than traditional file systems. Further, because any machine in the system can assume the responsibilities of a failed component, our serverless design also provides high availability via redundant data storage. To demonstrate our approach, we have implemented a prototype serverless network file system called xFS. Preliminary performance measurements suggest that our architecture achieves its goal of scalability. For instance, in a 32-node xFS system with 32 active clients, each client receives nearly as much read or write throughput as it would see if it were the only active client.
Informed Prefetching and Caching
- In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles
, 1995
"... The underutilization of disk parallelism and file cache buffers by traditional file systems induces I/O stall time that degrades the performance of modern microprocessor-based systems. In this paper, we present aggressive mechanisms that tailor file system resource management to the needs of I/O-int ..."
Abstract
-
Cited by 402 (10 self)
- Add to MetaCart
The underutilization of disk parallelism and file cache buffers by traditional file systems induces I/O stall time that degrades the performance of modern microprocessor-based systems. In this paper, we present aggressive mechanisms that tailor file system resource management to the needs of I/O-intensive applications. In particular, we show how to use application-disclosed access patterns (hints) to expose and exploit I/O parallelism and to allocate dynamically file buffers among three competing demands: prefetching hinted blocks, caching hinted blocks for reuse, and caching recently used data for unhinted accesses. Our approach estimates the impact of alternative buffer allocations on application execution time and applies a cost-benefit analysis to allocate buffers where they will have the greatest impact. We implemented informed prefetching and caching in DEC’s OSF/1 operating system and measured its performance on a 150 MHz Alpha equipped with 15 disks running a range of applications including text search, 3D scientific visualization, relational database queries, speech recognition, and computational chemistry. Informed prefetching reduces the execution time of the first four of these applications by 20 % to 87%. Informed caching reduces the execution time of the fifth application by up to 30%.
Feasibility of a Serverless Distributed File System Deployed on an Existing Set of Desktop PCs
- Proc. ACM SIGMETRICS
, 2000
"... We consider an architecture for a serverless distributed file system that does not assume mutual trust among the client computers. The system provides security, availability, and reliability by distributing multiple encrypted replicas of each file among the client machines. To assess the feasibility ..."
Abstract
-
Cited by 325 (9 self)
- Add to MetaCart
(Show Context)
We consider an architecture for a serverless distributed file system that does not assume mutual trust among the client computers. The system provides security, availability, and reliability by distributing multiple encrypted replicas of each file among the client machines. To assess the feasibility of deploying this system on an existing desktop infrastructure, we measure and analyze a large set of client machines in a commercial environment. In particular, we measure and report results on disk usage and content; file activity; and machine uptimes, lifetimes, and loads. We conclude that the measured desklop infrastructure would passably support our proposed system, providing availability on the order of one unfilled file request per user per thousand days. Keywords Serverless distributed file system architecture, personal computer
Frangipani: A Scalable Distributed File System
"... The ideal distributed file system would provide all its users with coherent, shared access to the same set of files,yet would be arbitrarily scalable to provide more storage space and higher performance to a growing user community. It would be highly available in spite of component failures. It woul ..."
Abstract
-
Cited by 320 (1 self)
- Add to MetaCart
(Show Context)
The ideal distributed file system would provide all its users with coherent, shared access to the same set of files,yet would be arbitrarily scalable to provide more storage space and higher performance to a growing user community. It would be highly available in spite of component failures. It would require minimal human administration, and administration would not become more complex as more components were added. Frangipani is a new file system that approximates this ideal, yet was relatively easy to build because of its two-layer structure. The lower layer is Petal (described in an earlier paper), a distributed storage service that provides incrementally scalable, highly available, automatically managed virtual disks. In the upper layer, multiple machines run the same Frangipani file system code on top of a shared Petal virtual disk, using a distributed lock service to ensure coherence. Frangipaniis meant to run in a cluster of machines that are under a common administration and can communicate securely. Thus the machines trust one another and the shared virtual disk approach is practical. Of course, a Frangipani file system can be exported to untrusted machines using ordinary network file access protocols. We have implemented Frangipani on a collection of Alphas running DIGITAL Unix 4.0. Initial measurements indicate that Frangipani has excellent single-server performance and scales well as servers are added.
The Zebra striped network file system
- ACM Transactions on Computer Systems
, 1995
"... Zebra is a network file system that increases throughput by striping file data across multiple servers. Rather than striping each file separately, Zebra forms all the new data from each client into a single stream, which it then stripes using an approach similar to a log-structured file system. This ..."
Abstract
-
Cited by 302 (5 self)
- Add to MetaCart
Zebra is a network file system that increases throughput by striping file data across multiple servers. Rather than striping each file separately, Zebra forms all the new data from each client into a single stream, which it then stripes using an approach similar to a log-structured file system. This provides high performance for writes of small files as well as for reads and writes of large files. Zebra also writes parity information in each stripe in the style of RAID disk arrays; this increases storage costs slightly but allows the system to continue operation even while a single storage server is unavailable. A prototype implementation of Zebra, built in the Sprite operating system, provides 4-5 times the throughput of the standard Sprite file system or NFS for large files and a 15 % to 300 % improvement for writing small files. 1
The Evolution of the Web and Implications for an Incremental Crawler
, 1999
"... In this paper we study how to build an effective incremental crawler. The crawler selectively and incrementally updates its index and/or local collection of web pages, instead of periodically refreshing the collection in batch mode. The incremental crawler can improve the "freshness" of th ..."
Abstract
-
Cited by 281 (18 self)
- Add to MetaCart
In this paper we study how to build an effective incremental crawler. The crawler selectively and incrementally updates its index and/or local collection of web pages, instead of periodically refreshing the collection in batch mode. The incremental crawler can improve the "freshness" of the collection significantly and bring in new pages in a more timely manner. We first present results from an experiment conducted on more than half million web pages over 4 months, to estimate how web pages evolve over time. Based on these experimental results, we compare various design choices for an incremental crawler and discuss their trade-offs. We propose an architecture for the incremental crawler, which combines the best design choices.
A Trace-Driven Analysis of the UNIX 4.2 BSD File System
, 1985
"... We analyzed the UNIX 4.2 BSD file system by recording userlevel activity in trace files and writing programs to analyze the traces. The tracer did not record individual read and write operations, yet still provided tight bounds on what information was accessed and when. The trace analysis shows that ..."
Abstract
-
Cited by 277 (5 self)
- Add to MetaCart
(Show Context)
We analyzed the UNIX 4.2 BSD file system by recording userlevel activity in trace files and writing programs to analyze the traces. The tracer did not record individual read and write operations, yet still provided tight bounds on what information was accessed and when. The trace analysis shows that the average file system bandwidth needed per user is low (a few hundred bytes per second). Most of the files accessed are open only a short time and are accessed sequentially. Most new information is deleted or overwritten within a few minutes of its creation. We also wrote a simulator that uses the traces to predict the performance of caches for disk blocks. The moderate-sized caches used in UNIX reduce disk traffic for file blocks by about 50%, but larger caches (several megabytes) can eliminate 90% or more of all disk traffic. With those large caches, large block sizes (16 kbytes or more) result in the fewest disk accesses. Trace-Driven Analysis of 4.2 BSD File System January 2, 1993 1...
UNIX Disk Access Patterns
, 1993
"... Disk access patterns are becoming ever more important to understand as the gap between processor and disk performance increases. The study presented here is a detailed characterization of every lowlevel disk access generated by three quite different systems over a two month period. The contributions ..."
Abstract
-
Cited by 277 (20 self)
- Add to MetaCart
Disk access patterns are becoming ever more important to understand as the gap between processor and disk performance increases. The study presented here is a detailed characterization of every lowlevel disk access generated by three quite different systems over a two month period. The contributions of this paper are the detailed information we provide about the disk accesses on these systems (many of our results are significantly different from those reported in the literature, which provide summary data only for file-level access on small-memory systems); and the analysis of a set of optimizations that could be applied at the disk level to improve performance. Our traces show that the majority of all operations are writes; disk accesses are rarely sequential; 25-- 50% of all accesses are asynchronous; only 13--41% of accesses are to user data (the rest result from swapping, metadata, and program execution); and I/O activity is very bursty: mean request queue lengths seen by an incoming request range from 1.7 to 8.9 (1.2--1.9 for reads, 2.0--14.8 for writes), while we saw 95th percentile queue lengths as large as 89 entries, and maxima of over 1000. Using a simulator to analyze the effect of write caching at the disk level, we found that using a small non-volatile cache at each disk allowed writes to be serviced considerably faster than with a regular disk. In particular, short bursts of writes go much faster -- and such bursts are common: writes rarely come singly. Adding even 8KB of non-volatile memory per disk could reduce disk traffic by 10-- 18%, and 90% of metadata write traffic can be absorbed with as little as 0.2MB per disk of nonvolatile RAM. Even 128KB of NVRAM cache in each disk can improve write performance by as much as a factor of three. FCFS scheduling...
Characteristics of File System Workloads,”
, 1998
"... Abstract In this paper, we describe the collection and analysis of file system traces from a variety of different environments, including both UNIX and NT systems, clients and servers, and instructional and production ..."
Abstract
-
Cited by 269 (3 self)
- Add to MetaCart
(Show Context)
Abstract In this paper, we describe the collection and analysis of file system traces from a variety of different environments, including both UNIX and NT systems, clients and servers, and instructional and production