Results 1 - 10
of
75
MapReduce: simplified data processing on large clusters
- OSDI’04: PROCEEDINGS OF THE 6TH CONFERENCE ON SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION
, 2004
"... MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with t ..."
Abstract
-
Cited by 913 (3 self)
- Add to MetaCart
MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program’s execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google’s clusters every day. 1
The Google File System
, 2003
"... We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive application s. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients. While ..."
Abstract
-
Cited by 637 (3 self)
- Add to MetaCart
We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive application s. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients. While sharing man y of the same goals as previous distributed file systems, our design has been driven by observations of our application workloads and technological environment, both current and anticipated, that reflect a marked departure from some earlier file system assumptions. This has led us to reexamine tradition al choices an d explore radically different design points. The file system has successfully met our storage needs. It is widely deployed within Google as the storage platform for the gen ration and processing of data used by our service as well as research and development efforts that require large data sets. The largest cluster to date provides hundreds of terabytes of storage across thousands of disks on over a thousand machines, and it is concurrently accessed by hundreds of clients. In this paper, we present file system interface extensions designed to support distributed application, discuss many aspects of our design, and report measurements from both micro-benchmarks and real world use.
Hippodrome: Running Circles around Storage Administration
- In Proceedings of the Conference on File and Storage Technologies
, 2002
"... Enterprise-scale computer storage systems are extremely difficult to manage due to their size and complexity. It is difficult to generate a good storage system design for a given workload and to correctly implement the selected design. Traditionally, initial system configuration is performed by admi ..."
Abstract
-
Cited by 118 (9 self)
- Add to MetaCart
Enterprise-scale computer storage systems are extremely difficult to manage due to their size and complexity. It is difficult to generate a good storage system design for a given workload and to correctly implement the selected design. Traditionally, initial system configuration is performed by administrators who are guided by rules of thumb. Unfortunately, this process involves trial and error, and as a result is tedious and error-prone. In this paper, we introduce Hippodrome, an approach to automating initial system configuration. Hippodrome is an iterative loop that analyzes an existing system to determine its requirements, creates a new storage system design to better meet these requirements, and migrates the existing system to the new design. In this paper, we show how Hippodrome automates initial system configuration. 1
The Kangaroo Approach to Data Movement on the Grid
, 2001
"... Access to remote data is one of the principal challenges of grid computing. While performing I/O, grid applications must be prepared for server crashes, performance variations, and exhausted resources. To achieve high throughput in such a hostile environment, applications need a resilient service th ..."
Abstract
-
Cited by 84 (19 self)
- Add to MetaCart
Access to remote data is one of the principal challenges of grid computing. While performing I/O, grid applications must be prepared for server crashes, performance variations, and exhausted resources. To achieve high throughput in such a hostile environment, applications need a resilient service that moves data while hiding errors and latencies. We illustrate this idea with Kangaroo, a simple data movement system that makes opportunistic use of disks and networks to keep applications running. We demonstrate that Kangaroo can achieve better end-to-end performance than traditional data movement techniques, even though its individual components do not achieve high performance.
Interposed Request Routing for Scalable Network Storage
- IN PROCEEDINGS OF THE FOURTH SYMPOSIUM ON OPERATING SYSTEM DESIGN AND IMPLEMENTATION (OSDI
, 2000
"... This paper presents Slice, a new storage system architecture for highspeed LANs incorporating network-attached block storage. Slice interposes a request switching filter -- called a /proxy -- along the network path between the client and the network storage system (e.g., in a network adapter or swit ..."
Abstract
-
Cited by 82 (11 self)
- Add to MetaCart
This paper presents Slice, a new storage system architecture for highspeed LANs incorporating network-attached block storage. Slice interposes a request switching filter -- called a /proxy -- along the network path between the client and the network storage system (e.g., in a network adapter or switch). The purpose of the/proxy is to route requests among a server ensemble that implements the file service. We present a prototype that uses this approach to virtualize the standard NFS file protocol to provide scalable, high-bandwidth file service to ordinary NFS clients. The paper presents and justifies the architecture, proposes and evaluates several request routing policies realizable within the architecture, and explores the effects of these policies on service structure
Diamond: A storage architecture for early discard in interactive search
, 2004
"... Permission is granted for noncommercial reproduction of the work for educational or research purposes. ..."
Abstract
-
Cited by 53 (15 self)
- Add to MetaCart
Permission is granted for noncommercial reproduction of the work for educational or research purposes.
Bridging the Information Gap in Storage Protocol Stacks
- In Proceedings of the USENIX Annual Technical Conference (USENIX ’02
, 2002
"... The functionality and performance innovations in file systems and storage systems have proceeded largely independently from each other over the past years. The result is an information gap: neither has information about how the other is designed or implemented, which can result in a high cost of mai ..."
Abstract
-
Cited by 34 (6 self)
- Add to MetaCart
The functionality and performance innovations in file systems and storage systems have proceeded largely independently from each other over the past years. The result is an information gap: neither has information about how the other is designed or implemented, which can result in a high cost of maintenance, poor performance, duplication of features, and limitations on functionality. To bridge this gap, we introduce and evaluate a new division of labor between the storage system and the file system. We develop an enhanced storage layer known as Exposed RAID (ERAID), which reveals information to file systems built above; specifically, ERAID exports the parallelism and failure-isolation boundaries of the storage layer, and tracks performance and failure characteristics on a fine-grained basis. To take advantage of the information made available by ERAID, we develop an Informed Log-Structured File System (ILFS). ILFS is an extension of the standard logstructured file system (LFS) that has been altered to take advantage of the performance and failure information exposed by ERAID. Experiments reveal that our prototype implementation yields benefits in the management, flexibility, reliability, and performance of the storage system, with only a small increase in file system complexity. For example, ILFS/ERAID can incorporate new disks into the system on-the-fly, dynamically balance workloads across the disks of the system, allow for user control of file replication, and delay replication of files for increased performance. Much of this functionality would be difficult or impossible to implement with the traditional division of labor between file systems and storage.
Robustness in Complex Systems
- 8th Workshop on Hot Topics in Operating Systems (HotOS-VIII
, 2001
"... This paper argues that a common design paradigm for systems is fundamentally awed, resulting in unstable, unpredictable behavior as the complexity of the system grows. In this awed paradigm, designers carefully attempt to predict the operating environment and failure modes of the system in order to ..."
Abstract
-
Cited by 32 (0 self)
- Add to MetaCart
This paper argues that a common design paradigm for systems is fundamentally awed, resulting in unstable, unpredictable behavior as the complexity of the system grows. In this awed paradigm, designers carefully attempt to predict the operating environment and failure modes of the system in order to design its basic operational mechanisms. However, as a system grows in complexity, the diuse coupling between the components in the system inevitably leads to the buttery eect, in which small perturbations can result in large changes in behavior. We explore this in the context of distributed data structures, a scalable, cluster-based storage server. We then consider a number of design techniques that help a system to be robust in the face of the unexpected, including overprovisioning, admission control, introspection, adaptivity through closed control loops. Ultimately, however, all complex systems eventually must contend with the unpredictable. Because of this, we believe systems shoul...
A Framework For Building Unobtrusive Disk Maintenance Applications
- In Proceedings of the 3rd USENIX Conference on File and Storage Technologies. USENIX Association
, 2004
"... clean construction of disk maintenance applications. They can use it to expose the disk activity to be done, and then process completed requests as they are reported. The system ensures that these applications make steady forward progress without competing for disk access with a system's primary app ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
clean construction of disk maintenance applications. They can use it to expose the disk activity to be done, and then process completed requests as they are reported. The system ensures that these applications make steady forward progress without competing for disk access with a system's primary applications. It opportunistically completes maintenance requests by using disk idle time and freeblock scheduling. In this paper, three disk maintenance applications (backup, write-back cache destaging, and disk layout reorganization) are adapted to the system support and evaluated on a FreeBSD implementation. All are shown to successfully execute in busy systems with minimal (e.g., <2%) impact on foreground disk performance. In fact, by modifying FreeBSD's cache to write dirty blocks for free, the average read cache miss response time is decreased by 15--30%. For non-volatile caches, the reduction is almost 50%.
Lachesis: Robust Database Storage Management Based on Device-specific Performance Characteristics
- International Conference on Very Large Databases
, 2003
"... Database systems work hard to tune I/O performance, but do not always achieve the full performance potential of modern disk systems. Their abstracted view of storage components hides useful device-specific characteristics, such as disk track boundaries and advanced built-in firmware algorithms. This ..."
Abstract
-
Cited by 16 (10 self)
- Add to MetaCart
Database systems work hard to tune I/O performance, but do not always achieve the full performance potential of modern disk systems. Their abstracted view of storage components hides useful device-specific characteristics, such as disk track boundaries and advanced built-in firmware algorithms. This paper presents a new storage manager architecture, called Lachesis, that exploits and adapts to observable device-specific characteristics in order to achieve and sustain high performance. For DSS queries, Lachesis achieves I/O efficiency nearly equivalent to sequential streaming even in the presence of competing random I/O traffic. In addition, Lachesis simplifies manual configuration and restores the optimizer's assumptions about the relative costs of different access patterns expressed in query plans. Experiments using IBM DB2 I/O traces as well as a prototype implementation show that Lachesis improves standalone DSS performance by 10% on average. More importantly, when running concurrently with an on-line transaction processing (OLTP) workload, Lachesis improves DSS performance by up to 3 , while OLTP also exhibits a 7% speedup.

