Results 1 - 10
of
19
Dynamic Metadata Management for Petabyte-scale File Systems
"... In petabyte-scale distributed file systems that decouple read and write from metadata operations, behavior of the metadata server cluster will be critical to overall system performance and scalability. We present a dynamic subtree partitioning and adaptive metadata management system designed to effi ..."
Abstract
-
Cited by 35 (8 self)
- Add to MetaCart
In petabyte-scale distributed file systems that decouple read and write from metadata operations, behavior of the metadata server cluster will be critical to overall system performance and scalability. We present a dynamic subtree partitioning and adaptive metadata management system designed to efficiently manage hierarchical metadata workloads that evolve over time. We examine the relative merits of our approach in the context of traditional workload partitioning strategies, and demonstrate the performance, scalability and adaptability advantages in a simulation environment.
OBFS: A File System for Object-based Storage Devices
- IN PROCEEDINGS OF THE 21ST IEEE / 12TH NASA GODDARD CONFERENCE ON MASS STORAGE SYSTEMS AND TECHNOLOGIES, COLLEGE PARK, MD
, 2004
"... The object-based storage model, in which files are made up of one or more data objects stored on self-contained Object-Based Storage Devices (OSDs), is emerging as an architecture for distributed storage systems. The workload presented to the OSDs will be quite different from that of generalpurpose ..."
Abstract
-
Cited by 28 (6 self)
- Add to MetaCart
The object-based storage model, in which files are made up of one or more data objects stored on self-contained Object-Based Storage Devices (OSDs), is emerging as an architecture for distributed storage systems. The workload presented to the OSDs will be quite different from that of generalpurpose file systems, yet many distributed file systems employ general-purpose file systems as their underlying file system. We present OBFS, a small and highly efficient file system designed for use in OSDs. Our experiments show that our user-level implementation of OBFS outperforms Linux Ext2 and Ext3 by a factor of two or three, and while OBFS is 1/25 the size of XFS, it provides only slightly lower read performance and 10%--40% higher write performance.
Efficient Metadata Management in Large Distributed Storage Systems
, 2003
"... Efficient metadata management is a critical aspect of overall system performance in large distributed storage systems. Directory subtree partitioning and pure hashing are two common techniques used for managing metadata in such systems, but both suffer from bottlenecks at very high concurrent access ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
Efficient metadata management is a critical aspect of overall system performance in large distributed storage systems. Directory subtree partitioning and pure hashing are two common techniques used for managing metadata in such systems, but both suffer from bottlenecks at very high concurrent access rates. We present a new approach called Lazy Hybrid (LH) metadata management that combines the best aspects of these two approaches while avoiding their shortcomings.
Asynchronous scheduling of redundant disk arrays
- In 12th ACM Symposium on Parallel Algorithms and Architectures
, 2000
"... Abstract—Allocation of data to parallel disk using redundant storage and random placement of blocks can be exploited to achieve low access delays. New algorithms are proposed which improve the previously known shortest queue algorithm by systematically exploiting that scheduling decisions can be def ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
Abstract—Allocation of data to parallel disk using redundant storage and random placement of blocks can be exploited to achieve low access delays. New algorithms are proposed which improve the previously known shortest queue algorithm by systematically exploiting that scheduling decisions can be deferred until a block access is actually started on a disk. These algorithms are also generalized for coding schemes with low redundancy. Using extensive simulations, practically important quantities are measured which have so far eluded an analytical treatment: The delay distribution when a stream of requests approaches the limit of the sytem capacity, the system efficiency for parallel disk applications with bounded prefetching buffers, and the combination of both for mixed traffic. A further step toward practice is taken by outlining the system design for: automatically load-balanced parallel hard-disk array. Additional algorithmic measures are proposed for that allow variable sized blocks, seek time reduction, fault tolerance, inhomogeneous systems, and flexible priorization schemes. Index Terms—Parallel disks, lazy scheduling, asynchronous, random redundant storage, duplicate allocation, soft real time, bipartite matching, queuing theory. 1
Nomad: A scalable operating system for clusters of uni and multiprocessors
- IN PROCEEDINGS OF THE 1ST IEEE INTERNATIONAL WORKSHOP ON CLUSTER COMPUTING
, 1999
"... The recent improvements in workstation and interconnection network performance have popularized the clusters of off-the-shelf workstations. However, the usefulness of these clusters is yet to be fully exploited, mostly due to the inadequate management of cluster resources implemented by current dist ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
The recent improvements in workstation and interconnection network performance have popularized the clusters of off-the-shelf workstations. However, the usefulness of these clusters is yet to be fully exploited, mostly due to the inadequate management of cluster resources implemented by current distributed operating systems. In order to eliminate this problem and approach the computational power of large clusters of workstations, in this paper we propose Nomad, an efficient operating system for clusters of uni and/or multiprocessors. Nomad includes several important characteristics for modern cluster-oriented operating systems: scalability, efficient resource management across the cluster, efficient scheduling of parallel and distributed applications, distributed I/O, fault detection and recovery, protection, and backward compatibility. Some of the mechanisms used by Nomad, such as process checkpointing and migration, can be found in previously proposed systems. However, our system stands out for its strategy for disseminating information across the cluster and its efficient management of all cluster resources. In addition, Nomad is highly scalable as it uses neither centralized control nor extra messages to implement its functionality, taking advantage of the I/O traffic associated with its distributed file system. Our preliminary evaluation of the load balancing aspect of Nomad shows that the pattern of file accesses in our distributed file system allows for efficient and scalable load balancing. Our main conclusion is that the complete implementation of Nomad will most likely be efficient and will be a nice platform for future research on operating systems for clusters of workstations.
Interconnection Architectures for Petabyte-Scale High-Performance Storage Systems
- In Proceedings of the 21st IEEE / 12th NASA Goddard Conference on Mass Storage Systems and Technologies
, 2004
"... As demand for storage bandwidth and capacity grows, designers have proposed the construction of petabytescale storage systems. Rather than relying upon a few very large storage arrays, these petabyte-scale systems have thousands of individual disks working together to provide aggregate storage syste ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
As demand for storage bandwidth and capacity grows, designers have proposed the construction of petabytescale storage systems. Rather than relying upon a few very large storage arrays, these petabyte-scale systems have thousands of individual disks working together to provide aggregate storage system bandwidth exceeding 100GB/s. However, providing this bandwidth to storage system clients becomes difficult due to limits in network technology. This paper discusses different interconnection topologies for large disk-based systems, drawing on previous experience from the parallel computing community. By choosing the right network, storage system designers can eliminate the need for expensive high-bandwidth communication links and provide a highly-redundant network resilient against single node failures. We analyze several different topology choices and explore the tradeoffs between cost and performance. Using simulations, we uncover potential pitfalls, such as the placement of connections between the storage system network and its clients, that may arise when designing such a large system.
Performance of the IBM General Parallel File System
- IN PROCEEDINGS OF THE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM
, 2000
"... We measure the performance and scalability of IBM's General Parallel File System (GPFS) under a variety of conditions. The measurements are based on benchmark programs that allow us to vary block sizes, access patterns, etc., and to measure aggregate throughput rates. We use the data to give perform ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
We measure the performance and scalability of IBM's General Parallel File System (GPFS) under a variety of conditions. The measurements are based on benchmark programs that allow us to vary block sizes, access patterns, etc., and to measure aggregate throughput rates. We use the data to give performance recommendations for application development and as a guide to the improvement of parallel file systems.
Algorithms for Scalable Storage Servers
- In SOFSEM 2004: Theory and Practice of Computer Science
, 2004
"... We survey a set of algorithmic techniques that make it possible to build a high performance storage server from a network of cheap components. Such a storage server oers a very simple programming model. To the clients it looks like a single very large disk that can handle many requests in parall ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
We survey a set of algorithmic techniques that make it possible to build a high performance storage server from a network of cheap components. Such a storage server oers a very simple programming model. To the clients it looks like a single very large disk that can handle many requests in parallel with minimal interference between the requests.
Cooperative Caching And Prefetching In Parallel/distributed File Systems
, 1997
"... If we examine the structure of the applications that run on parallel machines, we observe that their I/O needs increase tremendously every day. These applications work with very large data sets which, in most cases, do not fit in memory and have to be kept in the disk. The input and output data file ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
If we examine the structure of the applications that run on parallel machines, we observe that their I/O needs increase tremendously every day. These applications work with very large data sets which, in most cases, do not fit in memory and have to be kept in the disk. The input and output data files are also very large and have to be accessed very fast. These large applications also want to be able to checkpoint themselves without wasting too much time. These facts constantly increase the expectations placed on parallel and distributed file systems. Thus, these file systems have to improve their performance to avoid becoming the bottleneck in parallel/distributed environments. On the other hand, while the performance of the new processors, interconnection networks and memory increases very rapidly, no such thing happens with the disk performance. This lack of improvement is due to the mechanical parts used to build the disks. These components are slow and limit both the latency and t...
I/O in Parallel and Distributed Systems
"... One is scientific computing with massive datasets, such as those found in seismic processing, climate modeling, and so forth [dC94]. The second is databases [DG92]. The I/O bottleneck continues to be a serious concern for scientific computing, particularly Grand Challenge problems, where it is now ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
One is scientific computing with massive datasets, such as those found in seismic processing, climate modeling, and so forth [dC94]. The second is databases [DG92]. The I/O bottleneck continues to be a serious concern for scientific computing, particularly Grand Challenge problems, where it is now commonly recognized as an obstacle. Many scientific applications generate 1 GB of I/O per run [dC94], and applications performing an order of magnitude more are not uncommon: applications in computational physics and fluid dynamics are projected to require I/O on the order of 1 TB [dC94]. It seems clear that these total I/O requirements will keep increasing as scientists continue to study phenomena at larger space and time scales, and at finer space and time resolutions. Since the response time that humans can tolerate for obtaining computational results--- no matter how comprehensive and detailed--- is always bounded, the I/O rates required will continue to increase also. Thus while curre

