Results 1 - 10
of
24
On Implementing MPI-IO Portably and with High Performance
- In Proceedings of the 6th Workshop on I/O in Parallel and Distributed Systems
, 1999
"... We discuss the issues involved in implementing MPI-IO portably on multiple machines and file systems and also achieving high performance. One way to implement MPI-IO portably is to implement it on top of the basic Unix I/O functions (open, lseek, read, write, and close), which are themselves portabl ..."
Abstract
-
Cited by 137 (21 self)
- Add to MetaCart
We discuss the issues involved in implementing MPI-IO portably on multiple machines and file systems and also achieving high performance. One way to implement MPI-IO portably is to implement it on top of the basic Unix I/O functions (open, lseek, read, write, and close), which are themselves portable. We argue that this approach has limitations in both functionality and performance. We instead advocatean implementation approach that combines a large portion of portable code and a small portion of code that is optimized separately for different machines and file systems. We have used such an approach to develop a high-performance, portable MPI-IO implementation, called ROMIO. In addition to basic I/O functionality, we consider the issues of supporting other MPI-IO features, such as 64-bit file sizes, noncontiguous accesses, collective I/O, asynchronous I/O, consistency and atomicity semantics, user-supplied hints, shared file pointers, portable data representation, and file preallocati...
Data Sieving and Collective I/o In Romio
, 1999
"... The I/O access patterns of parallel programs often consist of accesses to a large number of small, noncontiguous pieces of data. If an application’s I/O needs are met by making many small, distinct I/O requests, however, the I/O performance degrades drastically. To avoid this problem, MPI-IO allows ..."
Abstract
-
Cited by 79 (14 self)
- Add to MetaCart
The I/O access patterns of parallel programs often consist of accesses to a large number of small, noncontiguous pieces of data. If an application’s I/O needs are met by making many small, distinct I/O requests, however, the I/O performance degrades drastically. To avoid this problem, MPI-IO allows users to access a noncontiguous data set with a single I/O function call. This feature provides MPI-IO implementations an opportunity to optimize data access. We describe how our MPI-IO implementation, ROMIO, delivers high performance in the presence of noncontiguous requests. We explain in detail the two key optimizations ROMIO performs: data sieving for noncontiguous requests from one process and collective I/O for noncontiguous requests from multiple processes. We describe how one can implement these optimizations portably on multiple machines and file systems, control their memory requirements, and also achieve high performance. We demonstrate the performance and portability with performance results for three applications—an astrophysics-application template (DIST3D), the NAS BTIO benchmark, and an unstructured code (UNSTRUC)—on five different parallel machines:
File System Workload Analysis for Large Scale Scientific Computing Applications
- In Proceedings of the 21st IEEE / 12th NASA Goddard Conference on Mass Storage Systems and Technologies
, 2004
"... Parallel scientific applications require high-performance I/O support from underlying file systems. A comprehensive understanding of the expected workload is therefore essential for the design of high-performance parallel file systems. We re-examine the workload characteristics in parallel computing ..."
Abstract
-
Cited by 38 (12 self)
- Add to MetaCart
Parallel scientific applications require high-performance I/O support from underlying file systems. A comprehensive understanding of the expected workload is therefore essential for the design of high-performance parallel file systems. We re-examine the workload characteristics in parallel computing environments in the light of recent technology advances and new applications.
Markov Model Prediction of I/O Requests for Scientific Applications
- Proceedings of the 16th international conference on Supercomputing (New
, 2002
"... Given the increasing performance disparity between processors and storage devices, exploiting knowledge of spatial and temporal I/O requests is critical to achieving high performance, particularly on parallel systems. Although perfect fore-knowledge of I/O requests is rarely possible, even estimates ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Given the increasing performance disparity between processors and storage devices, exploiting knowledge of spatial and temporal I/O requests is critical to achieving high performance, particularly on parallel systems. Although perfect fore-knowledge of I/O requests is rarely possible, even estimates of request patterns can potentially yield large performance gains. This paper evaluates Markov models to represent the spatial patterns of I/O requests in scientific codes. The paper also proposes three algorithms for I/O prefetching. Evaluation using I/O traces from scientific codes shows that highly accurate prediction of spatial access patterns, resulting in reduced execution times, is possible.
Lightweight I/O for scientific applications
, 2006
"... Today’s high-end massively parallel processing (MPP) machines have thousands to tens of thousands of processors, with next-generation systems planned to have in excess of one hundred thousand processors. For systems of such scale, efficient I/O is a significant challenge that cannot be solved using ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Today’s high-end massively parallel processing (MPP) machines have thousands to tens of thousands of processors, with next-generation systems planned to have in excess of one hundred thousand processors. For systems of such scale, efficient I/O is a significant challenge that cannot be solved using traditional approaches. In particular, general purpose parallel file systems that limit applications to standard interfaces and access policies do not scale and will likely be a performance bottleneck for many scientific applications. In this paper, we investigate the use of a “lightweight” approach to I/O that requires the application or I/O-library developer to extend a core set of critical I/O functionality with the minimum set of features and services required by its target applications. We argue that this approach allows the development of I/O libraries that are both scalable and secure. We support our claims with preliminary results for a lightweight checkpoint operation on a development cluster at Sandia. 1
A Case Study of Parallel I/O for Biological Sequence Analysis on Linux Clusters
- Proceedings of the 5th IEEE International Conference on Cluster Computing (Cluster 2003), Hong Kong
, 2003
"... In this paper we analyze the I/O access patterns of a widely-used biological sequence search tool and implement two variations that employ parallel-I/O for data access based on PVFS (Parallel Virtual File System) and CEFT-PVFS (Cost-Effective Fault-Tolerant PVFS). Experiments show that the two varia ..."
Abstract
-
Cited by 13 (7 self)
- Add to MetaCart
In this paper we analyze the I/O access patterns of a widely-used biological sequence search tool and implement two variations that employ parallel-I/O for data access based on PVFS (Parallel Virtual File System) and CEFT-PVFS (Cost-Effective Fault-Tolerant PVFS). Experiments show that the two variations outperform the original tool when equal or even fewer storage devices are used in the former. It is also found that although the performance of the two variations improves consistently when initially increasing the number of servers, this performance gain from parallel I/O becomes insignificant with further increase in server number. We examine the effectiveness of two read performance optimization techniques in CEFT-PVFS by using this tool as a benchmark. Performance results indicate: (1) Doubling the degree of parallelism boosts the read performance to approach that of PVFS; (2) Skipping hotspots can substantially improve the I/O performance when the load on data servers is highly imbalanced. The I/O resource contention due to the sharing of server nodes by multiple applications in a cluster has been shown to degrade the performance of the original tool and the variation based on PVFS by up to 10 and 21 folds, respectively; whereas, the variation based on CEFT-PVFS only suffered a two-fold performance degradation. Keywords: parallel I/O, CEFT-PVFS, PVFS, BLAST 1.
Models of Parallel Applications with Large Computation and I/O Requirements
- IEEE TRANS. ON SOFTWARE ENGINEERING
, 2002
"... ... In this paper, we present a formal model of the behavior of CPU and I/O interactions in scientific applications, from which we derive various formulas that characterize application performance. Our model captures the I/O and CPU activity at different levels of granularity, where results from t ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
... In this paper, we present a formal model of the behavior of CPU and I/O interactions in scientific applications, from which we derive various formulas that characterize application performance. Our model captures the I/O and CPU activity at different levels of granularity, where results from the model are shown to be in excellent agreement with measurement data from a set of I/O-intensive applications. Using the formulas from our model, which explicitly take I/O activity into account, we also present examples of possible applications of the model.
Applications of parallel I/O
, 1996
"... Scientific applications are increasingly being implemented on massively parallel supercomputers. Many of these applications have intense I/O demands, as well as massive computational requirements. This paper is essentially an annotated bibliography of papers and other sources of information about sc ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Scientific applications are increasingly being implemented on massively parallel supercomputers. Many of these applications have intense I/O demands, as well as massive computational requirements. This paper is essentially an annotated bibliography of papers and other sources of information about scientific applications using parallel I/O. It will be updated periodically.
Improving Parallel Job Scheduling Using Runtime Measurements
"... We investigate the use of runtime measurements to improve job scheduling on a parallel machine. Emphasis is on gang scheduling based strategies. With the information gathered at runtime, we define a task classification scheme based on fuzzy logic and Bayesian estimators. The resulting local tas ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We investigate the use of runtime measurements to improve job scheduling on a parallel machine. Emphasis is on gang scheduling based strategies. With the information gathered at runtime, we define a task classification scheme based on fuzzy logic and Bayesian estimators. The resulting local task classification is used to provide better service to I/O bound and interactive jobs under gang scheduling. This is achieved through the use of idle times and also by controlling the spinning time of a task in the spin block mechanism depending on the node's workload. Simulation results show considerable improvements, in particular for I/O bound workloads, in both throughput and machine utilization for a gang scheduler using runtime information compared with gang schedulers for which this type of information is not available.
Armada: a parallel I/O framework for computational grids
- Future Generation Computer Systems
, 2002
"... High-performance computing increasingly occurs on “computational grids ” composed of heterogeneous and geographically distributed systems of computers, networks, and storage devices that collectively act as a single “virtual ” computer. One of the great challenges for this environment is to provide ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
High-performance computing increasingly occurs on “computational grids ” composed of heterogeneous and geographically distributed systems of computers, networks, and storage devices that collectively act as a single “virtual ” computer. One of the great challenges for this environment is to provide efficient access to data that is distributed across remote data servers in a grid. In this paper, we describe our solution, a framework we call armada. The framework allows applications and dataset providers to flexibly compose graphs of processing modules that describe the distribution, application interfaces, and processing required of the dataset before computation. The armada runtime system then restructures the graph, and places the processing modules at appropriate hosts to reduce network traffic. © 2002 Elsevier Science B.V. All rights reserved. Keywords: Framework; Parallel I/O; Computational grids; Data grids

