| D. Kotz. Multiprocessor file system interfaces. In Proc. the Second International Conference on Parallel and Distributed Information Systems, pages 194--201, 1993. |
....data intensive applications could be a problem for these systems. Another body of work includes run time systems such as MPI I O [38, 36] PASSION [10, 34, 35] PANDA [29] and others [31, 7] These systems provide high level structured interfaces on top of low level native parallel file systems [22] and try to match the applications data structure which is usually multidimensional array. They also provide optimizations such as collective I O and data sieving to solve the problems brought by native parallel file systems for many popular access patterns. Again, these systems do not help when ....
D. Kotz. Multiprocessor file system interfaces. In Proc. the Second International Conference on Parallel and Distributed Information Systems, pages 194--201, 1993.
....techniques such as prefetching [19] caching [24, 6] and parallel I O [17, 12] However, there are serious obstacles preventing the parallel file systems from becoming a global solution to the data management problem. First of all, user interfaces of the file systems are in general low level [22], allowing the users to express access patterns of their applications using only low level structures such as file pointers and byte offsets. Second of all, nearly every file system has its own suite of I O commands, rendering the process of porting a program from one machine to another a very ....
....(Data set size is 2 MB) 4 procs 8 procs Original 2.27 1.34 Optimized 1.91 1.15 6 Related Work Numerous techniques for optimizing I O accesses have been proposed in literature. These techniques can be classified into three categories: the parallel file system and run time system optimizations [22, 8, 10, 19, 21, 16], compiler optimizations [4, 20, 17] and application analysis and optimization [20, 6, 29, 17, 7, 37] Brown et al. 5] proposed a meta data system on top of HPSS using DB2 DBMS. Our work, in contrast, focuses more on utilizing state of the art I O optimizations with minimal programming effort. ....
D. Kotz. Multiprocessor file system interfaces. In Proc. the Second International Conference on Parallel and Distributed Information Systems, pages 194--201, 1993.
....timization techniques such as prefetching [17] caching [22,5] and parallel I O [15,10] However, there are se rious obstacles preventing the file systems from becom ing a real solution to the high level data management problem. First of all, user interfaces of the file systems are low level [21]. They force the users to express ac cess patterns of their codes using file pointers, byte off sets, etc. which do not directly match the applications data structures, which are large multi dimensional ar rays, images and so forth. Second, every file system comes with its own set of I O ....
....I 10000 lOO A B C D E F G H Access Pattern Figure 7. Execution times for read operations. 6. Related Work There are many proposed techniques for optimizing I O accesses. These techniques can be divided into three main groups: the parallel file system and run time system optimizations [21,6,9,17,19,14], compiler opti mizations [3,18,15] and application analysis and opti mization [18,5,24,15] The closest work to ours is the one done by Brown et al. 4] They propose a similar architecture to ours; however, they do not handle the advanced I O optimiza tions proposed in this paper. They ....
D. Kotz. Multiprocessor file system interfaces. In Proc. the Second International Conference on Parallel and Distributed Information Systems, pages 1940.
....[6] is built on Linux clusters and it does not employ external storage either. The third group includes run time systems such as MPII O [25, 23] PASSION [7, 22] PANDA [17] and others [18] These systems provide high level structured interfaces on top of low level native parallel file systems [12] and try to match the applications data structure which is usually multidimensional array. Again, these systems do not help when application size increases. The forth group includes meta data management systems [3, 18, 4, 11, 13, 8, 14] These systems use database s query capability to ....
D. Kotz. Multiprocessor file system interfaces. In Proc. the Second International Conference on Parallel and Distributed Information Systems, pages 194--201, 1993.
....data intensive applications could be a problem for these systems. Another body of work includes run time systems such as MPI I O [35 37] PASSION [8,33,34] PANDA [7,25] and others [4,27,39] These systems provide high level structured interfaces on top of low level native parallel file systems [20] and try to match the applications data structure which is usually a multidimensional array. They also provide optimizations such as collective I O and data sieving to solve the problems brought by native parallel file systems for many popular access patterns. Again, these systems do not help ....
D. Kotz, Multiprocessor file system interfaces, in: Proc. of the 2nd International Conference on Parallel and Distributed Information Systems (1993) pp. 194--201.
....techniques such as prefetching [20] caching [25, 7] and parallel I O [18, 12] However, there are serious obstacles preventing the parallel file systems from becoming a global solution to the data management problem. First of all, user interfaces of the file systems are in general low level [24], allowing the users to express access patterns of their applications using only low level structures such as file pointers and byte offsets. Second of all, nearly every file system has its own suite of I O commands, rendering the process of porting a program from one machine to another a very ....
....prestaging, prefetching, and computation, thereby maximizing the I O performance. 7 Related Work Numerous techniques for optimizing I O accesses have been proposed in literature. These techniques can be classified into three categories: the parallel file system and run time system optimizations [24, 8, 10, 20, 22, 17], compiler optimizations [4, 21, 18] and application analysis and optimization [21, 7, 30, 18] Brown et al. 6] proposed a meta data system on top of HPSS using DB2 DBMS. Our work, in contrast, focuses more on utilizing state of the art I O optimizations with minimal programming effort. ....
D. Kotz. Multiprocessor file system interfaces. In Proc. the Second International Conference on Parallel and Distributed Information Systems, pages 194--201. IEEE Computer Society Press, 1993.
....into several MPI libraries, including the MPI implementations of several vendors (e.g. HP, SGI, NEC) and MPICH and LAM, two widelyused, freely available, portable MPI implementations. Numerous techniques have been proposed in literature to optimize I O accesses. The run time system optimizations [14, 12, 6, 4, 5] are the most relevant among these studies. Although these techniques share the same goal with our work (optimizing I O accesses at the run time) they try to optimize only a single file at a time. Several researchers focused on implementing easy touse interfaces that include optimizations for ....
Kotz, D. Multiprocessor file system interfaces. In Proc. of the Second International Conference on Parallel and Distributed Information Systems, pp. 194-201, 1993.
....into several MPI libraries, including the MPI implementations of several vendors (e.g. HP, SGI, NEC) and MPICH and LAM, two widelyused, freely available, portable MPI implementations. Numerous techniques have been proposed in literature to optimize I O accesses. The run time system optimizations [14, 12, 6, 4, 5] are the most relevant among these studies. Although these techniques share the same goal with our work (optimizing I O accesses at the run time) they try to optimize only a single file at a time. Several researchers focused on implementing easy touse interfaces that include optimizations for ....
Kotz, D. Multiprocessor file system interfaces. In Proc. of the Second International Conference on Parallel and Distributed Information Systems, pp. 194-201, 1993.
....which cannot be satisfied by secondary storage devices and for applications which cannot afford the cost or system complexity of a large number of disk drives. There has been a considerable amount of work in addressing the flow of data to and from secondary storage devices (e.g. magnetic disks) [1, 2, 3, 4, 5, 6, 7, 8, 9]. There has also been a significant amount of work on the management of large scale data in a storage hierarchy involving tertiary storage devices (e.g. tapes devices) 10, 11, 12, 13, 14] Striping has been studied to improve the response time of tertiary storage devices [15, 16] The ....
D. Kotz. Multiprocessor file system interfaces. In Proc. the Second Intl. Conf. on Paral. and Distr. Info. Sys., pages 194--201. IEEE Computer Society Press, 1993.
....when arbitrary range queries are allowed. Jagadish et al. [23] investigated the problem of efficient organization of a data warehouse on secondary storage. There has been a considerable amount of work in addressing the flow of data to and from secondary storage devices (e.g. magnetic disks) [6, 9, 11, 34, 16, 29, 31, 32, 5]. There has also been a significant amount of work on the management of large scale data in a storage hierarchy involving tertiary storage devices (e.g. tapes 8 9 devices) 37, 22, 24, 25, 30] Striping has been studied to improve the response time of tertiary storage devices [15, 19] Also, ....
D. Kotz. Multiprocessor file system interfaces. In Proc. the Second Intl. Conf. on Paral. and Distr. Info. Sys., pages 194--201. IEEE Computer Society Press, 1993.
....which cannot be satisfied by secondary storage devices and for applications which cannot afford the cost or system complexity of a large number of disk drives. There has been a considerable amount of work in addressing the flow of data to and from secondary storage devices (e.g. magnetic disks) [3, 4, 5, 10, 11, 14, 15, 16, 2]. There has also been a significant amount of work on the management of large scale data in a storage hierarchy involving tertiary storage devices (e.g. tapes devices) 19, 12, 17, 18, 7] Striping has been studied to improve the response time of tertiary storage devices [13, 6] The Department ....
D. Kotz. Multiprocessor file system interfaces. In Proc. the Second Intl. Conf. on Paral. and Distr. Info. Sys., pages 194--201. IEEE Computer Society Press, 1993.
....Kwan and Terstriep [31] show how the data distribution and data representation on the CM2 is a significant hurdle when doing I O over the HIPPI interface, and how it limits the throughput. Data reorganization has been proposed to achieve high bandwidth access to disks in distributed memory systems [5, 29, 38]. For example, in [5] the authors study the problem of implementing high speed file I O in the Intel Touchstone Delta. They observed that in order to achieve good performance it is important to send large blocks of data to the file servers that are part of the system. Although their solution ....
....25 9.3 Data reshuffling We use data reshuffling inside the distributed memory system to create large blocks that can be handled efficiently by the network interface. This optimization is widely applicable and has for example also been used to optimize disk I O on distributed memory systems [5, 29, 38]. Systems that support very efficient communication, e.g. the remote memory reads and writes on the Cray T3D, may not require data reshuffling, but any system with non negligible per packet overhead is likely to benefit from data reshuffling for fine grain data distributions. The iWarp HIPPI ....
D. Kotz. Multiprocessor file system interface. In Proceedings of the Second International Conference on Parallel and Distributed Information Systems, pages 194--201. ACM/IEEE, January 1993.
.... unlikely that the performance of individual disk units will improve significantly in the near future [2] This has led to a growing interest in parallel I O subsystems where a number of low bandwidth disks are organized in a parallel I O architecture to achieve high aggregate bandwidth [1] 3] [6]. The introduction of database technology to data intensive applications such as scientific analysis, marketing survey and study, demographic studies, image analysis, etc. with their storage and bandwidth needs makes it necessary to use parallel I O subsystems. Parallel I O architecture has ....
Kotz D. Multiprocessor file system interfaces. PDIS, pages 194--201, Jan 1993.
.... not need to map a program s logical streams onto a conventional serial file, as is the case, for example, with Intel s Concurrent File System [9] Virtual file pointers also generalize the proposals for multifiles , which are too closely tied to the number of physical processors being employed [3, 5]. Virtual file pointers also maximize asynchrony, which is very important for efficiency on MIMD hardware. Systems that allow i o operations on parallel aggregates, but with a single file pointer [1, 10] force over synchronization of many applications. This work is being supported by National ....
D. Kotz. Multiprocessor file system interfaces. Technical Report 92--179, Dartmouth College, 1992.
....I O optimization techniques such as prefetching [17] caching [22,5] and parallel I O [15,10] However, there are serious obstacles preventing the file systems from becoming a real solution to the high level data management problem. First of all, user interfaces of the file systems are low level [21]. They force the users to express access patterns of their codes using file pointers, byte offsets, etc. which do not directly match the applications data structures, which are large multi dimensional arrays, images and so forth. Second, every file system comes with its own set of I O calls, ....
....Pattern Small Chunk Big Chunk Without Sub Filing Figure 7. Execution times for read operations. 6. Related Work There are many proposed techniques for optimizing I O accesses. These techniques can be divided into three main groups: the parallel file system and run time system optimizations [21,6,9,17,19,14], compiler optimizations [3,18,15] and application analysis and optimization [18,5,24,15] The closest work to ours is the one done by Brown et al. 4] They propose a similar architecture to ours; however, they do not handle the advanced I O optimizations proposed in this paper. They build ....
D. Kotz. Multiprocessor file system interfaces. In Proc. the Second International Conference on Parallel and Distributed Information Systems, pages 194--201. IEEE Computer Society Press, 1993.
....solution to the data Visualization Analysis Simulation Archive Data Decomposition Mesh Adjust Generation Parameters Domain Cycle Figure 1: A typical computational science analysis cycle. management problem. First of all, user interfaces of the file systems are in general low level [21], allowing the users to express access patterns of their applications using only low level structures such as file pointers and byte offsets. Second of all, nearly every file system has its own suite of I O commands, rendering the process of porting a program from one machine to another a very ....
....prestaging, prefetching, and computation, thereby maximizing the I O performance. 6. RELATED WORK Numerous techniques for optimizing I O accesses have been proposed in literature. These techniques can be classified into three categories: the parallel file system and run time system optimizations [21, 7, 9, 18, 20, 15], compiler optimizations [4, 19, 16] and application analysis and optimiza Table 8: Total I O times (in seconds) for volren on 4 processors (Data set size is 64 MB) File No 1 2 3 4 Original 31.18 19.20 61.86 40.22 Optimized 11.90 11.74 20.10 18.38 Table 9: Total I O times (in seconds) for ....
D. Kotz. Multiprocessor file system interfaces. In Proc. the Second International Conference on Parallel and Distributed Information Systems, pages 194--201, 1993.
....code and data to make program s access pattern more local. Another system software based technique is built upon the file systems and run time libraries. Several approaches considered extending the traditional Unix file I O interface for handling the parallel accesses to parallel disk subsystems [18]. The problem with those approaches was that although the Unix like semantics is convenient for the portability of the existent user applications, the file I O interfaces derived from the Unix do not aim to get high I O performance. More recently a number of parallel file systems both from the ....
....parallel I O accesses from within her program, those file systems also support several I O modes which take care of the commonly seen file access patterns [28] in explicitly or implicitly parallelized scientific codes. These patterns can be characterized as regular and non consecutive I O accesses [18], and can be described by strided interfaces and their variants. Recently a number of run time libraries for out of core computations and a few file interfaces have been proposed [31, 30, 8, 29] SIO initiative [29] leaded by Caltech proposed a parallel file system programming interface which is ....
D. Kotz. Multiprocessor file system interfaces. In Proceedings of the Second International Conference on Parallel and Distributed Information Systems, pages 194-201, 1993.
....which cannot be satisfied by secondary storage devices and for applications which cannot afford the cost or system complexity of a large number of disk drives. There has been a considerable amount of work in addressing the flow of data to and from secondary storage devices (e.g. magnetic disks) [1, 2, 3, 4, 5, 6, 7, 8, 9]. There has also been a significant amount of work on the management of large scale data in a storage hierarchy involving tertiary storage devices (e.g. tapes devices) 10, 11, 12, 13, 14] Striping has been studied to improve the response time of tertiary storage devices [15, 16] The ....
D. Kotz. Multiprocessor file system interfaces. In Proc. the Second Intl. Conf. on Paral. and Distr. Info. Sys., pages 194--201. IEEE Computer Society Press, 1993.
....by adopting smart I O optimization techniques such as prefetching [13] caching [5] and parallel I O [11] However, there are serious obstacles preventing the file systems from becoming a real solution to the data management problem. First of all, user interfaces of the file systems are low level [17]. They force the users to express access patterns of their codes using file pointers, byte offsets, etc. which do not directly match the applications data structures, which are large multi dimensional arrays, images and so forth. Second, every file system comes with its own set of I O calls, ....
....I O times are likely to be huge (even when optimized) an overhead in this range is acceptable. 6. Related Work There are many proposed techniques for optimizing I O accesses. These techniques can be divided into three main groups: the parallel file system and run time system optimizations [17, 6, 8, 13, 16, 15, 10], compiler optimizations [3, 14, 11] and application analysis and optimization [14, 5, 18, 11] The closest work to ours is the one done by Brown et al. 4] They propose a similar architecture to ours; however, they do not handle the advanced I O optimizations proposed in this paper. They build ....
D. Kotz. Multiprocessor file system interfaces. In Proc. the Second International Conference on Parallel and Distributed Information Systems, pages 194--201. IEEE Computer Society Press, 1993.
.... and parallelization of input output requests is impossible if they are artificially serialized within the file system (e.g. by file pointers) This is one of the primary reasons that the UNIX file system interface, even with extensions, is generally deemed unsuitable as a parallel file system API [14, 39], although it remains popular for compatibility reasons. For example, a program that can utilize subsections of a file in any order cannot express this flexibility through a file system with a byte stream interface. Relaxing this constraint for applications can dramatically improve performance by ....
Kotz, D. Multiprocessor File System Interfaces. In Proceedings of the Second International Conference on Parallel and Distributed Information Systems (1993), pp. 194--201.
....per second and are expected to need rates of 1 GBps in the near future. By the time teraflop machines with terabyte primary memories become available, the storage requirements may reach petabytes 2 . Scientific applications on uniprocessor computers typically access large files sequentially (Kotz, 1992). 2.1.2 Transaction System Store Typical transaction systems include automatic bank teller and airline reservation systems. The access in these systems usually comprises a large number of concurrent requests operating on small data items (Ousterhout Douglis, 1989) The operations are typically ....
Kotz, D. 1992. Multiprocessor File System Interfaces. Pages 149--150 of: USENIX File System Workshop Proceedings.
....Here we extend this work to include automatic mode detection, design details and tradeoffs, and optimizations. Most proposed parallel file systems for MIMD machines give the programmer one view of a file, namely as a single stream of bytes. Some systems, including Vesta [3] PIOUS [19] and others [12] support multifiles , in which a parallel file is broken into multiple subfiles or segments, typically one per physical processor. The CM 200 supports parallel files, in which each physical processor accesses its own subfile [23] The notion of parallel VP streams is a large scale generalization ....
D. Kotz. Multiprocessor file system interfaces. In Proceedings of the Second International Conference on Parallel and Distributed Information Systems, pages 194--201, 1993.
....the seek operation is primarily mechanical, and is the costliest component of the disk access time. File System Interfaces: File systems on existing multiprocessors are usually based on the conventional UNIX like interface, which provides primitives such as open, close, read, write, seek, etc. [Kotz 1992]. Advantages of such systems include transparency of the underlying parallel disk architecture (and therefore portability) and familiarity of the interface for most programmers. However, the conventional interface does not allow the sophisticated programmer to access the underlying parallelism ....
....support the definition of logical views of the data using mapping functions. The programmer could for example impose a column major view on a matrix which was stored internally in row major order, and the file system would serve up the data in column major order. A multiopen call is proposed by Kotz [1992] which opens the file for a predefined process group. All processes in the group are then provided with a file handle to access the shared file. The Bridge file system also provides a parallel open call of this nature [Dibble et al. 1988] In Vesta, the process of gaining access to a file has been ....
David Kotz. Multiprocessor file system interfaces. Technical Report PCS-TR92179, Dartmouth College, May 1992.
....across many disks, which are accessed in parallel. Most extend a traditional file abstraction (a growable, addressable, linear sequence of bytes) with some parallel file access methods. The most common provide I O modes that specify whether and how parallel processes share a common file pointer [14, 21, 22, 23, 24, 25]. Some systems are based on a memory mapped interface [26, 27] and two provide a way for the user to specify per process logical views of the file [28, 29] Some provide SIMDstyle transfers [30, 31, 25, 18] Finally, in addition to shared file pointers, MPI IO allows applications to describe a ....
....current interface forces the programmer to break down large parallel I O activities into small, non consecutive requests. We believe that a control parallel model should support strided I O requests from the programmer s interface to the compute node, and from the compute node to the I O node [24, 44]. A strided request can effectively increase the request size, which lowers overhead and introduces opportunities for low level optimization [45] Future Work While we believe that low level workload analyses such as we have conducted are an important first step towards developing parallel file ....
David Kotz, "Multiprocessor file system interfaces", in Proceedings of the Second International Conference on Parallel and Distributed Information Systems, 1993, pp. 194--201.
....Katz [21] and Pasquale and Polyzos [24] studied I O intensive Cray applications. Jensen and Reed traced file archive activity on a Cray at NCSA [16] Experimental studies of I O from parallel scientific programs running on multiprocessors have been rather limited. Crockett [7] and Kotz and Ellis [18] described hypothetical characterizations of a parallel scientific file system workload. Cormen and Kotz [6] discussed desirable characteristics of parallel I O algorithms. Reddy et al. 29] studied I O from parallelized sequential applications, but their applications were handpicked and I O was ....
....2.2 Existing Parallel File Systems Existing parallel I O models are often closely tied to the machine architecture as well as to the programming model. Typically jobs can access files in different I O modes , which determine how a file pointer is shared among clients running in individual nodes [7, 4, 18]. The HFS [20] and KSR1 [17] file systems use a memory mapped interface. The nCUBE [9] and Vesta [5] allow more user control over data layout by providing per process logical views of the data. In PIFS [11] the file system controls which processor handles which part of the file to exploit memory ....
D. Kotz. Multiprocessor file system interfaces. In Proceedings of the Second International Conference on Parallel and Distributed Information Systems, pages 194--201, 1993.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC