| R. Mirchandaney, J. Saltz, R. Smith, D. Nicol, and K. Crowley. Principles of runtime support for parallel processors. In Proc. of the 2nd Int. Conf. on Supercomputing, July 1988. |
....by each processor is integrated into the combine phase which is to be executed anyway at the end of a superstep, hence the additional overhead caused by the runtime scheduling of the update messages is marginal. This should be seen in contrast to the classical inspector executor technique [10] that is applied to the runtime parallelization of loops with irregular array accesses in dataparallel programming environments. The inspectorexecutor technique applies two subsequent steps at runtime: first, the inspector (usually a sequential version of the loop restricted to address ....
....cross processor data dependences can be statically determined, point topoint communication with blocking receive is usually sufficient to ensure relative synchronization and data consistency. Where senders and receivers cannot be determined statically, runtime techniques such as inspector executor [10] may be applied. In both cases the single thread of control is an important prerequisite. HPF 2 [12] offers limited task parallelism, which is based on group splitting, but within each group there is still a single thread of control. As NestStep is a more general MIMD language, compiler techniques ....
R. Mirchandaney, J. Saltz, R. M. Smith, D. M. Nicol, and K. Crowley. Principles of run-time support for parallel processors. In Proc. 2nd ACM Int. Conf. Supercomputing, pages 140--152. ACM Press, July 1988.
.... For irregular codes work on run time techniques for automatic parallelization considers array accesses of loop indices or indirect references as functions of the loop index, scalars that are not defined in the loop body, or arrays indexed by just the loop index (e.g.i 2, i i 3, ia i a [38, 83]) The constraint on the array accesses is essential for being able to discover independent loop iterations that can execute concurrently. With our loose concurrency model all the accesses are local, and some of them may indirectly result in remote reads. Therefore, these can be found at run time ....
R. Mirchandaney, J. H. Saltz, R. M. Smith, D. M. Nico, and K. Crowley. Principles of runtime support for parallel processors. In Conference proceedings on International conference on supercomputing, pages 140--152. ACM, November 1988.
....to describe an unstructured mesh, will also be packed into local memory. For example, a structure that holds the vertex numbers for each element, will locally contain global vertex numbers for each locally numbered element. Global to local pointers are consequently required in inspector loops [5] used to renumber the contents of such local pointer arrays into local mesh numbering. But a global to local pointer would require a globally dimensioned array and so a scalable solution is required. With locally renumbered sub domains (core and overlap) on each processor, concurrent execution ....
R. Mirchandaney, J. Saltz, R. Smith, D. Nicol, and K. Crowley. Principles of runtime support for parallel processors. In Proc. Second Int. Conf. on Supercomputing, July 1988.
....added to insure the semantics of the output code. Nevertheless, if we can prove the dividend is positive, we can use the Fortran division. 6 Related work Techniques to generate distributed code from sequential or parallel code using a uniform memory space have been extensively studied since 1988 [22, 70, 89]. Techniques and prototypes have been developed based on Fortran [38, 39, 47, 18, 69, 88, 19, 20] C [8, 63, 6, 60, 7, 61] or others languages [74, 75, 58, 66, 57] The most obvious, most general and safest technique is called run time resolution [22, 70, 74] Each instruction is guarded by a ....
.... been extensively studied since 1988 [22, 70, 89] Techniques and prototypes have been developed based on Fortran [38, 39, 47, 18, 69, 88, 19, 20] C [8, 63, 6, 60, 7, 61] or others languages [74, 75, 58, 66, 57] The most obvious, most general and safest technique is called run time resolution [22, 70, 74]. Each instruction is guarded by a condition which is only true for processors that must execute it. Each memory address is checked before it is referenced to decide whether the address is local and the reference is executed, whether it is remote, and a receive is executed, or whether it is ....
Ravi Mirchandaney, Joel S. Saltz, Roger M. Smith, David M. Nicol, and Kay Crowley. Principles of Runtime Support for Parallel Processors. In ACM International Conference on Supercomputing, pages 140--152, July 1988.
....with the maximum improvements in Figure 18 and the numbers of transformations also match with the numbers in periodic reordering. Our on the fly cost model thus works well in practice with limited information. 6 Related Work The importance of irregular scientific computations is well established [34, 41, 27, 36]. Irregular reductions have been recognized as being particularly vital [21, 31, 38, 44] Researchers have investigated compiler analyses [28, 29] and ways to provide efficient run time and compiler support [6, 39] as well as efficient implementations of parallel irregular reductions [14, 16, ....
R. Mirchandaney, J. Saltz, R. Smith, D. Nicol, and K. Crowley. Principles of runtime support for parallel processors. In Proceedings of the Second International Conference on Supercomputing, St. Malo, France, July 1988.
.... is a contribution to the state of the art of compiling programs in languages like FORTRAN D that permit user defined data decomposition for parallel machines with a memory hierarchy, which is the goal of a number of projects including Parascope, Superb, Id Nouveau, Crystal, and other projects [7, 12, 14, 19, 23, 26, 31, 34, 39]. The emphasis in these projects has been on code generation mechanisms (such as the ownership rule discussed in Section 2) and on recognizing and exploiting special patterns of computation and communication such as reductions. Although it is wellknown that loop restructuring before code ....
R. Mirchandaney, J. Saltz, R. Smith, D. Nicol, and K. Crowley. Principles of runtime support for parallel processors. In Proc. of the 2nd Int. Conf. on Supercomputing, July 1988.
....It necessary because Fortran remainder is not positive for negative numbers. It was added to insure the semantics of the output code. 6 Related work Techniques to generate distributed code from sequential or parallel code using a uniform memory space have been extensively studied since 1988 [9, 28, 40, 31, 15, 22, 16, 6, 25, 2, 21, 27, 32, 23, 5, 19, 4, 8, 11, 39]. The most obvious, most general and safest technique is called run time resolution [9, 28, 31] Each instruction is guarded by a condition that is true for processors that must execute it. Each memory address is checked before it is referenced to decide whether the address is local and the ....
.... to generate distributed code from sequential or parallel code using a uniform memory space have been extensively studied since 1988 [9, 28, 40, 31, 15, 22, 16, 6, 25, 2, 21, 27, 32, 23, 5, 19, 4, 8, 11, 39] The most obvious, most general and safest technique is called run time resolution [9, 28, 31]. Each instruction is guarded by a condition that is true for processors that must execute it. Each memory address is checked before it is referenced to decide whether the address is local and the reference is executed, whether it is remote, and a receive is executed, or whether it is remotely ....
Ravi Mirchandaney, Joel S. Saltz, Roger M. Smith, David M. Nicol, and Kay Crowley. Principles of Runtime Support for Parallel Processors. In International Conference on Supercomputing, pages 140--152, July 1988.
....the maximum improvements in Figure 24 and the numbers of transformations also match with the numbers in periodic reordering. Our on the fly cost model thus works well in practice with limited information. 16 8 Related Work The importance of irregular scientific computations is well established [35, 42, 28, 37]. Irregular reductions have been recognized as being particularly vital [21, 32, 39, 47] Researchers have investigated compiler analyses [29, 30] and ways to provide efficient run time and compiler support [6, 40] as well as efficient implementations of parallel irregular reductions [14, 16, ....
R. Mirchandaney, J. Saltz, R. Smith, D. Nicol, andK. Crowley. Principles of runtime support for parallel processors. In Proceedings of the Second International Conference on Supercomputing, St. Malo, France, July 1988.
....to a statement instantiation 2 in order to post visualize the control flow of a program. ffl Debugging and trace based tools ( 15] might require frequency information and or true ratios of program branches in order to detect program portions which are never executed. Existing profilers ([14, 22, 3, 13, 20]) could not be used for following reasons: 1 In this paper parallel programs are referred to Single Program Multiple Data parallel programs only 2 A statement instantiation corresponds to a single statement execution. 10 2 The Weight Finder An Advanced Profiler for Fortran Programs ffl Most ....
....Computing. Cambridge University Press, 1988. 20] V. Sarkar. Determining Average Program Execution Times. In ACM International Conference on Supercomputing, 1989. 21] V. Sarkar. Partitioning and Scheduling Parallel Programs for Multiprocessor. The MIT Press, Cambridge, Massachusetts, 1989. [22] E. Satterthwaite. Debugging Tools for High Level Languages. Software Practice and Experience, 2) 197 217, 1972. 23] K.Y. Wang. A Performance Prediction Model for Parallel Compilers. Technical Report, Computer Science Dept. Purdue University, November 1990. Technical Report CSD TR1041, ....
[Article contains additional citation context not shown here]
R. Mirchandaney, J.H. Saltz, R.M. Smith, D.M. Nichol, K. Crowley. Principles of Runtime Support for Parallel Processors. In Proceedings of the ACM, 1988.
....accessed are unknown. However, a combination of compiletime analysis and run time processing can be applied to optimize communication. If no loopcarried true dependences are present, inspectors and executors may be created at compile time during code generation to combine messages at run time [33, 39]. The inspector performs the equivalent of message coalescing and aggregation at run time. The executor then utilizes collective communication specialized for irregular computations. Special all to all scatter and gather routines collect all the nonlocal data with a small number of messages. ....
....principles apply, but in a different form, for irregular or adaptive computations on sparse matrices. Compilers such as Arf [48] and Kali [33] support irregular computations by creating inspectors and executors for each loop nest to detect and combine messages for nonlocal accesses at run time [39]. This approach may be viewed as a sophisticated version of message vectorization, coalescing, and aggregation. Finally, two groups have analyzed the scalability of optimizations to exploit pipeline parallelism. Naik explores parallelization and optimization of pipelined computations in the ....
Mirchandaney, R., Saltz, J., Smith, R., Nicol, D., and Crowley, K. Principles of runtime support for parallel processors. In Proceedings of the Second International Conference on Supercomputing (St. Malo, France, July 1988).
.... execution and for block transfers, while keeping data accesses local wherever possible [8] Loop restructuring is followed by a code generation phase that generates parallel code and makes use of block transfers [7] Compiling for distributed memory machines is the goal of a number of projects [2, 3, 14, 12, 6, 5, 11, 4]. Existing work focuses on code generation techniques, not on loop restructuring, although the importance of loop restructuring in generating good code for vector and parallel machines is widely recognized. Recent work on generalized loop transformations can be found in [1, 13, 10] 2 Lambda Loop ....
R. Mirchandaney, J. Saltz, R. Smith, D. Nicol, and K. Crowley. Principles of runtime support for parallel processors. In Proc. of the 2nd Int. Conf. on Supercomputing, July 1988.
....repeats across multiple iterations. This feature can be exploited by providing a run time library which analyzes the structure of the irregular accesses before actual computation in a preprocessing step and generates optimized communication. This is typically referred to as an inspector executor [1] paradigm. Runtime libraries like CHAOS PARTI [2] and PILAR [3] simplify implementation of this preprocessing step and subsequent inter processor communication. We refer to applications containing a mixture of regular and irregular accesses as mixed regular irregular applications. While most ....
R. Mirchandaney, J. Saltz, R. M. Smith, D. M. Nicol and Kay Crowley, Principles of Run-time Support for Parallel Processors, Proceedings of the 1988 ACM International Conference on Supercomputing, pages 140-152, July 1988.
....This paper presents an irregular force decomposition algorithm that maintains good load balance and achieves good performance. The CHAOS library is designed to facilitate parallelization of irregular applications on distributed memory multiprocessor systems. It is a superset of the PARTI library [7]. This library (1) couples partitioners to the application programs, 2) remaps data and partitions work among processors, and (3) optimizes interprocessor communications. This implementation of parallel CHARMM presents an application of CHAOS that can be used to support efficient execution of ....
....a heuristics test that can be used to decide when it is necessary to regenerate the list. 3 CHAOS Runtime Library The CHAOS library is a set of software primitives that are designed to efficiently handle irregular problems on distributed memory systems. It is a superset of the PARTI library [7]. These primitives have been designed to ease the implementation of computational problems on parallel architecture machines by relieving users of low level machine specific issues. The design philosophy has been to leave the original (sequential) source codes essentially unaltered, with the ....
R. Mirchandaney, J. H. Saltz, R. M. Smith, D. M. Nicol, and Kay Crowley. Principles of runtime support for parallel processors. In Proceedings of the 1988.
No context found.
R. Mirchandaney, J. H. Saltz, R. M. Smith, D. M. Nicol and Kay Crowley, `Principles of runtime support for parallel processors', Proceedings of the 1988 ACM International Conference on Supercomputing, July 1988, pp. 140--152.
.... earlier work, we have developed runtime support libraries, analysis techniques, and compiler prototypes that can handle loops where distributed arrays are accessed through a single level of indirection [10, 18, 24] Such loops can be transformed into two constructs an inspector and and executor [21]. During program execution, the inspector examines the data references made by a processor and calculates what off processor data need to be fetched and where these data will be stored once they are received. The executor loop then uses the information from the inspector to implement the actual ....
R. Mirchandaney, J. H. Saltz, R. M. Smith, D. M. Nicol, and Kay Crowley. Principles of runtime support for parallel processors. In Proceedings of the
....This paper presents an irregular force decomposition algorithm that maintains good load balance and achieves good performance. The CHAOS library is designed to facilitate parallelization of irregular applications on distributed memory multiprocessor systems. It is a superset of the PARTI library [7]. This library (1) couples 1 partitioners to the application programs, 2) remaps data and partitions work among processors, and (3) optimizes interprocessor communications. This implementation of parallel CHARMM presents an application of CHAOS that can be used to support efficient execution of ....
....a heuristics test that can be used to decide when it is necessary to regenerate the list. 3 3 CHAOS Runtime Library The CHAOS library is a set of software primitives that are designed to efficiently handle irregular problems on distributed memory systems. It is a superset of the PARTI library [7]. These primitives have been designed to ease the implementation of computational problems on parallel architecture machines by relieving users of low level machine specific issues. The design philosophy has been to leave the original (sequential) source codes essentially unaltered, with the ....
R. Mirchandaney, J. H. Saltz, R. M. Smith, D. M. Nicol, and Kay Crowley. Principles of runtime support for parallel processors. In Proceedings of the 1988 ACM International Conference on Supercomputing, pages 140--152, July 1988.
....depends on the input data, typically because of some indirection in the code. In this case, it is not possible to predict at compile time which data must be prefetched. We treat this lack of information by transforming the original parallel loop into two constructs called inspector and executor [9, 10]. During program execution, the inspector examines the data references made by a processor, and calculates which off processor data need to be fetched and where these data will be stored once they are received. The executor loop then uses the information from the inspector to perform the actual ....
R. Mirchandaney, J. Saltz, R. Smith, D. Nicol, and K. Crowley. Principles of runtime support for parallel processors. In Proceedings of the Second International Conference on Supercomputing, St. Malo, France, July 1988.
.... for dense matrix computations are, meanwhile, well understood and sufficiently solved (e.g. 3, 4, 5] these problems are, for sparse matrix computations, solved, if at all, in a very conservative way (e.g. by run time parallelization techniques such as the inspector executor method [6] or run time analysis of sparsity patterns 4 for load balanced array distribution [7] This is not astonishing because such code looks quite awful to the compiler, consisting of indirect array indexing or pointer dereferencing which makes exact static access analysis impossible. For automatic ....
.... the back end should generate two variants of parallel code for the program sections involved: 1) an optimized parallel sparse matrix algorithm (e.g. a library routine) which is executed speculatively, and (2) a conservative parallelization, maybe using the inspector executor technique [6], or just sequential code, which is executed non speculatively. These two code variants may even be executed concurrently and overlapped with the evaluation of run time tests: If the testing processors find out during execution that the hypothesis allowing speculative execution was wrong, they ....
R. Mirchandaney, J. Saltz, R.M. Smith, D.M. Nicol, and K. Crowley. Principles of run-time support for parallel processors. In Proc. 2nd ACM Int. Conf. on Supercomputing, pages 140--152. ACM Press, July 1988.
.... of automatic parallelization for dense matrix computations are, meanwhile, well understood and sufficiently solved, these problems have been attacked for sparse matrix computations only in a very conservative way, e.g. by run time parallelization techniques such as the inspector executor method [25] or run time analysis of sparsity patterns for load balanced array distribution [36] This is not astonishing because such code looks quite awful to the compiler, consisting of indirect array indexing or pointer dereferencing which makes exact static access analysis impossible. In this paper we ....
....is large. For automatic parallelization, the back end should generate two variants of parallel code for the recognized program fragments: 1) an optimized parallel library routine that is executed speculatively, and (2) a conservative parallelization, maybe using the inspector executor technique [25], or just sequential code, which is executed non speculatively. These two code variants may even be executed concurrently and overlapped with the evaluation of run time tests: If the testing processors find out during execution that the hypothesis allowing speculative execution was wrong, they ....
R. Mirchandaney, J. Saltz, R. Smith, D. Nicol, and K. Crowley. Principles of run-time support for parallel processors. In Proc. 2nd ACM Int. Conf. on Supercomputing, pp. 140--152. ACM Press, July 1988.
No context found.
R. Mirchandaney, J. Saltz, R.M. Smith, D.M. Nicol and K. Crowley. Principles of run-time support for parallel processors. Proc. 1988 ACM International Conference on Supercomputing, pages 140-152, July, 1988.
No context found.
R. Mirchandaney, J. Saltz, R. Smith, D. Nicol, and K. Crowley. Principles of runtime support for parallel processors. In Proc. of the 2nd Int. Conf. on Supercomputing, July 1988.
No context found.
R. Mirchandaney, J. Saltz, R. Smith, D. Nicol, and K. Crowley. Principles of runtime support for parallel processors. In Proc. of the 2nd Int. Conf. on Supercomputing, July 1988.
No context found.
R. Mirchandaney, J. Saltz, R. Smith, D. Nicol, and K. Crowley. Principles of runtime support for parallel processors. In Proceedings of the Second International Conference on Supercomputing, St. Malo, France, July 1988.
No context found.
R. Mirchandaney, J. Saltz, R. Smith, D. Nicol, and K. Crowley. Principles of runtime support for parallel processors. In Proceedings of the Second International Conference on Supercomputing, pages 140--152, St. Malo, France, July 1988. ACM Press.
No context found.
R. Mirchandaney, J. Saltz, R. Smith, D. Nicol, and K. Crowley. Principles of runtime support for parallel processors. In Proceedings of the Second International Conference on Supercomputing, pages 140--152, St. Malo, France, July 1988.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC