| R. Chandra, A. Gupta and J. Hennessy. Data Locality and Load Balancing in COOL. in Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, CA, 1993. |
....such as central ready queues [7] also focus solely on processor load. Others, however, incorporate the benefits of co located data into scheduling decisions. These approaches explore the benefits of not migrating threads whose data is resident in the local cache [6] 8] and or local memory [1][4] Research in data migration focuses on co locating data with its accessing threads either by migrating or replicating pages in memory [5] 9] III. A VECTOR MODEL FOR MIGRATION The conflicting goals of improving locality and distributing resource demands can be characterized as vectors. An ....
R. Chandra, A. Gupta, and J. Hennessy, "Data Locality and Load Balancing in COOL," in Proc. of the 4th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, San Diego, CA, pp. 249--259, May. 1993.
....scheduler is very similar to the schedulers in some of these other systems, though Cilk s algorithm uses randomness and is provably efficient. Many multithreaded programming languages and runtime systems are based on heuristic scheduling techniques. Though systems such as Charm [91] COOL [27, 28], Id [3, 37, 80] Olden [22] and others [29, 31, 38, 44, 54, 55, 63, 88, 98] are based on sound heuristics that seem to perform well in practice and generally have wider applicability than Cilk, none are able to provide any sort of performance guarantee or accurate machine independent performance ....
Rohit Chandra, Anoop Gupta, and John L. Hennessy. Data locality and load balancing in COOL. In Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 249--259, San Diego, California, May 1993.
.... access and synchronization into a single mechanism is not unlike that provided by monitors, a linguistic mechanism suggested by Hoare [28] and Brinch Hansen [8] or other linguistic mechanisms that integrate synchronization and data access (e.g. mutexes in Argus [53] mutex operations in COOL [15], etc. 34 CRL Internals This chapter describes the general structure of the prototype CRL implementation used in this thesis. Platform specific implementation details are discussed in Chapter 5. The prototype CRL implementation supports single threaded applications in which a single user ....
.... and unmapping regions, the programming interface CRL presents to the end user is similar to that provided by Shared Regions [68] the same basic notion of synchronized access ( operations ) to regions ( objects ) also exists in other programming systems for hardware based DSM systems (e.g. COOL [15]) The Shared Regions work arrived at this interface from a different set of constraints, however: their goal was to provide software coherence mechanisms on machines that support noncache coherent shared memory in hardware. CRL could be provided on such systems using similar implementation ....
Rohit Chandra, Anoop Gupta, and John L. Hennessy. Data Locality and Load Balancing in COOL. In Proceedings of the Fourth Symposium on Principles and Practices of Parallel Programming, pages 249--259, May 1993.
....loop that has been parallelized, and in which this loop is repetitively executed. This is a common shared memory program structure in practice. Integrating both affinity and load balancing considerations in scheduling has also been considered elsewhere in the context of object oriented systems [3], and in the operating system rather than at the user level [17] A context of unbalanced workloads is considered, in which the parallel loop being executed is severely unbalanced with respect to the amounts of computation represented by each iteration, a situation in which simple forms of ....
R. Chandra, A. Gupta, J. L. Hennessy, "Data Locality and Load Balancing in COOL", Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems, April 1991, pp. 249-259.
.... takes significantly more time to perform the stencil computation in the application than does the serial version [Rinard 1994a] On DASH, these two factors make the Jade version perform significantly worse than a version of the same computation written in the explicitly parallel language COOL [Chandra et al. 1993]. The Jade version achieves a maximum speedup of 7.57 on 28 processors, while the COOL version achieves a maximum speedup of 21.06 on 28 processors. 7. RELATED WORK Researchers have developed an enormous number of parallel programming languages [America 1987; Bal et al. 1992; Burns 1988; ....
....a maximum speedup of 7.57 on 28 processors, while the COOL version achieves a maximum speedup of 21.06 on 28 processors. 7. RELATED WORK Researchers have developed an enormous number of parallel programming languages [America 1987; Bal et al. 1992; Burns 1988; Carriero and Gelernter 1989; Chandra et al. 1993; Foster and Taylor 1990; Gregory 1987; Hoare 1985; INMOS Limited 1984; Krishnamurthy et al. 1992; Reppy 1992; Yonezawa et al. 1986] There is a fundamental difference between Jade and almost all of these languages: Jade is a declarative language used to provide information about how a serial ....
[Article contains additional citation context not shown here]
Chandra, R., Gupta, A., and Hennessy, J. 1993. Data locality and load balancing in COOL. In Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, New York.
....is that the parts of the program commute. The goal is to help the implementation exploit concurrency while ensuring that the parallelization does not change the semantics of the program. Many concurrent object oriented languages support the notion of mutually exclusive operations on objects [Chandra et al. 1993; Yonezawa et al. 1986] Although the concept of commuting operations is never explicitly identified, the expectation is that all mutually exclusive operations that may attempt to concurrently access the same object commute. Unlike implementations of Jade and concurrent object oriented ....
CHANDRA, R., GUPTA, A., AND HENNESSY, J. 1993. Data locality and load balancing in COOL. In Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM Press, San Diego, CA.
....1. INTRODUCTION Multithreading is a key structuring technique for modern software. Programmers use multiple threads of control for many reasons: to build responsive servers that communicate with multiple parallel clients [15] to exploit the parallelism in shared memory multiprocessors [5], to produce sophisticated user interfaces [16] and to enable a variety of other program structuring approaches [11] Research in program analysis has traditionally focused on sequential programs [14] extensions for multithreaded programs have usually assumed a block structured, parbegin ....
R. Chandra, A. Gupta, and J. Hennessy. Data locality and load balancing in COOL. In Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, CA, May 1993.
..... 38 A Considerations to badness measures for partitions 41 3 Chapter 0 Introduction Aim and ambition of workpackage 2. 1 is to identify, discuss and evaluate those characteristics of workloads [19, 7] which support the exploitation of affinity scheduling in a distributed system [37, 38, 20, 10, 39]. Our primary interest lies in transaction based database manipulating loads (DB DC loads) which are regarded as typical for a commercial application of distributed systems. Identification and discussion of workload characteristics are completed in the first project year with the results ....
R. Chandra, A. Gupta, and J. L. Hennessy. Data locality and load balancing in cool. In PPOPP-93: principles and practice of parallel programming: proceedings of the fourth ACM SIGPLAN Symposium. San Diego, CA. May 19-22, pages 249--259, 1993.
....code has taken higher relevance since multiprocessors have been introduced in the industrial market. Applications can be rewritten in order to make them parallel or can be automatically restructured by a parallelizing compiler. Extensive work is now being developed in this area. Engl93] Blum95][Chan92] have proposed new languages or extensions (mainly based on C and C ) to help in the parallelization of applications by hand. They are based on asynchronous parallel functions that let the caller thread to continue while the function is executed by another thread of control. Programmers have to ....
Rohit Chandra, Anoop Gupta and John L. Hennessy, "Data Locality and Load Balancing in COOL", Fourth ACM SIGPLAN Symposium on the Principles and Practice of Parallel Programming (PPoPP), pp. 249-259, May 1993.
....is being followed by the evolution of the parallel execution environments to take advantage of such new architectural characteristics. Both the user and kernel execution levels are the subject of current research works, trying to achieve a good cooperation between them and with the hardware [2][3][5] 7] 4] 16] These are the trends in which we have been interested during the development of the NANOS LTR ESPRIT Project (E 21907) 11] In the Nanos Project, we have focussed in the development of a complete parallel environment to support the execution of multiprogrammed workloads. The goal ....
R. Chandra, A. Gupta and J. L. Hennessy, "Data Locality and Load Balancing in COOL", Fourth ACM SIGPLAN Symposium on the Principles and Practice of Parallel Programming (PPoPP), pp. 249-259, May 1993.
....lightweight threads packages written for shared memory machines. In particular, we are interested in implementing a scheduler that efficiently supports dynamic and irregular parallelism. 2. 1 Scheduling lightweight threads A variety of lightweight, user level threads systems have been developed [6, 11, 14, 15, 25, 29, 33, 37, 40, 45, 53], including mechanisms to provide coordination between the kernel and the user level threads library [2, 49, 31] Although the main goal of the threads schedulers in previous systems has been to achieve good load balancing and or locality, a large body of work has also focused on developing ....
Rohit Chandra, Anoop Gupta, and John L. Hennessy. Data locality and load balancing in COOL. In Proceedings of the Fourth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming, pages 239--259, May 1993.
.... This research has been supported by the Ministry of Education of Spain under contracts TIC 429 95 and TIC 439 94, ESPRIT project NANOS (21907) the CIRIT under program ACI and by the CEPBA (European Center for Parallelism of Barcelona) package (Filaments [EAD93] Cilk [BJK 95] COOL [CGH93] Concert [KC93] Programming models in the first group offer a loosely synchronous programming model in which parallel jobs can be executed fully in parallel and synchronize at global points (by means of barriers or critical sections) Services included in most user level thread packages allow ....
R. Chandra, A. Gupta, and J.L. Hennessy. Data locality and load balancing in cool. In Fourth ACM SIGPLAN Symposium on the Principles and Practice of Parallel Programming (PPoPP'93), 1993.
....None. Availability: COOL is available for the following architectures: Stanford Dash, silicon Graphics 4D 380, and Encore Multimax. Sources and Documentation can be found on anonymous ftp: cool.stanford.edu Email address: Rohit Chandra Gamma rohit cool.stanford.edu References: 55] [56] [57] 2.34 Coral Developer: IBM Palo Alto Scientific Center Description: oo. Multiple inheritance. memory model. parallelism. Asynchronous message passing. scheduling. mapping. Nothing is published about object thread placement, alignment etc. synchronization. The author states the ....
Rohit Chandra, Anoop Gupta, and John L. Hennessy. Data locality and load balancing in COOL. In ACM Sigplan Symp. on Principles and Practice of Parallel Programming, pages 249--259. ACM Press, New York, September 7--8, 1993.
....high performance from modern parallel systems. Several researches have studied techniques to improve the data locality of multithreaded programs. One class of such techniques is based on software controlled distribution of data among the local memories of a distributed shared memory system [15, 22, 26]. Another class of techniques is based on hints supplied by the programmer so that similar tasks might be executed on the same processor [15, 31, 34] Both these classes of techniques rely on the programmer or compiler to determine the data access patterns in the program, which may be very ....
.... One class of such techniques is based on software controlled distribution of data among the local memories of a distributed shared memory system [15, 22, 26] Another class of techniques is based on hints supplied by the programmer so that similar tasks might be executed on the same processor [15, 31, 34]. Both these classes of techniques rely on the programmer or compiler to determine the data access patterns in the program, which may be very difficult when the program has complicated data access patterns. Perhaps the earliest class of techniques was to attempt to execute threads that are close ....
[Article contains additional citation context not shown here]
Rohit Chandra, Anoop Gupta, and John L. Hennessy. Data locality and load balancing in COOL. In Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 249--259, San Diego, California, May 1993.
.... group Gamma info chorus.com COOL (NTT) ACOOL [167] Phi Phi Delta Delta H H A A Phi Phi Delta Delta H H A A hijklm y t i v i t c a y r a d n u o b ftp: ftp.ntt.jp pub lang Katsumi Maruyama Gamma maruyama nttmfs.ntt.jp COOL (Stanford) [64, 65, 66] Phi Phi Delta Delta H H A A Phi Phi Delta Delta H H A A hijklm y t i v i t c a y r a d n u o b ftp: cool.stanford.edu Rohit Chandra Gamma rohit cool.stanford.edu Coral [69] Phi Phi Delta Delta H H A A Phi Phi Delta Delta H H A A y t i ....
Rohit Chandra, Anoop Gupta, and John L. Hennessy. Data locality and load balancing in COOL. In ACM Sigplan Symp. on Principles and Practice of Parallel Programming, pages 249--259. ACM Press, New York, September 7-- 8, 1993.
....Table 8: Scheduling statistics in CA programs. 5 Related work Our work is related to many efforts focusing on runtime support for efficiently executing irregular computations on stock hardware [19, 20] It differs from runtime systems for coarse grained object oriented languages such as COOL [21] and Mentat [22] by focusing on fine grained object level concurrency. The ABCL onAP1000implementation [18] is most similar with our work but adopts a traditional design, emphasizing techniques for reducing access latency and objectcreation costs. The general versions in Concert are as efficient ....
R. Chandra, A. Gupta, and J. L. Hennessy, "Data locality and load balancing in COOL," in Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 1993.
....involve large costs to change object properties. Consequently, they appear to be worthwhile only when many objects on a particular node exhibit affinity for another object. The effectiveness of hints is currently unknown, but some related studies show that locality hints can improve performance [6]. While speculative optimizations can dramatically increase the opportunities for optimization, they are limited to cases where the possibilities can be narrowed and specialized code generated for each. When program structure depends strongly on input data or the evolution of computation, static ....
Rohit Chandra, Anoop Gupta, and John L. Hennessy. Data locality and load balancing in COOL. In Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 1993.
....by means of annotations. Several suggestions have been made; only the most influential of which are mentioned. In the Emerald system [9] objects can fixed to other objects which advises the system to store them together; un fixed objects can be migrated by the system. Similarly, the COOL [5] programmer can provide locality hints by expressing affinity between objects and threads. In addition there are hierarchical schemes, e.g. zones [23] that avoid pointto point relationships. In Jade [21] the programmer declaratively provides information about how the program access data ....
Rohit Chandra, Anoop Gupta, and John L. Hennessy. Data locality and load balancing in COOL. In ACM Sigplan Symp. on Principles and Practice of Parallel Programming, pages 249--259. ACM Press, New York, September 7--8, 1993.
....The proposed technique in this paper not only takes into consideration the affinity of parallel tasks to processors, it also uses information on the underlying cache architecture and memory reference patterns of tasks to minimize cache misses and false sharing. In the design of the COOL language [5], the locality exploitation issue is addressed by using language mechanisms and a run time system. Both task affinity and data affinity are specified by users and then are implemented by the run time system. A major limit with this approach is that the quality of locality optimizations totally ....
R. Chandra, A. Gupta, and J. L. Hennessy. Data locality and load balancing in cool. Proceedings of PPOPP'93, pages 249--259, May 1993.
....the load. This technique effectively increases scheduling granularity, and therefore provides good locality [7] and low scheduling contention. Another approach for obtaining good locality is to allow the user to supply hints to the scheduler regarding the data access patterns of the threads [12, 28, 37, 45]. However, such hints can be cumbersome for the user to provide in complex programs, and are often specific to a certain language or library interface. Therefore, our DFDeques algorithm instead uses the heuristic of scheduling threads close in the dag on the same processor to obtain good ....
R. Chandra, A. Gupta, and J. L. Hennessy. Data locality and load balancing in COOL. In Proc. ACM symp. Principles & Practice of Parallel Programming, pages 239--259, 1993.
....problem is the communication and computation overhead associated with managing tasks, not communication overhead caused by an inappropriate data decomposition. It is possible for a programmer using an explicitly parallel language to get much better performance for Ocean running on DASH (Chandra et al. 1993). The programmer can develop an application specific synchronization algorithm that has less overhead than the algorithm embedded inside the Jade implementation. It would be possible, however, to develop a Jade implementation that used static analysis to generate optimized parallel code that ....
Chandra, R., Gupta, A. and Hennessy, J. (1993) Data locality and load balancing in COOL. In Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming,San Diego, CA, May.
....the load. This technique effectively increases scheduling granularity, and therefore provides good locality [7] and low scheduling contention. Another approach for obtaining good locality is to allow the user to supply hints to the scheduler regarding the data access patterns of the threads [12, 28, 37, 45]. However, such hints can be cumbersome for the user to provide in complex programs, and are often specific to a certain language or library interface. Therefore, our DFDeques algorithm instead uses the heuristic of scheduling threads close in the dag on the same processor to obtain good locality. ....
R. Chandra, A. Gupta, and J. L. Hennessy. Data locality and load balancing in COOL. In Proc. ACM symp. Principles & Practice of Parallel Programming, pages 239--259, 1993.
.... Work Except for the notion of mapping and unmapping regions, the programming interface CRL presents to the end user is similar to that provided by Shared Regions [35] the same basic notion of synchronized access ( operations ) to regions ( objects ) also exists in other DSM programming systems [3, 11]. The Shared Regions work arrived at this interface from a different set of constraints; their goal was to provide software coherence mechanisms on machines that support non cache coherent shared memory in hardware. CRL could be provided on such systems using the same implementation techniques and ....
Rohit Chandra, Anoop Gupta, and John L. Hennessy. Data Locality and Load Balancing in COOL. In Proceedings of the Fourth Symposium on Principles and Practices of Parallel Programming, pages 249-- 259, May 1993.
....in the Jade implementation always agree on the target processor. These results suggest that it should be possible to improve the Jade scheduler by making it less eager to move tasks off their target processors in an attempt to improve the load balance. 6. Related Work. Chandra, Gupta and Hennessy [1] have designed, implemented and measured a scheduling algorithm for the parallel language COOL running on DASH. The goal of the scheduling algorithm is to enhance the locality of the computation while balancing the load. COOL provides an affinity construct that programmers use to provide hints ....
R. Chandra, A. Gupta, and J. Hennessy. Data locality and load balancing in COOL. In Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, CA, May 1993.
....of sequential code has taken higher relevance since multiprocessors have been introduced in the industrial market. Applications can be rewritten in order to make them parallel or can be automatically restructured by a parallelizing compiler. Extensive work is now being developed in this area. 3][4][5] have proposed new languages or extensions (mainly based on C and C ) to help in the parallelization of applications by hand. They are based on asynchronous parallel functions that let the caller thread to continue while the function is executed by another thread of control. Programmers have ....
R. Chandra, A. Gupta and J. L. Hennessy, "Data Locality and Load Balancing in COOL", Fourth ACM SIGPLAN Symp. on the Principles and Practice of Parallel Programming (PPoPP), pp. 249-259, May 1993.
....and highlyexpressive object oriented primitives. Another concern about the interface design is that all works we know about assume some message based programming model. Apparently, there is no research on how to express parallel I O request on thread based models, such as Cilk [Blumofe 95] COOL [Chandra 93] or OpenMP [OpenMP 97] Since some parallel applications are much easier to program by creating threads on the fly, how such applications should express parallel I O operation is an important (and untouched) question. Yet other key and yet unanswered question is how to combine performance and ....
R. Chandra, A. Gupta, and J. L. Hennessy. Data Locality and Load Balancing in COOL. Fourth ACM SIGPLAN PPoPP, pp. 249-259, San Diego, CA, May 1993. http://www-flash.stanford.edu/cool/ppopp.ps
....Code Generator Back End gnu C executable code for 88k and PPC Figure 1: The structure of the Implicitly Parallel C compiler. There are also many C like parallel languages in existence, such as C (and Parallel C) 16] C [18] C with futures [21] CC [6] Split C, Mentat [8] COOL [5], Charm , Enterprise, ESP, Hypertool, Jade, MeldC, PC, pC , C , X3H5, and Cid. Our work differs from all of these in that we have the only compiler that extracts parallelism automatically from C source without any (required) language extensions or explicit parallel constructs or libraries. 4 ....
Rohit Chandra, Anoop Gupta, and John L. Hennessy. Data locality and load balancing in COOL. In Symposium on Parallel Algorithms and Architectures. Association for Computing Machinery, 1993.
....6.5.3) 2 The access pattern of this algorithm follows a producer consumer relationship, where one processor is responsible for creating new values of a data object that is consumed by another. In the example, there 1 Variations on this philosophy have been expressed by other researchers [16, 20]. 2 The example is derived from a description in Falsafi et al. 34] CHAPTER 1. INTRODUCTION 9 are two globally shared integers, a and b. Two processors, x and y, are randomly chosen from the set of processors, and x and y are assigned the duty of updating the locations a and b, ....
R. Chandra, A. Gupta, and J. Hennessy. Data locality and load balancing in COOL. In Principles and Practice of parallel programming, May 1993.
.... directives and language constructs (the effort done by the Parallel Computing Forum PCF, or High Performance Fortran Forum HPF, or the directives provided by most hardware vendors) or by means of a set of services offered by a user level thread package (Filaments [EAD93] Cilk [BJK 95] COOL [CGH93] Concert [KC93] This research has been supported by the Ministry of Education of Spain under contracts TIC 429 95 and TIC 439 94, ESPRIT project NANOS (21907) and by the CEPBA (European Center for Parallelism of Barcelona) Programming models in the first group offer a loosely ....
R. Chandra, A. Gupta, and J.L. Hennessy. Data locality and load balancing in cool. In Fourth ACM SIGPLAN Symposium on the Principles and Practice of Parallel Programming (PPoPP'93), 1993.
.... communication constructs distributed throughout the source code [15] Many parallel languages, on the other hand, fail to give the implementation the data usage information that it needs to automatically apply sophisticated communication optimizations on a variety of computational platforms [10, 2, 6]. This paper describes our experience with communication optimizations in the context of Jade [19, 20] a portable, implicitly parallel programming language designed for exploiting task level concurrency. Jade programmers start with a program written in a standard serial, imperative language, then ....
....set, we believe it is appropriate to include the concurrent fetch optimization in future implementations. It fits naturally into the execution model, should in practice never impair the performance, and may prove useful for some other applications. 6 Related Work Chandra, Gupta and Hennessy [6] have designed, implemented and measured a scheduling algorithm for the parallel language COOL running on DASH. The goal of the scheduling algorithm is to enhance the locality of the computation while balancing the load. COOL provides an affinity construct that programmers use to provide hints ....
[Article contains additional citation context not shown here]
R. Chandra, A. Gupta, and J. Hennessy. Data locality and load balancing in COOL. In Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, CA, May 1993.
....the load. This technique effectively increases scheduling granularity, and therefore provides good locality [7] and low scheduling contention. Another approach for obtaining good locality is to allow the user to supply hints to the scheduler regarding the data access patterns of the threads [13, 29, 39, 46]. However, such hints can be cumbersome for the user to provide in complex programs, and are often specific to a certain language or library interface. Therefore, our DFDeques algorithm instead uses the heuristic of scheduling threads close in the dag on the same processor to obtain good locality. ....
R. Chandra, A. Gupta, and J. L. Hennessy. Data locality and load balancing in COOL. In Proc. ACM symp. Principles & Practice of Parallel Programming, pages 239--259, 1993.
....example, the Gaussian elimination algorithm derives no advantage from co locating neighboring rows. Thus, the fine grain program is able to approach closely the performance of the hand tuned program. Fine and coarse grain are relative terms. The fine grain programs illus 8 The COOL [9,8] programming language is based on similar, independently developed ideas. 10 15 20 25 30 35 40 45 50 55 2 3 4 5 6 7 8 S e c o n d s Number of processors Coarse Grain Templates with OAS Medium Grain with OAS Medium Grain Fig. 3. Completion time for Gaussian elimination on a 640 by 640 dense ....
R. Chandra, A. Gupta, and J. Hennessy. Data locality and load balancing in COOL. In Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 249--259, April 1991.
....data layout. Such a strategy works well for applications with a predictable data structure layout and predictable access patterns. However, a static decision between computation and data migration does not work for applications with unpredictable or dynamically changing data access patterns. COOL [4] is a parallel language running on DASH a large scale shared memory multiprocessor machine that supports multi level memory hierarchy. The impact of data reference locality in such systems is similar to that of dsm systems. A scheduling algorithm for COOL strives to enhance the locality of the ....
R. Chandra, A. Gupta, and J.L. Hennessy. Data Locality and Load Balancing in COOL. In Proc. of the Fourth ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPOPP), pages 249--259, May 1993.
....and that the kernel may not be the appropriate place to implement affinity scheduling. They were also the first to note that the short lifespan of lightweight threads does not allow for cache reuse. Developed independently at the same time as the initial implementation of Mercury, the COOL [6] programming language, like PRESTO and Mercury, extends C . In COOL, calling a function prefixed with the keyword parallel forks a thread to execute the function asynchronously. Join synchronization is done explicitly using condition variables. Intended for use on distributed memory machines ....
R. Chandra, A. Gupta, and J. L. Henessy. "Data Locality and Load Balancing in COOL". In Proceedings of the Fourth ACM Symposium on Principles and Practice of Parallel Programming, pages 249--259, May 1993.
....and systems for performance portability. Libraries and software systems. Recent research has yielded a variety of runtime support like the Concert system [KC93] the Chare kernel [SK91, KK93] and the compiler controlled threaded abstract machine [CSS 91] and programming languages like COOL [CGH] Concurrent Aggregates [And93] pSather [FLR92] and pC [BBG 93] As language based approaches, they encapsulate default policies for data and task assignment. When tasks involve dynamic data structures, execution time and data access patterns cannot be predicted by the compiler. Since no ....
Rohit Chandra, Anoop Gupta, and John L. Hennesy. Data locality and load balancing in COOL. submitted for publication.
.... access and synchronization into a single mechanism is not unlike that provided by monitors, a linguistic mechanism suggested by Hoare [28] and Brinch Hansen [8] or other linguistic mechanisms that integrate synchronization and data access (e.g. mutexes in Argus [53] mutex operations in COOL [15], etc. Chapter 4 CRL Internals This chapter describes the general structure of the prototype CRL implementation used in this thesis. Platform specific implementation details are discussed in Chapter 5. 4.1 Overview The prototype CRL implementation supports single threaded applications in ....
.... and unmapping regions, the programming interface CRL presents to the end user is similar to that provided by Shared Regions [68] the same basic notion of synchronized access ( operations ) to regions ( objects ) also exists in other programming systems for hardware based DSM systems (e.g. COOL [15]) The Shared Regions work arrived at this interface from a different set of constraints, however: their goal was to provide software coherence mechanisms on machines that support noncache coherent shared memory in hardware. CRL could be provided on such systems using similar implementation ....
Rohit Chandra, Anoop Gupta, and John L. Hennessy. Data Locality and Load Balancing in COOL. In Proceedings of the Fourth Symposium on Principles and Practices of Parallel Programming, pages 249--259, May 1993.
....times for Panel Cholesky on the iPSC 860, Table 12 presents the running times on DASH. The ANL version is an explicitly parallel version of the computation written in the ANL macro package; the COOL version is an explicitly parallel version written in the concurrent object oriented language COOL[8]. Figures 23 and 24 present the corresponding speedup curves. 1 2 4 8 16 24 32 Jade 53.62 34.11 33.91 35.54 43.84 47.77 49.73 Table 11: Execution Times for Panel Cholesky on the iPSC 860 (seconds) 1 2 4 8 16 24 32 Jade 33.84 17.47 11.03 7.21 6.93 7.29 7.36 COOL 34.97 18.63 10.36 6.82 4.10 3.33 ....
....that is not set, it creates the set of tasks for the next iteration of the solve. Table 13 presents the running times for Ocean on the iPSC 860, Table 14 presents the running times on DASH. The COOL version is an explicitly parallel version written in the concurrent object oriented language COOL[8]. Figures 26 and 27 present the corresponding speedup curves. 1 2 4 8 16 24 32 Jade 77.75 92.49 96.38 60.35 39.15 45.21 56.95 Table 13: Execution Times for Ocean on the iPSC 860 (seconds) 1 2 4 8 16 24 32 Jade 104.13 99.58 37.24 25.00 17.69 14.13 13.21 COOL 104.99 53.56 28.36 14.57 7.54 5.40 4.75 ....
R. Chandra, A. Gupta, and J. Hennessy. Data locality and load balancing in COOL. In Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, CA, May 1993.
....object consistency model and object concurrency guarantees. The consistency model preserves the notion of logically atomic operations by ensuring that method calls do not disrupt each other. Since this model is defined by the language, it does not depend on usage conventions for correctness as in [8, 24, 9, 40, 6]. Concurrency guarantees define which member calls will run concurrently, allowing programmers to reason about progress and deadlock. Object Consistency In ICC , concurrent method invocations on an object are constrained such that intermediate object states created within a member 4 This ....
....can cause major program structure disruptions. The diversity of task parallel extensions of C is much greater and can be loosely categorized based on their treatment of objects and concurrency. First, there are languages (or libraries) that introduce concurrency without changing the object model [6, 8, 40, 24]. These systems require the programmer to build concurrency control by convention, providing no language support for object consistency or for building abstractions from larger collections of objects. Second, many languages (or libraries) use objects to encapsulate concurrency, exploiting objects ....
Rohit Chandra, Anoop Gupta, and John L. Hennessy. Data locality and load balancing in COOL. In Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 1993.
....object consistency model and object concurrency guarantees. The consistency model preserves the notion of logically atomic operations by ensuring that method calls do not disrupt each other. Since this model is defined by the language, it does not depend on usage conventions for correctness as in [10, 28, 11, 44, 8]. Concurrency guarantees define what member calls will run concurrently allowing programmers to reason about progress and deadlock. 3.1.1 Object Consistency In ICC , concurrent method invocations on an object are constrained such that intermediate object states created within a member function ....
....can cause major program structure disruptions. The diversity of task parallel extensions of C is much greater and can be loosely categorized based on their treatment of objects and concurrency. First, there are languages (or libraries) that introduce concurrency without changing the object model [7, 10, 44, 28]. These languages require the programmer to build concurrency control by convention, providing no language support for object consistency or building abstractions from larger collections of objects. Second, many languages (or libraries) use objects to encapsulate concurrency, exploiting objects to ....
Rohit Chandra, Anoop Gupta, and John L. Hennessy. Data locality and load balancing in COOL. In Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 1993.
....taken several times, correctly adapting the number of threads created to the resources currently allocated by the operating system. This is the main difference between our nano threads implementation and other user level threads implementations, such as Filaments [Engl93] Cilk [Blum95] and COOL [Chan92], in which the decision of spawning parallelism is taken once and irremediably for an entire section of parallel code. The main goal of our limited creation decision is to maintain the ready queue with enough work to be performed in the future, avoiding whenever possible that it becomes empty, ....
Rohit Chandra, Anoop Gupta and John L. Hennessy, "Data Locality and Load Balancing in COOL", Fourth ACM SIGPLAN Symposium on the Principles and Practice of Parallel Programming (PPoPP), pp. 249-259, May 1993.
....computation structure often does not fit into data parallel models, and message passing requires the programmer to deal explicitly with the complexities of data placement, addressability, and concurrency control. Consequently, programming models based on dynamic thread creation and multithreading [21, 3, 12, 41, 8, 17, 28, 16] are increasingly popular for expressing such problems. Such models involve user defined computation units (hereafter referred to as logical threads) which are dynamically created to reflect the natural concurrency structure of the program; multithreading maps these onto the physical machine ....
....[21, 16, 40] build these mechanisms on top of vendor supported, standardized lightweight thread management [27] and communication [15, 44] interfaces; however, these incur relatively large overheads, requiring coarse granularity threads for efficiency. While systems with specialized runtimes [3, 8, 28] can provide efficient primitives supporting finer grained threads, they still incur large thread management and communication overheads for irregular, dynamic computations: ffl Such computations exhibit wide variations in thread granularity and are typically not amenable to compile time analyses ....
[Article contains additional citation context not shown here]
Chandra, R., Gupta, A., and Hennessy, J. L. Data locality and load balancing in COOL. In Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (1993).
....that exploit hardware support for broadcast. We also expect future parallel languages to become increasingly based on a data oriented paradigm in which programmers participate in the parallelization process by providing information about how parts of the program access data. Locality hints in COOL [3] and shared regions [14] are two examples of this trend. The implementations of these languages will be able to exploit their knowledge of how the program accesses data to employ efficient integrated synchronization and consistency protocols similar to the one presented in this paper. ....
R. Chandra, A. Gupta, and J. Hennessy. Data locality and load balancing in COOL. In Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, CA, May 1993.
....threads packages written for shared memory machines. In particular, we are interested in implementing a scheduler that efficiently supports dynamic and irregular parallelism. 2. 1 Scheduling lightweight threads A variety of systems have been developed to schedule lightweight, dynamic threads [5, 10, 27, 40, 23, 29, 33, 46, 14, 13, 35]. Although the main goal has been to achieve good load balancing and or locality, a large body of work has also focused on developing scheduling techniques to conserve memory requirements. Since the programming model allows the expression of a large number of lightweight threads, the scheduler ....
Rohit Chandra, Anoop Gupta, and John L. Hennessy. Data locality and load balancing in COOL. In Proceedings of the Fourth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming, pages 239--259, May 1993.
....access control. Programmers must either used the fixed protocol provided by a system or explicitly emulate access control. In SplitC [10] the fixed protocol does not support caching of data, fetches and stores to shared data always entail communication. Cid[17] Compositional C [8] and COOL[6], however, do provide an automatic replication mechanism, but programmers are restricted to the protocol provided by the system. Other languages, such as Jade[19] Olden[5] and CHAOS[13] provide the means of controlling data distribution, but do not allow programmers any control over repli ....
R. Chandra, A. Gupta, and J. Hennessy. Data locality and load balancing in COOL. In Principles and Practice of parallel programming, May 1993.
No context found.
R. Chandra, A. Gupta and J. Hennessy. Data Locality and Load Balancing in COOL. in Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, CA, 1993.
No context found.
R. Chandra, A. Gupta, and J. Hennessy. Data locality and load balancing in COOL. In Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, CA, May 1993. ACM, New York.
No context found.
R. Chandra, A. Gupta, and J. Hennessy. Data locality and load balancing in COOL. In Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, CA, May 1993.
No context found.
R. Chandra, A. Gupta, and J. Hennessy. Data locality and load balancing in COOL. In Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, CA, May 1993.
No context found.
R. Chandra, A. Gupta and J. Hennessy. Data Locality and Load Balancing in COOL. in Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, CA, 1993.
No context found.
Rohit Chandra, Anoop Gupta, and John L. Hennessy. Data locality and load balancing in COOL. In ACM Sigplan Symp. on Principles and Practice of Parallel Programming, pages 249#259. ACM Press, New York, September 7#8, 1993.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC