| Andrew A. Chien, Vijay Karamcheti, and John Plevyak. The Concert system -- compiler and runtime support for efficient, fine-grained concurrent object-oriented programs. Technical Report UIUCDCS-R-93-1815, University of Illinois at Urbana-Champaign, Urbana, IL, June 1993. |
....the language report, and the current release of the Concert software can be found at: http: www csag.cs.uiuc.edu ftp: cs.uiuc.edu pub csag Email addresses: group Gamma concert red herring cs.uiuc.edu Andrew A. Chien Gamma achien cs.uiuc.edu References: 65] 66] 67] 68] 69] [70] [71] 129] 177] 2.29 ConcurrentSmalltalk Developer: Description: oo. memory model. parallelism. Asynchronous method call plus futures (CBox) Post processing. Synchronous messages are also available. The caller decides which mode to use. scheduling. mapping. synchronization. There are ....
Andrew A. Chien, Vijay Karamcheti, and John Plevyak. The Concert system -- compiler and runtime support for efficient, fine-grained concurrent object-oriented programs. Technical Report UIUCDCS-R-93-1815, University of Illinois at Urbana-Champaign, Urbana, IL, June 1993.
....as concurrent garbage collection. Garbage collection is run whenever certain storage management conditions are detected by the runtime system. the compiler has no control to initiate or terminate garbage collection. Interested readers can get a full description of the Concert runtime design from [8]. 4.2 Runtime Support for SAMR Application We now proceed to detail several runtime services that we anticipate will be key to our application in particular. These are: ffl Parallel Invokes Since our codes parallelism takes the form of parallel independent method calls. These must be ....
Andrew Chien, Vijay Karamcheti, and John Plevyak. The Concert system---compiler and runtime support for efficient fine-grained concurrent object-oriented programs. Technical Report UIUCDCS-R-93-1815, Department of Computer Science, University of Illinois, Urbana, Illinois, June 1993.
....an interpreter which supports incremental program development and debugging, a source level symbolic debugger, and performance tools to evaluate parallel program performance. These tools are described in greater detail in the next section. More information on the Concert System can be found in [7, 18, 14, 19, 5]. 1 3.3 Concert Tools for Program Development and Tuning The Concert System supports the development of irregular parallel applications. The challenges for program development are much greater for explicitly parallel programs than for models which retain sequential program semantics; ....
....10 30. This comparable to the maximum floating performance achievable by sequential C programs. The Concert compiler performs aggressive type inference and inlining. However, the effectiveness of these optimizations is limited by locality and synchronization constraints. 3 As posited in [5, 7], locality analysis and grain size tuning is essential to achieving higher levels of efficiency. At present the Concert system does not automatically do any locality optimization. And further, 3 In addition to the optimization done within the Concert compiler, the output code is C compiled by ....
Andrew Chien, Vijay Karamcheti, and John Plevyak. The concert system -- compiler and runtime support for efficient fine-grained concurrent object-oriented programs. Technical Report UIUCDCS-R93 -1815, Department of Computer Science, University of Illinois, Urbana, Illinois, June 1993.
....on a single node, a naive implementation of the model would result in additional 200 overhead for each message received and a 100 overhead for each message sent. To eliminate this cost, we build on previous work aimed at optimizing concurrent object oriented languages for parallel systems [34, 16, 51]. Specifically, we concentrate on converting communication between a meta actor and its base actor into a series of function calls. Techniques have been proposed that use the results from type inference [47] to replace local message sends with function calls. These techniques utilize ....
Andrew Chien, Vijay Karamcheti, and John Plevyak. The Concert System -- Compiler and Runtime Support for Efficient Fine-Grained Concurrent Object-Oriented Programs. Technical Report UIUCDCSR -93-1815, Department of Computer Science, University of Illinois, Urbana, Illinois, June 1993.
....these programs, while quite significant, is not as impressive as it might be in another environment. For these programs, the current SELF compiler is unable to statically bind many messages because of a lack of static type information. Future compilers for SELF and other object oriented languages [Chien et al. 93, H lzle Ungar 93, Chambers et al. 93] are expected to incorporate interprocedural type analysis and extract type information from execution profiles, leading to many more messages being statically bound and thus eligible for inlining. We expect the importance of making good inlining decisions ....
Andrew A. Chien, Vijay Karamcheti, John Plevyak. The Concert System: Compiler and Runtime Support for Efficient, Fine-Grained Concurrent Object-Oriented Programs. Technical report R-93-1815, Department of Computer Science, University of Illinois at Urbana-Champaign, 1993.
....and discuss the implementation. 4.1 Overview Threads are important features in concurrent languages. The first concurrent Object Oriented language Simula uses coroutines , which is a thread like construct. Threads are widely used in many concurrent languages to support a fine grained computation [23, 15, 14, 9, 35, 33, 18]. In the fine grained computation, if the computation consists of many independent small tasks which are created dynamically, then a thread will be created for each independent unit of work (task) When a thread needs to wait for a result from a slow device like the network, printer, file system, ....
....in the next chapter. 6.2 Related Work Many efficient optimization schemas have been proposed to improve the performance of fine grained multiple threaded systems. Some of them like Filament [32, 31] Chores [12] Cilk [6] and Leapfrogging [38] use stack less threads. Some of them, like Concert[30, 9, 22], ABCL[35] and TAM[10] use a compiler to generate the code which supports the frame management and the context switching. In this section, we will study some of these thread schemas. 6.2.1 Stack less Threads In a stack less thread schema, threads are viewed as tasks without private stacks. Only ....
[Article contains additional citation context not shown here]
Andrew A. Chien, Vijay Karamcheti, and John Plevyak. The Concert System -- Compiler and Runtime Support for Efficient, Fine-Grained Concurrent Object-Oriented Programs. June 1993.
....on a single node, a naive implementation of the model would result in additional 200 overhead for each message received and a 100 overhead for each message sent. To eliminate this cost, we build on previous work aimed at optimizing concurrent object oriented languages for parallel systems [KA95, CKP93, PZC95] Specifically, we concentrate on converting communication between a meta actor and its base actor into a series of function calls. Many techniques have been proposed for using the information resulting from typeinference [PS91] to replace local message sends with function calls. These ....
Andrew Chien, Vijay Karamcheti, and John Plevyak. The Concert System -- Compiler and Runtime Support for Efficient Fine-Grained Concurrent Object-Oriented Programs. Technical Report UIUCDCS-R-93-1815, Department of Computer Science, University of Illinois, Urbana, Illinois, June 1993.
....scope for optimizing the Pupa implementation. Fundamentally, a communication system for parallel computing is required to provide a reliable in order delivery of messages from one process on one machine to another process on another machine. The studies performed by Karamcheti, Chien and Plevyak [4] [9] 10] 14] demonstrated that the support for reliable and ordered delivery is best provided at the lowest level of the communication layer. Parallel programs require low latency for small message and high bandwidth for large messages Small messages are often generated by time critical ....
Andrew Chen, Vijay Karamcheti, and John Plevyak. The Concert system - compiler and runtime support for efficient fine-grained concurrent object-oriented programs. Technical Report UIUCDCS-R93 -815, Department of Computer Science : University of Illinois at Urbana, June 1993.
....the communication necessary for remote method invocation, and schedule method invocations on objects. Parallelism is exploited by both a compiler and an associated runtime system. Examples of this type of systems include Mentat [10, 11] CC [5] Charm [13] pC [19] C [17] and Concert [6]. These systems differ in several ways, including the granularity of objects, the way parallelism is expressed, the implementation of synchronization, the policy of scheduling method invocations, and the means through which data is shared. Most of these systems model program execution as ....
Andrew A. Chien, Vijay Karamcheti, and John Plevyak. The Concert system - compiler and runtime support for efficient, fine-grained concurrent object-oriented programs. Technical Report 1815, University of Illinois at Urbana-Champaign, June 1993.
.... For example, Kessler et al. use it to implement fast breakpoints for debuggers [84] Pike et al. speed up bit blt graphics primitives by dynamically generating optimal code sequences [103] In operating systems, dynamic compilation has been used to efficiently support fine grain parallelism [32, 105] and to eliminate the overhead of protocol stacks [1] and dynamic linking [67] Dynamic compilation has also been used in other areas such as database query optimization [19, 42] microcode generation [107] and fast instruction set emulation [34, 93] 2.5.2 Customization The idea of customizing ....
Andrew A. Chien, Vijay Karamcheti, and John Plevyak. The Concert System: Compiler and Runtime Support for Efficient, Fine-Grained Concurrent Object-Oriented Programs. University of Illinois at Urbana-Champaign, Technical Report UIUC DCS-R-93-1815, 1993.
....among the different layers of the protocols rather than the traditional ordered processing of the layers Gamma for protocol implementations to improve performance. We have followed the integrated layers approach. We also borrow from the studies conducted by Karamcheti, Chien and Plevyak [20] [51] 52] 67] where they have concluded that the support for reliable and ordered delivery should be provided at the lowest layer of a communication system implementation. Other optimizations include an optimistic flow control policy to improve throughput, a demand driven acknowledgment scheme ....
Andrew Chen, Vijay Karamcheti, and John Plevyak. The Concert system - compiler and runtime support for efficient fine-grained concurrent object-oriented programs. Technical Report UIUCDCS-R-93-815, Department of Computer Science : University of Illinois at Urbana, June 1993.
....in Section 6. 2 Background We describe the programming model, execution model, and the compiler framework. The mapping of the programming model to the execution model described here is largely conceptual; further information about our approach and actual implementation of COOP can be found in [9, 30]. 2.1 Programming Model The programming model we assume is the synergistic union of Actors [1, 12, 21] and the objectoriented model [17] Each object can act concurrently to update its own state, create new objects or invoke methods on other objects. An object provides a set of abstract ....
....opening the door for speculative optimization. They also expose the basic costs in the execution model, enabling many optimizations including some described later in this paper. 2. 3 Compiler Framework The optimizations described in this paper have been implemented as part of the Concert compiler [9]. The intermediate form used in our compiler is the Program Dependence Graph (PDG) 16] in Static Single Assignment (SSA) 15] form. Using the intermediate form, the compiler performs concrete type inference, global constant propagation, cloning, inlining and extension of access regions. Next, ....
Andrew Chien, Vijay Karamcheti, and John Plevyak. The concert system -- compiler and runtime support for efficient fine-grained concurrent object-oriented programs. Technical Report UIUCDCS-R-93-1815, Department of Computer Science, University of Illinois, Urbana, Illinois, June 1993.
....precisely analyze multiply linked 2 structures. 1 . To remedy these limitations, the ASG incorporates extensions to the SSG (not based on heap reference counts) These extensions are described in detail in Section 3. 2. 2 Project Context This work has been done as part of the Concert Project [CKP93] The objective of the Concert system is to achieve efficient, portable implementations of fine grained concurrent object oriented languages on parallel machines. Current commercial multicomputers are the primary target and present a variety of difficult problems since the cost of fine grained ....
Andrew Chien, Vijay Karamcheti, and John Plevyak. The concert system -- compiler and runtime support for efficient fine-grained concurrent object-oriented programs. DCS Technical Report UIUCDCS-R-93-1815, University of Illinois, Department of Computer Science, 1304 W. Springfield Avenue, Urbana, Illinois, June 1993.
....increase in the prefetch queue size to 128 words makes the pull scheme competitive for all messagesizes. This study was motivated by the high communication costs that we observed in our CM 5 implementation of a concurrent objectoriented language which generates very irregular communication traffic [5, 16]. We are currently in the process of porting the implementation to the T3D. Future work will examine the implications of architectural support in the context of traffic patterns generated by large application programs. Acknowledgements The authors thank Jae HoonKim and JohnPlevyak for ....
Andrew Chien, Vijay Karamcheti, and John Plevyak. The Concert system -- compiler and runtime support for efficient fine-grained concurrent object-oriented programs. Technical Report UIUCDCS-R93 -1815, Department of Computer Science, University of Illinois, Urbana, Illinois, June 1993.
....only if a variable can be resolved to a single concrete type. Thus, in contrast to the general types of other typing techniques, our goal is to find specific type information. To enable such optimizations, we have implemented a concrete type inference algorithm in the Illinois Concert compiler [6]. For the sample code shown in Figure 1 this algorithm determines that max is called only on integers from the expression (f (1 max: 2) and floats from the expression (f (1.0 max: 3.0) enabling the max and functions to be specialized and inlined. 2 Type systems like those of Pascal and C ....
Andrew Chien, Vijay Karamcheti, and John Plevyak. The concert system -- compiler and runtime support for efficient fine-grained concurrent object-oriented programs. Technical Report UIUCDCS-R-93-1815, Department of Computer Science, University of Illinois, Urbana, Illinois, June 1993.
....efficient schema for each portion of the program so that C efficiency is achieved where possible. In addition, our implementation is portable; written entirely in C, it is not specific to the stack structure on any particular machine. This execution model is used by the Illinois Concert system [5], a compiler and runtime which achieves efficient execution of concurrent object oriented programs. This system compiles ICC , a parallel dialect of C [14] and Concurrent Aggregates (CA) 8] for execution on workstations, the TMC CM5 [31] and the Cray T3D [10] simply by recompiling and linking ....
....enabling them to be executed on the stack. The flexible parallel sequential execution model presented in this paper dynamically adapts for parallel or sequential execution and provides a hierarchy of calling schemas of increasingly power and cost. This model is part of the Illinois Concert system [5, 7] which consists of a globally optimizing compiler [28] and runtime [23] The compiler is capable of resolving interprocedural control and data flow information [27] which enables the use of specialized calling conventions based on the synchronization features required by the called method. The ....
[Article contains additional citation context not shown here]
Andrew Chien, Vijay Karamcheti, and John Plevyak. The Concert system -- compiler and runtime support for efficient fine-grained concurrent object-oriented programs. Technical Report UIUCDCS-R93 -1815, Department of Computer Science, University of Illinois, Urbana, Illinois, June 1993.
....and moved outside the loop. With these optimizations, matrix multiply of multi dimensional arrays in CA is as fast as C (see Section 4) even though the array operations are abstracted and ostensibly require much more work. 3. 6 Context This optimization framework is embodied in the Concert [12] retargetable compiler for concurrent objectoriented languages which currently supports both Concurrent Aggregates (CA) and ICC (a parallel dialect of C ) In this paper we are concerned only with the sequential subset of CA. Readers interested in transformations specific to concurrent ....
....analogues, and demonstrated them on a suite of standard benchmarks. We have shown that this framework applied to a dynamicallytyped pure object oriented language can improve performance up to 6 times over C . This framework is implemented in and our experiments were conducted using the Concert [12] compiler to which we have recently added a front end for a C dialect (ICC [11] We intend to use this platform to continue the evaluation of automatic interprocedural optimization. 7 Acknowledgments We thank Vijay Karamcheti, Julian Dolby, Xingbin Zhang and the other members of the Concert ....
Andrew Chien, Vijay Karamcheti, and John Plevyak. The Concert system -- compiler and runtime support for efficient fine-grained concurrent object-oriented programs. Technical Report UIUCDCS-R-93-1815, Department of Computer Science, University of Illinois, Urbana, Illinois, June 1993.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC