| W. Horwat, `Concurrent Smalltalk on the message-driven processor', Masters Thesis, Massachussetts Institute of Technology, Computer Science Department, September, 1991. |
....frame can be freed. These structures are shown in Figure 2 1. The following sections describe them in more detail. 2.2 Data Structures 2.2.1 Codeblocks A codeblock consists of one or more scheduling quanta stored contiguously on each processor on which the procedure might be invoked. Unlike [Horwat 1989], code is distributed at load time. The format of a pointer to a codeblock is shown in Figure 2 2. A user defined tag value, CB, is used to indicate a pointer to a codeblock The low sixteen bits of the descriptor hold aln this context, user defined means defined by my dataflow system, as ....
....a pointer to a frame. The low sixteen bits of the descriptor hold the node number, and the high sixteen bits hold the local address, combining to provide a global address. Storing the node number in the low sixteen bits provides an efficiency bonus on the J Machine as first described in [Horwat 1989, page 68] 12 3532 31 16 15 FD ILocal address 1ode Number I o ISD of Ar(ument Chain Figure 2 3: A Non Loop Procedure Frame. A user defined tag, FD, denotes a frame descriptor. It encodes the unique global address of a frame. The first slot of a frame holds a frame descriptor ....
[Article contains additional citation context not shown here]
Horwat, Waldemar. Concurrent Smalltalk on the Message-Driven Processor. Master's Thesis, Department of EECS, MIT, 1989.
....LimitLESS Dir1SW Figure 2 3. DSM implementation alternatives. For the purposes of this thesis, programming systems that implement a shared address space by directly mapping operations on the shared address space into message passing constructs (e.g. Split C [81] Concurrent Smalltalk [29]) are considered to be messagepassing programming systems, not DSM systems. This section provides a brief overview of traditional DSM implementation schemes (see Figure 2 3) The first part addresses software schemes intended for message passing multicomputers or networks of workstations. The ....
Waldemar Horwat. Concurrent Smalltalk on the Message-Driven Processor. Technical Report 1321, Massachusetts Institute of Technology, Artificial Intelligence Laboratory, September 1991.
....for the parallel domain, our current approach focuses on languages where the expression of fine grained parallelism is much cleaner. To date, two languages have been implemented to the J machine: The actor language Concurrent Smalltalk (CST) and the dataflow language Id. Concurrent Smalltalk: CST [18] is a parallel object oriented programming language (based on the Actor model [1] with asynchronous message send and distributed objects. Syntax is similar to that of LISP or SCHEME. Method or function invocation is performed by sending a message to the first argument of the method. The message ....
....larger than the machine can handle. We need to study how we can effectively and automatically throttle the parallelism being created when the machine is saturated. These issues, and others related to the efficiency of programming fine grained parallel processors, are discussed in more detail in [18]. Dataflow Implementation: Id [21] is a functional programming language originally designed for dataflow architectures. An Id program can be converted into a dataflow graph, in which operators are represented by nodes and dependencies by arcs. Originally, these dataflow graphs were executed ....
[Article contains additional citation context not shown here]
Waldemar Horwat. Concurrent Smalltalk on the Message-Driven Processor. MasteFs thesis, MIT, May 1989.
....of object orientation and concurrency control costs in the generated code. Concurrent object oriented languages have been inefficient largely because they provide a uniform view of all program data. Even the best implementations incur tens to hundreds of instructions for each method invocation [26, 47] due to the cost of managing a distributed memory (method invocations are location independent) and managing concurrency (locks) Furthermore, the high procedure call frequency typical of object oriented programs not only magnifies the method invocation overhead, it also reduces the benefits of ....
....operations) may invoke methods on several other objects concurrently, waiting on the responses only when required by data flow or the programmer. In this way, the programmer can safely and conveniently compose larger parallel abstractions and entire programs. A number of languages share this model [10, 26, 33, 46]. The programming model has three features which contribute fundamentally to its programmability: ffl a shared name space, ffl dynamic thread creation, and ffl object level access control. A shared namespace allows programmers to separate data layout and functional correctness. Dynamic thread ....
[Article contains additional citation context not shown here]
Waldemar Horwat. Concurrent Smalltalk on the message-driven processor. Master's thesis, Massachusetts Institute of Technology, Cambridge, Massachusetts, June 1989.
....the perception that these features cannot be implemented efficiently on multicomputers built from conventional processors (stock hardware) This claim is reinforced by the use of custom hardware in the highest performance implementations of COOP languages. Implementations of CST on the J machine [6] and ABCL on the EM 4 [7] exploit hardware support for message passing, method dispatch, global name translation, and thread scheduling to achieve high performance. Such custom hardware efforts focus on minimizing the cost of all runtime operations by supporting a few generalpurpose primitives ....
W. Horwat, "Concurrent smalltalk on the message-driven processor," Master's thesis, MassachusettsInstitute of Technology, Cambridge, Massachusetts, June 1989.
....express both parallel recursive computations like SIMPL, and distributed data structures like those needed to perform fi reduction in parallel. Furthermore, the functional notation of most actor languages allow the user to easily express lazy and eager evaluation of expressions [Manning 87, Waldemar 91] 4.3 Towards a parallel execution model of Prolog Discussion in section 3.5 has pointed out a number of difficulties for exploiting parallelism between backchaining and unification. On the other hand, the characterization of independent AND parallelism given in section 3.4 suggests that even ....
H. Waldemar, Concurrent Smalltalk on the Message-Driven Processor, Tech. Rep. 1321, MIT Artificial Intelligence Laboratory, Sept. 1991.
....trade off between good task distribution and runtime overhead. 2 Related Work Several researchers are working on efficient implementations of high level concurrent languages which involves dynamic task creation, some are in concurrent object oriented languages ABCL[11, 9] Concurrent Smalltalk[3], others are in functional languages Id[2, 10] or logic programming languages KL 1[8, 4] ABCL and CST concentrate on and achieved efficiencies of basic operations such as message passing, or task creation. Although the above implementations have achieved good performance of simple operations ....
Waldemar Horwat. Concurrent Smalltalk on the message-driven processor. Master's thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, May 1989.
....processing for concurrent objects can be significantly improved over the previous approaches, and as a result, the OOCP languages could serve well in a massively parallel environment. We believe that this claim holds even for the small examples in this paper. For example, by the numbers given in [7], the total number of machine cycles for object creation followed by a message send and its reply is estimated to be well over 200 cycles for the J Machine, whereas it is approximately 130 cycles in our case. And, for the reasons we gave in this section, we further believe that the difference in ....
Waldemar Horwat. Concurrent smalltalk on the message-driven processor. Master's thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, May 1989.
....By contrast, Charm requires programmers take the responsibility of mapping of objects onto processing nodes. pC [87] C [86] and pSather [95] are all C based COOP languages which are designed to support data parallelism. They differ in how to initiate data parallel execution. CST [61, 38] and DistributedConcurrentSmalltalk (DCS) 96] are two of many COOP languages which extended Smalltalk 80 [43] CST supports concurrency using locks, asynchronous message passing, and distributed objects. Distributed objects are similar to aggregates in CA and are equipped with similar ....
W. Horwat. Concurrent Smalltalk on the Message Driven Processor. Master's thesis, MIT, May 1989.
....MIT Laboratory for Computer Science with additional support from a National Science Foundation Graduate Fellowship. z MIT Laboratory for Computer Science, 545 Technology Square, Cambridge, Massachusetts, 02139 (cel mit.edu) 2 R. D. BLUMOFE AND C. E. LEISERSON memory capacity of the machine [15, 22, 24, 39, 43]. To date, the space requirements of multithreaded computations have been managed with heuristics or not at all [14, 15, 22, 24, 26, 32, 39, 43] In this paper, we use algorithmic techniques to address the problem of managing storage for multithreaded computations. Our goal is to develop ....
....for Computer Science, 545 Technology Square, Cambridge, Massachusetts, 02139 (cel mit.edu) 2 R. D. BLUMOFE AND C. E. LEISERSON memory capacity of the machine [15, 22, 24, 39, 43] To date, the space requirements of multithreaded computations have been managed with heuristics or not at all [14, 15, 22, 24, 26, 32, 39, 43]. In this paper, we use algorithmic techniques to address the problem of managing storage for multithreaded computations. Our goal is to develop scheduling algorithms that expose sufficient parallelism to obtain linear speedup, but without exposing so much parallelism that the space requirements ....
W. Horwat, Concurrent Smalltalk on the message-driven processor, Tech. Report MIT/AI/ TR-1321, MIT Artificial Intelligence Laboratory, Sept. 1991.
....to specify mappings, it s authors recommend that this process be left to the runtime system. processdef factorial f public: atomic int compute(int n) f if( n) return 1; else f factorial f1; return (n f1.compute(n 1) g g g Figure 2. 2: Factorial Program in C A Concurrent Smalltalk (CST) [16] system has also been developed for programming the J machine. CST requires a larger and more complex microkernel to handle a broad range of language concepts that applications within the Scalable Concurrent Programming Laboratory do not require. Figure 2.3 shows an example of a CST program. Note ....
Horwat, W., Concurrent Smalltalk on the Message-Driven Processor, Masters thesis, Massachssetts Institute of Technology, Computer Science Department, September, 1991.
....across the computer network increases. This communication latency has not kept pace with decreasing processor cycle times. Even very low latency networks [62,83] have round trip message latencies greater than 100 instruction cycles. Fine grain programs send messages every 75 to 100 instructions [38,15]. If processors FIGURE 1 2. Parallelism profile of a Simple(50) a typical dataflow application. The program consists of phases of very high parallelism, as well as long sequential tails. ALU Operations Profile in SIMPLE (1 1 . 0 1000 2000 0 100 200 300 400 500 600 700 800 900 ....
....T run T switch = 20 Introduction 1.3 As illustrated by Figure 1 2, dynamic parallel programs may generate excessive parallelism. Since each active thread consumes memory, many thread scheduling policies must limit the number of concurrent active threads [20,38,48]. The goal is to spawn enough parallelism to keep processors busy without completely swamping the system. A second goal of thread scheduling is to exploit temporal locality among threads. An efficient algorithm [55] for switching threads in response to synchronization delays is to switch between a ....
Waldemar Horwat. "Concurrent Smalltalk on the Message-Driven Processor." Master's thesis, MIT, May 1989.
....in [9] For example, the performance peaks at 434 MFlops for 1024 by 1024 matrix on 64 node partition of the CM 5. 8 Related Work Many efficient software based and hardware based implementations for fine grained concurrent object oriented programming languages have been reported in literature [14, 36, 29, 18]. Our work relies on software technologies without any special hardware support; it is more closely related to the software based implementations [29, 18] Our implementation supports fine grain object level concurrency and thus differs from research in coarse grained COOP languages such as [7, ....
W. Horwat. Concurrent Smalltalk on the Message Driven Processor. Master's thesis, MIT, May 1989.
....3 1 Introduction Several experimental parallel architectures have been developed in recent years to demonstrate novel hardware mechanisms that may enhance the performance of programs written in emerging parallel languages. For example, Monsoon[17] focuses on Id90[15] J Machine[10, 11] on CST[13], Alewife[1] on Mul T, CM 5[20] on Fortran90, and Dash[14] and KSR 1 on extensions to C and Fortran. All of these architectures provide a family of mechanisms that collectively support the requirements of the parallel language, all of the machines are universal enough to support any of the other ....
Waldemar Horwat. Concurrent Smalltalk on the Message-Driven Processor. Technical Report Technical Report 1321, MIT Artificial Intelligence Lab, 545 Tech. Square, Cambridge, MA, 1991.
....of synchronization and communication structures including: synchronous (RPC) data (object) parallel, reactive and even custom communication and synchronization structures constructed as convenient for the application. In addition, continuations (the right to determine a future) can be forwarded [21] to another call, passed as arguments and stored in data structures. Graphical representations of these structures, all of which are supported by the ICC and CA langauges, appear in Figure 3. The flexibility of this programming model enables the programmer to select the mechanisms most ....
Waldemar Horwat. Concurrent Smalltalk on the message-driven processor. Master's thesis, Massachusetts Institute of Technology, Cambridge, Massachusetts, June 1989.
....Languages Multilisp Multilisp [33] is the language that originally embodies the future construct. The central idea of future that a future expression returns something that later becomes the result value is adopted not only in parallel Lisps but also in some concurrent object oriented languages [36, 78]. ABCL f also supports a variant of future. An apparent difference between the future in Multilisp and the one in ABCL f is that in Multilisp, producer consumer synchronization of a future invocation is implicit in value reference, whereas ABCL f requires explicit touch operations. For example, ....
Waldemar Horwat. Concurrent Smalltalk on the message-driven processor. Master's thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, May 1989.
....in a serious overhead, which almost made people to believe that concurrent object orient computing was quite slow. There are few exceptions which are recently developed sharing the same goal to achieve realistic performance on several parallel architectures: ffl Concurrent Smalltalk on J Machine[12, 11, 6], ffl ABCL onAP1000[24, 35] The detailed comparisons with these implementations are described in Section 7.4. Chapter 3 Overview of Concurrent Object Oriented Programming Language System ABCL EM 4 This chapter outlines our concurrent object oriented language system, ABCL EM 4, details of which ....
....communication latency alone does not reduce the overhead involved in message passings amongst concurrent objects in the remote nodes. Another major difference between ABCL EM 4 and previously proposed architecturesupported OO language implementations such as Concurrent Smalltalk on the JMachine [12, 11] and A NET[3] is that we attempt to eliminate the memory read write copy overhead of the outgoing and incoming messages as much as possible. That is to say, whether messages will require buffering or not could have significant impact on performance due to overheads such as copying and management ....
[Article contains additional citation context not shown here]
W. Horwat. Concurrent smalltalk on the message-driven processor. Technical Report MIT-AI-TR 1321, MIT Artificial Intelligence Lab., 1991.
....VFTP to Dormant Mode 3 Polling of Remote Message 5 Adjusting Stack Pointer and Return 3 Total 25 Table 4: Breakdown of intra node message to dormant object. Instruction Counts Real Time ( s) Cycles Clock Rate (MHz) ABCL onAP1000 160 17.8 450 25 ABCL onEM 4[17] 100 9 110 12.5 CST (on J Machine)[9] 110 4 220 50 Table 5: Comparison of send reply latency. even more if we adopt the above optimizations) that of the dormant case. Our scheduling mechanism specially optimizes dormant case by stack based scheduling, thereby reducing the total intra node scheduling overhead. Minimum inter node ....
....This shows that stock multicomputers with appropriate software technologies have much higher internode message passing capability than previously believed. In comparison to fine grain architecture (Table 5) send and reply latency is approximately 18 s, or 450 cycles, which is only about twice of [9] or about 4 times of [17] when normalized to the same clock speed. The above instruction counts includes message setup (in the script of the sender) polling, extracting of the message (in the message handler) system message buffer management and script invocation. The sender node takes about 20 ....
[Article contains additional citation context not shown here]
Waldemar Horwat. Concurrent Smalltalk on the Message-Driven Processor. MIT Artificial Intelligence Laboratory Technical Report MIT-AI-TR 1321, September 1991.
....to be written to the original calling frame. With four processors enabled, utilization is 37 i.e. on average, a processor does useful work a little over a third of the time. The hybrid system does not use the operating system written to support object oriented programming on the J Machine [8], so the library routines and fault handlers constitute the entire operating system. The functions of these routines are: ffl Lookup Check if a frame slot holds a disabled continuation before writing to it. If a continuation is present, enable it to signify that the data has arrived. ffl ....
Horwat, Waldemar. Concurrent Smalltalk on the Message-Driven Processor. MIT Artificial Intelligence Laboratory Technical Report 1080, Cambridge, MA, 1990. (Master's Thesis, Department of EECS, MIT.)
....of ABCLonAP1000, and then describe how the extension that support our proposal can be seamlessly integrated with it. 8. 1 Overview of ABCLonAP1000 Most work in high performance concurrent OOPLs have focused on combination of elaborate hardware and highly tuned, specially tailored software[49, 121] to drastically improve upon the two key factors in achieving high performance efficient inter node message passing and efficient intra node multithreading allowing message passing speed to approach those of sequential OO languages. We have shown with our work ABCL onAP1000 that, even ....
Waldemar Horwat. Concurrent Smalltalk on the message-driven processor. Master's thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, May 1989.
....LimitLESS Dir1SW Figure 2 3. DSM implementation alternatives. For the purposes of this thesis, programming systems that implement a shared address space by directly mapping operations on the shared address space into message passing constructs (e.g. Split C [81] Concurrent Smalltalk [29]) are considered to be messagepassing programming systems, not DSM systems. 2.1 Implementation Techniques This section provides a brief overview of traditional DSM implementation schemes (see Figure 2 3) The first part addresses software schemes intended for message passing multicomputers or ....
Waldemar Horwat. Concurrent Smalltalk on the Message-Driven Processor. Technical Report 1321, Massachusetts Institute of Technology, Artificial Intelligence Laboratory, September 1991.
....is our belief that the programmer is best able to make decisions about process mapping. If a program exhausts local resources, then we consider it to be a poorly formed program and leave it to the programmer to restructure the code to better balance its resource usage. A Concurrent Smalltalk (CST) [13] system has also been developed for programming the J machine. CST requires a larger and more complex microkernel to handle a broad range of language concepts that our applications do not require. Our system is currently the only programming system that has been developed for the J machine that is ....
Horwat, W., Concurrent Smalltalk on the Message-Driven Processor, Masters thesis, Massachssetts Institute of Technology, Computer Science Department, September, 1991.
.... proposed, relatively little attention has been paid to achieving high performance on conventional multicomputers such as CM 5, nCUBE 2, and AP1000[11] Most work on high performance concurrent OOPLs has focused on combination of elaborate hardware and highly tuned, specially tailored software [5, 14]. These software architectures (the compiler and the runtime system) exploit special features provided by the hardware in order to achieve the two key factors in high performance concurrent OOPL implementation efficient message passing between objects and efficient intra node multithreading. The ....
....queue and the next runnable thread is scheduled automatically upon termination of the current thread. Some concurrent OOPLs on such finegrain machines achieve impressive performance where the latency of message passing between objects on different nodes are 9 s[14] or about 200 machine cycles[5] for a request reply cycle of method invocation. The purpose of this paper is to demonstrate the techniques on conventional multicomputers that achieve comparable performance without the hardware facilities described above. Language implementation effort in this direction has been recently ....
[Article contains additional citation context not shown here]
Waldemar Horwat. Concurrent Smalltalk on the message-driven processor. Master's thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, May 1989.
....Revised Concurrent Smalltalk Manual 1 Chapter 1. Introduction This document describes the Concurrent Smalltalk language as of March 1993. Concurrent Smalltalk evolved significantly since it was originally implemented as part of a Master s thesis in May 1989. Additions and modifications since [3] are marked on the margin; the major differences are summarized later in this chapter. Scott Furman, Todd Dampier, and Shaun Kaneshiro assisted with Concurrent Smalltalk and Cosmos enhancements such as floatingpoint arithmetic, queue overflow handling, garbage collection, and miscellaneous bug ....
....arithmetic, queue overflow handling, garbage collection, and miscellaneous bug fixes and enhancements. Details on some of these are published in separate documents. Implementation To use the features in this document, be sure that you are using the Optimist compiler version 3. 0 or later [3][2] Cosmos version 3.0 or later [5] and MDPSim version 7.2c or later [4] The Optimist II compiler is written in standard revised Common Lisp as specified in [6] Unlike previous versions of Optimist II, this one uses the ANSI version of the LOOP macro. Optimist II was developed on a Macintosh ....
[Article contains additional citation context not shown here]
Waldemar Horwat. Concurrent Smalltalk on the Message-Driven Processor. MIT Artificial Intelligence Laboratory Technical Report 1321, September 1991.
No context found.
W. Horwat, `Concurrent Smalltalk on the message-driven processor', Masters Thesis, Massachussetts Institute of Technology, Computer Science Department, September, 1991.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC