46 citations found. Retrieving documents...
W. J. Dally et al. Architecture of a Message-Driven Processor. In Proceedings of the 14th Annual International Symposium on Computer Architecture, pages 189--196, ACM, May 1987.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Executing Multithreaded Programs Efficiently - Blumofe (1995)   (12 citations)  (Correct)

....in some of these other systems, though Cilk s algorithm uses randomness and is provably efficient. Many multithreaded programming languages and runtime systems are based on heuristic scheduling techniques. Though systems such as Charm [91] COOL [27, 28] Id [3, 37, 80] Olden [22] and others [29, 31, 38, 44, 54, 55, 63, 88, 98] are based on sound heuristics that seem to perform well in practice and generally have wider applicability than Cilk, none are able to provide any sort of performance guarantee or accurate machine independent performance model. These systems require that performance minded programmers become ....

William J. Dally, Linda Chao, Andrew Chien, Soha Hassoun, Waldemar Horwat, Jon Kaplan, Paul Song, Brian Totty, and Scott Wills. Architecture of a message-driven processor. In Proceedings of the 14th Annual International Symposium on Computer Architecture, pages 189--196, Pittsburgh, Pennsylvania, June 1987. Also: MIT Artificial Intelligence Laboratory Technical Report MIT/AI/TR-1069.


Performance Tradeoffs In Multithreaded Processors - Agarwal (1991)   (38 citations)  (Correct)

....initiated every cycle (or every few cycles) pipeline bubbles due to pipeline dependencies or processor stalls due to memory latency can be prevented. Processors in message passing multicomputers often maintain multiple processes per node and context switch among them to overlap message tatencies [11, 12]. MIT VLSI Memo, 1989 No. 89 566; revised 1990. Submitted to IEEE Transactions on Parallel and Distributed Systems, 1991. There are limits to the improvements in processor utilization achievable by multithreading a processor. Most important, multithreading requires applications to display ....

W. J. Dally et al. Architecture of a Message-Driven Processor. In Proceedings of the ldth Annual Symposium on Computer Architecture, pages 189-196, IEEE, New York, June 1987.


Design And Analysis Of Update-Based Cache Coherence Protocols For .. - Glasco (1995)   (1 citation)  (Correct)

....subsequent words. Moreover, no expensive synchronization operation is required; thus, the producer never needs to wait for the writes to be performed. There have been several systems designed with word synchronization. These include but are not limited to Alewife [48] HEP [65] Tera [6] and MDP [17]. This dissertation is the first to examine the performance of word synchronization on both update based and invalidate based cache coherent multiprocessors. The work on the Alewife system examines the implementation details of a word synchronization scheme, and their work gives an excellent ....

William J. Dally, et al. Architecture of a message-driven processor. In Proceedings of the 14th Annual Symposium on Computer Architecture, pages 189--196, June 1987.


Performance of Shared Caches on Multithreaded Architectures - Chen, Peir, King (1996)   (3 citations)  (Correct)

....are provided so that saving of register contents is not needed during context switching. As a result, context switching can be done within a few cycles. There are a number of research and commercial projects focusing on the design of multithreaded systems, e.g. Tera [1] Alewife [3] J machine [7], MASA [9] IBM Empire [12] T [16] CPC [21] etc. However, there are relatively fewer studies on the performance impact to the storage hierarchy in this type of systems. Two previous works by Agarwal [2] and Saavedra Barrera et al. 18] used analytical methods in their attempt to estimate ....

....four program counters, each has an associated processor status register and a set of general purpose registers, called a task frame. Context switching will occur when a remote memory access or a synchronization attempt is encountered. The overhead for context switching is 11 cycles. The J Machine [7] implements in each processor three execution levels: background, priority 0, and priority 1. Separate registers are provided to support rapid switching among these three levels. Cache memory is a proven technique to reduce memory latency and has been implemented in all the known high performance ....

W. Dally, L. Chao, A. Chien, S. Hassoun, W. Horwat, J. Kaplan, P. Song, B. Totty and S. Wills, "Architecture of a Message-Driven Processor," in Proc. of the 14th Annual International Symposium on Computer Architecture, pp. 189--196, June 1987.


An Architecture of Highly Parallel Computer AP1000 - Hiroaki Ishihata Takeshi (1991)   (15 citations)  (Correct)

....while keeping high throughput and avoiding deadlock. Dally[4] has proposed message driven processor based on object oriented concurrent programming. He uses a custom CPU to reduce communication latency and avoids deadlock by using a message buffering scheme which he calls the virtual channel[5]. We use general purpose microprocessor and special hardware for communication to reduce the overall overhead. In this paper, we describe the AP1000 in detail. The design concepts are presented in Section 2, the system configuration in Section 3, and detail of the cell architecture in Section 4. ....

....1: System configuration Many kinds of interconnection schemes have been analyzed and implemented. According to recent studies, lowdimensional networks with wide channels provide lower latency, less contention, and higher hot spot throughput than high dimensional networks with narrow channels[5]. We used a 2 dimensional torus network for the T net, with each cell connected to its four adjacent cells through the T net. Each port of the T net has a 16 bit data bus, two parity bits, and a few control signals. Using a pipelined handshaking control protocol, each port of the T net has a 25 ....

W.J.Dally, "Architecture of a Message-Driven Processor, " Proc. of 14th ACM/IEEE Symposium on Computer Architecture , pp.189-196, 1987.


VLIW Processors: Efficiently Exploiting Instruction Level.. - Rudd (1999)   (Correct)

....a superscalarbased processor core; this approach utilizes as many of the function units as possible by executing operations from these independent threads concurrently. Other approaches include switching threads at long latency events such as cache misses or message waits (as in the MIT J Machine [15]) or speculatively executing multiple related threads (as in the Multi scalar [50] project from the University of Wisconsin) In general, the tradeoff in multi threaded processors is to increase complexity to improve processor utilization thus reducing complexity efficiency and possibly ....

William J. Dally, Linda Chao, Andrew Chien, Soha Hassoun, Waldemar Horwat, Jon Kaplan, Paul Song, Brian Totty, and Scott Wills. Architecture of a message-driven processor. In The 22nd Annual International Symposium on Computer Architecture, pages 189--195. Association of Computing Machinery, June 1995.


A Concurrent Abstract Interpreter - Weeks, Jagannathan, Philbin (1994)   (Correct)

.... in the abstract state, a fine grained concurrent implementation of the interpreter will typically entail the creation of many more execution contexts than can be efficiently implemented on multiprocessor platforms that do not provide explicit hardware support for fine grained concurrency [1] [11]. We, therefore, choose a more coarse grained implementation strategy. In this version, the abstract state is partitioned, and a thread is assigned to manage each partition. The runtime organization of a program s abstract state is given in terms of shared data structures that are concurrently ....

William J. Dally and et. al. Architecture of a Message-Driven Processor. In Proceedings of the 14 th IEEE Conference on Computer Architecture, pages 189--196, 1987.


Models Of Communication Latency In Shared Memory.. - Gregory Byrd December (1993)   (1 citation)  (Correct)

.... block line word consumer yes yes no [18] deliver line block line word producer yes yes yes [17] reader copy block block consumer no yes no [16, 19] writer copy block block word producer no yes no [16, 19] message block word producer no yes no [5] message eager block word producer no yes yes [6, 4] Table 1: Summary of mechanisms. Weaker models of consistency have been proposed, however, which do allow store concurrency [2, 12] These models recognize that a series of writes may proceed in any order, if they do not affect the global view of the computation, as long as they all complete ....

William J. Dally et al. Architecture of a message-driven processor. In Proceedings of the 14th Annual International Symposium on Computer Architecture, pages 189--196, June 1987.


Local Memory Reference Behavior of Fine-Grain.. - Motomura, Papadopoulos (1993)   (Correct)

....on processors that support an efficient short message network interface. Otherwise network overhead will dominate execution time. Fortunately, tightly integrated network interfaces are appearing in a number of commercial and research machines, e.g. the CM 5 [4] NCUBE, Tera [5] iWARP, JMachine [6], Alewife [7] and T [8] Our results should give insight to the local memory reference behavior of codes running any of these machines which have been compiled for a TAM like execution model. Tree of Activation Frames f: g: h: loop interations Global Heap of Shared Objects Active Threads join ....

William J. Dally, L. Chao, A. Chien, S. Hassoun, W. Horwat, J. Kaplan, P. Song, B. Totty and S. Wills. Architecture of a Message-Driven Processor. In Proceedings of 14th Annual International Symposium on Computer Architecture, IEEE, June 1987, pages 189-196.


Space-Efficient Scheduling of Multithreaded Computations - Blumofe, Leiserson (1993)   (30 citations)  (Correct)

....order at compile time. At run time, a scheduler dynamically orders execution of the threads. Other systems employ schedulers that dynamically order threads based on the availability of data in shared memory multiprocessors [1, 10, 23] or message arrivals in message passing multicomputers [2, 17, 29, 44]. Rapid execution of a multithreaded computation on a parallel computer requires exposing and exploiting parallelism in the computation by keeping enough threads concurrently alive to keep the processors of the computer busy. If processors are busy most of the time, the P processor execution ....

W. J. Dally, L. Chao, A. Chien, S. Hassoun, W. Horwat, J. Kaplan, P. Song, B. Totty, and S. Wills, Architecture of a message-driven processor, in Proceedings of the 14th Annual International Symposium on Computer Architecture, Pittsburgh, Pennsylvania, June 1987, pp. 189--196.


System Support for Efficient Network Communication - Thekkath (1994)   (4 citations)  (Correct)

....and control transfer. The proposed structure is applicable in environments other than distributed systems, e.g. in large scale dedicated multiprocessors, where the cost of control transfer is high relative to that of data transfer. Traditionally, these multiprocessor systems, e.g. the JMachine [22], T [53] etc. have opted for a single primitive that unifies remote transfer of data and control, in contrast to our approach. 93 It is important to make two final points in passing. First, note that the structure we propose here is one innovative alternative to organizing distributed ....

William J. Dally, Linda Chao, Andrew Chien, Soha Hassoun, Waldemar Horwat, Jon Kaplan, Paul Song, Brian Totty, and Scott Wills. Architecture of a messagedriven processor. In Proceedings of the 14th International Symposium on Computer Architecture, pages 189--196, June 1987.


Synchronization Constraints with Inheritance: What Is.. - Matsuoka, Wakita.. (1990)   (16 citations)  (Correct)

....activities of professionals requiring immense computational power. We claim that object oriented concurrent programming (OOCP) serves as the basis for professional computing. Massive parallel architectures supporting OOCP languages are being designed and built atCaltech (Mosaic) 3] MIT (J Machine)[8], among other places. Research on computational models for OOCP, such as the Actor model[1] design of languages based on those models, and their implementations are flourishing. In developing large scale programs for professional computing with OOCP languages, it is extremely important that ....

W. Dally et al. Architecture of a message-driven processor. In 14th ACM/IEEE Symposium on Computer Architecture, pages 189--196, Jun. 1987.


Emulation of a Virtual Shared Memory Architecture - Raina (1993)   (3 citations)  (Correct)

....the main computation processor. The two processors would have differing characteristics, e.g. the compute processor would need efficient floating point capability, whereas the emulation Conclusion and Further Work 153 processor would need efficient communication and message manipulation [53, 198, 168]. This would result in a practical implementation with little additional development cost. A new generation message passing multiprocessor that has such a capability is the Meiko CS 2 [131] The CS 2 consists of a SPARC like communication co processor that handles message passing and relieves the ....

W. J. Dally et al. Architecture of a Message-Driven Processor. In Proceedings of the 14th Annual International Symposium on Computer Architecture, pages 189--196, ACM, May 1987.


Space-Efficient Scheduling of Multithreaded Computations.. - Blumofe, Leiserson (1993)   (30 citations)  (Correct)

....sequential order at compile time. At run time, a scheduler dynamically orders execution of the threads. Other systems employ schedulers that dynamically order threads based on the availability of data in shared memory multiprocessors [1, 4] or message arrivals in message passing multicomputers [9, 24]. Rapid execution of a multithreaded computation on a parallel computer requires exposing and exploiting parallelism in the computation by keeping enough threads concurrently active to keep the processors of the computer busy. If processors are busy most of the time, the execution schedule X of ....

William J. Dally, Linda Chao, Andrew Chien, Soha Hassoun, Waldemar Horwat, Jon Kaplan, Paul Song, Brian Totty, and Scott Wills. Architecture of a message-driven processor. In Proceedings of the 14th Annual International Symposium on Computer Architecture, pages 189--196, Pittsburgh, Pennsylvania, June 1987.


Object Oriented Load Distribution in DinnerBell - Kono, Tatsukawa, Aoyagi..   (Correct)

....network transmission unit and receiving unit work in parallel with micro message execution unit. The network assumption is also simple. We assumed the network to be flat, that is, with no locality. A multistage network is an example of such a network. This architecture assumption is general, as in [Dally87]. Micro message execution time is based on a micro message Time Receiving Unit 6 R Network Delay R Transmission Unit 6 Transmission Time Fig. 18: Time Chart of Simulation. interpreter on MC68000 ( 0.5Mips) micro processor and a multistage network LSI. ....

W.J. Dally, A. Chien L. Chao, S. Hassoun, W. Horwat andJ. Kaplan, P. Song, B. Totty, and S. Wills. Architecture of a message-driven processor. In 14th Comp. Arch. Conf. Procs., pp. 189--196. IEEE, 1987.


The Message-Driven Processor: A Multicomputer Processing.. - William Dally Roy (1992)   (85 citations)  Self-citation (Dally)   (Correct)

....(ISA) with instruc tions to support parallel processing. Specifically, the MDP provides efficient hardware mechanisms for communication, synchronization and naming. This section describes the MDP ISA with partic ular emphasis on these mechanisms. Further details of the MDP ISA are described in [11]. Register Set: The MDP provides separate register sets to support rapid switching between three execution levels: background, priority 0 (PO) and priority 1 (P1) The MDP executes at the background level when there are no pending messages. Each arriving message creates a task and initiates ....

William J. Dally et al. Architecture of a Message-Driven Processor. In Proceedings of the 1gth Inter- national Symposium on Computer Architecture, pages 189 205. IEEE, Computer Society Press, June 1987.


Evaluating the Locality Benefits of Active Messages - Ellen Spertus And (1995)   (3 citations)  Self-citation (Dally)   (Correct)

No context found.

William J. Dally et al. Architecture of a message-driven processor. In Proceedings of the 14th International Symposium on Computer Architecture, pages 189--205. IEEE, June 1987.


The J-Machine: A Retrospective - Dally, Chang, Chien, Fiske, Horwat.. (1998)   (2 citations)  Self-citation (Dally)   (Correct)

.... of Computer Science, Mills College 8 Digital Equipment Corporation, Western Research Laboratory 9 Department of Electrical and Computer Engineering, Georgia Institute of Technology Eleven years ago, at ISCA 14, we published a paper titled, Architecture of a Message Driven Processor [Dal87] marking the start of our J Machine project at MIT. The project culminated with the construction of a working prototype in 1991 [Dal92] and the evaluation of this prototype in 1992 [NWD93, Spert93] The J Machine demonstrated the use of a jellybean part, a commodity part incorporating a ....

W. Dally, et al., "Architecture of a Message-Driven Processor," ISCA-14, pp. 189-196, 1987.


Mechanisms for Efficient, Protected Messaging - Lee   Self-citation (Dally)   (Correct)

....direct exposure of critical, shared messaging resources to untrusted user level processes also compromised protection and created starvation risks. The inadequacy of the message system hardware in these designs is sometimes circumvented with scheduling protocols [28] and programming conventions [53]. Such remedial solutions however seriously defeat the raw performance of the underlying hardware, even when they are reliable. The performance penalty would be especially pronounced within the fine grain, fast context switching environment of a system like the M Machine. In order to benefit ....

....resource. The message size is unbounded in a streaming interface. Message words are written into a channel by the sender at one end, and read out by the receiver at the other end. Until the sender receiver pair agrees to relinquish access to the channel, it cannot be reassigned. In the J Machine [53] for example, the SEND instruction implicitly establishes such a channel, which is closed only by the SENDE instruction. Therefore, the code fragment between each SEND instruction and its corresponding SENDE instruction must be treated as a critical, exclusive region, within which no other thread ....

[Article contains additional citation context not shown here]

William J. Dally, Linad Chao, Andrew Chien, Soha Hassoun, Waldemar Horwat, Jon Kaplan, Paul Song, Brian Totty, Scott Wills, "Architecture of a MessageDriven Processor", in ISCA, 1987. pp. 189--196.


Emulation of a Virtual Shared Memory Architecture - Raina (1993)   (3 citations)  (Correct)

No context found.

W. J. Dally et al. Architecture of a Message-Driven Processor. In Proceedings of the 14th Annual International Symposium on Computer Architecture, pages 189--196, ACM, May 1987.


Mitsubishi Electric Research Laboratories - Cambridge Research Center   (Correct)

No context found.

W. Dally and et al. Architecture of a Message-Driven Processor. In Intl Symposium on Computer Architecture, 1987.


Mitsubishi Electric Research Laboratories - Cambridge Research Center   (Correct)

No context found.

W. Dally and et al. Architecture of a Message-Driven Processor. In Intl Symposium on Computer Architecture, 1987.


Binding Time in Distrubuted Shared Memories - Kong (1999)   (Correct)

No context found.

William J. Dally, Linda Chao, Andrew Chien, Soha Hassoun, Waldemar Horwat, Jon Kaplan, Paul Song, Brian Totty, and Scott Wills. Architecture of a Message-Driven Processor. In Proceedings of the 14th Annual International Symposium on Computer Architecture, pages 189--196. Computer Architecture News, 15(2), June 1987.


Global Illumination and Monte Carlo - Heirich (1997)   (Correct)

No context found.

Dally, W. J. et al. "Architecture of a Message Driven Processor." Proc. 14th ACM/IEEE Symposium on Computer Architecture (1987), pp. 189-196.


The Persistent Relevance of IPC Performance: New.. - Hsieh, Kaashoek, Weihl (1993)   (5 citations)  (Correct)

No context found.

Dally, W.J., Chao, L., Chien, A., Hassoun, S., Horwat, W., Kaplan, J., Song, P., Totty, B., and Wills, S., "Architecture of a Message -Driven Processor," Proc. 14th Annual International Symposium on Computer Architecture, pp. 189-196, Pittsburgh, PA, Jun. 1987.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC