29 citations found. Retrieving documents...
W. J. Dally, A. Chien, S. Fiske, W. Horwat, J. Keen, M. Larivee, R. Lethin, P. Nuth, S. Wills, P. Carrick and G. Fyler, `The J-machine: a fine-grain concurrent computer', in G. X. Ritter (ed.), Information Processing 89, Elsevier Science Publishers B.V., North Holland, IFIP, 1989.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Tolerating Latency with Dagger - Attila Gursoy Kale   (Correct)

....Dealing with this latency is therefore a major objective in parallel processing. On the hardware side, this is being addressed by designing architectures that reduce the latency to the minimum. The ALLCACHE architecture of the KSR 1 machine, and the message processor architecture of J Machine [2] are examples of these attempts as well as the continuous evolution of communication hardware in the traditional architectures of Intel and NCUBE machines. However, physical reality dictates that remote access will always be significantly slower than local access. Software techniques for ....

W.Dally, and et al. "The J-Machine: A Fine-Grain Concurrent Computer", In IFIP Congress, 1989.


Methods for Performance Evaluation of Wormhole-Switched Networks - Nilsen (1998)   (Correct)

....is largely due to the fact that the switching devices are relieved from buffer management. The price to pay is the implementation cost of flit level flow control [23] The WH switching principle origins from research on high performance interconnection networks for multiprocessor systems [7,31,36]. A list of commercial machines using WH switching is [74, 75, 100, 107, 113, 124, 146] Today, the same technique is finding its way also into LAN [12, 25, 43] and SAN [71, 154] applications. A particular form of the switching principle is standardized [73] as well, with corresponding components ....

DALLY, W., CHIEN, A., FISKE, S., HORWAT, W., KEEN, J., LARIVEE, M., LETHIN, R., NUTH, P., WILLS, S., CARRICK, P., AND FYLER, G. The JMachine: A fine-grain concurrent computer. In Proceedings of the IFIP 11th World Computer Congress, Information Processing 89 (1989), pp. 1147--1153.


Acknowledgments - Would Like To   (Correct)

....for interconnection. Some examples of existing or proposed machines that make use of direct networks are: Caltech Cosmic Cube [4] Caltech Mosaic [5] CMU Intel iWarp [6] 7] Connection Machine [8] HORIZON [9] Intel iPSC and Paragon; MIT Alewife [10] MIT J machine [11]; MuNet [12] Stanford DASH Multiprocessor [13] Thinking Machines CM2 [8] and . Cray T3E [14] PE PE SW PE PE SW SW SW PE PE PE PE (a) SW PE SW PE SW PE SW PE (b) Fig. 1.1. Network examples. a) Indirect network. b) Direct network. We will focus on multidimensional direct networks ....

W. J. Dally et at., "The J-Machine: A fine-grain concurrent computer," in Proc. IFIP Congress, 1989. -


A Fault Tolerant Routing Scheme For Hypercubes - Khaled Day   (Correct)

.... ALGORITHM TO THE k ARY n CUBE We now show how to adapt the fault tolerant routing strategy presented in this paper to the k ary n cube topology [8] 12] 2] which has been used in significant computers like the Cosmic Cube [14] the Connection Machine [9] the Ametek 2020 [15] the JMachine [6], the Mosaic [16] the iWarp [1] and the Cray T3D. The goal of this section is to illustrate the generality of the proposed fault tolerant routing scheme. 11 The k ary n cube Q n k has N = k n nodes each of the form x = x n 1 x n 2 . x 0 , where 0 x i k, for all 0 i n. Two nodes x = ....

W. J. Dally, A. Chien, S. Fiske, W. Horwat, J. Keen, M. Larivee, R. Lethin, P. Nuth, S. Wills, P. Carrick, and G. Fyler, "The J-Machine: A Fine-Grain Concurrent Computer," Information Processing 89, pp. 1,147-1,153, Elsevier Science Publishers B. V., 1989.


A Design Principle of Massively Parallel Distributed-Memory.. - Amamiya, Kawano (1994)   (Correct)

....performed in high speed due to RISC type execution mechanism in short cycle, and the inter thread concurrent execution also achieves high throughput due to the ultra multiprocessing mechanism inherited from Datarol I architecture. Similar architectures are Monsoon[13] T[12] EM 4[14] J Machine[6] and Epsilon 2[8] Monsoon is also designed to support multithread execution by introducing synchronizing join operator. Monsoon uses ordinary random access memory for join counter management and performs its join operation as one of general instructions. Join operation in Datarol II machines is ....

W. J. Dally, A. Chien, S. Fiske, W. Horwat, J. Keen, M. Larivee, R. Lethin, P. Nuth and S. Wills, "The J-Machine: A Fine-grain Concurrent Computer," Proc. 11th IFIP, pp.1147-1153, (1989).


A Development Methodology for Concurrent Programs - Bryan Chow   (Correct)

....to facilitate coordination between H Threads. The compiler will use the H Thread slots to exploit instruction level parallelism. In addition to these features, the M Machine provides support for hardware synchronization and message driven processes similar to that provided in the J Machine [1]. 3 Support for Sequential Programming Sequential programming serves to utilize the entire memory of the machine. Execution of a sequential program is achieved by providing an environment which appears, from the programmer s perspective, to be equivalent to a UNIX style workstation on the ....

Dally, W. J., et al., "The J-Machine: A Fine-grain Concurrent Computer," Information Processing 89, G. X. Ritter (ed.), Elsevier Science Publishers B.V., North Holland, IFIP, 1989.


A Video Controller and Distributed Frame Buffer for the J-Machine - Eric Mcdonald (1995)   (1 citation)  (Correct)

....NE43 610 June 7, 1993 We are currently in the test and assembly phase of a distributed, scalable video system for the J Machine. The J Machine is a fine grain concurrent computer comprised of up to 65,536 36 bit Message Driven Processors (MDPs) which communicate through a lowlatency network [1]. Our goal is to provide high bandwidth, multi buffered, high resolution video output capability for the J Machine. Furthermore, the video system is designed to be scalable so as to meet the varying demands of different J Machine configurations and users. The video system is comprised of two types ....

W. J. Dally et al., "The J-Machine: A fine-grain concurrent computer," in Proceedings of IFIP 89 Conference, 1989. VLSI memo 89-532.


Datarol: A Parallel Machine Architecture for Fine-Grain.. - Makoto Amamiya   (Correct)

....is performed in high speed due to RISC type execution mechanism in shortcycle, and the inter thread concurrent execution also achieves high throughput due to the ultra multiprocessing mechanism inherited from Datarol I architecture. Similar architectures are Monsoon[16] T[15] EM 4[17] J Machine[7] and Epsilon 2[9] Monsoon is also designed to support multithread execution by introducing synchronizing join operator. Monsoon uses ordinary random access memory for join counter management and performs its join operation as one of general instructions. Join operation in Datarol II machines is ....

W. J. Dally, A. Chien, S. Fiske, W. Horwat, J. Keen, M. Larivee, R. Lethin, P. Nuth and S. Wills, "The J-Machine: A Fine-grain Concurrent Computer," Proc. 11th IFIP, pp.1147-1153 (1989).


Designing Broadcasting Algorithms in the Postal Model for.. - Bar-Noy, Kipnis (1992)   (12 citations)  (Correct)

....graph between the processors. Such systems treat sending a message as a send and forget event and assume that passing a message between any pair of processors takes roughly the same time. This assumption is incorporated into several distributed memory parallel computers, such as MIT J machine [9], TMC CM 5 [12] and IBM Vulcan system [16] A similar approach was investigated for high speed communication networks [5, 15] and is incorporated into networks such as PARIS [4] and plaNET [7, 14] Similar message passing situations exist in real life. Consider, for example, communication between ....

W. J. Dally, A. Chien, S. Fiske, W. Horwat, J. Keen, M. Larivee, R. Lethin, P. Nuth, S. Wills, P. Carrick, and G. Fyler "The J-Machine: a fine-grain concurrent computer," Information Processing 89, Elsevier Science Publishers, IFIP, 1989, pp. 1147--1153.


On Optimal Placements of Processors in Fault-Tolerant .. - Blaum, Bruck.. (1997)   (3 citations)  (Correct)

....as partially populated networks, in the sense that only a subset of routing nodes are targets of message injection by processors. On the other hand, networks such as tori, meshes, and hypercubes have been designed and or built where the number of routing nodes is equal to the number of processors [7]. Hence, these networks have been used as fully populated networks, in the sense that every routing node in the topology is subjected to message injection. Fully populated tori and meshes exhibit a theoretical throughput which degrades as the network size increases. Note that the bisection ....

W. Dally, A. Chien, S. Fiske, W. Horwat, J. Keen, M. Larivee, R. Lethin, P. Nuth and S. Wills, "The J-Machine: A Fine Grain Concurrent Computer," IFIP Congress, 1989.


Coherent Network Interfaces for Fine-Grain Communication - Mukherjee, Falsafi, al. (1996)   (25 citations)  (Correct)

....low latency networks is rapidly making NIs a bottleneck. Rather than try to explore the entire NI design space here, we focus our efforts three ways: First, we concentrate on NIs that reside on memory or I O buses. In contrast, other research has examined placing NIs in processor registers [5,15,21], in the level one cache controller [1] and on the level two cache bus [10] Our NIs promise lower cost than the other alternatives, given the economics of current microprocessors and higher integration level we expect in the future. Nevertheless, closer integration is desirable if it can be made ....

....hardware device to send and receive network messages to and from many processes. Unfortunately, the operating system s overheads severely limit performance, especially for small messages. Many multicomputers reduce or eliminate this overhead by mapping the NI directly into the user s address space [1,15,29]. Thus the operating system normally need not get involved when messages are sent and received. However, user mapped NIs significantly complicate support for multiprogramming. Possible solutions range from disallowing multiprogramming [15] to taking special actions at context switch time (to ....

[Article contains additional citation context not shown here]

William J. Dally, Andrew Chien, Stuart Fiske, Waldemar Horwat, John Keen, Michael Larivee, Rich Nuth, Scott Wills, Paul Carrick, and Greg Flyer. The JMachine: A Fine-Grain Concurrent Computer. In G. X. Ritter, editor, Proc. Information Processing 89. Elsevier North-Holland, Inc., 1989.


Bandwidth-Optimal Complete Exchange on Wormhole-Routed.. - Tseng, Lin, Gupta, Panda (1997)   (6 citations)  (Correct)

....Columbus, OH 43210, U.S.A. ftlin,pandag cis.ohio state.edu] 3 Computer Science Department, Colorado State University, Ft. Collins, CO 80523 U.S.A. gupta cs.colostate.edu] x A preliminary version of this paper appeared in Int l Parallel Processing Symp. 1995 [22] Paragon, MIT J Machine [5], Tera HORIZON [20] Cray T3D [4, 16] Polymorphic Torus [11] Fujitsu AP 1000, and iWarp [3] A torus is a mesh with wrap around links. Although meshes and tori are generally regarded as close families, there are still some distinctions: i) As opposed to meshes, all nodes of a torus are ....

W. J. Dally, R. Davison, J. A. S. Fiske, G. Fyler, J. S. Keen, R. A. Lethin, M. Noakes, and P. R. Nuth. "The J-Machine: A Fine-grain Concurrent Computer". In Information Processing 89, IFIP, pages 1147--1153, 1989.


A Message-Driven Programming System for Fine-Grain Multicomputers - Daniel Maskit (1994)   (6 citations)  (Correct)

....Office of Navel Research under contract number N00014 91 J 1986. The first author is partially supported by an NSF Graduate Research Fellowship. Introduction This paper describes an experimental message driven programming system and its implementation on a 512 computer J machine. The J Machine [1] is an architectural experiment which focuses on the evaluation of hardware mechanisms, such as the integration of messages and processes, to support concurrent programming. The programming system carries the experience gained from our previous experiments [2, 3, 4] into a C based system, while ....

.... expressly to support efficient fine grain process execution [8] The J machine is a similar design, developed at MIT by the Concurrent VLSI Architecture Project, that supports fine grain processes but also provides on chip associative memory, and hardware support for process synchronization [1]. The programming system described here provides a low level platform that supports both irregular applications and development of highlevel systems on fine grain multicomputers. The idea of using a heap based implementation of a stack oriented language is not novel. It was earlier reported for ....

Dally, W. J., et al., "The J-Machine: A Fine-grain Concurrent Computer," Information Processing 89, G. X. Ritter (ed.), Elsevier Science Publishers B.V., North Holland, IFIP, 1989.


Logging and Recovery in a Highly Concurrent Database - Keen (1994)   Self-citation (Keen)   (Correct)

....is non volatile. Any information written to disk will remain on disk despite a system failure. Distributed Memory Multiprocessor System. The data structures and algorithms presented in this thesis are intended for a fine grain distributed memory multi processor system, such as the MIT J Machine [11, 12, 48], in which each processor can directly address only its own local memory and all interprocessor communication must occur via explicit message passing. Nevertheless, the techniques presented in this thesis could be adapted to a shared memory multiprocessor system with little effort. Disks are the ....

....typically only 10 to 20 Bytes in length. Low overhead interprocessor communication is therefore particularly important. For best performance, parallel XEL should be implemented on a fine grain concurrent computer that provides low overhead, low latency communication primitives. The MIT J Machine [11, 12, 48] is an existing example of such a machine. XEL will perform satisfactorily on other concurrent systems in which the overhead for interprocessor communication and synchronization is higher as long as the added delays are still relatively short compared to the delays for writing blocks to disk, the ....

William Dally, Andrew Chien, Stuart Fiske, Waldemar Horwat, John Keen, Michael Larivee, Rich Lethin, Peter Nuth, Scott Wills, Paul Carrick, and Greg Fyler. The JMachine: A Fine-Grain Concurrent Computer. In Proc IFIP 11th World Computer Congress, pages 1147 1153, San Francisco, California, August 1989.


Compressionless Routing: A Framework for Adaptive and.. - Kim, Liu, Chien (1996)   (22 citations)  Self-citation (Chien)   (Correct)

....the limitations of CR based routing algorithms. Section 8 discusses related work. Finally, Section 9 concludes the paper, summarizing the results. 2 Background High performance routing networks, the subject of significant study over the last ten years, are in widespread use in parallel machines [8, 9, 10, 11, 12]. All of these multicomputer systems use divect tetwor ks, meaning that the computing nodes are embedded in the network topology, and as a result, some nodes are closer than others. In addition to use in multicomputers, direct networks are gaining acceptance in shared memory machines such as the ....

....computers, only a few features for fault tolerance have been introduced in commercial multiprocessor routing networks. For exam pie, a number of machines include parity on each physical channel to detect errors, but can do little but kill the process or reboot the machine when an error occurs [8, 12, 11, 14, 13]. More aggressive machines support checksums or error correcting codes for each packet on each link [25, 26] In all of these machines, faulty channels require reconfiguration of the network and machine with loss of some working processors and network channels. Generally, data errors cannot be ....

W.J. Dally, A. Chien, S. Fiske, W. Horwat, J. Keen, M. Larivee, R. Lethin, P. Nuth, S. Wills, P. Carrick, and G. Fyler, "The J-Machine: A fine-grain concurrent computer," in Information Processing 39, Proceedings of the IFIP Congress, pp. 1147 1153, August 1989.


Experiences Implementing Dataflow On A General-Purpose.. - Ellen Spertus And (1991)   (2 citations)  Self-citation (Dally)   (Correct)

....IMPLEMENTING DATAFLOW ON A GENERAL PURPOSE PARALLEL COMPUTER 1 Ellen Spertus and William J. Dally Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, Massachusetts 02139 Abstract The MIT J Machine [3], a massively parallel computer, is an experiment in providing general purpose mechanisms for communication, synchronization, and naming that will support a wide variety of parallel models of computation. We have developed two experimental dataflow programming systems for the J Machine. For the ....

....recently, people have begun developing compilation techniques for executing dataflow programs on general purpose parallel machines [1] Our work is on these techniques. The J Machine The J Machine is a massively parallel MIMD computer based on the Message Driven Processor (MDP) a custom chip [3] [4] For this research, we used a simulator of a 32node J Machine. Each processor has 260K (4K on chip) of 32 bit words augmented with 4 bit tags. Types specified by tags include booleans, integers, symbols, pointers, and cfutures. A cfuture, used for synchronization, represents 1 The research ....

William J. Dally, et al., "The J-Machine: A Fine-Grain Concurrent Computer," Information Processing 89, Proceedings of the IFIP Congress, 1989.


Concert - Efficient Runtime Support for Concurrent.. - Karamcheti, Chien (1993)   (47 citations)  Self-citation (Chien)   (Correct)

....versions are listed in Figure 4. Since most data transfers are accompanied by a handler invocation, all versions integrate handler invocation with actual data transfer. This integration reduces the overhead from data reception to dynamic method dispatching [12] and has been used in the J machine [13] and the active message facility [14] The different data transfer versions reflect two basic performance distinctions: local versus remote transfers, and register versus memory access costs. Remote transfers cost more than corresponding local transfers due to routing and network interface ....

W. J. Dally, A. Chien, S. Fiske, W. Horwat, J. Keen, M. Larivee, R. Lethin, P. Nuth, S. Wills, P. Carrick, and G. Fyler, "The J-Machine: A fine-grain concurrent computer," in Information Processing89, Proceedingsof the IFIP Congress, pp. 1147--1153, Aug. 1989.


Logging and Recovery in a Highly Concurrent Database - Keen (1994)   Self-citation (Keen)   (Correct)

....is nonvolatile. Any information written to disk will remain on disk despite a system failure. Distributed Memory Multiprocessor System. The data structures and algorithms presented in this thesis are intended for a fine grain distributed memory multiprocessor system, such as the MIT J Machine [11, 12, 48], in which each processor can directly address only its own local memory and all interprocessor communication must occur via explicit message passing. Nevertheless, the techniques presented in this thesis could be adapted to a shared memory multiprocessor system with little effort. 24 Disks are ....

....typically only 10 to 20 Bytes in length. Low overhead interprocessor communication is therefore particularly important. For best performance, parallel XEL should be implemented on a fine grain concurrent computer that provides low overhead, low latency communication primitives. The MIT J Machine [11, 12, 48] is an existing example of such a machine. XEL will perform satisfactorily on other concurrent systems in which the overhead for interprocessor communication and synchronization is higher as long as the added delays are still relatively short compared to the delays for writing blocks to disk, the ....

William Dally, Andrew Chien, Stuart Fiske, Waldemar Horwat, John Keen, Michael Larivee, Rich Lethin, Peter Nuth, Scott Wills, Paul Carrick, and Greg Fyler. The JMachine: A Fine-Grain Concurrent Computer. In Proc IFIP 11th World Computer Congress, pages 1147--1153, San Francisco, California, August 1989. 179


Compressionless Routing: A Framework for Adaptive and.. - Kim, Liu, Chien (1997)   (22 citations)  Self-citation (Chien)   (Correct)

....the limitations of CR based routing algorithms. Section 8 discusses related work. Finally, Section 9 concludes the paper, summarizing the results. 2 Background High performance routing networks, the subject of significant study over the last ten years, are in widespread use in parallel machines [8, 9, 10, 11, 12]. All of these multicomputer systems use direct networks, meaning that the computing nodes are embedded in the network topology, and as a result, some nodes are closer than others. In addition to use in multicomputers, direct networks are gaining acceptance in shared memory machines such as the ....

....computers, only a few features for fault tolerance have been introduced in commercial multiprocessor routing networks. For exam4 ple, a number of machines include parity on each physical channel to detect errors, but can do little but kill the process or reboot the machine when an error occurs [8, 12, 11, 14, 13]. More aggressive machines support checksums or error correcting codes for each packet on each link [25, 26] In all of these machines, faulty channels require reconfiguration of the network and machine with loss of some working processors and network channels. Generally, data errors cannot be ....

W. J. Dally, A. Chien, S. Fiske, W. Horwat, J. Keen, M. Larivee, R. Lethin, P. Nuth, S. Wills, P. Carrick, and G. Fyler, "The J-Machine: A fine-grain concurrent computer," in Information Processing 89, Proceedings of the IFIP Congress, pp. 1147--1153, August 1989.


A Message-driven Programming System for Fine-grain Multicomputers - Maskit, Taylor (1994)   (6 citations)  (Correct)

No context found.

W. J. Dally, A. Chien, S. Fiske, W. Horwat, J. Keen, M. Larivee, R. Lethin, P. Nuth, S. Wills, P. Carrick and G. Fyler, `The J-machine: a fine-grain concurrent computer', in G. X. Ritter (ed.), Information Processing 89, Elsevier Science Publishers B.V., North Holland, IFIP, 1989.


An Efficient, Protected Message Interface - Lee, al. (1998)   (1 citation)  (Correct)

No context found.

W.J. Dally et al., "The J-Machine: A Fine-Grain Concurrent Computer," Proc. Information Processing 89, Elsevier Science, North Holland, 1989, pp. 1,147-1,153.


Dagger: Combining Benefits of Synchronous and.. - Attila Gursoy Laxmikant (1993)   (1 citation)  (Correct)

No context found.

W.Dally, and et al. "The J-Machine: A FineGrain Concurrent Computer", In IFIP Congress, 1989.


Global Illumination and Monte Carlo - Heirich (1997)   (Correct)

No context found.

Dally, W. J. et al. "The J-Machine: a Fine Grain Concurrent Computer. " Proc. IFIP Congress (1989), pp. 1147-1153.


Subtorii Allocation Strategies for Torus Connected Networks - Gupta, Srimani (1997)   (Correct)

No context found.

W. J. Dally, R. Davison, J. A. Stuart Fiske, G. Fyler, J. S. Keen, R. A. Lethin, M. Noakes, and P. R. Nuth. "The J-Machine: A Fine-grain Concurrent Computer". In Information Processing 89, IFIP, pages 1147--1153, 1989.


A Practical Processor Design for Multithreading - Amamiya, Kawano, Tomiyasu.. (1996)   (1 citation)  (Correct)

No context found.

W.J.Dally, A.Chien, S.Fiske, W.Horwat, J.Keen, M.Larivee, R.Lethin, P.Nuth and S.Wills, "The J-Machine: A Fine-grain Concurrent Computer," Proc. 11th IFIP, pp.1147-1153, 1989.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC