75 citations found. Retrieving documents...
W. J. Dally, A. Chien, S. Fiske, W. Horwat, J. Keen, M. Larivee, R. Lethin, P. Nuth, S. Wills, P. Carrick, and G. Flyer. The J-Machine: a Fine-Grain Concurrent Computer. In Information Processing `89, 1989.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Information Hiding in Parallel Programs - Foster (1992)   (5 citations)  (Correct)

....has so far focused on concepts. We now examine how virtual topologies, virtual channels, lightweight processes, and port arrays can be used to develop parallel programs. Although some multicomputers and operating systems incorporate certain of these abstractions as primitive mechanisms [25, 11, 28], it will in general be necessary to provide compile time or run time support. There is much to be gained from standardizing this support so that it can be reused in many applications. It is also desirable to define interfaces that encourage or enforce correct usage. One viable approach is to ....

....cessors. An annotation L on a block location denotes invocation of location function L; it causes the block to execute on the virtual processor with index returned by L. Port Arrays. A port declaration creates a one dimensional distributed array of deftnitional variables. A declaration port P [11] ; creates a port array P with 11 elements, distributed blockwise across the nodes of the virtual topology in which the port array is declared. For example, a declaration port p [2 nodes ( creates a port array p with 10 2 nodes( elements; p [2.i] and p [2.i 1] are located on the ith node ....

Dally, W. J., et al., The J-Machine: A fine-grain concurrent computer, Information Processing 89, G. X. Ritter (ed.), Elsevier Science Publishers B.V., North Holland, IFIP, 1989.


A Compiler Approach to Scalable Concurrent Program Design - Foster, Taylor (1992)   (11 citations)  (Correct)

....Code . ode Networks iPSC80 J l J Uohin [ Uosio l Portable Emulator Figure 1: Compilation Strategy architecture provides high performance message handling and fine grain process schedul ing [36] The J machine also provides high performance variable and code manipulation hardware [15]. All of these features may be used to replace unique components of the emulator design, providing high performance, native code versions of the system. Implementations of this type are currently under construction. 1.5 Summary The important characteristics of this approach arc as follows. We ....

Dally, W. J., et al., The J-Machine: A fine-grain concurrent computer, Information Processing 89, G. X. Ritter (ed.), Elsevier Science Publishers B.V., North Holland, IFIP, 1989.


Fine-Grain Distributed Shared Memory on Clusters of Workstations - Schoinas (1997)   (3 citations)  (Correct)

....by suspending the computation and invoking a user level handler. A typical handler performs the actions dictated by a coherence protocol to allow the access and then resumes the computation. The fine grain access control mechanism is similar to full empty bits of dataflow architec tures [DCF 89] but it is tailored to support the implementation of shared memory protocols. For this reason, it extends the two state model of the full empty bits to a three state model that includes a readonly state. More specifically, Tempest s fine grain access control is based on tagged memory blocks. ....

....with low latencies than the former. Among the key proposals that emerged from the multicomputer community have been the Berkeley active messages. The design has been heavily influenced by earlier work in message directed computation in the context of dataflow architectures and the J machine [DCF 89, PC90] Berkeley active messages sought to reduce latencies by eliminating the soft ware complexity associated with traditional multicomputer messaging interfaces. Tempest s 52 messaging interface is based on the Berkeley active messages [vECGS92] It differs from Berkeley active messages in ....

[Article contains additional citation context not shown here]

William J. Dally, Andrew Chien, Stuart Fiske, Waldemar Horwat, John Keen, Michael Larivee, Rich Nuth, Scott Wills, Paul Carrick, and Greg Flyer. The j-machine: A fine-grain concurrent computer. In G. X. Ritter, editor, Proc. Information Processing 89. Elsevier North-Holland, Inc., 1989.


Planar-Adaptive Routing (par) :low-Cost Adaptive Networks For.. - Jae Kim Eng   (Correct)

....memory is distributed across the processing nodes; only the data in local memory can be accessed directly. Access to data in remote memories is supported by message passing between processors. Direct networks, represented by grid or mesh networks, have been used predominantly in multicomputers [47, 45, 23, 46]. However, as direct networks gain acceptance in shared memory multiprocessor designs, distinguishing these machines by network topology is less appropriate. Though indirect networks provide several advantages such as the topological equidistance property, they suffer from a significant drawback: ....

....and the MIT Alewife [1] The number of memory references to distant memory units can be dramatically reduced by exploiting locality of reference. Another approach is to hide or tolerate the latency by overlapping useful work with communication latency. Multicomputers such as the MIT J machine [23] use context switching to tolerate remote object access. The TERA machine [5] uses fine grain multithreading to hide the latency, and the Stanford DASH and MIT Alewife also use the multithreading to complement remote memory access due to cache misses. However, the techniques we have described ....

[Article contains additional citation context not shown here]

W. J. Dally, A. Chien, S. Fiske, W. Horwat, J. Keen, M. Larivee, R. Lethin, P. Nuth, S. Wills, P. Carrick, and G. Fyler. The J-Machine: A Fine-Grain Concurrent Computer. In Information Processing 89, Proceedings of the IFIP Congress, pages 1147-- 1153, August 1989.


L'insegnamento di Nuove Tecnologie di Programmazione: Alcune.. - Briot   (Correct)

....possiamo evidenziare i seguenti vantaggi : alto livello di espressione, modularit a, dinamicit a ed apertura. Anche se la OOCP rappresenta attualmente un nuovo campo in continua espansione, si possono gi a riscontrare decisivi in ussi sulle archittetture a pi u processori (come la J Machine [Dally et al. 89] e numerose applicazioni nei sistemi per l analisi dei segnali [Barry 89] controllo dei processi, sistemi per l automazione dell ucio, e per no nell animazione. 2 Insegnamento In questa sezione saranno discussi i vantaggi apportati da questa nuova metodologia di programmazione agli ....

W.J. Dally et al., \The J-Machine: a Fine-Grain Concurrent Computer", Proceedings of Information Processing Congress (IFIP'89), pages 1147-1153, August 1989.


Training in New Programming Technologies: an Experience - Briot (1992)   (Correct)

....of cooperative modules, and to execute them onto parallel computer architectures. Advantages may be summarized as following: high levelness, modularity, dynamicity, and openness. OOCP is a new growing eld, but has already main impact on new multi processor architectures (like the J Machine [Dally et al. 89] and various applications like signal processing [Barry 89] process control, oce information systems, animation. 2 Teaching In this section we will discuss the issue of introducing this new programming methodology to students and conventional programmers. This discussion is based on our ....

W.J. Dally et al., \The J-Machine: a Fine-Grain Concurrent Computer", Proceedings of Information Processing Congress (IFIP'89), pages 1147-1153, August 1989.


FUGU: Implementing Translation and Protection in a .. - Mackenzie.. (1994)   (9 citations)  (Correct)

....Hybrid Deposit [19] proposes hardware to interpret messages as operations on pre negotiated buffer areas. FUGU s approach is to add protection while maintaining existing, well defined user level communicationmechanisms and efficient, distributed shared memory. The J machine multicomputer [9] provides two levels of network priorities, user level access to the network hardware and the ability to relaunch incoming messages from memory transparently. The J machine is a single user machine with no support for shared memory or DMA on messages. The CM 5 multicomputer provides multiuser ....

William J. Dally et al. The J-Machine: A Fine-Grain Concurrent Computer. In Proceedings of the IFIP (International Federation for Information Processing), 11th World Congress, pages 1147--1153, New York, 1989. Elsevier Science Publishing.


Adaptive routing on the Recursive Diagonal Torus - Funahashi And Hanawa   (Correct)

....exerted to implement Massively Parallel Computers (MPCs) with tens of thousands nodes. In these systems, the connection topology often dominates the system performance. Instead of hypercube used in first generation multicomputers, most recent machines take the 2 D or 3 D mesh (torus) network[1][2][3] Although the diameter of a mesh network is large ( O( p M) or O( 3 p M) for M nodes) it only requires four or six links per node unlike the hypercube which requires log 2 M links per node. However, in an MPC with more than ten thousands nodes, the large diameter of the mesh network is ....

W. J. Dally A. Chien S. Fiske W. Horwat J. Kenn M. Larivee R. Lethin P. Nuth and S. Wills. The J-machine: A Fine-Grain Concurrent Computer. In IFIP 11th Computer Congress, pages 1147--1153, August 1989.


Bandwidth-Optimal Complete Exchange on Wormhole-Routed.. - Tseng, Lin, Gupta, Panda (1997)   (6 citations)  (Correct)

....Science, Duke University, Durham N.C. 27708, U.S.A, sandeep cs.duke.edu] x A preliminary version of this paper appeared in Int l Parallel Processing Symp. 1995 [19] 2 as hypercubes [1] Examples of machines with such topologies include the MasPar MP 1 [3] Intel Paragon, MIT J Machine [6], Tera HORIZON [17] Cray T3D [4, 13] and Polymorphic Torus [9] A torus is a mesh with wrap around links. Although meshes and tori are generally regarded as close families, there are still some distinctions: i) As opposed to meshes, all nodes of a torus are topologically symmetric, ii) a torus ....

W. J. Dally, et al. The J-Machine: A fine-grain concurrent computer. In Information Processing 89, IFIP, pages 1147--1153, 1989.


Synchronization and Pipeline Design for a Multithreaded Massively.. - Sakai (1992)   (2 citations)  (Correct)

....pipeline. The concept of multithreading is not exclusive to the extension of dataflow architectures. For instance, the Denelcor HEP [1] and the Tera Computing System [6] are multithreaded computers in the sense that they execute and control multiple threads in a single pipeline. Dally s J machine [16] does not interleave multiple threads, but it can switch between threads very quickly; thus, we can say that it actually supports the multithreaded computation. In addition, Dally s new machine, called the M machine, has a mechanism of thread interleaving [17] where many threads can exist inside a ....

Dally, W., Chien, A., Fiske, S., Horwat, W., Keen, J., Larivee, M., Lethin, R., Nuth, P. and Wills, S.: The J-Machine: A Fine-Grain Concurrent Computer, Proc. of IFIP 89, pp.1147-1153 (1989).


The M-Machine Multicomputer - Fillo, Keckler, Dally, Carter.. (1995)   (22 citations)  Self-citation (Dally)   (Correct)

....which are copied into the buffer and sent again later. Discussion: The M Machineprovides direct register to register communication, avoiding the overhead of memory copying at both the sender and the receiver, and eliminating the dedicated memory for message arrival, as is found on the J Machine [8]. Registermapped network interfaces have been used previously in the Mars Machine [2] J Machine, and iWarp [4] and have been described by T [26] as well as Henry and Joerg [15] However, none of these systems provide protection for user level messages. Systems, like the J Machine, that provide ....

DALLY, W. J., ET AL. The J-Machine: A fine-grain concurrent computer. In Proceedings of the IFIP Congress (Aug. 1989), G. Ritier, Ed., North-Holland, pp. 1147-1153.


Execution of Dataflow Programs on General-Purpose Hardware - Spertus (1992)   (1 citation)  Self-citation (William)   (Correct)

....1. Each level has its own set of registers, and priorities 0 and I have separate message queues. Background execution is interrupted by a priority 0 message, which in turn will be interrupted by any priority I messages. Several J Machines have been built, including one with 128 processors. See [9, 11] for a complete description of the MDP and the J Machine. 1.2 Previous Experiments in Executing Dataflow Programs on the J Machine 1.2.1 Dataflow Graphs Dataflow compilers convert programs into dataflow graphs, where the nodes of the graph represent operators, and the arcs represent ....

Dally, William J., et al. The J-Machine: A Fine-Grain Concurrent Computer. Informa- tion Processing 89, Proceedings of the IFIP Congress, 1989.


Planar-Adaptive Routing: Low-cost Adaptive Networks for.. - Chien, Kim (1992)   (136 citations)  Self-citation (Chien)   (Correct)

....routing [15] the ideas apply to virtual cut through [21] and store and forward networks as well. Overloaded Channels Figure 1: Four packets and their routing paths under deterministic, dimension order routing. 2 The Problem Most existing multicomputer routing networks use deterministic routing [32, 30, 13, 31]. Although there are numerous paths between any source and destination, in order to avoid deadlock, deterministic routing defines a single path from source to destination. Fixed, single path routing prevents effective use of the network s density of physical interconnection because the physical ....

....significantly reduces the amount of hardware required and should reduce the time to setup and drive data across the switches. Low connectivity requirements also make it possible to use organizations which allow the router performance to be further optimized for high speed, low latency performance [14, 13]. In planar adaptive routers, the routing function prevents deadlock, completely independent of the flow control. No routing decisions depend on the presence or absence of flits in particular network buffers. This allows routing and flow control decisions to be made separately, decoupling the ....

W. J. Dally, A. Chien, S. Fiske, W. Horwat, J. Keen, M. Larivee, R. Lethin, P. Nuth, S. Wills, P. Carrick, and G. Fyler. The J-Machine: A fine-grain concurrent computer. In Information Processing 89, Proceedings of the IFIP Congress, pages 1147--1153, August 1989.


The Cost of Adaptivity and Virtual Lanes in a Wormhole Router - Aoyama, Chien (1995)   (13 citations)  Self-citation (Chien)   (Correct)

....Router latency is approximately 50ns and channel data rates are as high as 90MB s, using byte wide links. Derivatives of MRCs are used in several research Cost of Adaptivity 24 machines [1, 20, 30] The J Machine Router The J Machine is a fine grained concurrent computer developed at MIT [16, 29]. The J machine network is a three dimensional mesh, with bidirectional 9 bit channels, and dimension order, wormhole routing. The J Machine network uses two virtual channels to support two logically independent message priorities and a globally synchronous clock. The data throughput is 36 MB s ....

W. J. Dally, A. Chien, S. Fiske, W. Horwat, J. Keen, M. Larivee, R. Lethin, P. Nuth, S. Wills, P. Carrick, and G. Fyler. The J-Machine: A fine-grain concurrent computer. In Information Processing 89, Proceedings of the IFIP Congress, pages 1147--1153, August 1989.


An Evaluation of Planar-Adaptive Routing (PAR) - Jae Kim Andrew (1992)   (4 citations)  Self-citation (Chien)   (Correct)

....been touted as scalable parallel architectures, in fact their scalability is limited by the performance of their interconnection networks. One reason why networks do not achieve their full potential bandwidth is restrictive routing policies. Most existing multicomputers use deterministic routing [13, 11, 6, 12] due to its simplicity. Any deter 1 The research described in this paper was supported in part by National Science Foundation grant CCR 9209336, Office of Naval Research grant N00014 92 J 1961, and National Aeronautics and Space Administration grant NAG 1 613. Additional support has been ....

William J. Dally, Andrew Chien, Stuart Fiske, Waldemar Horwat, John Keen, Michael Larivee, Rich Lethin, Peter Nuth, Scott Wills, Paul Carrick, and Greg Fyler. The J-Machine: A Fine-Grain Concurrent Computer. In Information Processing 89, Proceedings of the IFIP Congress, pages 1147--1153, August 1989.


Execution of Dataflow Programs on General-Purpose Hardware - Spertus (1992)   (1 citation)  Self-citation (William)   (Correct)

....1. Each level has its own set of registers, and priorities 0 and 1 have separate message queues. Background execution is interrupted by a priority 0 message, which in turn will be interrupted by any priority 1 messages. Several J Machines have been built, including one with 128 processors. See [9, 11] for a complete description of the MDP and the J Machine. 1.2 Previous Experiments in Executing Dataflow Programs on the J Machine 1.2.1 Dataflow Graphs Dataflow compilers convert programs into dataflow graphs, where the nodes of the graph represent operators, and the arcs represent ....

Dally, William J., et al. The J-Machine: A Fine-Grain Concurrent Computer. Information Processing 89, Proceedings of the IFIP Congress, 1989.


Using Attributed Flow Graph Parsing to Recognize Programs - Wills (1994)   (6 citations)  Self-citation (Wills)   (Correct)

....other existing recognition system is a 300 line database program recognized by CPU[12] All other systems work with toy programs on the order of tens of lines. We empirically and analytically studied the computational cost of GRASPR s parsing algorithm with respect to the simulator programs [4]. Since the algorithm is essentially constrained search, it is exponential in the worst case. However, in the practical application of graph parsing to recognizing complete instances of clich es, constraints are strong enough to prevent exponential behavior in practice. In particular, structural ....

W. Dally, A. Chien, S. Fiske, W. Horwat, J. Keene, M. Larivee, R. Lethin, P. Nuth, S. Wills, P. Carrick, and G. Fyler. The J-Machine: A fine-grain concurrent computer. In Int. Fed. of Info. Processing Societies, 1989.


Do Faster Routers Imply Faster Communication? - Karamcheti, Chien (1994)   (6 citations)  Self-citation (Chien)   (Correct)

....transfers, and is implemented using CMAM xfer function which splits up the transfer into a sequence of hardware packets at the source, and CMAM handle left xfer function which reassembles the packets at the destination. 1 While this is not the most efficient type of network interface [13, 8, 4], it has the significant virtue that no changes to the processor are required. Many researchers believe that this type of interface is basically representative of future network interfaces. 2 The CM 5 NI also supports an interrupt driven interface for reception; however, the cost is very high ....

....exploring what impact advanced network features (adaptive routing, virtual channels) have on network interface complexity and software overhead. Our work addresses some of these issues. Research on network interfaces has focused primarily on reducing message injection (and reception) overhead [13, 8, 19, 4] or offloading the communication onto a coprocessor [14, 16, 3] Such efforts are complementary to our goal of software protocol overhead reduction. Improvements in network interface can reduce the basic communication cost in our studies. While reducing the basic cost is important, as can be seen ....

W. J. Dally, A. Chien, S. Fiske, W. Horwat, J. Keen, M. Larivee, R. Lethin, P. Nuth, S. Wills, P. Carrick, and G. Fyler. The J-Machine: A fine-grain concurrent computer. In Information Processing 89, Proceedings of the IFIP Congress, pages 1147--1153, August 1989.


The Concert System -- Compiler and Runtime Support for.. - Andrew Chien Vijay (1993)   (11 citations)  Self-citation (Chien)   (Correct)

.... drawback of fine grained, concurrent object oriented languages to date has been their inefficiency (compared to their competitors such as parallel FORTRAN dialects) In addition, the most efficient implementations of such languages have relied on specialized hardware to achieve high performance [15, 36, 42]. The primary goal of the Concert project is to develop compiler and runtime techniques to make fine grained concurrent object oriented languages portable and efficient. By portable and efficient, we mean that the programs should run efficiently both on uniprocessors and on parallel computers ....

....basic thread scheduling, etc. is also discussed. Efficient concurrent object oriented language implementations must provide a global object namespace, communication services for remote method invocation, and support for scheduling method invocations. Though implementations on custom hardware [15, 36, 42] focus on providing a few general purpose primitives, runtime systems on stock hardware require a different approach. The hardware structure of such systems necessarily implies a hierarchy of costs for many basic runtime operations. These cost distinctions must be recognized and managed to obtain ....

W. J. Dally, A. Chien, S. Fiske, W. Horwat, J. Keen, M. Larivee, R. Lethin, P. Nuth, S. Wills, P. Carrick, and G. Fyler. The J-Machine: A fine-grain concurrent computer. In Information Processing 89, Proceedings of the IFIP Congress, pages 1147--1153, August 1989.


Emulation of a Virtual Shared Memory Architecture - Raina (1993)   (3 citations)  (Correct)

No context found.

W. J. Dally, A. Chien, S. Fiske, W. Horwat, J. Keen, M. Larivee, R. Lethin, P. Nuth, S. Wills, P. Carrick, and G. Flyer. The J-Machine: a Fine-Grain Concurrent Computer. In Information Processing `89, 1989.


Analyzing NIC Overheads in Network-Intensive Workloads - Binkert, Hsu, Saidi.. (2005)   (Correct)

No context found.

William J. Dally et al. The J-Machine: A fine-grain concurrent computer. In G. X. Ritter, editor, Information Processing 89, pages 1147--1153. Elsevier North-Holland, Inc., 1989.


The Performance Potential of an Integrated Network.. - Binkert, Dreslinski.. (2004)   (Correct)

No context found.

W. J. Dally et al. The J-Machine: A fine-grain concurrent computer. In G. X. Ritter, editor, Information Processing 89, pages 1147--1153. Elsevier North-Holland, Inc., 1989.


Analyzing NIC Overheads in Network-Intensive Workloads - Binkert, Hsu, Saidi.. (2004)   (Correct)

No context found.

William J. Dally et al. The J-Machine: A fine-grain concurrent computer. In G. X. Ritter, editor, Information Processing 89, pages 1147--1153. Elsevier North-Holland, Inc., 1989.


Distributed Paging for General Networks - Awerbuch, Bartal, Fiat (1996)   (36 citations)  (Correct)

No context found.

William J. Dally et al. The J-Machine: A fine-grain concurrent computer. In G.X. Ritter, editor, Proceedings of the IFIP Congress, pages 1147--1153. North-Holland, August 1989. 33


Issues In Software Support For Parallel I/O - Bordawekar (1993)   (Correct)

No context found.

W.J. Dally, A. Chien, S. Fiske, W. Howart, J. Keen, and M. Larivee. The J Machine: A Fine Grain concurrent Computer. Information Processing 89, Proceedings of the IFIP Conference, pages 1147--1153, August 1986.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC