38 citations found. Retrieving documents...
M. Noakes, D. A. Wallach, and W. J. Dally, "The J-Machine Multicomputer: An Architectural Evaluation." Proc. 20th Ann. Int'l Symp. Computer Architecture, New York: ACM Press, 1993, pp. 224-235.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

High-Level Prototyping for the HTMT Petaflop Machine - Yerosheva (2001)   (Correct)

.... of machines, like IBM SP2[77] Widely used programming models such as data parallelism (CM 5[22] Split C[27] shared memory (or data passing) SGI Cray SHMEM[84] University of Oxford BSP[83] Linda[35] message passing (MPI[73] multithreading in a shared memory environment (J machine[70], Alewife[76] Tera[26] remote procedure calls (Solaris[75] active messages (J machine[70] nCube 2[78] CM 5[22] Intel Paragon[28] and others, provide the basis for design of new classes of execution and programming models on top of new massively parallel computer systems. 1.7 The thesis ....

.... (CM 5[22] Split C[27] shared memory (or data passing) SGI Cray SHMEM[84] University of Oxford BSP[83] Linda[35] message passing (MPI[73] multithreading in a shared memory environment (J machine[70] Alewife[76] Tera[26] remote procedure calls (Solaris[75] active messages (J machine[70], nCube 2[78] CM 5[22] Intel Paragon[28] and others, provide the basis for design of new classes of execution and programming models on top of new massively parallel computer systems. 1.7 The thesis organization This thesis is organized as follows. Chapter 2 provides an overview of the HTMT ....

[Article contains additional citation context not shown here]

M. Noakes, D. Wallach, W. Dally, "The J-Machine Multicomputer: An Architectural Evaluation," in Proc. of the 20th Intl. Symp. on Computer Architecture, May 1993.


Resource Deadlocks and Performance of Wormhole.. - Boppana, Chalasani.. (1998)   (3 citations)  (Correct)

....network. These are especially suited for wormhole switching in low dimension mesh and torus networks. Wormhole switching is a form of cut through switching in which blocked messages are not buffered [7] Many recent multicomputers and multiprocessors use this form of routing [1] 5] [19], 12] The previous studies on wormhole multicast communication addressed and solved the 1045 9219 98 10.00 1998 IEEE ################ .# R.V. Boppana is with the Division of Computer Science, University of Texas at San Antonio, San Antonio, TX 78249 0667. E mail: boppana cs.utsa.edu. # ....

# M.D. Noakes et al., "The J-Machine Multicomputer: An Architectural Evaluation," Proc. 20th Ann. Int'l Symp. Computer Architecture, pp. 224--235, May 1993.


Integrating Memory and Network Accesses : A Flexible.. - Yunn-Yen Chen Chung-Ta   (Correct)

....communication coprocessors, and faster networks cannot solve the whole problem. Very often the bottleneck is the processor network interface. Several This work is supported by National Science Council under grants NSC 83 0408 E 007 001 and NSC 83 0408 E 007 092. recent papers address this issue [1, 4, 7]. Unfortunately, they are derived primarily from an architecture s point of view. We argue that although the proposed mechanisms are essential for efficient processor network interface, some frequently used high level operations such as gather and scatter, collective communication, etc. ....

....support variable length messages. Thus, we believe the processor network interface should support both fixed and variable length messages efficiently under maximum flexibility. Another important operation usually found in numerically intensive programs is reduction. Since the methods proposed in [1, 4, 7] can support it efficiently, we will not discuss the operation here. In this paper, we propose a processor network interface design, which tightens the processor, network, and memory closely together, meets the requirements listed above, and supports efficient execution of applications. One goal ....

[Article contains additional citation context not shown here]

M. D. Noakes, D. A. Wallach, and W. J. Dally, "The J-Machine Multicomputer: An Architectural Evaluation" in The 20th ISCA, pp. 224 -- 236, May, 1993.


Efficient Collective Communication on Multidimensional Meshes with .. - Watts (1994)   (Correct)

.... the Intel Paragon and the Cray T3D, which utilize two dimensional non torus and threedimensional torus meshes, respectively [7,12] They are also appropriate for research machines such as the Intel Touchstone Delta (two dimensional non torus) and the J and M Machines (three dimensional non torus) [9,13,14,15,17]. Many of these algorithms and techniques also apply to hypercubes (which can be viewed as multidimensional meshes with two nodes in each dimension) although that architecture is not specifically addressed. 4 Target Collective Communications The following collective communications routines are ....

....the channels occupied by a blocked packet to be used by other packets. This improvement decouples channel and flit buffer resources by providing multiple buffers for each channel. Virtual channel flow control is already used in architectures such as the Cray T3D as well as the J and M Machines [7,9,17]. When this modification becomes widely adopted, the direct send method should perform quite well, with a running time of: For very long vectors, the direct send method is at least a factor of two better than the DR all toall scatter. 14 Group Communication The methods presented in this paper ....

M. Noake, D. Wallach and W. Dally. "The J-Machine Multicomputer: An Architectural Evaluation." Proceedings of the 20th International Symposium on Computer Architecture, May, 1993.


The MIT Alewife Machine - Agarwal, Bianchini, Chaiken, al (1991)   (2 citations)  (Correct)

....at its destination, it typically causes an interrupt. The CMMU overlaps message arrival with interrupt processing by posting the interrupt as soon as it has received the header of a message. Since the operating system reserves one of the four Sparcle hardware contexts for message processing (as in [26] and [29] no register saves or restores are necessary. The first 16 words of an incoming message are presented in the memory mapped input packet array. Consequently, an interrupt handler may either load words directly from this array via the instruction, or initiate a DMA sequence to store the ....

....data directly in another processor s cache. The KSR1 and DDM [13] provide a shared address space through cache only memory. These machines also allow prefetching. The Scalable Coherent Interface [6] also specifies mechanisms for implementing large shared address spaces. Both the J machine [26] and the CM 5 export hardware message passing interfaces directly to the user. These interfaces differ from the Alewife interface in several ways. First, in Alewife, messages are normally delivered via an interrupt and dispatched in software, while in the Jmachine, messages are queued and ....

M. Noakes, D. Wallach, and W. Dally, "The J-Machine multicomputer: An architectural evaluation," in Proc. 20th Annu. Int. Symp. Computer Architecture, May 1993, pp. 224--235.


Reducing Cost and Tolerating Defects in Page-based.. - Oskin, Keen..   (Correct)

....was proposed well before the current commodity thrust. The SWIM project [12] combined reconfigurable logic and memory to perform fast protocol computations. The J Machine integrated processor, memory, and network router in a single chip to form building blocks for a fine grained multiprocessor [13]. The RAW [14] MORPH [15] and RaPiD [16] projects continue to explore the use of reconfigurable technology to exploit parallelism. The RAW project, in particular, has also examined issues of processor width, dynamically trading off ILP and speculation [17] The HPAM project [18] takes a ....

M. Noakes, D. Wallach, and W. Dally, "The J-Machine multicomputer: An architectural evaluation," in Proceedings of the 20th Annual International Symposium on Computer Architecture 1993, (San Diego, CA), pp. 224--235, ACM, May 1993.


Resource Placements in 2D Tori - Almohammad, Bose (1998)   (Correct)

....in terms of the average message latency. 1. Introduction The family of torus graphs, such as mesh and k ary n cube, is becoming a popular topology for the interconnection networks of high performance parallel computers. Many practical systems, including Ametak 2010 [20] the MIT J Machine [14] (3D mesh) the Mosaic [19] and the Cray T3D T3E [18] 3D torus) have been built based on this network topology. In these multicomputer systems, there may be a limited number of resources, such as I O nodes and software packages, that each processor needs to access. These limited number of ....

M. Noakes, D. A. Walach, and W. J. Dally. "The JMachine Multicomputer: An Architectural Evaluation". In 20 t h International Symposium on Computer Architecture, pages 224--235, 1993.


Wormhole Routing Techniques for Directly Connected Multicomputer .. - Mohapatra (1998)   (9 citations)  (Correct)

....memory bandwidth, and processing capability of the system increases with the number of nodes. Examples of experimental and commercial systems based on direct interconnection network include Intel s iPSC, Touchstone Delta [37] and Paragon [38] Ncube 2 3 [53] Cray T3D [41, 64] MIT J Machine [56], and Stanford DASH [47] The nodes of a direct network based multicomputer communicate by passing messages through an interconnection network. Neighboring nodes send messages to one another directly while nodes that are not connected directly communicate with each other by passing messages ....

M. Noakes, D. A. Wallach and W. J. Dally, "The J-Machine Multicomputer: An Architectural Evaluation," Intl. Symposium on Computer Architecture, pp. 224-235, 1993.


Designing Large Hierarchical Multiprocessor Systems under.. - Debashis Basak (1994)   (4 citations)  (Correct)

....We treat this area as a square with a perimeter of 4 p a and measure all board sizes in terms of a. Given a board of size (ja) we can accommodate upto j processors on it. Over the years board sizes have grown in physical dimensions. We now have contemporary prototypes like the MIT J machine [4] which has 64 processors on a board. The technology at a given time determines the largest board size (ma) which can fit at most m processors. For illustrative purposes we consider m to be 64 in this paper. The maximum board size puts an upper limit on the largest cluster size c in the system ....

....of all board areas in the system remains (Na) 3. 2 Two pinout technologies Currently two different types of technologies are being employed by the computer industry: Surface pinouts: In this technology the surface of the board is utilized for external connections using elastomeric connectors [4]. Under this technology the pin count from a board of area A is O(A) Let us define parameter p1 to be the pin count that can be supported on a given board area a using surface pinout technology. This implies that a square board area of (ja) would have a surface pin count of (P = jp1) Peripheral ....

[Article contains additional citation context not shown here]

M. D. Noakes, D. A. Wallach, and W. J. Dally. The JMachine Multicomputer: An Architectural Evaluation. In Proc. of the ISCA, pp. 224--235, 1993.


Compiling C for the EARTH Multithreaded Architecture - Hendren, Tang, Zhu, Gao.. (1996)   (2 citations)  (Correct)

....high performance parallel computing. By supporting many threads of control and fast switching among threads, multithreaded architectures can tolerate inherent communication and synchronization latencies by switching to a new ready thread of control whenever a long latency operation is encountered [1, 2, 3, 4, 5]. As long as there is enough parallelism in an application, a multithreaded architecture can hide these latencies and effectively utilize available communication bandwidth. For a review of the principle of multithreading and some representative multithreaded architecture projects, readers are ....

....is given in Figure 6. In this case, the parameter root in the function incr tree can be declared as a local pointer since it incr tree is always invoked on the processor owning the node pointed to by root. We will return to this example when discussing function invocations in Section 2.4. int a[4], b[4] MAXP; replicated int c[4] int p; int local q; int r; int local s; int local local t; a) global declarations (b) local pointer declarations Figure 5: Examples of declarations 2.4 Remote and Basic Functions In order to support parallel execution of programs, it is ....

[Article contains additional citation context not shown here]

M. D. Noakes, D. A. Wallah, and W. J. Dally, "The JMachine multicomputer: An architectural evaluation," in Proc. of the 20th Ann. Intl. Symp. on Computer Architecture, (San Diego, Calif.), pp. 224--235, May 1993.


Polling Watchdog: Combining Polling and Interrupts for Efficient .. - Maquelin (1996)   (33 citations)  (Correct)

....found for the system to react to network events quickly and cost effectively. We believe that the multithreaded program execution model has great potential for future parallel systems due to its ability to tolerate the communication and synchronization latencies inherent in parallel architectures [1, 5, 7, 14, 17]. As long as there is enough parallelism in an application, a multithreaded architecture can hide these latencies by switching to another thread. However, this necessitates the smooth integration of asynchronous events into the computation [17] Since multithreaded systems face the same ....

....on our experiments we believe that this mechanism can provide a single solution that works well for most programs, independently of their communication behavior. 6 Related Work The projects most relevant to our paper are the Remote Queues communication model [2] TAM CM 5 [17, 20] the J Machine [14], Alewife [1] StarT NG [5] EM X [12] and RWC 1 [15] We believe that the asynchronous message reception mechanism proposed in this paper can also be applied to non multithreaded architectures. The Remote Queues communication model [2] displays some striking similarities with the Polling ....

Michael D. Noakes, Deborah A. Wallah, and William J. Dally, "The J-Machine Multicomputer: An Architectural Evaluation," in Proc. of the 20th Ann. Intl. Symp. on Computer Architecture, San Diego, Calif., pp. 224-- 235, May 1993.


A Case for Intelligent RAM: IRAM - Patterson, Anderson, Cardwell.. (1997)   (48 citations)  (Correct)

....only about onethird of the die (Table 3) the upcoming gigabit DRAM has enough capacity that whole programs and data sets can fit on a single chip. In the past, so little memory could fit onchip with the CPU that IRAMs were mainly considered as building blocks for multiprocessors [Bar78] Kog95][Noa93]. Third, DRAM dies have grown about 50 each generation; DRAMs are being made with more metal layers to accelerate the longer lines of these larger chips. Also, the high speed interface of synchronous DRAM will require fast transistors on the DRAM chip. These two DRAM trends should make logic on ....

Noakes, M.D.; Wallach, D.A.; Dally, W.J. "The J-Machine multicomputer: an architectural evaluation." 20th Annual International Symposium on Computer Architecture, San Diego, CA, USA, 16-19 May 1993, p. 224-35.


Performance Issues in the Design of Hierarchical-ring and Direct .. - Ravindran (1998)   (Correct)

....while path 2 is chosen by an adaptive routing algorithm. It is seen that adaptive routing can route around blocked nodes whenever possible thus improving the throughput of the network significantly. determined by the source and destination node addresses alone. The Intel Paragon [6] MIT J machine [23, 64], and Cray T3D [80] all use deterministic routing. Adaptive routing, on the other hand, exploits the fact that there is more than one path between any source and destination node pair (in a multi dimensional network) and bases its decision on which output link to forward a packet to on such ....

Noakes, Michael D, Deborah A, and W. J. Dally, "The J-machine multicomputer: An architectural evaluation," Proc. Intl. Symp. on Computer Architecture, 1993.


Image Processing PCI-based Shared Memory Architecture Design - Houzet, Fatni   (Correct)

....space entirely in hardware. DASH [16] is a cache coherent multiprocessor that uses prefetching. The KSR1 [4] provides a shared address space through cache only memory. The Scaleable Coherent Interface [17] also specifies mechanisms for implementing large, shared address space. Both the Jmachine [18] and CM 5 [19] export hardware messagepassing interface directly to the user. In Alewife [20] messages are delivered via an interrupt and dispatched in software, while in the J machine, messages are queued and dispatched in sequence by the hardware. The Jmachine does not provide an atomic message ....

M. Noakes, D. Wallach, and W. Dally (May 1993) The JMachine Multicomputer: An architectural Evaluation. In The 20th Int. Symp. on Computer Architecture, pp. 224-235.


A Message-Driven Programming System for Fine-Grain Multicomputers - Maskit (1994)   (6 citations)  (Correct)

....as delivering hardware message passing performance to application programs; providing an inexpensive code distribution mechanism; and generating high quality code. The numbers for the hardware performance are derived from the results reported by the MIT Concurrent VLSI Architecture Project in [21]. The results reported here are based on a simple producer consumer code. This code spawns a producer process on one computer which sends 100,000 messages, each of which creates a consumer process on another computer. This program appears in two forms: one way communication as shown in Figure 5.1 ....

Noakes, M., Wallach, D. and Dally, W., "The J-Machine Multicomputer: An Architectural Evaluation," Proceedings of the 20th ` International Symposium on Computer Architecture, May, 1993.


The Offset Cube: A Three-Dimensional Multicomputer Network.. - Stephen Lacy (1996)   (1 citation)  (Correct)

....This generally increases wire lengths and unnecessarily constrains the density and bandwidth of interboard connections. Backplanes also degrade system reliability by introducing additional levels in the packaging hierarchy, each with its own set of mechanical contacts. 3 The MIT J Machine [30] uses an improved 3D packaging technique in which printed circuit boards are stacked and interconnected with conducting elastomeric spacers. Distributing the vertical connectors over the area of each board provides high wiring flux across the system bisection (1420 electrical connections between ....

M. D. Noakes, D. A. Wallach, and W. J. Dally. "The J-Machine Multicomputer: An Architectural Evaluation," Proceedings of the 20th Annual International Symposium on Computer Architecture, pp. 224-235, 1993.


Hybrid Multithreaded Architecture with Symmetric Multiprocessors - Junghwan Kim   (Correct)

....Multithreaded architecture is characterized by hardware support for fast context switching and efficient synchronization. Multithreaded architectures are categorized into two groups according to their computational models: von Neumann model and hybrid model. HEP[2] Tera[3] MASA[4] J Machine[5] and Alewife[6] are based on the former model. Iannucci s machine[7] P RISC[8] T[9] TAM[10] and DAVRID[11, 12] are based on the latter model. Von Neumann multithreaded architectures keep conventional model with additional hardware support, however, parallelism is restricted due to limited ....

M. D. Noakes, D. A. Wallach, and W. J. Dally, "The J-Machine Multicomputer: An Architectural Evaluation," In Proc. 20th Int'l Sympo. on Computer Architecture, 1993.


Techniques for Low-Overhead and No-Overhead Communication.. - Brian Grayson   (Correct)

....the processor from loading the data structure before the lock has been acquired, while allowing nondependent instructions to execute. Delayed response device registers are different from traditional full empty bits (such as those used in dataflow computers[14] the fut and cfut of the J Machine[12], or the Cray T3E s E registers[16] in several ways. First of all, DRDRs are used to detect conditions, such as whether or not a message has arrived, rather than the validity of a particular word of memory. This separation of the condition from the region of memory allows more flexibility, even ....

....rank 1 updates, with the appropriate rowand column vectors being broadcast within each column and row as needed. 4.2 Architectural Evaluation Table 2 compares the communication overhead for the PINGN application under the various libraries. For comparison, the equivalent values for the J Machine[12] are also included. Library T s per message (in cycles) Tw per word (in cycles) MVP with polling 65 5 MVP with DRDRs 51 3 MVP with polling 20.5 4 MVP with DRDRs 23.5 1.9 J Machine 11 2 Table 2: One way Message Overhead for the PINGN Benchmark. T s is the sum of the overhead for sending and ....

Michael D. Noakes, Deborah A. Wallach, and William J. Dally. "The J-Machine Multicomputer: An Architectural Evaluation," In Twentieth International Symposium on Computer Architecture (ISCA 1993), 1993.


The J-Machine: A Retrospective - Dally, Chang, Chien, Fiske, Horwat.. (1998)   (2 citations)  Self-citation (Noakes Wallach Dally)   (Correct)

No context found.

M. Noakes, D. Wallach, and W. Dally, "The J-Machine Multicomputer: An Architectural Evaluation," ISCA-20, pp. 224-235, 1993.


Balancing Performance, Area, and Power in an On-Chip Network - Gold   (Correct)

No context found.

M. Noakes, D. A. Wallach, and W. J. Dally, "The J-Machine Multicomputer: An Architectural Evaluation." Proc. 20th Ann. Int'l Symp. Computer Architecture, New York: ACM Press, 1993, pp. 224-235.


A Lightweight Idempotent Messaging Protocol for Faulty - Brown (2002)   (1 citation)  (Correct)

No context found.

Michael D. Noakes, Deboarah A. Wallach, William J. Dally, "The J-Machine Multicomputer: An Architectural Evaluation ", Proc. ISCA 1993, pp. 224-235.


A Message-driven Programming System for Fine-grain Multicomputers - Maskit, Taylor (1994)   (6 citations)  (Correct)

No context found.

M. Noakes, D. Wallach and W. Dally, `The J-machine multicomputer: an architectural evaluation', Proc. 20th `International Symposium on Computer Architecture, May, 1993.


A Case for Intelligent RAM - Patterson, al. (1997)   (39 citations)  (Correct)

No context found.

M.D. Noakes, D.A. Wallach, and W.J. Dally, "The J-Machine Multicomputer: An Architectural Evaluation," Proc. 20th Ann. Int'l Symp. Computer Architecture, IEEE CS Press, 1993, pp. 224-235.


New Multilevel Parallelism Management for Multimedia.. - Verians, Legat, Macq..   (Correct)

No context found.

M. D. Noakes, D. A. Wallach, W. J. Dally, "The J-Machine Multicomputer: An Architectural Evaluation," Proc. of the 20 Ann.l Int.l Symp. on Computer Architecture, ACM, 1993, pp 224-235


Issues in the Design of Direct Multiprocessor Networks - Ravindran, Stumm (1997)   (Correct)

No context found.

Noakes, Michael D, Deborah A, and W. J. Dally, "The J-machine multicomputer: An architectural evaluation," Proc. Intl. Symp. on Computer Architecture, 1993.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC