80 citations found. Retrieving documents...
M. D. Noakes, D. A. Wallach, and W. J. Dally. The J-Machine multicomputer: An architectural evaluation. In Computer Architecture News, pp. 224--235, May 1993.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Mate: A Tiny Virtual Machine for Sensor Networks - Levis, Culler (2002)   (33 citations)  (Correct)

.... Figure 1: Mate Component Breakdown 0 1 2 3 Subroutines Clock Send Receive Events gets sets Code Operand Stack Return Mat PC Mate Context Figure 2: Mate Architecture and Execution Model: Capsules, Contexts, and Stacks has strong similarities to Active Messages [33] and the JMachine [26]. There are, of course, important di#erences for example, instead of reliably routing to processors, it routes through an unreliable multihop wireless network. The tiny amount of RAM also forces motes to have a constrained storage model Mate cannot bu#er messages and tasks freely as the ....

Michael Noakes, Deborah Wallach, and William J. Dally. The J-Machine Multicomputer: An Architectural Evaluation. In Proceedings of the 20th International Symposium on Computer Architecture, 1993.


Sparsely Faceted Arrays: A Mechanism Supporting Parallel.. - Brown (2002)   (2 citations)  (Correct)

....the shmem documentation. In fact, shared memory operations may only be carried out on symmetric data objects i.e. there are no provisions at the library level for allocating referencing scalar objects on independent nodes. One notable departure from the conventional approach is the J machine [51] parallel computer. In the J machine, all references to objects, distributed or otherwise, are indirected through a segment table on each node; this style of addressing is similar to that used by early capability [14] architectures [39] Using indirection tables allows the J machine to provide ....

....than there are nodes in the machine, the constituents are placed in such a manner as to provide an even distribution over the entire machine. The J machine suffers from the problem, common to early capability systems, that indirecting every memory access through a segment table is inefficient; [51] reports that in practice, an unacceptably large percentage of program time is spent engaged in translation. The M machine [15] multicomputer, a successor to the J machine, provides direct addressing. It supports a coarse grained mechanism for distributing resources over variable regions of the ....

Michael D. Noakes, Deborah A. Wallach, and William J. Dally. The J-machine multicomputer: An architectural evaluation. In Proceedings of the 20th Annual International Symposium on Computer Architecture, pages 224--235, 1993.


ADAM: A Decentralized Parallel Computer Architecture Featuring.. - Huang (2002)   (Correct)

....within more conventional architectures. TAM [CSS 91] also referred to as Active Threads in [WGQH98] and its follow on, Active Messages [vCGS92] proposes an efficient mechanism for interprocessor communication using continuations. It significantly differentiates itself from the J Machine [NWD93] Monsoon and T [PBB93] all message driven machines, by the fact that Active Messages is a purely softwareapproach to achieving high performance. vCGS92] claims that pure message driven hardware implementations are crippled by the limited number of registers available per hardware context, ....

....dispatch. Scheduler The use of fine grained multithreading to hide latency has been seen before in the Tera MTA 95] HEP [Smi82a] M Machine [FKD 95] and T [PBB93] among others. The scheduling algorithm implemented in the simulator for this work is a derivative of that used in [NWD93] and takes after the general scheduling algorithm described in the introduction to this section. Threads are divided into two pools, a runnable pool and a stalled pool. The runnable pool is executed in a round robin fashion with a thread pre emption timeout to guarantee some fairness. Threads ....

M.D. Noakes, D.A. Wallach, and W.J. Dally. The J-Machine multicomputer: An architectural evaluation. In Proceedings of the 20 Annual Symposium on Computer Architecture, pages 224--235, 1993.


Sparsely Faceted Arrays: A Mechanism Supporting Parallel.. - Brown (2002)   (2 citations)  (Correct)

....shmem documentation. In fact, shared memory operations may only be carried out on symmetric data objects i.e. there are no provisions at the library level for allocating referencing scalar objects on independent nodes. 17 One notable departure from the conventional approach is the J machine [51] parallel computer. In the J machine, all references to objects, distributed or otherwise, are indirected through a segment table on each node; this style of addressing is similar to that used by early capability [ 14] architectures [39] Using indirection tables allows the J machine to provide ....

....than there are nodes in the machine, the constituents are placed in such a manner as to provide an even distribution over the entire machine. The J machine suffers from the problem, common to early capability systems, that indirecting every memory access through a segment table is inefficient; [51] reports that in practice, an unacceptably large percentage of program time is spent engaged in translation. The M machine [15] multicomputer, a successor to the J machine, provides direct addressing. It supports a coarse grained mechanism for distributing resources over variable regions of the ....

Michael D. Noakes, Deborah A. Wallach, and William J. Dally. The J-machine multicomputer: An architectural evaluation. In Proceedings of the 20th Annual International Symposium on Computer Architecture, pages 224-235, 1993.


A Microserver View of HTMT: New Benchmarks and.. - Yerosheva, Kuntz.. (2000)   (Correct)

....the HTMT architecture or execution model. For example, the Tera machine [8] has similar underlying architecture and supports multithreading at each processing node, but it does not put memory and processor on the same chip which completely changes its data and process organization. The J machine [6], 7] supports active messages and multithreading in a distributed environment and provides support for complex programming and execution models but does not consider multi level memory hierarchies in its architecture. The Beowulf [11] computer system distinguishes among di erent types of ....

M. Noakes, D. Wallach, J. Dally, \The J-machine Multicomputer: An Architectural Evaluation," MIT, IEEE, 1993.


A Microserver View of HTMT - Yerosheva, Kuntz, Brockman, Kogge (2001)   (Correct)

....and multithreading give additional levels of complexity. Many systems have similar features to the HTMT architecture or execution model. The Tera machine [8] has a similar underlying architecture and supports multithreading, but does not put the memory and processor on the same chip. The J machine [19], 9] has active messages (vs. parcels in HTMT) and multithreading in a distributed environment but does not consider the memory hierarchy. The Beowulf [20] supports multithreading, groups different types of processing nodes into clusters to ease load balancing, and has two networks ....

M. Noakes, D. Wallach, and J. Dally. The J-machine multicomputer: An architectural evaluation. 1993.


Simulation of the Clustered Torus - Wong (1997)   (Correct)

.... [Pea77] and direct binary n cubes [Sei84] Particularly popular networks are currently the mesh and the torus network topologies which have been used in a wide variety of experimental and commercial parallel processing systems, such as the Stanford Dash ( LLG 92] the MIT J Machine ( ND92] NWD93] the MIT M Machine ( FKD 95] and the Cray T3D ( KS93] They have the following advantages over some other: a regular structure, which allows simple routing, and an efficient handling of local traffic. These two topologies have two drawbacks, namely: a large diameter and a large average ....

M.D. Noakes, D.A. Wallach, and W.J. Dally. The j-machine multicomputer: An architectural evaluation. In Proceedings of the 20th International Symposium on Computer Architecture, May 1993.


Efficient Multicast on Irregular Switch-based Networks with.. - Kesavan, Panda   (Correct)

....system) we will consider multicast for the remainder of this paper. However, it must be noted that all the developed algorithms and theories in this paper apply to broadcast as well. Current generation parallel systems like IBM SP2 [41] Intel Paragon [16] Cray T3E [35] nCube 3 [12] J Machine [28], and Stanford FLASH use the cut through switching technique due to its inherent advantages like low latency communication and reduced communication hardware overhead [27] These systems provide very small buffer space at each hop, which results in links getting held up by blocked worms. Also, ....

M. D. Noakes, D. A. Wallach, and W. J. Dally. The J-Machine Multicomputer: An Architectural Evaluation. In International Symposium on Computer Architecture, pages 224--236, 1993.


A Partitioning-Independent Paradigm for Nested Data.. - Engelhardt, Wendelborn (1995)   (5 citations)  (Correct)

....they incorporate) may be crucial to the responsibility p i being permitted to pass the barrier. The simplest (and potentially least expensive) interleaving scheme is non preemptive multi threading, which has been proposed in recent literature in both the parallel hardware community [2, 15] and the parallel software community [6] Such schemes have the benefit of lightweight context switching as well as providing significant opportunities for latency hiding. In the context of our model this means that data parallel operations whose elemental functionality includes an unbounded ....

M. Noakes, D. Wallach, and W. Dally. The J-Machine multicomputer: An architectural evaluation. Computer Architecture News, 21(2):224--235, May 1993.


An In-Depth Analysis of the Communication Costs of.. - Eric Schwabe Valerie (1995)   (Correct)

....Section 6. The analyses presented in Sections 4 and 5 are in terms of such regular arrays of elements, where N is the total number of elements in the array. Hence for 4 Mapping Figure 1: Decomposition of a regular two dimensional FEM domain. Machine ff (secs) fi (secs byte) ff=fi MIT J Machine [13] 0.9 0.04 22.50 IBM SP 1 [19] 3 0.0258 116.28 Workstations on Ethernet 1000 4 250.00 Intel Delta 100 0.2 500.00 Thinking Machines CM 5 [15] 88 0.126 698.41 Intel Paragon 269 0.11 2445.45 Table 1: Communication parameters for some parallel machines. two dimensional domains, we consider p N ....

M. D. Noakes, D. Wallach, and W. J. Dally. The J-Machine Multicomputer: An Architectural Evaluation. Proceedings of the IEEE 20th International Symposium on Computer Architecture (1993).


Designing Clustered Multiprocessor Systems under Packaging.. - Debashis Basak And (1996)   (6 citations)  (Correct)

....be arbitrarily large in size. The physical size of a board is restricted by electrical, mechanical, and board fabrication constraints. In terms of physical dimensions board sizes being used in recent multiprocessor systems vary from 6 Theta4 to an aggressive 26 Theta21 used in the J machine [17]. The largest board size available to a system designer usually varies with technology and over a period of time. In this study we therefore do not present guidelines restricted to a particular largest board size. The available maximum board size is treated as a parameter in the framework. ....

....of length p b units, can support a total of P b = p p p b pins. b) Surface pinout: This is representative of a more aggressive pinout technology. The surface of the board is utilized for external connections. Representative examples are electronic interconnections using elastomeric connectors [17] and optical interconnections [16, 24] Let p s denote the surface pinout density, the pin count that can be supported from a board of unit capacity. Assuming a linear relation between board area and surface pinout, the pin count supportable from a board of capacity b is P b = bp s . Figure 5 ....

[Article contains additional citation context not shown here]

M. D. Noakes, D. A. Wallach, and W. J. Dally. The J-Machine Multicomputer: An Architectural Evaluation. In Proc. of the Int. Symposium on Computer Architecture, pages 224--235, 1993.


Automatically Partitioning Threads Based on Remote Paths - Tang, Gao (1998)   (Correct)

....performance of high end computing systems. By supporting many threads of control and fast switching among threads, multithreaded architectures can tolerate inherent communication and synchronization latencies by switching to the next ready thread whenever a long latency operation is encountered [3, 23, 11, 2, 8, 18, 16, 26, 37, 24]. As long as there is enough parallelism in an application, a multithreaded architecture can hide these latencies and utilize available communication bandwidth effectively. Since the amount of parallelism exposed to the fine grain multithreaded architectures may be enormous, it is a demanding task ....

M. D. Noakes, D. A. Wallah, and W. J. Dally. The J-Machine multicomputer: An architectural evaluation. In Proc. of ISCA-20, pages 224--235, San Diego, Calif., May 1993.


A Refinement of the HTMT Program Execution Model - Gao, Amaral, al. (1998)   (1 citation)  (Correct)

....in the cryogenic area. In related studies the Delaware group has explored ways to achieve high levels of parallelism at the instruction level without incurring a great penalty in the real estate required for control flow and synchronization mechanisms in the hardware implementation of the machine [14, 15]. The Superstrand Architecture introduces the notion of a strand as a block of instructions 1 grouped together to become a scheduling quantum of execution. The first experiments with this architecture indicate that programs can be efficiently partitioned into strands to be executed under a ....

Michael D. Noakes, Deborah A. Wallah, and William J. Dally. The J-Machine multicomputer: An architectural evaluation. In Proceedings of the 20th Annual International Symposium on Computer Architecture [2], pages 224--235. Computer Architecture News, 21(2), May 1993.


Logging and Recovery in a Highly Concurrent Database - Keen (1994)   (Correct)

....is nonvolatile. Any information written to disk will remain on disk despite a system failure. Distributed Memory Multiprocessor System. The data structures and algorithms presented in this thesis are intended for a fine grain distributed memory multiprocessor system, such as the MIT J Machine [11, 12, 48], in which each processor can directly address only its own local memory and all interprocessor communication must occur via explicit message passing. Nevertheless, the techniques presented in this thesis could be adapted to a shared memory multiprocessor system with little effort. 24 Disks are ....

....typically only 10 to 20 Bytes in length. Low overhead interprocessor communication is therefore particularly important. For best performance, parallel XEL should be implemented on a fine grain concurrent computer that provides low overhead, low latency communication primitives. The MIT J Machine [11, 12, 48] is an existing example of such a machine. XEL will perform satisfactorily on other concurrent systems in which the overhead for interprocessor communication and synchronization is higher as long as the added delays are still relatively short compared to the delays for writing blocks to disk, the ....

Michael D. Noakes, Deborah A. Wallach, and William J. Dally. The J-Machine Multicomputer: An Architectural Evaluation. In Proceedings of the 20th International Symposium on Computer Architecture, pages 224--235, San Diego, California, May 1993. IEEE.


Active Page Architectures for Media Processing - Justin Hensley Mark (1999)   (1 citation)  (Correct)

....was proposed well before the current commodity thrust. The SWIM project [ACK94] combined reconfigurable logic and memory to perform fast protocol computations. The J Machine integrated processor, memory, and network router in a single chip to form building blocks for a fine grained multiprocessor [NWD93] The RAW [L 98] MORPH [CG96] and RaPiD [E 97] projects continue to explore the use of reconfigurable technology to exploit parallelism. The HPAM [MEFT96] and FlexRAM [KHY 99] projects take a hierarchical approach to intelligent memory, adding a processor on chip to perform ....

M. Noakes, D. Wallach, and W. Dally. The J-Machine multicomputer: An architectural evaluation. In Proceedings of the 20th Annual International Symposium on Computer Architecture 1993, pages 224--235, San Diego, CA, May 1993. ACM.


Randomized, Oblivious, Minimal Routing Algorithms for.. - Nesson (1995)   (Correct)

....a single address space) but the underlying physical architecture is based on message passing. Many academic research projects have developed distributed memory multicomputers. These include systems from Caltech (Cosmic Cube [112] Mosaic [113] CMU (iWarp [15] MIT (Alewife [2] J Machine [93]) and Stanford University (DASH [76] FLASH [69] among others. Commercially, some of the major vendors include Cray Research (T3D [94] IBM (SP 1 [120] SP 2 [118] Intel (iPSC 860, Paragon XP S [51] Kendall Square Research (KSR 1 [108] nCUBE (nCUBE 1, nCUBE 2 [86] and Thinking ....

....multiple nodes. Recently, wormhole routing has generated a great deal of interest in both the research and commercial arenas, because it pipelines messages and requires little buffer space at the routing nodes [92] Wormhole routing has been used in the Caltech Mosaic [113] and the MIT J Machine [93]. It has also been used in the Cray T3D [94] and in Myrinet [10] a new LAN technology. Figure 2.2 illustrates the main difference between store and forward routing and the other two models: pipelining. Table 2.1 summarizes some of the advantages and disadvantages of each model. No model is ....

[Article contains additional citation context not shown here]

M. Noakes, D. Wallach, and W. Dally, The J--Machine Multicomputer: An Architectural Evaluation, in Proc. of the 20th International Symp. on Computer Architecture, IEEE, May 1993. WWW URL is http://cag-www.lcs.mit.edu.


Design Choices in the SHRIMP System: An Empirical Study - Blumrich, Alpert, Chen.. (1998)   (12 citations)  (Correct)

....based on a working 16 node SHRIMP system. In that sense, this paper can be categorized along with previous design evaluations of research machines such as the DASH multiprocessor [32] the Illinois Cedar machine [31] the MIT Alewife multiprocessor [1] and the J machine multicomputer [37]. SHRIMP has leveraged commodity components to a much greater degree than J machine, Cedar, Alewife or even DASH, thus this paper focuses primarily on evaluating its custom hardware support for communication. In terms of networking fabric, the Intel Paragon backplane used in SHRIMP is admittedly ....

.... also supports user level message passing, but places more burden on application programs by requiring them to construct their own message headers [15] Some previous machines have worked to streamline the hardware software interface by mapping network interface FIFOs into processor registers [14, 24, 37]. Such approaches go against SHRIMP s goal of using commodity CPUs. A slightly less integrated approach mapping FIFOs to memory rather than registers was employed in the CM 5 [42] CM 5 implementation restrictions limited the degree of multiprogramming, however, and applications were still ....

Michael D. Noakes, Deborah A. Wallach, and William J. Dally. The J-Machine Multicomputer: An Architectural Evaluation". In Proceedings of the 20th Annual Symposium on Computer Architecture, pages 224--235, May 1993.


The M-Machine Multicomputer - Fillo, Keckler, Dally, Carter.. (1995)   (22 citations)  Self-citation (Dally)   (Correct)

....prepending the destination and DIP to the message body and injects in into the network. Two message priorities are provided: user messages are sent at priority zero, while priority 1 is reserved for system level message reply, thus avoiding deadlock. Message Address Translation: As described in [25], the explicit management of processor identifiers by application programs is cumbersome and slow. To eliminate this overhead, the MAP im plements a Global Translation Lookaside Buffer (GTLB) backed by a software Global Destination Table (GDT) to hold mappings of virtual address regions to node ....

NOAKES, M.D., WALLACH, D. A., AND DALLY, W. J. The J-Machine multicomputer: An architectural evaluation. In Proceedings of the 20th International Symposium on Computer Architecture (San Diego, California, May 1993), IEEE, pp. 224-235.


Lazy Threads: Implementing a Fast Parallel Call - Goldstein (1996)   (8 citations)  (Correct)

No context found.

M. D. Noakes, D. A. Wallach, and W. J. Dally. The J-Machine multicomputer: An architectural evaluation. In Computer Architecture News, pp. 224--235, May 1993.


A Low Cost, Multithreaded Processing-in-Memory System - Brockman, Thoziyoor, Kuntz, .. (2004)   (1 citation)  (Correct)

No context found.

M. D. Noakes, D. A. Wallach, and W. J. Dally. The J-machine multicomputer: An architectural evaluation. In Lubomir Bic, editor, Proceedings of the 20th Annual International Symposium on Computer Architecture, pages 224--236, San Diego, CA, May 1993.


Simulation of the Clustered Torus - Wong   (Correct)

No context found.

M.D. Noakes, D.A. Wallach, and W.J. Dally. The j-machine multicomputer: An architectural evaluation. In Proceedings of the 20th International Symposium on Computer Architecture, May 1993.


Evaluation of the Raw Microprocessor: An.. - Taylor, Lee.. (2004)   (6 citations)  (Correct)

No context found.

M. Noakes, et al. The J-Machine Multicomputer: An Architectural Evaluation. 1993 ISCA, pp. 224--235.


Can Scatter Communication Benefit from Multidestination.. - Banikazemi, Panda   (Correct)

No context found.

M. D. Noakes, D. A. Wallach, and W. J. Dally. The J-Machine Multicomputer: An Architectural Evaluation. In International Symposium on Computer Architecture, pages 224--236, 1993.


Coping with Very High Latencies in Petaflop Computer.. - Ryan, Amaral, Gao.. (1998)   (Correct)

No context found.

Michael D. Noakes, Deborah A. Wallah, and William J. Dally. The J-Machine multicomputer: An architectural evaluation. In Proceedings of the 20th Annual International Symposium on Computer Architecture [2], pages 224--235. Computer Architecture News, 21(2), May 1993.


Randomized Algorithms on the Mesh - Narayanan (1998)   (1 citation)  (Correct)

No context found.

M. Noakes, D. Wallach, and W. Dally. The J-machine multicomputer: an architectural evaluation. In International Symposium on Computer Architecture, 1993.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC