69 citations found. Retrieving documents...
Leiserson, C. E. et al. (1992) The network architecture of the connection machine CM-5. In Proc. 4th Ann. ACM Symp. on Parallel Algorithms and Architectures, San Diego, CA, June 29--July 1, pp. 272--285. IEEE Computer Society Press, Los Alamitos, CA.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Theory and Practice in Parallel Job Scheduling - Feitelson, Rudolph.. (1994)   (60 citations)  (Correct)

....evidence exists. Theory suggests that preemption be used to ensure good response times for small jobs [64] especially since workloads have a high variability in computational requirements [21] This comes close on the heels of actual systems that implement gang scheduling for just this reason [46,32,27,20]. Actually two metrics may be used to gauge the responsiveness of a system: the actual response time (or turnaround time, i.e. the time from submittal to termination) or the slowdown (the ratio of the response time on a loaded system to the response time on a dedicated system) Using actual ....

C. E. Leiserson, Z. S. Abuhamdeh, D. C. Douglas, C. R. Feynman, M. N. Ganmukhi, J. V. Hill, W. D. Hillis, B. C. Kuszmaul, M. A. St. Pierre, D. S. Wells, M. C. Wong-Chan, S-W. Yang, and R. Zak, "The network architecture of the Connection Machine CM-5". J. Parallel & Distributed Comput. 33(2), pp. 145--158, Mar 1996.


Compressionless Routing: A Framework for Adaptive and.. - Kim, Liu, Chien (1996)   (22 citations)  (Correct)

....result, some nodes are closer than others. In addition to use in multicomputers, direct networks are gaining acceptance in shared memory machines such as the MIT Alewife [13] Stanford DASH [14] and Tera ComputeFs TERA machine [10] Some recent parallel machines such as the Thinking Machines CM5 [15], Meiko CS 2 [16] and Kendall Square Research KSR1 [17] use itzdir ect networks in which computing nodes are separated from networks. In contrast to previous multistage interconnection networks, their hierarchical topologies allow the exploitation of locality. Though these networks are also ....

C. Leiserson, Z. Abuhamdeh, D. Douglas, C. Feynman, M. Ganmukhi, J. Hill, W. Hillis, B. Kuszmaul, M. Pierre, D. Wells, M. Wong, S. Yang, and R. Zak, "The network architecture of the Connection Machine CM-5," in Proceedings of the 5'ymposium on Parallel Algorithms and Architectures, 1992. Available from ftp://crans. thJ. nk. corn/doe/Papers/net. ps. Z.


Image Feature Extraction on Connection Machine CM-5 - Viktor Prasanna And (1994)   (1 citation)  (Correct)

....and a diagnostic network. The data network provides point to point data communication between any two PNs. Communication can be performed concurrently between pairs of PNs and in both directions. The data network is a 4 ary fat tree [8] The bandwidth continues to scale linearly up to 16; 384 PNs [9]. The control network provides cooperative operations, including broadcast, synchronization, and scans (parallel prefix and suffix) The control network is a complete binary tree with all the PNs as leaves. For our analysis, we will model the CM 5 as a set of high performance SISD machines ....

C. Leiserson et.al., "The Network Architecture of the Connection Machine CM-5", Technical Report, Thinking Machines Corporation, 1992.


User-Level Communication in a System with Gang Scheduling - Etsion, Feitelson (2001)   (1 citation)  (Correct)

....stored on the NIC, and discarding the packet if it does not fit. It is assumed that higher level software (e.g. MPI or TCP) will handle the retransmission 8 needed to compensate for such lost packets. Flushing the network as part of a context switch was pioneered by the CM 5 Connection Machine [9]. This implementation has the distinction of flushing messages that are in transit, and storing them on any node in the partition. When the job is re scheduled, these messages are re injected into the network to complete their trip. Flushing is also used in the SCore D cluster, which uses the PM ....

C. E. Leiserson et al., "The network architecture of the Connection Machine CM-5 ". J. Parallel & Distributed Comput. 33(2), pp. 145--158, Mar 1996.


Network Performance under Physical Constraints - Petrini, Vanneschi (1997)   (Correct)

....Introduction Fat trees and low dimensional cubes are emerging standards in the design of interconnection networks for parallel machines. Fat trees have been adopted by many research prototypes and commercial machines [1] The data network of the Connection Machine CM 5 uses two distinct fattrees [2] and is composed of routing chips that have either two or four parent connections. The Data Diffusion Machine (DDM) is a virtual shared memory architecture that implements a hierarchical COMA cache coherence protocol in the internal switches of a fat tree [3] The communication chip Elite is the ....

C. E. Leiserson et al., "The Network Architecture of the Connection Machine CM-5," in Proceedings of the 4th Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 272--285, June 1992.


Efficient Personalized Communication on Wormhole Networks - Petrini, Vanneschi (1997)   (Correct)

....a given switch in a full fat tree may have more choices in the routing decision than in a corresponding network with fixed arity switches. Fat trees have been adopted by many research prototypes and commercial machines. The data network of the Connection Machine CM 5 uses two distinct fattrees [26]. The network is composed of routing chips that have either 2 or 4 parent connections. The hierarchical nature of the fat tree is exploited to partition the CM 5 in dedicated subnetworks whose communication traffics do not interfere between them. The Data Diffusion Machine (DDM) is a virtual ....

C. E. Leiserson et al., "The Network Architecture of the Connection Machine CM-5," in Proceedings of the 4th Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 272--285, June 1992.


Evaluation of Design Choices for Gang Scheduling using.. - Feitelson, Rudolph (1996)   (16 citations)  (Correct)

.... In the early 80s, John Ousterhout proposed the thesis that related threads should be scheduled to execute together [32] This idea, now known as gang scheduling, is becoming increasingly popular, and can be found in various forms on commercial machines such as the CM 5 from Thinking Machines [29], the Intel Paragon [25] the SGI multiprocessors running IRIX [1] the Meiko CS 2 [14] the Alliant FX 8 [41] and the MasPar and DAP SIMD arrays. Gang scheduling has also been used in a production system on a BBN Butterfly at LLNL [20] which is now being ported to a new Cray T3D machine, and ....

....gradually as load increases, and fosters support for interactive response times. And the fact that interacting processes are guaranteed to execute simultaneously allows them to access hardware communication devices in user mode, without the overheads associated with operating system protection [29, 39, 23]. A distributed hierarchical control (DHC) scheme for supporting gang scheduling has been proposed previously [16] DHC defines a control structure over the parallel machine and combines time slicing with a buddy system partitioning scheme. Given the DHC framework, this paper investigates several ....

C. E. Leiserson, Z. S. Abuhamdeh, D. C. Douglas, C. R. Feynman, M. N. Ganmukhi, J. V. Hill, W. D. Hillis, B. C. Kuszmaul, M. A. St. Pierre, D. S. Wells, M. C. Wong, S-W. Yang, and R. Zak, "The network architecture of the Connection Machine CM-5 ". In 4th Symp. Parallel Algorithms & Architectures, pp. 272--285, Jun 1992.


Leaf Communications in Trees and Fat Trees - Vassilios Dimakopoulos Nikitas   (Correct)

....while multinode broadcasting arises naturally in iterative algorithms [1] Scattering and gathering are considered dual operations. An algorithm for one of the problems can be transformed to an algorithm for the other by simply reversing the data paths. With the advent of Thinking Machines CM 5 [7], fat tree networks have received increased attention. Fat trees are hierarchical networks built around complete trees but have processing nodes only at the leaves. The capacities of the branches in the fat tree may not remain constant in all levels of the tree, but rather increase in an ....

C. E. Leiserson et al, "The network architecture of the Connection Machine CM-5," in Proc. 4th ACM Symp. Parall. Algor. Arch., June 1992, pp. 272--285.


Optimal Software Multicast in Wormhole-Routed Multistage Networks - Xu, Gui, Ni (1997)   (15 citations)  (Correct)

.... one port communication architecture . This assumption, which is consistent with many existing multistage network systems, implies that the local processor must transmit (receive) message in sequential. Commercial SPCs using multistage cube networks with turnaround routing include the TMC CM 5 [10], Meiko CS 2 [11] and IBM SP 1 [12] In the CM 5, the first two level stages use 4 Theta 2 switches, yielding a dual port communication architecture. Communication latency latency consists of three component values: start up latency , network latency , and blocking time [13] The start up ....

C. E. Leiserson et al., "The network architecture of the Connection Machine CM-5," in Proceedings of the ACM Symposium on Parallel Algorithms and Architectures, (San Diego, CA.), pp. 272--285, Association for Computing Machinery, 1992.


Parallel Sparse Triangular Solution with Partitioned.. - Chong, Schreiber (1994)   (2 citations)  (Correct)

....processors, but is otherwise identical to a CM5. The sequential performance of the Cypress on our sparse floating point computation is roughly one MFLOP. The sequential performance of the Viking is roughly a factor of two better than the Cypress. Both machines have a fat tree based network [10] that provides fairly uniform communications delays and sustainable bandwidth. It also has a control network for global operations. To keep our study architecturally general, we do not use the vector units, nor do we use the control network. We executed our DAGs in a data driven fashion using ....

C. E. Leiserson et al., "The network architecture of the connection machine CM-5," in Symposium on Parallel Architectures and Algorithms, (San Diego, California), pp. 272--285, ACM, June 1992.


Communication-Efficient and Memory-Bounded External.. - Lee, Ranka, Shankar   (Correct)

....send a message from one processor to another is modeled as m, where m is the size of the message. For our complexity analysis we assume that and are constant, independent of the link congestion and distance between two nodes. With new techniques such as wormhole routing and randomized routing [DaS87, KRG94, Lei92, NiM93], the distance between communicating processors seems to be less of a determining factor on the amount of time needed to complete the communication. Further, the effect of link contention (due to several messages traversing common links along their routes) is limited due to the presence of virtual ....

C. Leiserson, et al.,"The Network Architecture of the Connection Machine CM-5," Proc. 4th Annual ACM Symposium on Parallel Algorithms and Architectures, 1992.


Low Level Vision Processing on Connection Machine CM-5 - Viktor Prasanna Ashfaq (1993)   (2 citations)  (Correct)

....network. The data network provides high performance pointto point data communications between the components of the system. The control network provides cooperative operations, including broadcast, synchronization, and scans (parallel prefix and suffix) Additional details can be found in [10]. For our analysis, we will model the machine as a set of high performance SISD machines interacting through the data and control networks. In one unit of time, a PN can read or send a packet (fixed number of bytes) of information from the data network or perform an arithmetic logic operation on ....

....the number of processing nodes. The details are as follows: Data Network The data network is a 4 ary fat tree [9] The network is composed of router chips. Each router chip has 4 child connections and either 2 or 4 parent connections. The bandwidth continues to scale linearly up to 16; 384 PNs [10]. To route a message from one processor to another, the message is sent up the tree to the least common ancestor of the two processors, and then down to the destination node, so requiring no bandwidth higher in the tree. We assume that in one unit of time, a node in the tree can send a packet of ....

[Article contains additional citation context not shown here]

C. E. Leiserson et.al., "The Network Architecture of the Connection Machine CM-5", Technical Report, Thinking Machines Corporation, 1992.


Design and Performance Evaluation of. . . - Oi (2000)   (Correct)

....the design and the choice of interconnection networks has a significant impact on the performance of a DSM multiprocessor. The interconnection networks used for multiprocessors include the hierarchical bus (Data Diffusion Machine [29] the 2 D Mesh (Stanford FLASH [35] the fattree (TMC CM 5 [39]) and the multistage interconnection network (MIN) NYU 8 Ultracomputer [21] The ring networks have the advantages of (i) fixed node degree (modular expandability) ii) simple network interface structure (fast operation speed) and (iii) low wiring complexity (fast transmission speed) On the ....

....process than others. Therefore, it is beneficial to place such data objects closer to the processor than other processors for reducing the average access latency, which is impossible for the NYU Ultracomputer. 13 A fat tree interconnection network was used in Thinking Machine Corporations CM 5 [39]. An advantage of the tree network is to be able to localize the network traffic between nodes within a subtree. However, a link in a higher level in the tree hierarchy is shared by many nodes, and hence its bandwidth can be a performance bottleneck. In a fat tree network, the higher the level in ....

C. Leiserson et al., "Network architecture of the Connection Machine CM-5," in Proceedings of 4th Annual ACM Symposium on Parallel Algorithms and Architectures, 272--285, June--July 1992. 108


Job Scheduling in Multiprogrammed Parallel Systems - Feitelson (1997)   (16 citations)  (Correct)

....Butterfly [503] The NYU Ultracomputer [244, 243] the IBM RP3 [458] PASM [535, 534] TRAC [83, 372, chap. 7] and Cedar (where clusters are connected to the network rather than individual PEs) 222, 319] In distributed memory machines, all ports are connected to PEs. Examples include the CM 5 [356], the Meiko CS 2, and the IBM SP2 [553] Multistage networks are generally considered to be a realistic alternative to crossbar switches, which have unit delay but n 2 components. They are arranged as log n stages of 2 Theta 2 switching elements (hence the logarithmic delay) with n=2 such ....

....machines that use partitionable multistage networks. The Connection Machine CM 5 is based on a network that is logically seen as a fat tree (a tree in which links near the root are thicker and provide more bandwidth so as to prevent congestion) but actually implemented as a multistage network [356]. Unlike PASM or TRAC, the CM 5 is a distributed memory machine. The machine is partitioned among competing jobs by partitioning the network. Each partition includes 2 k PEs for some k 5 (i.e. the minimal partition size is 32) with the adjoining k=2 stages of the network 3 . Thus there is ....

[Article contains additional citation context not shown here]

C. E. Leiserson, Z. S. Abuhamdeh, D. C. Douglas, C. R. Feynman, M. N. Ganmukhi, J. V. Hill, W. D. Hillis, B. C. Kuszmaul, M. A. St. Pierre, D. S. Wells, M. C. Wong-Chan, S-W. Yang, and R. Zak, "The network architecture of the Connection Machine CM-5 ". J. Parallel & Distributed Comput. 33(2), pp. 145--158, Mar 1996. 146


Job Scheduling in Multiprogrammed Parallel Systems - Feitelson (1997)   (16 citations)  (Correct)

....(BBN Butterfly) independent PEs with local queues within partitions 4.1 [352] CM 2 from Thinking Machines partitioning into quadrants, possible gang scheduling 3.2.1, 5.3 [582, 299] CM 5 from Thinking Machines gang scheduling within partitions created by buddy system 5.3, 7.2. 4 [355] Concentrix (Alliant FX 8) gang scheduling across whole machine 5.3 [567] Convex C2, C4 hardware self scheduling 3.5 [578, sect. 3.4] Cosmic Cube local queues 4.1 [513] Cray microtasking self scheduling from a global queue 4.2 [220] Cray T3D partitions of power of two PEs 3.2 [307] Cray ....

C. E. Leiserson, Z. S. Abuhamdeh, D. C. Douglas, C. R. Feynman, M. N. Ganmukhi, J. V. Hill, W. D. Hillis, B. C. Kuszmaul, M. A. St. Pierre, D. S. Wells, M. C. Wong, S-W. Yang, and R. Zak, "The network architecture of the Connection Machine CM-5 ". In 4th Symp. Parallel Algorithms & Architectures, pp. 272--285, Jun 1992.


Decoupled Pre-Fetching for Distributed Shared Memory - Watson, Rawsthorne (1995)   (4 citations)  (Correct)

....A parallel machine would comprise a collection of such units interconnected by a high speed network. Although the details of network structure are important to a physical implementation, we do not wish to detail them here. Network structures such as that developed for the Thinking Machines CM5 [16] would certainly be applicable. The techniques described are specifically concerned with the problems associated with pre fetching. A complete system based on these ideas would, of course, need to address issues of distributed memory coherence and consider how the pre fetching scheme would ....

C.E. Leiserson et al. "The Network Architecture of the Connection Machine CM5", Proceedings of the Fifth ACM Symposium on Parallel Algorithms and Architectures, July 1992.


Performance Evaluation of Switch-Based Wormhole Networks - Lionel Ni Fellow (1995)   (8 citations)  (Correct)

....networks, which are not shown in the figure. Fig. 6. An eight node bidirectional butterfly MIN. The concept of bidirectional switches was studied in [25] There are many commercial SPCs using BMINs with wormhole switching and turnaround routing including the TMC CM 5 [26] Meiko CS 2 (k = 4) [27], and IBM SP 1 2 (k = 4) 28] 29] In the CM 5, the first two level stages use 4 2 switches, yielding a dual port communication architecture. Although BMINs have been used in many commercial machines, to the best of our knowledge, the routing property of BMINs has not been formally described. ....

C.E. Leiserson et al., "The Network Architecture of the Connection Machine CM-5," Proc. ACM Symp. Parallel Algorithms and Architectures, pp. 272-285, San Diego, 1992.


Job Scheduling in Multiprogrammed Parallel Systems - Feitelson (1997)   (16 citations)  (Correct)

....machines that use partitionable multistage networks. The Connection Machine CM 5 is based on a network that is logically seen as a fat tree (a tree in which links near the root are thicker and provide more bandwidth so as to prevent congestion) but actually implemented as a multistage network [214]. Unlike PASM or TRAC, the CM 5 is a distributed memory machine. The machine is partitioned among competing jobs by partitioning the network. Each partition includes 2 k PEs for some k 5 (i.e. the minimal partition size is 32) with the adjoining k=2 stages of the network 2 . Thus there is ....

....eliminates any need for privileged support, and opens the door to efficient user level communication. Security is provided by mapping the communication devices into user space, and using existing hardware protection mechanisms. This approach is used in the the K2 [331] the Connection Machine CM 5 [214], and the RWC 1 [168] 5.3 Gang Scheduling within Predefined Partitions The simple approach is to first partition the machine into sets of disjoint PEs, and then perform gang scheduling within each partition independently of the others. Actually, parti44 tioning is not strictly necessary, as it ....

[Article contains additional citation context not shown here]

C. E. Leiserson, Z. S. Abuhamdeh, D. C. Douglas, C. R. Feynman, M. N. Ganmukhi, J. V. Hill, W. D. Hillis, B. C. Kuszmaul, M. A. St. Pierre, D. S. Wells, M. C. Wong-Chan, S-W. Yang, and R. Zak, "The network architecture of the Connection Machine CM-5 ". J. Parallel & Distributed Comput. 33(2), pp. 145--158, Mar 1996.


Scheduling Computationally Intensive Data Parallel Programs - Raghu Subramanian Systems   (Correct)

....called the front end 4 As usual, 1K = 2 10 , 1M = 2 20 , and 1G = 2 30 . 6 PE; all the remaining PEs are called back end PEs. The various PEs are connected by three different networks: the router, the tree and the HiPPI (which stands for High Performance Parallel Interface) Tree network[15] helps executing the prefix, broadcast and reduce statements; for each of these statements, typically it takes roughly 250ns. 5 The Router network [15] helps the execution of communication statements; each takes roughly 5s. Finally, the HiPPI network[10] helps the execution of I O statements; ....

....are connected by three different networks: the router, the tree and the HiPPI (which stands for High Performance Parallel Interface) Tree network[15] helps executing the prefix, broadcast and reduce statements; for each of these statements, typically it takes roughly 250ns. 5 The Router network [15] helps the execution of communication statements; each takes roughly 5s. Finally, the HiPPI network[10] helps the execution of I O statements; each takes typically roughly 25ms. The above three networks are responsible for executing all statements except elementwise and indirect memory access ....

[Article contains additional citation context not shown here]

C. Leiserson et al. "The Network Architecture of the Connection Machine CM- 5". In Symposium on Parallel Architectures and Algorithms, 1992.


Parallel I/O Systems and Interfaces for Parallel Computers - Feitelson, Corbett, Hsu.. (1995)   (1 citation)  (Correct)

....elect to have dedicated I O nodes act as an internal shared I O server. These I O nodes are used for storage of persistent data, i.e. data that is supposed to outlive any single instance of an application s execution. Examples include the Connection Machine CM 5 from Thinking Machines Corp. [65, 40], the nCUBE hypercube [26] the Intel iPSC hypercubes [56] and Paragon mesh, the Meiko Computing Surface CS 2, and the IBM Scalable POWERparallel system SP2. Even the MasPar SIMD array processor has an internal parallel I O system. While this is based on a large dedicated memory buffer that ....

....other messages in the network, thus degrading application performance. Whether or not this happens depends on the network design (Fig. 3) For example, the CM 5 data network is designed so that each application executes in a separate partition of compute nodes, with a dedicated part of the network [40]. The I O nodes also form a separate partition. In addition, interpartition traffic (such as I O traffic from an application partition to the I O partition) uses another part of the network, that does not belong to any partition. Therefore I O traffic does not have any effect on jobs that are not ....

C. E. Leiserson, Z. S. Abuhamdeh, D. C. Douglas, C. R. Feynman, M. N. Ganmukhi, J. V. Hill, W. D. Hillis, B. C. Kuszmaul, M. A. St. Pierre, D. S. Wells, M. C. Wong, S-W. Yang, and R. Zak, "The network architecture of the Connection Machine CM-5 ". In 4th Symp. Parallel Algorithms & Architectures, pp. 272--285, Jun 1992.


Fault-Tolerant Hierarchical Networks for Shared Memory .. - Mahmud, Samaratunga.. (2002)   (Correct)

No context found.

Leiserson, C. E. et al. (1992) The network architecture of the connection machine CM-5. In Proc. 4th Ann. ACM Symp. on Parallel Algorithms and Architectures, San Diego, CA, June 29--July 1, pp. 272--285. IEEE Computer Society Press, Los Alamitos, CA.


An Efficient, Protected Message Interface - Lee, al. (1998)   (1 citation)  (Correct)

No context found.

C. Leiserson et al., "The Network Architecture of the Connection Machine CM-5," Proc. Symp. Parallel Algorithms and Architectures, ACM Press, New York, 1992, pp. 272-285.


Performance Analysis of Wormhole Routed k-ary n-trees - Petrini, Vanneschi (1998)   (Correct)

No context found.

C. E. Leiserson et al, "The Network Architecture of the Connection Machine CM5, " In Proceedings of the 4th Annual ACM Symposium on Parallel Algorithms and Architectures, pages 272--285, June 1992. 20


Multiprocessor Runtime Support for Fine-Grained, Irregular.. - Frederic Chong Shamik (1994)   (11 citations)  (Correct)

No context found.

C. E. Leiserson et al., "The network architecture of the connection machine CM-5," in SPAA, (San Diego, California), pp. 272--285, ACM, June 1992.


Two Virtual Memory Mapped Network Interface Designs - Blumrich, Dubnicki.. (1994)   (5 citations)  (Correct)

No context found.

C. Leiserson, Z. Abuhamdeh, D. Douglas, C. Feynman, M. Ganmukhi, J. Hill, D. Hillis, B. Kuszmaul, M. S. Pierre, D. Wells, M. Wong, S. Yang, and R. Zak, "The network architecture of the connection machine CM-5," in Proceedings of 4th ACM Symposium on Parallel Algorithms and Architectures, pp. 272--285, June 1992.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC