178 citations found. Retrieving documents...
Thinking Machines Corporation, The Connection Machine CM-5 Technical Summary (1993).

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Unknown - Thomasian And Bay   (Correct)

.... precision of the prediction include dynamic application factors, such as data dependent computation, and dynamic system effects, such as the effect of the memory hierarchy (see discussion) we used as the test beds are the KSR 1 [1] which supports shared memory programming model, and the CM 5 [2], which supports both message passing and data parallel programming models. The problems we used as test seeds are Gauss elimination (GE) all pairs shortest path (APSP) and a large electromagnetic simulation (EM) application [7] 3.1. Architectural Characteristics 3.1.1. The Shared Memory ....

....subpage (the basic data transfer unit in the KSR 1) A processor waits for an empty slot to transmit a message. A single bit in the header of the slot identifies it as empty or full as the slots rotate through a ring interface of the processor. 3.1.2. The Connection Machine CM 5. The CM 5 [2] is the newest member of the Thinking Machines Connection Machine family. It is a distributed memory multiprocessor system which can be scaled up to 16K processors and supports both SIMD and MIMD programming models. Each CM 5 node consists of a SPARC processor operating at either 32 or 40 MHz, 32 ....

Thinking Machines Corporation. The Connection Machine CM-5 Technical Summary. 1993.


MORPH: A System Architecture for Robust High Performance Using .. - Chien, Gupta (1996)   (5 citations)  (Correct)

.... for low latency communication by adapting the number of memory elements associated with each processing element (optimal PE granularity) configuring the physical I O resource to match the applications needs (local memory hierarchy, global network) and by adding special hardware structures [19, 60] such as fast barrier or broadcast support for machine subsets or the entire machine, to optimize performance. For example, experience over the last ten years demonstrates that intraprocessor communication mechanisms (data shared through the cache) are much more efficient than even the best ....

Thinking Machines Corporation. The Connection Machine CM-5 Technical Summary. 245 First Street, Cambridge, MA 02154-1264, October 1991.


Scalable Data Parallel Algorithms for Texture.. - Bader.. (1993)   (5 citations)  (Correct)

....language C . In the SPMD model, each processing node executes a portion of the same program, but local memory and machine state can vary across the processors. The SPMD model efficiently simulates the data parallel SIMD model normally associated with massively parallel programming. References [40] and [34] provide an overview for the CM 5, and both [43] and [45] contain detailed descriptions of the data parallel platform. Note that a CM 5 machine with vector units has four vector units per node, and the analysis given here will remain the same. See Figure 2 for the general organization of ....

Thinking Machines Corporation, Cambridge, MA. The Connection Machine CM-5 Technical Summary, January 1992. 34


A Virtual Memory Model for Parallel Supercomputers - Reis, Scherson (1996)   (Correct)

....2. Background To place virtual memory in context, we need a clear definition of the physical machine, the programming model, the execution model and the operating system. A MIMD machine, with distributed physical memory and constant delay interconnection network is assumed in this study [17]. The I O subsystem is of primary importance to virtual memory, and we assume a subsystem similar to the Cray T3D: a few I O gateways that serve all processors [11] The processors may logically view one big common file or many independent smaller files. Although several parallel programming ....

Thinking Machines Corporation, Cambridge, MA. The Connection Machine CM-5 Technical Summary, October 1991.


Allocation and Scheduling for a Computational Grid - Lewis (2001)   (Correct)

....requirements or the availability of a new allocation mechanism or algorithm. Several examples of allocation tools will be discussed in this Section and the possibility of incorporation into the Mini Grid architecture will be examined. 2.2. 1 Connection Machine The Connection Machine, or CM5 [4] and [13] was rst released by Thinking Machines in October, 1991. It tried to combine the positive aspects of both the MIMD and SIMD machines. The CM5 supports the full data parallel model by providing high performance for branching and synchronization alike [4] The CM 5 operating system, ....

....The Connection Machine, or CM5 [4] and [13] was rst released by Thinking Machines in October, 1991. It tried to combine the positive aspects of both the MIMD and SIMD machines. The CM5 supports the full data parallel model by providing high performance for branching and synchronization alike [4]. The CM 5 operating system, CMOST, is an enhanced version of the UNIX operating system. It supports most of the standards in UNIX and uses the network standards to communicate to all of its processors through three separate network connections. The basic architecture of the CM 5 can be seen in ....

Thinking Machines Corporation. The Connection Machine: CM-5 Technical Summary. Technical report, Thinking Machines Corporation, Cambridge, Massachusetts, October 1991.


A Case Study of Shared Mmeory and Message Passing: The Triangle.. - Lew   (Correct)

....application) under low contention because shared memory offers low overhead data access. Our implementations run on the MIT Alewife multiprocessor [2] The message passing implementation was ported from a message passing implementation that runs on Thinking Machines CM 5 family of multicomputers [25]. The original CM 5 implementation written by Kirk Johnson won first place in an Internet newsgroup contest [14] the goal of which was to solve the triangle puzzle in the shortest time. Alewife efficiently supports both message passing and cache coherent shared memory programming models in ....

Thinking Machines Corporation. Connection Machine CM-5 Technical Summary. Nov. 1993. 656565


Quantitative Performance Modeling of Scientific Computations and.. - Toledo (1995)   (2 citations)  (Correct)

....methodology, called benchmapping, are demonstrated in Chapters 4 and 5 using two benchmapping systems called PERFSIM and BENCHCVL.PERFSIM is a profiler for data parallel Fortran programs. It runs on a workstation and produces the profile of the execution of a program on the Connection Machine CM 5 [110] quicker than the profile can be produced by running the program on a CM 5. BENCHCVL predicts the running time of data parallel programs written in the NESL language [17] on several computer systems. Applications of benchmapping, including program profiling and tuning, making acquisition and ....

....plain slow. PERFSIM is a benchmapping system that accelerates the profiling process by estimating the running time of most of the expensive operations in a program, while refraining from actually performing them. PERFSIM analyzes CM Fortran [109] programs running on the Connection Machine CM 5 [110]. By combining execution of the control structure and scalar operations in a program with analysis of vector operations, PERFSIM can execute a program on a workstation and in seconds and generate performance data that would take several minutes or more to generate by running the program on an ....

[Article contains additional citation context not shown here]

Thinking Machines Corporation, Cambridge, MA. The Connection Machine CM-5 Technical Summary, January 1992.


Architecture Implications of High-Speed I/O for.. - Gross, Steenkiste (1994)   (Correct)

....or via a direct, dedicated link, as is sometimes done for a parallel file system. Among the possible network choices, HIPPI is CUlTently the most popular one, and most manufacturers of distributed memory parallel systems either provide or have announced a H1PPI connection (e. g, CM 2 [20] CM 5 [7], iSC 860 [12] NCube2 [14] iWarp [4] Paragon XP S [11] Maspar [2, 15] As far as an application on the parallel system is concerned, the exact characteristics of the external links do not matter, and the I O node provides an appropriate abstrac tion. We can think of the I O nodes as ....

Thinking Machines Corporation. The Connection Machine CM-5 Technical Summary. Thinking Machines Corporation, 1991.


Integration of Message Passing and Shared Memory.. - Heinlein.. (1994)   (40 citations)  (Correct)

....some of these systems. Many recent systems and proposals advocate provisions for direct user level access to message protocols. The messaging interface is typically either memory mapped or register based. The Connection Machine CM 5 provides access to the network through a memory mapped interface [21]. Register based approaches provide tighter coupling by moving the network interface into the processor and providing direct access to the interface through special registers [5, 9] One of the problems with the above systems is that they are typically optimized for short messages, thus limiting ....

Thinking Machines Corporation. The Connection Machine CM-5 Technical Summary, 1991.


Fine-Grain Distributed Shared Memory on Clusters of Workstations - Schoinas (1997)   (3 citations)  (Correct)

....using a custom low latency network. These parallel platforms have been called massively parallel processors (MPPs) Since relatively little extra hardware except the custom network is required, this approach has enjoyed popularity in older machines such as Intel iPSC860 [Int90] and TMC CM 5 [Thi91] It is still present today in commercial systems such as IBM SP 2. At a high level of detail, there is not much difference between MPPs and a collection of workstations with the exception of the custom network. Recently however, the network technology has caught up with the other components of ....

....by the network hardware did not allow the efficient implementation of active messages with large data blocks. Therefore, the cost of breaking a bulk data transfer into small active messages was too high to realize bandwidth comparable to the one achieved by the default CM 5 messaging library [Thi91] In contrast, on the COW, the use of the channel interface has been depreciated mainly because the active message layer can support messages up to four Kbytes, which are big enough to offer the raw hardware thorughput. While the channel interface is still supported in Blizzard, it is being ....

[Article contains additional citation context not shown here]

Thinking Machines Corporation. The connection machine CM-5 technical summary, 1991.


Exploiting Superword Level Parallelism with Multimedia.. - Larsen (2000)   (20 citations)  (Correct)

....to get good MIMD performance, extracting SLP should not detract from existing MIMD parallel performance. 2. 4 SIMD Parallelism SIMD parallelism came into prominence with the advent of massively parallel supercomputers such as the Illiac IV [11] and later with the Thinking Machines CM 1 and CM 2 [25, 26] and the Maspar MP 1 [4, 6] The association of the term SIMD with this type of computer is what led us to use Superword Level Parallelism when discussing short SIMD operations. SIMD supercomputers were implemented using thousands of small processors that 14 worked synchronously on a single ....

Thinking Machines Corporation, Cambridge, MA. Connection Machine CM-200 Technical Summary, June 1991.


Exploiting Superword Level Parallelism with Multimedia.. - Larsen (2000)   (20 citations)  (Correct)

....to get good MIMD performance, extracting SLP should not detract from existing MIMD parallel performance. 2. 4 SIMD Parallelism SIMD parallelism came into prominence with the advent of massively parallel supercomputers such as the Illiac IV [11] and later with the Thinking Machines CM 1 and CM 2 [25, 26] and the Maspar MP 1 [4, 6] The association of the term SIMD with this type of computer is what led us to use Superword Level Parallelism when discussing short SIMD operations. SIMD supercomputers were implemented using thousands of small processors that 14 worked synchronously on a single ....

Thinking Machines Corporation, Cambridge, MA. Connection Machine CM-2 Technical Summary, April 1987.


The Provision Of Relocation Transparency Through A Formalised.. - Falkner (2000)   (1 citation)  (Correct)

.... For example, given an object whose purpose is to calculate a surface map from an array of point values, it could be annotated with the attributes fast, Fortran and CM5 to indicate that it was a high performance solution, written in Fortran [40, 126] that can only be run on a Connection Machine CM5 [175]. This form of resolution is more complex than a name matching scheme, as the resolution system has to determine which attributes should be matched, which should be taken as mandatory, and the ordering of attributes with regard to preferences specified by the client. Furthermore, binding of ....

Thinking Machines Corporation. The Connection Machine CM5 Technical Summary, 1991.


Distributed Genetic Algorithms for Partitioning Uniform Grids - Christou (1996)   (3 citations)  (Correct)

....with the uniform case, many aspects are easily extended to the most general, non uniform case as well. 2 Figure 1: 5 point Uniform Grid Computation In order to perform such 5 point computations over a discretized domain on a distributed memory parallel computer (like the Connection Machine CM5 [Thi91] or a network of high performance workstations) the computational load should be balanced across processors in a way that minimizes interprocessor communication. This communication will occur at the common boundaries of the regions that each processor will occupy. It is therefore necessary to ....

Thinking Machines Corporation. The Connection Machine CM-5 Technical Summary, October 1991.


A Framework for Parallel Job Scheduling - Subramanian (1995)   (Correct)

....CM 2 [Hil85, 7 This figure and others in this section correspond roughly to the CRAY T3D. Do not take these figures too precisely; we only wish to convey a feel for their relative order of magnitude. 23 Thi91a] and MasPar s MP 2 [Mas91] Examples of MIMD machines are Thinking Machines CM 5 [Thi91b, L 92] and Cray Research s T3D [Oed93, Cra93] From this difference in instruction fetching, several other hardware differences follow as corollaries: Fine Grain vs Coarse Grain: In an SIMD machine, the back end processors do not need to fetch and decode instructions. Therefore, they do not ....

Thinking Machines Corporation, Cambridge, MA. The Connection Machine CM-5 Technical Summary, October 1991.


The Provision Of Relocation Transparency Through A Formalised.. - Falkner (2000)   (1 citation)  (Correct)

.... For example, given an object whose purpose is to calculate a surface map from an array of point values, it could be annotated with the attributes fast, Fortran and CM5 to indicate that it was a high performance solution, written in Fortran [40, 126] that can only be run on a Connection Machine CM5 [175]. This form of resolution is more complex than a name matching scheme, as the resolution system has to determine which attributes should be matched, which should be taken as mandatory, and the ordering of attributes with regard to preferences specified by the client. Furthermore, binding of ....

Thinking Machines Corporation. The Connection Machine CM5 Technical Summary, 1991.


Unresponsiveness-Tolerant Collective Communication - Pakin   (Correct)

....for a message to pass through a few layers software and firmware before being injected into network. modifying these lay ers, nonblocking barriers intercept a message additional processing before after passes through network. This property somewhat unique clus ters. Parallel computers such CM 5 [100] [93] contrast, integrate the network high in memory hierarchy, giving applications direct access While this improves latency and possibly, bandwidth, it implies that only way intervene between two communicating processes such system is to modify the custom, vendor specific communication ....

....barrier gion is determined statically, while nonblocking barriers dynamically execute any code violate barrier semantics. 6.6 Evaluating unresponsiveness There have been few studies unresponsiveness relate to work. First, a study that Brewer and Kuszmaul [18] performed Thinking Machines CM 5 [100], they found adding unnecessary barriers parallel programs some times increases performance. They attribute this effect fan in. While VIA drops packets and case reliable delivery reception) introduces channel resets when receiver unresponsive, CM 5 backs network, causing additional delays. other ....

Thinking Machines Corporation, Cambridge, MA. Connection Machine CM-5 Technical Summary, October


Time Space Sharing Scheduling and Architectural Support - Hori, Yokota, Ishikawa.. (1995)   (6 citations)  (Correct)

....Krueger et al. proposed a new job scheduling scheme, called scan , and found that job scheduling order, not mapping, is more important to achieve higher processor utilization [11] All methods, however, are batch scheduling and an interactive programming environment can not be provided. CM 5[18] and Paragon[10] provide time sharing scheduling. In CM 5, partitioning can only be changed at system bootup time, and in Paragon (OSF 1) partitioning and the partition in which a job is executed must be specified by user. If the target parallel machine is dynamically partitionable, one can ....

....the maximum process switching time. On a distributed memory parallel machine, the handling of messages being passed around a network is the major issue faced in guaranteeing the process switching time. In CM 5, time sharing in a partition is implemented with an AFD (All Fall Down) operation [18]. When a scheduler decides to switch a process, the subnetwork in the partition enters AFD mode. In this mode, all messages in the subnetwork go to the nearest processors regardless of the message destinations. After the AFD mode, the kernel switches to a new process. The new process s messages ....

Thinking Machines Corporation. Connection Machine CM-5 Technical Summary, November 1992.


Alternating Directions Methods for the Parallel.. - Spyridon..   (Correct)

....The article by Best et al. 12] presents the parallel I O modes of the machine, while the article by Hillis and Tucker [49] provides a general overview and presents examples of current applications of the CM 5. For details the interested reader can also refer to the CM 5 technical summary [96] and the CMMD library guide [97] 90 7.3 The suite of test problems To assess the relative performance of the three splittings we used MNETGEN [4] a derivative of NETGEN [57] to generate three hundred random multicommodity network problems (MC) one hundred with linear objectives and two ....

Thinking Machines Corporation, Cambridge, MA. The Connection Machine CM--5 technical summary, 1991.


A Dataflow-Based Software Integration Model in Parallel and.. - Cheng (1996)   (Correct)

....information about the CM5 and PVM are also given to help explain the approaches we adopted when working in a specific context of machine and programming tools. B. 1 Parallel Programming on Connection Machine CM5 A detailed description of the CM5 architecture and software system can be found in [28] and related CM5 documents from TMC. Here, we briefly outline the existing 130 APPENDIX B. CASE STUDIES IN CHAPTER 3 131 PN PN PN PN CP IO CM5 Data Network and Control Network FDDI, HPPI, CMIO or VME Bus Local Network . Figure B.1: Components of a CM5 system parallel programming ....

....AVS (remote) modules in the network. We use an AVS system module called geometry viewer along with other system modules ( generate colormap, color range, and field to mesh ) for 3D rendering operations. The computationally intensive modules of this application are distributed to the CM5 [28], an MIMD supercomputer that is configured with 32 processing nodes at NPAC. Each processing node (PN) of the CM5 consists of a SPARC processor for control and non vector computation, four vector units for numerical computation, and 32 MB of RAM. It also includes a Network Interface chip that ....

Thinking Machines Corporation, The Connection Machine CM-5 Technical Summary, Technical Report, Cambridge, MA, October 1991.


A Molecular Dynamics Simulation Of The Orientationally.. - Affouard And Ph   (Correct)

....PVM (Parallel Virtual Machine) also seems promising. Our molecular dynamics simulation was done on a CM2 Connection Machine, a massively parallel computer, with distributed memory; to our knowledge, such architecture was little used up to now for MD. The 16384 processors of 2 the CM2 run in SIMD [10] mode, in which all the processors in use execute the same instruction at the same time using their local data. The 2 n processors are layed out on a n dimensional hypercube with implicit periodicity in each direction, each processor thus having n neighbours. As in many parallel computing ....

Thinking Machine Corporation, Connection Machine CM-2 Technical Summary, Cambridge, MA, (1991)


Network Interface for Message-Passing Parallel Computation on a.. - Hoe (1994)   (5 citations)  (Correct)

....latency. 2.2 Shortcoming of Memory Mapped Network Interface As mentioned previously, a network interface for stock workstations can only communicate with the processor through a bus. A straightforward message passing interface could be implemented as memory mapped registers such as in CM 5 [14], or as a packet sized array of memory mapped registers as suggested by Joerg and Henry [9] Figure 1) These interfaces are passive devices that only respond to the processor s direct manipulation through memory mapped operations. A user program composes an outbound packet by writing the content ....

....parallel processing occurs in frequent and small size messages. The communication overhead must be further minimized by giving the user processes direct control of the network interface. These lowoverhead user level network interface designs can be found in many contemporary MPP architectures [9, 14]. However, these designs typically involve the support of custom system or CPU design. In most contemporary workstation designs, the RISC microprocessors are optimized for cached accesses while the bus architectures are optimized for blocked transfers. The network interface design must take these ....

Thinking Machines Corporation. The Connection Machine: CM-5 Technical Summary, January 1992.


Parallel Programming Languages - Pingali (1998)   (Correct)

....and array processors for performing scientific computations in which dense matrices are the primary data structures. Not surprisingly, most of these languages are extensions of FORTRAN. Programs in these languages contain a combination of scalar and vector operations. On array processors like CM 2 [41], the scalar operations are usually performed by a front end high performance work station, while the vector operations are performed on the array processor. Vector processors like the CRAY [11] can execute both scalar and vector instructions. Therefore, the key problem in designing a SIMD ....

....vector languages are LRLTRAN [45] from Lawrence Livermore Laboratories, BSP FORTRAN [5] from Burroughs, and Cedar FORTRAN [21] from the University of Illinois, Urbana. Cedar FORTRAN permitted the expression of both SIMD and MIMD parallelism. 2. 2 Distributed Memory SIMD Languages The CM 2 machine [41] from Thinking Machines was a distributed memory array processor and its assembly language, called Paris (Parallel Instruction Set) 40] had FORTRAN, C and Lisp interfaces that permit programmers to write high level language programs with Paris commands embedded in them. The resulting languages ....

Thinking Machines Corporation. Connection Machine CM-200 Technical Summary, June 1991.


Critical Performance Path Analysis, and Efficient.. - Bright, Fineberg, ..   (Correct)

....also enhance latency tolerance for data movement between localities. Seamless [FiC92c, FiC92d] is a latency tolerant RISC based multiprocessor architecture based on the data movement programming model [FiC92a] In Seamless, the concept of a multicomputer (e.g. the iPSC 860 [Int91b, HeG90] CM5 [Thi91], and nCUBE 2 [Pal88, Tro89] is extended by adding a second processor, the Locality Manager(LM) to each processing element. In Seamless, a processing element this is referred to as a locality . While the idea of adding a second processor to handle communications is not unique (e.g. the Intel ....

Thinking Machines, The Connection Machine CM-5 Technical Summary, Thinking Machines Corporation, Cambridge, MA, October 1991.


Packing Schemes for Gang Scheduling - Feitelson (1996)   (45 citations)  (Correct)

....all the jobs exceeds the number of PEs in the system, time slicing is used. However, the context switching is coordinated across the PEs, such that all the threads in a job are scheduled and de scheduled at the same time. Gang scheduling is a prominent feature of the Connection Machine CM 5 system [28], and is available on the Intel Paragon [17] the Meiko CS 2, and multiprocessor SGI workstations [2] It has also been used extensively in a home grown system on a BBN Butterfly at Lawrence Livermore Labs [13] which has recently been ported to their new Cray T3D system. The main drawback of ....

Thinking Machines Corp., Connection Machine CM-5 Technical Summary. Nov 1992.


Performance Evaluation Of The Thinking Machines Cm-5 - Kwan (1994)   (Correct)

....to and from the SDA using the data network, this sixteen byte 13 SCSI Channels Disk Buffer SPARC NI Disk Array to interconnection network Figure 2.3: Input Output Node Architecture increment was chosen to match the size of the payload of a data network packet. The Scalable File System (SFS) [31, 23] logically sits atop the SDA and arbitrates input output requests from applications. From a user perspective, SFS looks like a Unix file system, but with parallel input output interface extensions. 2.1.4 Control Processors The control processor can be viewed as a front end (i.e. host) to the ....

Thinking Machines Corporation. Connection Machine CM-5 Technical Summary, Revised Edition, Nov 1992.


Parallel Sphere Rendering - Krogh, Painter, Hansen   (Correct)

....approximately 35,000 spheres per second. This algorithm was also implemented on Pixel Planes 5, achieving more than one million spheres per second [7, 12] 3 Massively Parallel Processors at the ACL The Thinking Machines Corporation CM 5 is a production quality massively parallel supercomputer [22]. The CM 5 can consist of 32 to 16384 Sparc processors. The CM 5 at the Advanced Computing Lab (ACL) at Los Alamos National Laboratory consists of 1024 processors, each with 32MB of local RAM for a total of 32GB. Each Sparc processor also has four 64 bit wide vector units which can assist in ....

Thinking Machines Corporation. The Connection Machine CM-5 technical summary, 1991.


pSather: Layered Extensions to an Object-Oriented.. - Murer, Feldman, Lim.. (1993)   (18 citations)  (Correct)

.... range of parallel systems from shared memory multiprocessors (one single cluster) to distributed memory multiprocessors (each cluster has one processor) pSather has indeed been implemented on both shared memory multiprocessors (e.g. Sequent [50] and distributed memory multiprocessors (e.g. CM 5 [18]) 4.2 Identification of Clusters Clusters are identified by numbers of type INT in the range between 0 and the number of clusters in the system minus one. Consequently, the operator (section 4.3) expects operands of type INT. Remote calls to non existing clusters lead to runtime errors. Often ....

Thinking Machines Corporation. The Connection Machine CM-5 Technical Summary. Thinking Machines Corporation, Cambridge Massachusetts, October 1991.


Design Principles of Parallel Operating Systems - Schröder-Preikschat   (Correct)

....[21] PARIX [26] and UBIK [30] Except the former mentioned one, these all are systems developed for Transputer based architectures. The iPSC 2 hypercube with its 128 nodes is controlled by NX 2, a parallel operating system providing virtual shared memory [20] The operating system for the CM 5 [5], called CMost, is a SunOS variant. This variant, however, is only executed on the control processors. Processing nodes are run by a runtime executive and, normally, will be subjected to single tasking mode of operation. Nevertheless, there is also support for multi tasking. But this means that ....

....latency (i.e. message header transmission) and segment latency (i.e. message trailer transmission) It depends on the network architecture whether both parts will accumulate to the same delay. Note that the network may be tuned to specifically support the transfer of small and fixed size packets [5]. Node latency is the sum of sender latency and receiver latency. The former is due to the header and trailer setup procedure, referred to as header latency and trailer latency, respectively. The latter is due to (1) signaling and handling a communication interrupt, 2) receiving a packet from the ....

Thinking Machines Corp. The Connection Machine CM--5 Technical Summary. System Referance Manual, October 1991.


High Performance Messaging on Workstations: - Illinois Fast Messages   (Correct)

....packets, latency drops to 25s, and for larger packets, bandwidth rises to 19.6MB s. This delivered bandwidth is greater than OC 3 ATM s physical link bandwidth of 19.4 MB s. FM s performance exceeds the messaging performance of commercial messaging layers on numerous massively parallel machines [21, 29, 11]. A good characterization of a messaging layer s usable bandwidth (bandwidth for short messages) is n 1 2 , the packet size to achieve half of the peak bandwidth ( r1 2 ) FM achieves an n 1 2 of 54 bytes. In comparison, Myricom s commercial API requires messages of over 3,873 bytes to ....

Thinking Machines Corporation, 245 First Street, Cambridge, MA 02154-1264. The Connection Machine CM-5 Technical Summary, October 1991.


Discrete Event Driven Simulator Manual - Schark Research Group   (Correct)

.... given in the form of primitives present in the high level language used to describe the algorithm (semaphores, barrier synchronization, etc) In order to efficiently implement such primitives specialized hardware may be available on the PP system, for instance an integer tree or single bit trees [28, 27, 31, 6]. 2.2.3 Communication Model (Software) In section 2.2.1 we focused on how PEs communicate (physical sharing) Here we will focus on how tasks communicate (logical sharing) These two concepts are orthogonal, so we can have four categories, as given by table 2.2. The first category is obvious. If ....

Thinking Machines Corporation, Cambridge, MA. The Connection Machine CM-5 Technical Summary, October 1991.


Obtaining Sequential Efficiency for Concurrent.. - Plevyak, Zhang, Chien (1995)   (29 citations)  (Correct)

....model is achieved by multiplexing the processing elements in software. Thus, each processing element can be viewed as a sequential machine augmented with runtime primitives supporting naming, locking, location, and concurrency control. This model matches existing massively parallel processors [42, 13], and we believe it is appropriate for the next generation machines as well. context switches processing element processing element threads ObjectE ObjectD ObjectC ObjectA ObjectB local messages remote messages Figure 1: Execution Model Example Each object has a global name, a lock to ....

Thinking Machines Corporation, 245 First Street, Cambridge, MA 02154-1264. The Connection Machine CM-5 Technical Summary, October 1991.


Scoped Behaviour for Optimized Distributed Data Sharing - Lu (2000)   (Correct)

....every physical memory location from any processor is constant, then the machine is said to have a uniform memory access (UMA) architecture. However, some computer architectures use scalable interconnection networks, such as meshes [Lenoski et al. 1992, Agarwal et al. 1995] and fat trees [Thi, 1992] , in which independent communication links increase bandwidth, but the distance between individual processors can vary. Computer architects also build large scale multiprocessors using a hierarchy of networks. For example, one could have meshes connecting buses [Lenoski et al. 1992, Kuskin et ....

Thinking Machines Corporation, Cambridge, MA. The Connection Machine CM-5 Technical Summary, January 1992.


Efficient Techniques for Nested and Disjoint Barrier.. - Ramakrishnan, Scherson, .. (1999)   (4 citations)  (Correct)

....Since data parallel programs involve frequent barrier synchronizations, a computer intended to run data parallel programs must implement them efficiently. For this purpose, current MIMD computers, including the CM 5 and T3D, provide a dedicated barrier tree exclusively for barrier synchronizations [4, 15, 19]. 1.1 Limitations from Control Nesting Current MIMD computers provide just one barrier tree per user application. However, data parallel programs very often require the simultaneous use of more than one barrier synchronization tree, due to data dependent conditionals and loops, that have ....

Thinking Machines Corporation, Cambridge, MA. The Connection Machine CM-5 Technical Summary, October 1991.


An Evaluation of the CMMD Library and an Efficient.. - Ulf Johansson Gunnar (1994)   (1 citation)  (Correct)

....performed in the same manner, which implies that the communication is synchronous. To communicate in C a feature called parallel left indexing is used. The left indices are used to choose which elements of some parallel variables are to participate in the operation. For example, the statement [10]dest = 5]source means that the tenth element of the parallel variable dest will be replaced by the fifth element of the parallel variable source. As different elements of the parallel variables are located on different processors, communication among the processors are expressed implicitly in ....

Thinking Machines Corporation. The Connection Machine CM-5 Technical Summary, January 1992.


A Parallel Object-Oriented System for Realizing Reusable and.. - Lim (1993)   (7 citations)  (Correct)

....sequential counterparts. Over the decades, the multiprocessors which have been built include NBS PILOT [160] in the late 1950 s, Burroughs D825 [11] ILLIAC IV [22] in the 1960 s, S 1 [228] Cm [211] in the 1970 s, BBN Butterfly [80] Sequent Symmetry [166] in the 1980 s, and KSR 1 [199] CM 5 [76] in the 1990 s. Despite the variety of multiprocessors, such machines have never gained a foothold in the general computing community which has historically remained dominated by sequential machines. This is true regardless of whether the latter are in the form of maxicomputers (e.g. System 360 ....

....an object oriented language on a shared memory machine. We understood that for pSather to become more generally useful, the language would have to deal with distributed memory machines because scalable architectures require distributed memory. A preliminary version of pSather was ported to a CM 5 [76] in early 1992. Because of this initial experience of parallel programming on a distributed memory machine, pSather evolved to support a more general machine model (Section 2.4) and the definitions of both old and new parallel constructs were refined to be suitable in a distributed memory, ....

[Article contains additional citation context not shown here]

Thinking Machines Corporation. The Connection Machine CM-5 Technical Summary. Thinking Machines Corporation, Cambridge Massachusetts, October 1991.


Performance of the CM-5, ENEE 646 Class Report - Martin, Bader (1994)   (3 citations)  (Correct)

....disk for error detection and single fault error correction. The CM 5 is an MIMD (Multiple Instruction, Multiple Data) machine which can run in true MIMD mode, Single Program Multiple Data (SPMD) mode, or can use lock step synchronization of SPMD to simulate SIMD operation. References [9] and [10] provide an overview for the CM 5, and an in depth look at the network architecture of this machine is described in [7] See Figure 1 for the general organization of the CM 5 with vector units. There are also Control Processors (CP) attached to the fat tree interconnection network. The control ....

....Appendix B for the AC code used in this experiment, and Appendix C for the C code used to contrast the performance for high level data parallel languages. ffl Rating: 32 MFLOPS VU peak performance for 64 bit floating point operations; equivalent to 128 MFLOPS node or 4 GFLOPS 32 node machine) [10]; ffl Performance: 64.1 MFLOPS node (highly scalable) using low level vector code, or 3.15 MFLOPS node using high level data parallel languages. ffl In best case performance, 2 GFLOPS 32 node machine, with no node communications and optimized low level vector code. 12 CM 5 Vector Unit ....

Thinking Machines Corporation, Cambridge, MA. The Connection Machine CM-5 Technical Summary, January 1992.


Exploiting Superword Level Parallelism with Multimedia.. - Larsen, Amarasinghe (2000)   (20 citations)  Self-citation (Ma)   (Correct)

....are required to get good MIMD performance, extracting a small amount of SLP would not detract from existing MIMD parallel performance. 2.2. 3 SIMD Parallelism SIMD parallelism came into prominence with the advent of massively parallel supercomputers such as the Thinking Machines CM1 and CM 2 [28, 29] and Maspar MP 1 [6, 8] The association of the term SIMD with these types of computers is what led us to utilize the term Superword Level Parallelism when discussing short SIMD parallelism. These supercomputers were implemented using thousands of small processors which worked synchronously on ....

Thinking Machines Corporation, Cambridge, MA. Connection Machine CM-200 Technical Summary, June 1991.


Exploiting Superword Level Parallelism with Multimedia.. - Larsen, Amarasinghe (2000)   (20 citations)  Self-citation (Ma)   (Correct)

....of parallelism than the vector parallelism associated with traditional vector supercomputers. We denote this parallelism Superword Level Parallelism since parallelism comes in the form of superwords containing packed data. Note that SLP also differs from traditional large scale SIMD parallelism [6, 8, 28]. SIMD supercomputers require large amounts of parallelism in order to achieve speedups, whereas SLP can be profitable when such parallelism is scarce. In some sense, superword level parallelism is actually a restricted type of ILP. ILP techniques have been very successful in the general purpose ....

....are required to get good MIMD performance, extracting a small amount of SLP would not detract from existing MIMD parallel performance. 2.2. 3 SIMD Parallelism SIMD parallelism came into prominence with the advent of massively parallel supercomputers such as the Thinking Machines CM1 and CM 2 [28, 29] and Maspar MP 1 [6, 8] The association of the term SIMD with these types of computers is what led us to utilize the term Superword Level Parallelism when discussing short SIMD parallelism. These supercomputers were implemented using thousands of small processors which worked synchronously on ....

Thinking Machines Corporation, Cambridge, MA. Connection Machine CM-2 Technical Summary, April 1987.


Automatic Generation of Parallel Programs with Dynamic Load .. - May Cmu-Cs- School   Self-citation (Corporation)   (Correct)

....7.4 Modeling performance with static allocation of work. 166 7.5 Work movement costs used in modeling performance. 170 xiii xiv Chapter 1 Introduction There has been a lot of success in developing parallel languages [48, 51, 52, 66] and parallelizing compilers [25, 62, 67, 79] for MIMD distributed memory machines. These tools have simplified the distribution of applications on tightly coupled machines, such as the Thinking Machines CM 5 [66] the Intel iWarp [7, 62] and the Cray T3D [1, 44] Workstation clusters, in which ....

....There has been a lot of success in developing parallel languages [48, 51, 52, 66] and parallelizing compilers [25, 62, 67, 79] for MIMD distributed memory machines. These tools have simplified the distribution of applications on tightly coupled machines, such as the Thinking Machines CM 5 [66], the Intel iWarp [7, 62] and the Cray T3D [1, 44] Workstation clusters, in which independent workstations are connected by a high speed network, are emerging as a new type of loosely coupled multicomputer. However, the tools for managing the distributed resources on these network based ....

Thinking Machines Corporation, Cambridge, MA. Connection Machine CM-5 Technical Summary, November, 1992.


Evaluating the Locality Benefits of Active Messages - Ellen Spertus And (1995)   (3 citations)  Self-citation (Massachusetts)   (Correct)

....buffering messages is the use of on chip SRAM space and bandwidth by arriving messages; however, even under software control, cache space and memory bandwidth is required to buffer most arriving data. If the network interface is attached to a lower level of the memory hierarchy, as with the CM 5 [Thi92] even more scarce memory bandwidth is consumed reading incoming message data out of the memory mapped registers. Thus, the only advantage to not buffering messages is avoiding consuming cache space and bandwidth for any incoming messagedata that can be handledwithout being written to memory. ....

Thinking Machines Corporation, Cambridge, Massachusetts. The Connection Machine CM-5 Technical Summary, January 1992.


CoMet: A Synthetic Benchmark for Message-Passing Architectures - Ganapati (1993)   Self-citation (Corporation)   (Correct)

....performance parameters, 20 such as startup times and bandwidths, should be interpreted from these graphs. 3.1. 5 Benchmark Topology DMMP systems can have varied architectures such as those based on a network of workstations or a hypercube or a tree based communication network (CM 5) Thi91] It is unrealistic to expect the benchmark kernels to be completely architecture independent. We have instead focussed on capturing the topologies of the hardware into uniform data structures based on important data structures in parallel, scientific programs. Communication properties of some ....

Thinking Machines Corporation. Connection Machine CM-5 Technical Summary, October 1991.


An Efficient Data Parallel Algorithm for 2-D Convolutions - Sandra Dykes Xiaodong   (Correct)

No context found.

Thinking Machines Corporation, The Connection Machine CM-5 Technical Summary (1993).


Comparative Evaluation and Case Studies of Shared-Memory.. - Data-Parallel Execution..   (Correct)

No context found.

Thinking Machines Corporation, The Connection Machine CM-5 Technical Summary, 1993.


Evaluation and Measurement of Multiprocessor Latency Patterns - Xiaodong Zhang Yong   (Correct)

No context found.

Thinking Machines Corporation, The Connection Machine CM-5 Technical Summary, 1993.


Comparative Evaluation and Case Studies of Shared-Memory.. - Data-Parallel Execution..   (Correct)

No context found.

Thinking Machines Corporation, The Connection Machine CM-5 Technical Summary, 1993.


Performance Predictions on Implicit Communication Systems - Xiaodong Zhang Zhichen   (Correct)

No context found.

Thinking Machine Corporation, The Connection Machine CM-5 Technical Summary, 1993.


The Ganglia Distributed Monitoring System: Design.. - Massie, Chun, Culler (2003)   (12 citations)  (Correct)

No context found.

Thinking Machines Corporation. Connection machine cm-5 technical summary, 1992.


Unresponsiveness-Tolerant Collective Communication - Pakin (2001)   (Correct)

No context found.

Thinking Machines Corporation, Cambridge, MA. The Connection Machine CM-5 Technical Summary, October 1991.


Data Locality Optimization of Shared Memory Programs on NUMA.. - Tao   (Correct)

No context found.

Thinking Machines Corporation. The Connection Machine CM-5 Technical Summary, 1991.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC