110 citations found. Retrieving documents...
J. Boyle, R. Butler, T. Disz, B. Glickfeld, E. Lusk, R. Overbeek, J. Patterson, R. Stevens, "Portable Programs for Parallel Processors,", Holt, Rinehart, and Winston Incorporation, New York, NY, 1987.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Supporting the Memory System Evaluation with a Monitor Simulator - Tao   (Correct)

....NUMA characterized PC clusters with a focus on the behavior of remote memory accesses and the goal of aiding the data locality optimization on the distributed shared memory. SIMT runs applications from the SPLASH 2 Benchmark suite [15] and any other application written in C C using m4 macros [2] in a fashion similar to SPLASH and SPLASH 2 applications. It supports a thread based programming model with a shared address space and a private stack space for each thread. SIMT comprises a front end (a memory reference generator) and a backend modeling the target system. This is shown in ....

J. Boyle, R. Butler, T. Disz, B. Glickfeld, E. Lusk, R. Overbeek, J. Patterson, and R. Stevens. Portable Programs for Parallel Processors. Holt, Rinehart, and Winston Incorporation, New York, NY, 1987.


A Compiler Approach to Scalable Concurrent Program Design - Foster, Taylor (1992)   (11 citations)  (Correct)

....6] However, we insist upon a clear separation of sequential and concurrent components in order to conveniently apply source to source transformation techniques and build programming abstractions. Previous work on reusable abstractions in parallel program design include the Argonne monitor macros [4] and Schedule package [17] and Cole s algorithmic skeletons [14] However, in none of these approaches is support for abstractions incorporated into a compiler. An alternative to our compiler techniques is to use run time techniques such as higher order functions [28, 31] However, we prefer to ....

Boyle, J., Butler, R., Disz, T., Glickfcld, B., Lusk, E., Overbeck, R., Patterson, J., and Stevens, R., Portable Programs for Parallel Processors, Holt, Rinehart, and Winston, 1987.


Learning from the Success of MPI - Gropp (2001)   (4 citations)  (Correct)

....were other, equally portable programming models, including many message passing and communication based models. For example, the socket interface was (and remains) widely available and was used as an underlying communication layer by other parallel programming packages, such as PVM [3] and p4 [4]. An obvious second requirement is that of performance: the ability of the programming model to deliver the available performance of the underlying hardware. This clearly distinguishes MPI from interfaces such as sockets. However, even this is not enough. This paper argues that six requirements ....

Boyle, J., Butler, R., Disz, T., Glickfeld, B., Lusk, E., Overbeek, R., Patterson, J., Stevens, R.: Portable Programs for Parallel Processors. Holt, Rinehart, and Winston, New York (1987)


Learning from the Success of MPI - Gropp (2001)   (4 citations)  (Correct)

....were other, equally portable programming models, including many message passing and communication based models. For example, the socket interface was (and remains) widely available and was used as an underlying communication layer by other parallel programming packages, such as PVM [3] and p4 [4]. An obvious second requirement is that of performance: the ability of the programming model to deliver the available performance of the underlying hardware. This clearly distinguishes MPI from interfaces such as sockets. However, even this is not enough. This paper argues that six requirements ....

Boyle, J., Butler, R., Disz, T., Glickfeld, B., Lusk, E., Overbeek, R., Patterson, J., Stevens, R.: Portable Programs for Parallel Processors. Holt, Rinehart, and Winston, New York (1987)


Data Locality and Load Balancing in COOL - Chandra, Gupta, Hennessy (1993)   (51 citations)  (Correct)

....is collocated with the object. As shown in Figure 3, we can simultaneously exploit cache locality through task affinity on the source column, as well as memory locality through object affinity on the destination column. This exactly captures the way the algorithm was hand coded using ANL macros [3] to run on the Stanford DASH multiprocessor; the same scheduling is very simply expressed in CooL. Processor Affinity: Finally, for load balancing reasons it often becomes necessary to directly schedule a task on a particular processor (in practice the corresponding server process) rather than ....

J. Boyle, R. Butler, T. Disz, B. Glickfield, E. Lusk, R. Overbeek, J. Patterson, and R. Stevens. Portable Programs for Parallel Processors. Holt, Rinehart and Winston, Inc., New York NY, 1987.


Fine-Grain Distributed Shared Memory on Clusters of Workstations - Schoinas (1997)   (3 citations)  (Correct)

....maintains sharing status for all the memory blocks on the home nodes. A remote cache serves as a temporary repository for data fetched from remote nodes. A set of fine grain tags enforce access semantics for shared remote memory blocks. Upon 1. Blizzard supports the PARMACS programming model [BBD 87] PARMACS offers to each process of a parallel application a private address space with fork like semantics. Shared memory support is limited to the special shared heap. Uninitializcd Data Static Data I I Enable fine grain access control for page Access I Change page prtectin t read write ....

....segment at the same location in each address space that it is designated as the shared memory segment. Within this segment, the user handles accesses to unmapped pages, and controls the accessibility of mapped memory at a fine granularity. Blizzard supports the PARMACS programming model [BBD 87] PARMACS offers to each process of a parallel application a private address space with fork like semantics while shared memory support is limited to the special shared memory segment. Blizzard preallocates an address range within the application address space for the shared memory segment. ....

James Boyle, Ralph Butler, Terrence Disz, Barnett Glickfieldand Ewing Lusk, Ross Overbeek, James Patterson, and Rick Stevens. Portable Programs for Parallel Processors. Holt, Rinehart and Winston Inc., 1987.


Efficient Runtime Support for Cluster-Based Distributed Shared.. - Speight (1997)   (3 citations)  (Correct)

....performance to TreadMarks under either Solaris or Windows NT. 3.2 Programming for Brazos Users write programs for Brazos utilizing the shared memory parallel programming paradigm and link with the static library brazos.lib at compile time. Brazes provides an implementation of the PARMACS [12] macro suite for ease of porting between shared memory systems. Instead of including a main( function, users specify the function UserMain( as the entry point to a Brazos parallel application. The program s main( function resides in the Brazos library. The Brazos DSM system will spawn as many ....

J. Boyle, R. Butler, T. Disz, B. Glickfeld, E. Lusk, R. Overbeek, J. Patterson, and R. Stevens. Portable Programs for Parallel Processors. Holt, Rinehart and Winston, Inc., 1987.


Volume Rendering on Scalable Shared-Memory MIMD Architectures - Nieh, Levoy (1992)   (56 citations)  (Correct)

....are connected by a 120 Mbytes sec 2D mesh network. Programming DASH for Volume Rendering. DASH s architectural support for shared memory makes the implementation of the volume rendering algorithm easy. Our implementation was done in C with Argonne National Laboratory (ANL) parallel macros [2] for shared memory programming primitives. Processors share access to all large data structures, such as the voxel array, shading table, octree, and image array. The data storage requirements of our implementation are roughly four bytes per voxel: one byte for original data samples, one byte for ....

James Boyle, Ralph Butler, Terrence Disz, Barnett Glickfeld, Ewing Lusk, Ross Overbeek, James Patterson, and Rick Stevens. Portable Programs for Parallel Processors. Holt, Rinehart, and Winston, Inc., 1987.


Experiments with "HPJava" - Carpenter, Chang, Fox, Leskiw, Li (1997)   (7 citations)  (Correct)

....an acceptable price for the bene ts of using a supercomputer. This attitude was not sustainable as one parallel architecture gave way to another, and the cost of porting software became exorbitant. For several years now, portability across platforms had been a central concern in parallel computing [4, 5, 14, 13, 23]. More fundamentally, the assumption that high performance computing will be done primarily on specialized supercomputers is questioned increasingly. Rapid strides in performance and connectivity of ordinary workstations and PCs make it look equally possible that the future of parallel computing ....

J. Boyle, R. Butler, T. Disz, B. Glickfeld, E. Lusk, R. Overbeek, J. Patterson, and R. Stevens. Portable Programs for Parallel Processors. Holt, Rinehart and Winston, 1987.


Evaluation of a Competitive-Update Cache Coherence . . . - Grahn, Stenström (1996)   (2 citations)  (Correct)

....AND STENSTRO M for Ocean, which was provided to us by Stanford University. The main characteristics of the five benchmark programs, together with the size of the data set used, are summarized in Table II. All programs are written in C using the PARMACS macros from Argonne National Laboratory [1] and compiled with gcc version 2.1 with optimization level O2. Previous studies [10, 20] have shown that MP3D, Cholesky, and Water have a high number of migratory objects, while PTHOR and Ocean have many producer consumer objects. 5. EXPERIMENTAL RESULTS In this section we present our ....

Boyle, J., Butler, R., Disz, T., Glickfeld, B., Lusk, E., Overbeek, R., Patterson, J., and Stevens, R. Portable Programs for Parallel Processors. Holt, Rinehart & Winston, New York, 1987.


Compiler-Assisted Distributed Shared Memory Schemes.. - Takashi Matsumoto Junpei (1998)   (Correct)

....of the replacement itself is small and the influence of the MBCF operations is also very small after the MBCF interrupt. 5 Optimizing Techniques RCOP deals with a parallel shared memory program based on lazy release consistency (LRC) model[6] The input program is written in C extended by PARMACS[7] 1 . RCOP analyzes the shared memory program and translates it into a instrumented C program which explicitly contains consistency management codes for the ADSM. The output C code is compiled by the backend compiler, then linked with the ADSM runtime library to generate executable code. We used ....

J. Boyle et al. Portable Programs for Parallel Processors. Holt, Rinehart and Winston, Inc., 1987.


Java as a Language for Scientific Parallel Programming - Carpenter, Chang, Fox, Li (1997)   (1 citation)  (Correct)

....that the ambitions of the Java development team go well beyond enhancing the functionality of HTML documents. Many of their concerns, such as portability, execution in a heterogenous network environment, and efficiency, mirror developments in High Performance Computing world over a number of years[4, 12, 11, 17, 15]. With Java positioned to become a standard programming language on the Internet, and scientific parallel processing edging towards network based computation, it is natural to ask how these two technologies will interact. How suitable is Java for scientific computing, and do lessons from research ....

J. Boyle, R. Butler, T. Disz, B. Glickfeld, E. Lusk, R. Overbeek, J. Patterson, and R. Stevens. Portable Programs for Parallel Processors. Holt, Rinehart and Winston, 1987.


Supporting Software Distributed Shared Memory with.. - Inagaki, Niwa.. (1998)   (3 citations)  (Correct)

....evaluation with SPLASH 2. Section 5 describes related work about a combination of optimizing compiler and software DSM. Section 6 gives a summary. 2. Compilation Process Figure 1 describes the overall compilation process. The input is a shared memory program written in C extended with PARMACS[4]. PARMACS provides the primitives for task creation, shared memory allocation, and synchronization (barrier, lock, and pause) The consistency of shared memory follows lazy release consistency (LRC) model[20] Our compiler inserts consistency management code sequences for software DSM into a given ....

J. Boyle et al. Portable Programs for Parallel Processors. Holt, Rinehart and Winston, Inc., 1987.


Performance Evaluation of Link-Based Cache Coherence Schemes - Nilsson, Stenström (1993)   (2 citations)  (Correct)

....filled before caches are inserted at level i 1, as shown in Figure 3. A new cache that reads a memory block is inserted in the tree as a leaf at the lowest level. The tree is optimal even if a replacement is done [11] M Last Root Father Father Pre Suc Suc Pre 1 2 3 Son[0] Son[1] NULL Son[0] Son[1] Son[0] Son[1] Father NULL NULL NULL NULL Figure 3: Three caches have read a memory block and all the pointers are set correctly (from Nilsson and Stenstrom [11] In [11] we showed that the overhead is 3 log 2 N bits for one memory block and (3 K) log 2 N bits for one ....

....caches are inserted at level i 1, as shown in Figure 3. A new cache that reads a memory block is inserted in the tree as a leaf at the lowest level. The tree is optimal even if a replacement is done [11] M Last Root Father Father Pre Suc Suc Pre 1 2 3 Son[0] Son[1] NULL Son[0] Son[1] Son[0] Son[1] Father NULL NULL NULL NULL Figure 3: Three caches have read a memory block and all the pointers are set correctly (from Nilsson and Stenstrom [11] In [11] we showed that the overhead is 3 log 2 N bits for one memory block and (3 K) log 2 N bits for one cache line. Figure 3 ....

[Article contains additional citation context not shown here]

J. Boyle, R. Butler, T. Disz, B. Glickfeld, E. Lusk, R. Overbeek, J. Patterson, and R. Stevens. Portable Programs for Parallel Processors. Holt, Rinehart and Winston, Inc., 1987.


Dome: Parallel Programming in a Distributed.. - Nagib..   (Correct)

....by an NSF Graduate Research Fellowship. Intel Corporation. k IBM Canada Laboratory. These ideas are not new; parallel computing has long been an active area of research. The fact that networks of computers are commonly being used in this fashion is new. Software tools like PVM [1, 15] P4 [5], Linda [7] Isis [2] Express [14] and MPI [16] allow a programmer to treat a heterogeneous network of computers as a parallel machine. These tools are useful, but for efficient and practical use, load balancing and fault tolerance mechanisms must be developed that will work well in a ....

J. Boyle, R. Butler, T. Disz, B. Glickfeld, E. Lusk, R. Overbeek, J. Patterson, and R. Stevens. Portable Programs for Parallel Processors. Holt, Rinehart and Winston, Inc., 1987.


Scoped Behaviour for Optimized Distributed Data Sharing - Lu (2000)   (Correct)

....by the underlying hardware. For example, the DASH multiprocessor [Lenoski et al. 1992] has influenced the support for locality optimizations in the Concurrent Object Oriented Language (COOL) Chandra et al. 1994] Other programming environments exist for DASH (e.g. the ANL parallel macros [Boyle et al. 1987] and Jade [Rinard et al. 1993] and other cache coherent sharedmemory multiprocessors have also been built (e.g. Alewife [Agarwal et al. 1995] and NUMAchine [Abdelrahman et al. 1994] but we focus on DASH and COOL as an interesting and representative case study. DASH is a cache coherent ....

J. Boyle, R. Butler, T. Disz, B. Glickfeld, E. Lusk, R. Overbeek, J. Patterson, and R. Stevens. Portable Programs for Parallel Processors. Holt, Rinehart and Winston, Inc., New York, NY, 1987. BIBLIOGRAPHY 186


A Scalable Eigenvalue Solver for Symmetric Tridiagonal.. - Trefftz, Huang.. (1994)   (2 citations)  (Correct)

....for scientific computing tasks. The split merge algorithm was implemented using two different programming environments, PVM and P4. PVM [23] a public domain package from Oak Ridge National Laboratory, provides a software infrastructure for network based heterogeneous concurrent computing. P4 [24], developed at Argonne National Laboratory, comprises a library of macros and subroutines that support monitors for shared memory programming, message passing primitives, and support for heterogeneous cluster computing. Since there were no significant differences in performance between the PVM and ....

J. Boyle, R. Butler, T. Disz, B. Glickfield, E. Lusk, and R. Overbeek, Portable Programs for Parallel Processors. Holt, Rinehart and Winston, 1987.


A Taxonomy of Programming Models for Symmetric Multiprocessors.. - Gropp, Lusk (1995)   (20 citations)  Self-citation (Lusk)   (Correct)

....data transfer from one process s private memory to another s. We focus here on what can be accomplished with this basic set of tools, and consider higher level constructs to be built on these. The idea of combining the shared memory and message passing models is not new. It has been described in [1] and implemented in a widely available programming system [2] However, the current computing environment, with this programming model becoming available in so many different ways, gives the topic renewed importance. In Section 2, we define the terms we will use in our discussion of programming ....

James Boyle, Ralph Butler, Terrence Disz, Barnett Glickfeld, Ewing Lusk, Ross Overbeek, James Patterson, and Rick Stevens. Portable Programs for Parallel Processors. Holt, Rinehart, and Winston, 1987.


User's Guide to the p4 Parallel Programming System - Ralph Butler And (1992)   (58 citations)  Self-citation (Butler Lusk)   (Correct)

....programming a variety of parallel machines. A paper describing its functions and use is [2] Its predecessor was the m4 based Argonne macros system described in the Holt, Rinehart, and Winston book Portable Programs for Parallel Processors, by Lusk, Overbeek, et al. from which p4 takes its name[1]. The current p4 system maintains the same basic computational models described there (monitors for the shared memory model, messagepassing for the distributed memory model, and support for combining the two models) while significantly increasing ease and flexibility of use. See 4 [Getting ....

....frees memory obtained with p4shmalloc. Compare with p4free. 11.2 Shared Memory Data Types The abstraction provided by p4 for managing data in shared memory is monitors. Good places to learn about the monitor concept in general are [3] and [5] The specific approach taken by p4 is described in [1]. P4 provides several useful monitors (p4barriert, p4getsubmonitort, p4askformonitort) as well as a general monitor type to help the user in constructing his own monitors (p4monitort) 11.3 Monitor Building Primitives The following functions can be used to construct monitors. A monitor so ....

[Article contains additional citation context not shown here]

J. Boyle, R. Butler, T. Disz, B. Glickfeld, E. Lusk, R. Overbeek, J. Patterson, and R. Stevens. Portable Programs for Parallel Processors. Holt, Rinehart, and Winston, 1987.


A Technique for Collecting Simultaneous Multithreaded.. - Vega, Hamkalo.. (2006)   (Correct)

No context found.

J. Boyle, R. Butler, T. Disz, B. Glickfeld, E. Lusk, R. Overbeek, J. Patterson, R. Stevens, "Portable Programs for Parallel Processors,", Holt, Rinehart, and Winston Incorporation, New York, NY, 1987.


MERMERA: Non-Coherent Distributed Shared Memory for Parallel.. - Sinha (1993)   (5 citations)  (Correct)

No context found.

J. Boyle, R. Butler, T. Disz, B. Glickgield, E. Lusk, R. Overbeek, J. Patterson, and R. Stevens. Portable Programs for Parallel Processors. Holt, Rinehart and Winston, 1987.


Time--Stamp Generation for Optimistic Parallel Computing - Adam Back And (1995)   (2 citations)  (Correct)

No context found.

J Boyle, R Butler, T Disz, B Glickfeld, E Lusk, R Overbeek, J Patterson, and R Stevens. Portable Programs for Parallel Processors. Holt, Rinehart, and Winston, 1987.


Exploiting Thread-Level Parallelism On . . . - Lo (1998)   (Correct)

No context found.

J. Boyle, R. Butler, T. Diaz, B. Glickfeld, E. Lusk, R. Overbeek, J. Patterson, and R. Stevens. Portable Programs for Parallel Processors. Holt, Rinehart, and Winston, Inc., 1987.


Data Locality Optimization of Shared Memory Programs on NUMA.. - Tao   (Correct)

No context found.

J. Boyle, R. Butler, T. Disz, B. Glickfeld, E. Lusk, R. Overbeek, J. Patterson, and R. Stevens. Portable Programs for Parallel Processors. Holt, Rinehart, and Winston Incorporation, New York, NY, 1987.


Distributed Cactus Stacks: Runtime Stack-Sharing.. - Sardesai.. (1998)   (2 citations)  (Correct)

No context found.

J. Boyle, R. Butler, T. Disz, B. Glickfeld, E. Lusk, R. Overbeek, and R. Stevens. Portable Programs for Parallel Processors. Holt, Rinehart and Winston, Inc., 1987.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC