16 citations found. Retrieving documents...
J. R. Larus, Compiling for Shared-Memory and Message-Passing computer, ACM Letters on Programming Languages and Systems, 1996

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Automatic Parallelization for Non-cache Coherent Multiprocessors - Paek, Padua (1996)   (Correct)

....private memory is fully software controllable; that is, the data would never be flushed out until explicitly done so by the program. Hence, as long as the local memory space is available, a processor can prefetch data anytime before it is needed and poststore it sometime after the computation. In [29], some other advantages of the compiler directed communication over cache coherence protocol driven communication on the non cache coherent architecture are discussed. The shared data copying scheme is especially useful when the data distribution requirements of a program are dynamic. The ....

J. R. Larus, Compiling for Shared-Memory and Message-Passing computer, ACM Letters on Programming Languages and Systems, 1996


Efficient Personalized Communication on Wormhole Networks - Petrini, Vanneschi (1997)   (Correct)

....on the latency, or delay, incurred in communicating a message, o, the message handling overhead, g, the gap, which is the reciprocal of the available per processor communication bandwidth and P , the number of processor memory modules. Examples of utilization of LogP are shown in [4] 5] and [6]. A model that has recently gained wide consideration in the scientific community is the BSP (Bulk Synchronous Parallel) model [7] 8] because is neither too abstract nor too low level and it seems to meet many of the requirements listed above [9] It was proposed by Valiant as a bridging model ....

J. R. Larus, "Compiling for Shared-Memory and Message-Passing Computers," ACM Letters on Programming Languages and Systems, vol. 2, pp. 165--180, March--December 1994.


Communication Performance of Wormhole Interconnection Networks - Petrini (1997)   (Correct)

....contention free steps and are thus limited by the injection overhead and the base network latency. In the absence of contention the two parameters L and o can be properly estimated and the model can lead to important and effective optimizations. Examples of LogP usage are shown in [32] 56] and [97]. 4.3.3 BSP The Bulk Synchronous Parallel or BSP model was proposed by Valiant as a bridging model that provides a standard interface between the domains of parallel architectures and algorithms. In the BSP model, a parallel architecture consists of a set of processors, each with its own private ....

James R. Larus. Compiling for Shared-Memory and Message-Passing Computers. ACM Letters on Programming Languages and Systems, 2(1--4):165--180, March--December 1994.


Compiling for Scalable Multiprocessors with Polaris - Paek, Padua (1997)   (Correct)

....the cache controller. However, through explicit data movement operations, the programmer or compiler have simple and direct control over communications in non cache coherent machines. There are also several advantages having data movement controlled by the program rather than by the hardware[12]. Thus, for example, prefetching, which is usually applied under direct program control, can substantially reduce communication costs. In this work, we profit from the fact that the Cray T3D supports fast single sided communication in the form of PUT GET primitives. The target language of our ....

J. R. Larus, Compiling for Shared-Memory and Message-Passing computer, ACM Letters on Programming Languages and Systems, 1996


Shared Control Multiprocessors - A Paradigm for Supporting.. - Abu-Ghazaleh (1998)   (1 citation)  (Correct)

....are significantly more expensive than those to local memory; thus, the model is called NUMA, or Non Uniform Memory Access time. It is essential for the programmer to realize that the cost of memory access to non local processors memory is more expensive than to local memory [ Hill et al. 1993, Larus, 1993 ] The gain of distributed shared memory over distributed memory is efficient hardware support for remote accesses to non local shared memory. Since the memory organization is physically a distributed memory, many architectures expose the underlying layer to the programmer, allowing direct ....

.... for fine grained communication (e.g. distributed shared memory architectures) Agarwal et al. 1995, Cra, 1993, Kuskin et al. 1994 ] Unfortunately, these machines are more complex and expensive than the regular distributed memory MIMD configurations, and do not completely solve this problem [ Larus, 1993 ] 2. High cost and low utilization of silicon area. The reasons for the low utilization of silicon are chiefly (i) having a control unit for every data stream, and (ii) arbitration and conflict resolution hardware. The area required for the control part of a microprocessor was shown to be close ....

J. Larus. Compiling for shared-memory and message-passing computers. ACM Letters on Programming Languages and Systems, 2(1--4), March--Dec. 1993.


SVS: Can The Shared Variable Paradigm Exist in Massively.. - Dimitrelos, Halatsis (1995)   (Correct)

....hardware functions [5, 6, 7, 8] This method is expected to gain wider acceptance in the near future, as compiler technology is rapidly evolving. In the future, compilers will be able to predict the communicational behaviour of the program better than they can today, and to act accordingly [9]. The second method is the implementation of certain runtime routines that will be the extra abstraction level, or the filter between the SM programming model and the DM architecture [10] Finally, there are the so called hybrid approaches, which in search of a way to reduce the gap between SM ....

James R.Larus. Compiling for shared-memory and message-passing computer. Technical report, Univ. of Wisconsin Computer Sciences, November 1993.


Mechanisms for Distributed Shared Memory - Reinhardt (1996)   (4 citations)  (Correct)

....Because programmers communicate with the system only through memory accesses (loads and stores) they must relinquish message passing s direct control over memory and communication to gain shared memory s ease of use. This lack of control forces them to forgo potential optimizations [KJA 93, Lar94] For example, a programmer may know (or a compiler may deduce) that a value written on node A will be read next on node B. In this case, sending the value directly from A to B is almost certainly more efficient than relying 6 on a DSM system s coherence protocol. Even without exact ....

....performance, typically 7 by sending data directly to its consumer. Performance improvements of an order of magnitude have been observed from custom protocols [FLR 94] Although this thesis reports on manual optimizations only, automatic optimizations based on static program analysis [Lar94] or programmer annotations [RAK89, CBZ91, HLRW93] are a promising approach to achieving similar efficiency with reduced programmer effort. Other researchers are investigating tools to aid custom protocol development [CRL96] as well as other applications of these mechanisms for example, ....

James R. Larus. Compiling for shared-memory and message-passing computers. ACM Letters on Programming Languages and Systems, 2(1-4):165--180, March-- December 1994. 122


Efficient Machine-Independent Programming of High-Performance.. - Tseng (1995)   (Correct)

....algorithms most similar to ours, but rely on heuristics to eliminate cross processor dependences. In comparison, we perform full communication analysis. Larus has compared implementing global address spaces in software using a distributed memory compiler compared to hardware based implementations [50]. He speculates that distributed memory compilers are desirable because they can more closely exploit underlying architectural features in certain key cases; however, shared memory hardware is desirable in the cases where the compiler fails. Mukherjee et al. compared the performance of explicit ....

J. Larus. Compiling for shared-memory and message-passing computers. ACM Letters on Programming Languages and Systems, 2(1--4):165--180, March--December 1993.


Unified Compilation Techniques for Shared and.. - Tseng, Anderson.. (1995)   (3 citations)  (Correct)

....we have shown how they may be implemented in a compiler by adapting well known techniques from distributed memory compilers for shared memory machines. Larushascomparedimplementing global address spacesin software using a distributed memory compiler compared to hardwarebased implementations [19]. He speculates that distributed memory compilers are desirable because they can more closely exploit underlying architectural features in certain key cases; however, sharedmemory hardware is desirable in the cases where the compiler fails. Compared to his work we examine actual instances of ....

J. Larus. Compiling for shared-memory and message-passing computers. ACM Letters on Programming Languages and Systems, 2(1-- 4):165--180, March--December 1993.


Tempest: A Substrate for Portable Parallel Programs - Hill (1995)   (23 citations)  Self-citation (Larus)   (Correct)

....is a consequence of one size fits all coherence policies, which implement widely applicable semantics that can be unnecessarily general in many situations. Tempest mechanisms enable a compiler or programmer to retain the advantages of shared memory (a shared address space and caching [3,14]) but communicate more efficiently by customizing a coherence protocol to an application s sharing patterns and semantics. To demonstrate these ideas, we developed custom update protocols for three applications: NAS Appbt, Berkeley EM3D, and SPLASH Barnes [6] The three protocols differ ....

James R. Larus. Compiling for Shared-Memory and MessagePassing Computers. ACM Letters on Programming Languages and Systems, 2(1--4):165--180, March--December 1994.


Application-Specific Protocols for User-Level Shared.. - Falsafi, Lebeck.. (1994)   (75 citations)  Self-citation (Larus)   (Correct)

....an address space by detecting remote accesses with static program analysis or run time tests and transfers data through explicit messages. These systems have gained a reputation for performing poorly in many circumstances and being overly sensitive to programs spatial locality and false sharing [12]. One reason for shared memory s poor performance is that it typically implements only a single coherence protocol. The policy embodied in a protocol controls a system s response to a remote reference and therefore the message traffic between processors. Most systems provide a single, fixed ....

James R. Larus. Compiling for Shared-Memory and MessagePassing Computers. ACM Letters on Programming Languages and Systems, 2(1--4):165--180, March--December 1994.


Tempest and Typhoon: User-Level Shared Memory - Reinhardt, Larus, Wood (1994)   (247 citations)  Self-citation (Larus)   (Correct)

.... machines share a common hardware base with messagepassing machines (workstation like nodes and point topoint message passing) compilers for shared memory machines have been constrained to use memory loads and stores for communication, even when static analysis could identify better approaches [24]. This paper describes Tempest and Typhoon. Tempest is an interface that permits programmers and compilers to use hardware communication facilities directly and to modify the semantics and performance of shared memory operations. It enables an application s user level code to support shared memory ....

....in the handlers. By eliminating most synchronization and all invalidation traffic, the user level coherence code attains near minimum communication. In effect, this approach combines the communication efficiency of message passing with the low overhead and programming simplicity of shared memory [24]. Of course, the simple EM3D application could also be implemented efficiently with pure message passing, by a software inspection step that explicitly allocates space for remote nodes and builds an update list [7] This approach is feasible because the graph is static and the inspector overhead ....

[Article contains additional citation context not shown here]

James R. Larus. Compiling for Shared-Memory and Message-Passing Computers. ACM Letters on Programming Languages and Systems, 1(4):?, December 1993. To appear.


Parallel Computer Research in the Wisconsin Wind Tunnel Project - Hill, Larus, Wood (1996)   (2 citations)  Self-citation (Larus)   (Correct)

..... tolerate even longer interconnection network latencies, and . provide solutions at many price performance points. A particularly frustrating aspect of performance optimization in a shared memory model, such as CSM, is that sometimes a message is exactly the right communication mechanism [25,6]. CSM directives can approximate a message send, but the approximation is not semantically perfect and costs performance. So, we added the question: can messages be integrated with coherent shared memory in a portable way Our affirmative answer to these questions is the Tempest parallel ....

James R. Larus. Compiling for Shared-Memory and MessagePassing Computers. ACM Letters on Programming Languages and Systems, 2(1--4):165--180, March--December 1994.


LCM: Memory System Support for Parallel Language.. - Larus, Richards.. (1994)   (13 citations)  Self-citation (Larus)   (Correct)

....Graduate School. 0 1 Introduction Compiling parallel languages for parallel computers is difficult. Most of these languages assume a shared address space in which any part of a computation can reference any data. Parallel machines provide either too little or too much support for many languages [22]. On one hand, message passing machines require a compiler to statically analyze and handle all details of data placement and access, or pay a large cost to defer these decisions to run time. On the other hand, shared memory machines provide more dynamic mechanisms, but generally use them to ....

....(typically, sequential consistency) and provide few performanceenhancing mechanisms beyond prefetches and cache flushes. Compilers can circumvent coherence policies only by sending messages [19] even when language semantics or program analysis shows that much coherence traffic is unnecessary [9, 14, 22]. Relaxed consistency models trade a simple view of memory as a sequentially consistent store for increased hardware performance [1] Most models adopt the view that memory need only be consistent at program specified synchronization points. Relaxed consistency, instead of providing mechanisms by ....

James R. Larus. Compiling for Shared-Memory and Message-Passing Computers. ACM Letters on Programming Languages and Systems, 2(1--4):165--180, March-- December 1994.


HPF on Fine-Grain Distributed Shared Memory: Early Experience - Chandra, Larus (1996)   (5 citations)  Self-citation (Larus)   (Correct)

....benefit from HPF. A compiler targeting a message passing machine converts parallel loops that manipulate data in a global address space (such as those that can be written in HPF or similar languages [19,34] into SPMD code that, in essence, synthesizes a global name space using explicit messages [22]. Unfortunately, the compiler depends on complete and accurate program analysis [34] to generate good message passing code. Programs that cannot be completely analyzed show poor performance [29] An alternative approach leaves the onerous task of implementing a program s shared address space to an ....

James R. Larus. Compiling for Shared-Memory and Message-Passing Computers. ACM Letters on Programming Languages and Systems, 2(1--4):165--180, March--December 1994.


Sather Revisited: A High Performance Free Alternative to C++ - Stoutamire, Kennel (1995)   (2 citations)  (Correct)

No context found.

J. Larus, "Compiling for Shared-Memory and Message-Passing Computers", ACM Letters on Prog. Lang. and Systems, Vol. 2 No. 1-4 March-December 1993, pp. 165-180.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC