22 citations found. Retrieving documents...
Steven S. Lumetta and David E. Culler. Managing concurrent access for shared memory active messages. In International Parallel Processing Symposium, Orlando, Florida, April 1998.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Ultra-High Performance Communication with MPI and the Sun.. - Sistare, Jackson (2002)   (2 citations)  (Correct)

.... messaging for low latency [1] 2] These interconnects have been primarily utilized in clusters whose compute nodes contain only a few processors each [3] 4] The Berkeley CLUMPS project was one of the first to use larger and more capable SMP systems as the basic building blocks for a cluster [5][6]. SMP systems with larger processor counts make much higher demands of the network interconnect. Indeed, no existing interconnect can supply the bandwidth needed to build a balanced SMP cluster with comparable on node and off node bandwidths. To solve this problem, Sun Microsystems has developed ....

Steven S. Lumetta and David E. Culler. Managing Concurrent Access for Shared Memory Active Messages. IPPS/SPDP '98 , Orlando.


Scintilla: Cluster Computing with SCI - Ibel, Schauser, Scheiman, Schmitt   (Correct)

....K 0 500 1000 1500 2000 2500 3000 4 8 16 32 64 128 256 512 1K 2K 8K 32K 128K 512K 2M 8M Store Time (usec) Stride (bytes) Store time for 16 segments of size 128 KB per node 16 M 8 M 4 M 2 M 1 M Figure 10: Store times for different segment sizes. examining clusters of SMP nodes [LC98, LMC97] Besides Dolphin Interconnects [Dol95a] Interconnect Systems Solution [Kib97] and Vitesse are also developing commodity SCI products to connect clusters of workstations. Several other groups even build their own SCI network cards or bridges, for example the SMILE group at the University ....

S. S. Lumetta and D. E. Culler. Managing Concurrent Access for Shared Memory Active Messages. In Proceedings of the International Parallel Processing Symposium, Orlando, Florida, April 1998.


Performance Metrics for Embedded Parallel Pipelines - Fleury, Downton, Clark   (Correct)

....design between NUMA machines and distributed memory machines, for example between the Cray T3x family and the Intel Paragon which has enabled message passing to be used on machines. For example, For example, Active Messages , light weight user messages, have also successfully been applied [ 43 ] to ccNUMA multiprocessors. Therefore, it occurs to us that performance models for traditional NUMA machines may equally apply to the distributed memory systems used by PPF. These performance models refer to traditional NUMA machines employed without message passing, whereas PPF systems have ....

S. S. Lumetta and D. E. Culler. Managing concurrent access for shared memory active messages. In IEEE IPPS/SPDP'98, 1998. Available at http://dlib.computer.org/conferen/ipps.


BIP-SMP: High Performance Message Passing over a.. - Geoffray, Prylli.. (1999)   (9 citations)  (Correct)

....the MPICH devices ch shmem or ch lfshmem . The device ch lfshmem is a lock free shared memory device that achieves very good performance (2. 4 s and 100 MB s on one of our SMP nodes) Concerning the shared memory management, an article about concurrent access for shared memory Active Messages [10] presents very efficient solutions, like a lock free algorithm and high performance lock implementation. 4 Multi Protocol communication layer: BIP SMP BIP is an efficient communication layer. It provides almost all of the hardware performance at the user level. The constraints are easy to accept ....

Steven S. Lumetta and David E. Culler. Managing concurrent access for shared memory active messages. In International Parallel Processing Symposium, Orlando, Florida, April 1998.


Program Transformation and Runtime Support for Threaded MPI.. - Hong Tang Kai (2000)   (3 citations)  (Correct)

....how multiple threads can be invoked in each MPI node, but not how to execute each MPI node as a thread. Previous work has illustrated the importance of lock free management for reducing synchronization contention and unnecessary delay due to locks [Anderson 1990; Arora et al. 1998; Herlihy 1991; Lumetta and Culler 1998; Massalin and Pu 1991] Lock free synchronization has also been used in the process based SGI implementation [Gropp et al. 1996] Theoretically speaking, some concepts of SGI s design could be applied to our case after considerations for thread based execution. However, as a proprietary ....

....code is not available to public. Also, their design uses busy waiting when a process is waiting for events [Salo 1998] which is not desirable for multiprogrammed environments [Kontothanassis et al. 1997; Ousterhout 1982] Lock free studies in [Anderson 1990; Arora et al. 1998; Herlihy 1991; Lumetta and Culler 1998; Massalin and Pu 1991] restrict their queue models to be either FIFO or FILO. These models are not sufficient for MPI point to point communication, and sometimes too general with unnecessary overhead for MPI. A study that attempts to use lock free data structures for MPICH is conducted in a ....

[Article contains additional citation context not shown here]

LUMETTA, S. S. AND CULLER, D. E. 1998. Managing concurrent access for shared memory active messages. In Proceedings of the International Parallel Processing Symposium. Orlando, Florida, 272--8.


Cluster-Based Parallel Simulations: A Case Study With.. - Geoffray, Pham..   (Correct)

....like the MPICH devices ch shmem or ch lfshmem . The device ch lfshmem is a lock free shared memory device that achieves very good performance (2. 4 s and 100 MB s on one of our SMP nodes) Regarding the shared memory management, some work about concurrent access for shared memory Active Messages [7] presents very ecient solutions such as a lock free algorithm and a high performance lock implementation. A rst solution for exploiting the other available processors could be to use them with BIP threads (and use special compilation techniques to regroup several MPI processes into a unique ....

Steven S. Lumetta and David E. Culler. Managing Concurrent Access for Shared-Memory Active Messages. In International Parallel Processing Symposium, Orlando, Florida, April 1998.


The Paderborn University BSP (PUB) Library on the Cray.. - Bonorden.. (2000)   (Correct)

....Nevertheless, we can gain some performance since we do not need all the overhead which is provided by MPI and so we can implement a small and fast message passing layer for our special purposes. Our implementation of COMM using the SHMEM library equals the lock free algorithm introduced in [CL98] We will give a short description of the original data structure and algorithm. On each node a data structure into which nodes can write data is generated. Sending (and receiving) messages is implemented on top of the SHMEM capabilities supplied by unicos. The message bu er on each node consists ....

....Besides how to obtain the lock free quality the algorithm is clear up to here. A sender puts a packet and perhaps a single bulk data bu er into the targets data structure. A receive operation is reading the next packet (and bulk data entry) Since we need advanced functionality than given in [CL98] we made the following enhancements: 1. More than one bulk data block can be used to store a single message. Therefore we made it possible to use more than one bulk data block per message by chaining up the bulks. 2. If a message is even larger than the maximum size that can be stored in a ....

D.E. Culler and S.S. Lumetta. Managing Concurrent Access for Shared Memory Active Messages. In IPPS/SPDP, March 1998.


Exploiting Clusters of Shared Memory Multiprocessors.. - Geoffray, Pham.. (1999)   (Correct)

....of course, the lock needs to be very ecient. With BIP SMP, two processes can then overlap the lling of a send request and the only operation that is locked is the entry in the send request s queue. This strategy is similar to the mechanism for the concurrent access implemented in Active Messages [5]. Althought several Myrinet interfaces per node can be supported by BIP SMP, the bottleneck is the PCI bus that is usually unique in the regular SMP machines. 3.2 BIP SMP main features There are several ways to move data from one process to another process. One solution is to use shared memory ....

Steven S. Lumetta and David E. Culler. Managing concurrent access for shared memory active messages. In IPPS, Orlando, Florida, April 1998.


Fast Synchronization on Scalable Cache-Coherent.. - Nikolopoulos..   (Correct)

.... not appeared in the literature until recently [10, 17] Lock free synchronization has also attracted considerable attention due to its competitive performance compared to lock based synchronization and its robustness as a synchronization discipline in multiprogrammed shared memory multiprocessors [2, 13, 16, 17, 20]. Synchronization primitives on shared memory multiprocessors can be analyzed effectively through time decomposition of synchronization periods [5] A generic synchronization primitive can be decomposed into at most four distinct time intervals, the acquire, the waiting, the compute and the ....

S. Lumetta and D. Culler. Managing Concurrent Access for Shared Memory Active Messages. Proc. of the 12th IEEE Int. Parallel Processing Symp., pp. 272--278, Orlando (USA), Apr. 1998.


An Efficient Global Address Space Model with SCI - Ibel, Schmitt, Schauser.. (1998)   (Correct)

....whereas our paper covers real data gathered on an existing cluster. A similar similar empirical study for the Cray T3D has been conducted in [1] Another way to combine shared memory and message passing is to build a higher level abstraction that maps to either paradigm. The CLUMPS project [20] [15] provides an efficient implementation of Active Messages [24] on a cluster of SMP clusters and makes the most efficient use of both message passing between and shared memory within nodes to achieve the maximum performance. The SIMPLE framework proposes in [5] provides a uniform API for both shared ....

S. Lumetta and D. Culler. Managing concurrent access for shared memory active messages. In Proceedings of IPPS/SPDP, Orlando, FL, March 1998.


Sharing the Garden GATE: Towards an Efficient Uniform.. - Butler, Roe   (Correct)

....abstract, gives a good handle on performance and offers an attractive solution. There exist two major design issues for highperformance message passing protocols on SMP architectures: the minimisation of cache coherent transactions and the management of concurrent access of the message queues [2]. The first issue has ramifications for the performance of the protocol. Whereas, the second issue involves ensuring the integrity of the message queues themselves. The remainder of this paper is organised as follows: in the next section, we give an overview of Gardens; Section 3 details the ....

....its own communication end point in a shared memory segment. Every other process then attaches to the segment allowing it to write to the end point and hence send messages. As with all shared variables, these end points must be accessed in a predictable fashion to ensure their integrity. Similar to [2] we are able to construct lock free message queues using the atomic Compare Swap (CAS) instruction on SPARC processors or the Intel equivalent, Compare Exchange. Unlike [2] we utilise a single message queue for both request and reply messages, but we increase the size of the queue. This was ....

[Article contains additional citation context not shown here]

S. Lumetta and D. Culler. Managing concurrent access for shared memory active messages. In First Merged Symposium IPPS/SPDP, 1998.


Parallel Structure in an Integrated Speech-Recognition Network - Fleury, Downton, Clark (1999)   (Correct)

....on an SMP We considered whether a widely available type of parallel machine would be sufficient to parallelize the complete system. On a symmetric multiprocessor (SMP) the thread manager would share one processor with the data manager. Efficient message passing is available for SMPs [21] in addition to threads. Triphones, usual for continuous speech, restrict potential parallelism but with node level decomposition, Table 1, an eight processor machine would approach the required fivefold speed up while a four processor machine would reduce turnaround during testing. The estimate ....

S. S. Lumetta and D. E. Culler. Managing concurrent access for shared memory active messages. In IPPS/SPDP'98, 1998. 7 pages from http://now.CS.berkeley.EDU/Papers2. This article was processed using the L A T E X macro package with LLNCS style


Push-Pull Messaging: A High-Performance Communication Mechanism .. - Wong, Wang (1999)   (Correct)

....To ensure the correctness of the 5 invocation in the multiprocessor environment, the system has to restrict that only one user or kernel thread invokes the thread at a time. Efficient synchronization between concurrent processes in the COMP node is critical to the communication performance [12][19] Stage 2: Data Pumping. After the submission of packets, the NIC pumps packets to the physical network through the hardware on the NIC. The time spent in data pumping mainly depends on the hardware performance. For example, it can be affected by the performance of DMA engines in the host ....

S. S. Lumetta and D. E. Culler. "Managing Concurrent Access for Shared Memory Active Messages", Proc. of the 12th International Parallel Processing Symposium (IPPS '98), 1998.


Protocols Aboard Network Interface Cards - Beauduy Bettati Fcbeauduy (1999)   (1 citation)  (Correct)

....As the performance provided by networking technologies dramatically increases, solutions for high performance finegrained distributed computing start to emerge. Computing based on clusters, or on networks of workstations, greatly increases the performance of a variety of applications at low costs [1, 2, 3, 4]. The performance of such clusters relies heavily on low communication latency. For example, applications on clusters frequently rely on reliable multicast protocols to disseminate the state of the computation and to manage the state of the system. These protocols typically involve several rounds ....

Lumetta, Culler, Managing Concurrent Access for shared Memory Active Messages, IPPS/SPDP 98, Orlando, Florida, March 1998


Program Transformation and Runtime Support for Threaded MPI.. - Tang, Shen, Yang (1999)   (3 citations)  (Correct)

....invoked in each MPI node, but not how to execute each MPI node as a thread. These studies are useful for us to relax our assumptions in the future. Previous work has also illustrated the importance of lock free management for reducing synchronization contention and unnecessary delay due to locks [5, 6, 20, 25, 26]. Lock free synchronization has also been used in the process based SGI implementation [19] Theoretically speaking, some concepts of SGI s design could be applied to our case after considerations for thread based execution. However, as a proprietary implementation, SGI s MPI design is not ....

....low level functions and hardware support specific to the SGI architecture, which may not be general or suitable for other machines. Also, their design uses busy waiting when a process is waiting for events [31] which is not desirable for multiprogrammed environments [23, 28] Lock free studies in [5, 6, 20, 25, 26] either restrict their queue model to be FIFO or FILO, which are not sufficient for MPI point to point communication, or are too general with unnecessary overhead for MPI. A lock free study for MPICH is conducted in a version for the NEC shared memory vector machines and Cray T3D [18, 9, 2] using ....

[Article contains additional citation context not shown here]

S. S. Lumetta and D. E. Culler. Managing Concurrent Access for Shared Memory Active Messages. In Proceedings of the International Parallel Processing Symposium, April 1998.


Compile/Run-time Support for Threaded MPI Execution on.. - Hong Tang (1999)   (3 citations)  (Correct)

....proposed in this extended abstract are focused on efficient point to point communication primitives using lock free queue management techniques. The previous work has illustrated importance of lock free management for reducing synchronization contention and unnecessary delay due to locks [4, 5, 12, 15, 16] and it has also been used in the process based SGI implementation [11] Theoretically speaking, some concept of their design could be applied to our case after certain considerations for supporting thread based execution. However, as a proprietary implementation, SGI s MPI design is not ....

....low level functions and hardware support specific to the SGI architecture, which may not be general or suitable for other machines. Also their design uses busy waiting when a process is waiting for events [21] which is not desirable for multiprogrammed environments [13, 18] Lock free studies in [4, 5, 12, 15, 16] either restrict their queue model to be FIFO or stack, which are not sufficient for MPI point to point communication, or too general with unnecessary overhead for MPI. Thus our second goal is to design an efficient communication protocol for MPI threads by using a new lock free queue management ....

[Article contains additional citation context not shown here]

S. S. Lumetta and D. E. Culler. Managing Concurrent Access for Shared Memory Active Messages. In Proceedings of the International Parallel Processing Symposium, April 1998.


Compile/Run-time Support for Threaded MPI Execution on.. - Hong Tang (1999)   (3 citations)  (Correct)

....invoked in each MPI node, but not how to execute each MPI node as a thread. These studies are useful for us to relax our assumptions in the future. Previous work has also illustrated the importance of lock free management for reducing synchronization contention and unnecessary delay due to locks [4, 5, 18, 21, 22]. Lock free synchronization has also been used in the process based SGI implementation [17] Theoretically speaking, some concepts of SGI s design could be applied to our case after considerations for thread based execution. However, as a proprietary implementation, SGI s MPI design is not ....

....low level functions and hardware support specific to the SGI architecture, which may not be general or suitable for other machines. Also, their design uses busy waiting when a process is waiting for events [27] which is not desirable for multiprogrammed environments [19, 24] Lock free studies in [4, 5, 18, 21, 22] either restrict their queue model to be FIFO or FILO, which are not sufficient for MPI point to point communication, or are too general with unnecessary overhead for MPI. A lock free study for MPICH is conducted in a version for the NEC shared memory vector machines and Cray T3D [16, 8, 2] using ....

[Article contains additional citation context not shown here]

S. S. Lumetta and D. E. Culler. Managing Concurrent Access for Shared Memory Active Messages. In Proceedings of the International Parallel Processing Symposium, April 1998.


Compile/Run-time Support for Threaded MPI Execution on.. - Tang, Shen, Yang (1999)   (3 citations)  (Correct)

....invoked in each MPI node, but not how to execute each MPI node as a thread. These studies are useful for us to relax our assumptions in the future. Previous work has also illustrated the importance of lock free management for reducing synchronization contention and unnecessary delay due to locks [4, 5, 18, 21, 22]. Lock free synchronization has also been used in the process based SGI implementation [17] Theoretically speaking, some concepts of SGI s design could be applied to our case after considerations for thread based execution. However, as a proprietary implementation, SGI s MPI design is not ....

....low level functions and hardware support specific to the SGI architecture, which may not be general or suitable for other machines. Also, their design uses busy waiting when a process is waiting for events [27] which is not desirable for multiprogrammed environments [19, 24] Lock free studies in [4, 5, 18, 21, 22] either restrict their queue model to be FIFO or FILO, which are not sufficient for MPI point to point communication, or are too general with unnecessary overhead for MPI. A lock free study for MPICH is conducted in a version for the NEC shared memory vector machines and Cray T3D [16, 8, 2] using ....

[Article contains additional citation context not shown here]

S. S. Lumetta and D. E. Culler. Managing Concurrent Access for Shared Memory Active Messages. In Proceedings of the International Parallel Processing Symposium, April 1998.


Multi-Protocol Active Messages on a Cluster of SMP's - Lumetta, Mainwaring, Culler (1997)   (38 citations)  Self-citation (Lumetta Culler)   (Correct)

....being swapped out on the progress of other senders results in a level of robustness that proves quite advantageous in multiprogrammed systems. Furthermore, the method outlined above results in superior application performance even for a dedicated system. The interested reader is referred to [20] for further detail. Although the AM II library provides support for protected access to an endpoint using multiple receiver threads, we have assumed the use of a single thread per process in this work. The issues and costs for concurrent access by receivers are similar to those for senders. In ....

S. S. Lumetta, D. E. Culler, "Managing Concurrent Access for Shared Memory Active Messages, " U. C. Berkeley Technical Report in preparation.


A Software Suite for High-Performance Communications - On Clusters Of   (Correct)

No context found.

Steven S. Lumetta and David E. Culler. Managing concurrent access for shared memory active messages. In International Parallel Processing Symposium, Orlando, Florida, April 1998.


Evaluating Support for Global Address Space Languages.. - Bell, Chen, Bonachea.. (2004)   (1 citation)  (Correct)

No context found.

S. Lumetta and D. Culler. Managing concurrent access for shared memory active messages. In Proceedings of the International Parallel Processing Symposium, pages 272--279, 1998.


Evaluating Support for Global Address Space Languages.. - Bell, Chen, Bonachea.. (2004)   (1 citation)  (Correct)

No context found.

S. Lumetta and D. Culler. Managing concurrent access for shared memory active messages. In Proceedings of the International Parallel Processing Symposium, pages 272--279, 1998.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC