10 citations found. Retrieving documents...
E. Speight. "Efficient runtime support for cluster based distributed shared memory multiprocessors", PhD Thesis, Rice University, Houston, TX, 1997.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Dynamic Adaptation of Sharing Granularity in DSM Systems - Itzkovitz, Niv, Schuster (1999)   (Correct)

....much larger than the optimal sharing unit, and the result is false sharing. False sharing can cause the dsm to send redundant messages, as well as redundant data in messages which carry truly shared data. The ill effects of false sharing can be reduced using a relaxed memory consistency protocol [4, 5, 8, 16] or prefetching strategies [7, 9] However, the price for using these methods is a change in the memory semantics which creates a difficulty for the programmer. In addition, relaxed consistency protocols introduce substantial additional overhead [17] Fine grain access has been proposed [13, 14] ....

....in Barnes. Only when using AdaptableView we were able to achieve speedups, and these are increasing with the number of hosts. Our results here, using Sequential consistency, are comparable to the state of the art results reported for relaxed consistency dsm systems (e.g. LRC implementation in [16]) The breakdown of the runtime into computation time and the time it takes to handle different dsm operations is depicted in Figure 9. It shows the results for AdaptableView, a fixed fine granularity and a fixed coarse granularity. Thanks to the reduction in false sharing and to the employed ....

W. E. Speight. Efficient Runtime Support for Cluster-Based Distributed Shared Memory Multiprocessors. PhD thesis, Department of Electrical and Computer Engineering, Rice University, July 1997.


An Integrated Shared-Memory / Message Passing API for.. - Speight, Abdel-Shafi, .. (1998)   Self-citation (Speight)   (Correct)

....we believe that this coarse grained interleaving is unnecessarily constraining. There are many instances when it is useful to send bulk data while maintaining full coherence. This section discusses the issues involved in the design of the Brazos Common API. 2. 1 An Overview of Brazos Brazos [17] uses page based memory protection and low level message passing primitives to implement the abstraction of shared memory on a network of clustered SMP computers under Windows NT. Brazos currently runs on x86 multiprocessors connected by FastEthernet, Gigabit Ethernet, or ServerNet [10] and ....

E. Speight, Efficient Runtime Support for Cluster-Based Distributed Shared Memory Multiprocessors. Ph.D. Thesis, Rice University, 1997.


Realizing the Performance Potential of the Virtual.. - Speight, Abdel-Shafi, .. (1999)   (15 citations)  Self-citation (Speight)   (Correct)

.... results of the low level latency benchmark described in Section 4 (labeled VI Native) and the results from the same test run using MPI calls instead of VIPL calls (labeled MPI VI Baseline) These results were obtained by simply replacing the existing UDP sends and receives in the Brazos [23] MPI implementation with the corresponding VI calls, without regard to optimizing the resulting MPI implementation in any way to use VI. For comparison purposes, the results obtained using the cLAN UDP MPI implementation are also shown (labeled MPI UDP) The dramatic difference in latency ....

E. Speight, "Efficient Runtime Support for Cluster-Based Distributed Shared Memory Multiprocessors," Ph.D. Thesis, Rice University, 1997.


Reducing Coherence-Related Communication in Software.. - Speight, Bennett (1998)   (2 citations)  Self-citation (Speight)   (Correct)

....systems is therefore challenged by the large size of the unit of sharing (a page) and the high latency associated with accessing remote memory. One of the principal contributing factors to message latency is the high operating system cost associated with initiating an inter process network message [10]. Although it is productive to reduce this latency by low level changes to the OS or runtime system [11, 12, 13] the focus of this paper is performance improvement through the reduction in the number of coherence related messages. 1.2 Summary of Results In this paper, we present the following ....

....when to drop out of the copyset is implemented as follows. We associate a counter with each page of shared data. This counter is initially set to an applicationspecific threshold value derived from past executions of the program (a discussion of the Brazos history mechanism can be found in [10]) Whenever the process receives indirect diffs for a page that is not accessed before the next time the page is invalidated, the counter value is decremented by 1. If the process uses the indirect diffs, the counter is reset to the initial value. If the counter reaches 0, the process removes ....

[Article contains additional citation context not shown here]

W. E. Speight, Efficient Runtime Support for Cluster-Based Distributed Shared Memory Multiprocessors, PhD thesis, Rice University, Aug. 1997.


Using Multicast and Multithreading to Reduce Communication.. - Speight, Bennett (1998)   (9 citations)  Self-citation (Speight)   (Correct)

....multicast. This section addresses these issues. Multicast is not scalable. We have found that software DSM systems relying on multicast communication are no less scalable than software DSM systems that rely on point to point communication, in either a broadcast or switched Ethernet environment [19]. In general, for the systems we have studied (PC servers with very fast processors and relatively slow interconnection networks) the maximum number of processors that can be productively employed is on the order of 8 to 16. This number can be increased by a hybrid hardware software DSM system ....

....be increased by a hybrid hardware software DSM system such as Brazos that takes advantage of hardware cache coherence available on SMP servers. However, problems such as contention for the network, bus, and memory resources on a single machine may still limit the performance of some applications [19]. Multicast is hardware dependent. The use of multicast has been increasing over the last few years, mostly in multimedia applications, as PC hardware manufactures have realized that multicast can improve the utilization of limited network bandwidth. This is evidenced by the establishment of ....

[Article contains additional citation context not shown here]

E. Speight. Efficient Runtime Support for Cluster-Based Distributed Shared Memory Multiprocessors. Ph.D. Thesis, Rice University, Houston, 1997.


Brazos: A Third Generation DSM System - Speight (1997)   (30 citations)  Self-citation (Speight)   (Correct)

....if there are more processors located on a given server than the number of user threads assigned to it. Finally, the use of a separate thread to handle incoming replies allows Brazos to maintain multiple simultaneous outstanding network requests, which can significantly improve performance [22]. 3.3. Software Scope Consistency DSM systems must maintain data consistency to ensure that threads do not access stale or out of date data that was written by a thread on another machine. Although a detailed discussion of the many consistency models used in shared memory systems is beyond the ....

....is desirable because the mapping between pages and data may change across program execution, but program variables generally do not. The history mechanism and dynamic page management protocol were not used in obtaining the results presented in Section 4. Details on these techniques can be found in [22]. 3.6. Brazos Program Development Users write Brazos programs using familiar sharedmemory programming semantics. Any shared data in the system may be transparently accessed by any thread without regard to where in the system the most current value for that data resides. The Brazos runtime system ....

[Article contains additional citation context not shown here]

E. Speight. Efficient Runtime Support for ClusterBased Distributed Shared Memory Multiprocessors. Ph.D. Thesis, Rice University, Houston, 1997.


An Integrated Shared-Memory / Message Passing API for.. - Speight, Abdel-Shafi, ..   Self-citation (Speight)   (Correct)

....we believe that this coarse grained interleaving is unnecessarily constraining. There are also instances in which it is useful to send bulk data while maintaining coherence. This section discusses the issues involved in the design of the Brazos Common API. 2. 1 AN OVERVIEW OF BRAZOS Brazos [17] uses page based memory protection and low level message passing primitives to implement the abstraction of shared memory on a network of clustered SMP computers under Windows NT. Brazos currently runs on x86 multiprocessors connected by FastEthernet, Gigabit Ethernet, or ServerNet [10] and ....

E. Speight, Efficient Runtime Support for Cluster-Based Distributed Shared Memory Multiprocessors. Ph.D. Thesis, Rice University, 1997.


Fault-Tolerance Using Cache-Coherent Distributed Shared.. - Hecht Kavi Gaede   (Correct)

No context found.

E. Speight. "Efficient runtime support for cluster based distributed shared memory multiprocessors", PhD Thesis, Rice University, Houston, TX, 1997.


Fault-Tolerance Using Cache-Coherent Distributed Shared Memory.. - Hecht Kavi   (Correct)

No context found.

E. Speight. "Efficient runtime support for cluster based distributed shared memory multiprocessors", PhD Thesis, Rice University, Houston, TX, 1997.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC