| A. Bilas, D. Jiang, Y. Zhou, and J.P. Singh. Limits to the performance of software shared memory : A lay ered approach. In The 5th IEEE Symposium on High-Performance Computer Architecture, February 1999. Also Princeton University Tech. Report No. TR-57698. |
....set corresponds naturally to a particular protocol decomposition. We can therefore correlate parameter values and decompositions. We also consider a larger range of PPSs and show that some are more sensitive to particular parameters than others. Bilas et al. identify bottlenecks in DSM systems [7]. Their simulation study revolves around the same layers as used in this paper: low level communication software and hardware, PPS, and applications. Bilas et al. use memory mapped communication and analyze the performance of page based and fine grained DSM systems. Our work uses packet based ....
A. Bilas, D. Jiang, Y. Zhou, and J. Singh. Limits to the Performance of Software Shared Memory: A Layered Approach. In Proc. of the 5th Int. Symp. on High-Performance Computer Architecture, pp. 193--202, Orlando, FL, Jan. 1999.
.... The dependency of sof tware shared memory on communication layer parameters was studied in [9] which represents the first part of this paper.The limitations of and synergies between the di#erent layers of shared memory clusters, bothf or SVM as well as fine grained sof tware DSM were studied in [7].The authors in [50] examine the impact of network total order, broadcast, and remote write capability on af amilyof shared memory protocols.They find that latency is more important than remote writes, broadcast, or total ordering.The di#erence with our results comesf rom the significant ....
A. Bilas, D. Jiang, Y. Zhou, and J.P. Singh. Limits to the performance of software shared memory : A lay ered approach. In The 5th IEEE Symposium on High-Performance Computer Architecture, February 1999. Also Princeton University Tech. Report No. TR-57698.
....The dependency of software shared memory on communication layer parameters was studied in [9] which represents the first part of this paper. The limitations of and synergies between the di#erent layers of shared memory clusters, both for SVM as well as fine grained software DSM were studied in [7]. The authors in [50] examine the impact of network total order, broadcast, and remote write capability on a family of shared memory protocols. They find that latency is more important than remote writes, broadcast, or total ordering. The di#erence with our results comes from the significant ....
A. Bilas, D. Jiang, Y. Zhou, and J.P. Singh. Limits to the performance of software shared memory: A layered approach. In The 5th IEEE Symposium on High-Performance Computer Architecture, February 1999. Also Princeton University Tech. Report No. TR-57698.
....or protocol layer runs the application itself. All these layers have substantial impact on end application performance. Our research approach has been to examine applications and systems together through all these layers, opening up each and trying to understand where best to make improvements [6]. As we will see in later chapters, understanding performance portability and scalability requires this. It cannot be done based on system level research alone, treating applications as a black box. Furthermore, we should take care to use up to date, aggressive hardware and software systems, and ....
A. Bilas, D. Jiang, Y. Zhou, and J. P. Singh. Limits to the performance of software share d memory: A layered approach. In The 5th International Symposium on High Performance Computer Architecture, Feb 1999.
.... single writer) the good news is that they are mostly performance portable to hardwarecoherent systems as well [22] Simultaneous research in applications and the systems layers, rather than treating either as fixed, is important to understand and exploit the synergies among the layers [40, 5], truly understand the potential of SVM, and develop guidelines for performanceportable shared memory programming across hardwarecoherent systems and clusters, the two major emerging multiprocessor platforms. Tools to understand bottlenecks and check violations of consistency are also important. ....
....activity to synchronization points, it can better tolerate finer grained synchronization) Application restructuring aids SVM further. More research is needed to determine which approach is clearly superior to the other given instrumentation costs and future trends in communication performance [40] (generally, higher bandwidth and message handling costs favor SVM while lower latency favors a fine grained approach) Benchmarks FG SC HLRC LU 6.3 8.3 Ocean 6.1 8.3 Water Nsquared 11.4 11.1 Volrend 6 9 Water Spatial 11.6 11.1 Raytrace 12.2 13 Barnes 7 6 Table 4: Fine grain SC and HLRC, ....
Jaswinder Pal Singh, Angelos Bilas, Dongming Jiang, and Yuanyuan Zhou. Limits to the performance of software shared memory: A layered approach. Technical Report TR-576-98, Computer Science Department, Princeton University, Princeton, NJ-08544, November 1997.
....or program layer, the protocol layer that supports the programming model, and the communication layer that implements the machine s communication architecture. Each of the system layers has both performance and functionality characteristics, which can be enhanced to improve overall performance [9]. Since SVM was first proposed [35] much research has been done in improving the protocol layer by relaxing the memory consistency models [5, 32] by improving the communication layer with low latency, high bandwidth, userlevel communication [20, 13, 17, 38, 15, 47, 4, 39, 26, 33, 34, 51, 6] ....
A. Bilas, D. Jiang, Y. Zhou, and J. Singh. Limits to the performance of software shared memory: A layered approach. Proceedings of the 5th International Symposium on High Performance Computer Architecture, Orlando, February 1999. Also Princeton University Tech. Report No. TR576 -98.
....into the simulated system to analyze the causes of observed e#ects. 1 Introduction The performance of an application running on a parallel system is a#ected by several layers of software and hardware. For page grained shared virtual memory (SVM) on clusters, the layers are as shown in Figure 1 [26]. A key question for such systems is how much and what kind of limited hardware support is most e#ective in accelerating their performance, thus bringing it closer to that of hardware coherence. The goal is to make the shared address space model attractive for application users on clusters as ....
....infrastructure of SMPs connected by Myrinet. Our analysis confirms several key outstanding problems for home based SVM systems, most of which have been observed earlier (e.g. 15, 16, 3, 5, 22, 1] and that are not alleviated fully by AU. Let us examine these layer by layer (see Figure 1 and [26]) In the communication layer, the problem of latency critical control and request messages getting stuck behind others in queues can perhaps be alleviated by using separate queues for di#erent message or packet types, or simply di#erent priorities for them. The problem of interrupts being ....
J. P. Singh, A. Bilas, D. Jiang, and Y. Zhou. Limits to the performance of software shared memory: A layered approach. Technical Report TR-576-98, Computer Science Department, Princeton University, Princeton, NJ-08544, Nov. 1997.
....into the simulated system to analyze the causes of observed effects. 1 Introduction The performance of an application running on a parallel system is affected by several layers of software and hardware. For page grained shared virtual memory (SVM) on clusters, the layers are as shown in Figure 1 [26]. A key question for such systems is how much and what kind of limited hardware support is most effective in accelerating their performance, thus bringing it closer to that of hardware coherence. The goal is to make the shared address space model attractive for application users on clusters as ....
....infrastructure of SMPs connected by Myrinet. Our analysis confirms several key outstanding problems for home based SVM systems, most of which have been observed earlier (e.g. 15, 16, 3, 5, 22, 1] and that are not alleviated fully by AU. Let us examine these layer by layer (see Figure 1 and [26]) In the communication layer, the problem of latency critical control and request messages getting stuck behind others in queues can perhaps be alleviated by using separate queues for different message or packet types, or simply different priorities for them. The problem of interrupts being ....
J. P. Singh, A. Bilas, D. Jiang, and Y. Zhou. Limits to the performance of software shared memory: A layered approach. Technical Report TR-576-98, Computer Science Department, Princeton University, Princeton, NJ-08544, Nov. 1997.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC