| A. Bilas. Improving the Performance of Shared Virtual Memory on System Area Networks. PhD thesis, Department of Computer Science, Princeton University, Nov. 1998. |
....have been proposed for DSMs. Donald Yeung et al. 15, 16] started from a clusters of SMPs with a relatively simple protocol. Blelloch and Gibbons et al. studied the performance of planar DAG scheduling. Their model supports synchronization based on write once synchronization variables [2] Bilas [1] analyzed the performance of shared virtual memory on networks from communication layer, protocol layer, and application layer. Our performance model is based on Cilk s initial model and the effect of DSM operations. Applications sequential execution 2 processors 4 processors 8 processors vWxg ....
A. Bilas. Improving the Performance of Shared Virtual Memory on System Area Networks. PhD thesis, Department of Computer Science, Princeton University, Nov. 1998.
....in the firmware. Consequently, we wanted to reimplement the firmware using the ESP language. The VMMC software has been extensively used by a number of research projects. Several standard high level communication libraries including Remote Procedure Call (RPC) 17] Shared Virtual Memory (SVM) [16], Sockets [37] and NX Message Passing [4] have been implemented on top of the low level API provided by VMMC. Several distributed applications [59, 66] that run on a cluster have also used VMMC as the communication mechanism. Significant effort [42, 41, 18, 30] has been spent on implementing, ....
A. Bilas. Improving the Performance of Shared Virtual Memory on System Area Networks (Thesis). Technical Report TR-586-98, Princeton University, Department of Computer Science Department, 1998.
....messages is not worthwhile with current CPU and network technologies since the main CPU is idle when the message arrives. However, finding an efficient communication mechanism to replace the interrupt or polling is worth studying, and this has been a direction for future research on software DSM[21, 15]. 3. Data miss penalty contributes 11.75 of the whole execution time on average. In other words, 39.17 system overhead is spent on data miss penalty. So finding an efficient way to reduce data miss penalty is the most important thing. Although remote page fetching costs much time, we find the ....
A. Bilas. Improving the Performance of Shared Virtual Memory on System Area Networks. PhD thesis, Dept. of Computer Science, Princeton University, November 1998.
....portability as well. 3.1.1 Cluster Platform I: Page Based Shared Virtual Memory Simulator Although real SVM systems are available, a detailed simulator provides the possibility, as well as ease, of in depth performance debugging. The simulator has been validated against a real SVM cluster [5]. Performance monitor and tools are also being developed on our real SVM systems [55] Our simulation environment is built on top of augmint [70] an execution driven simulator using the x86 instruction set, and runs on x86 systems. It models a cluster of 16 uniprocessor nodes connected with a ....
....interconnect [10] Contention is modeled in great detail at all levels, including the network end points, except in the network links and switches themselves. The processor has a P6 like instruction set, and is assumed to be a 1 IPC processor. Details about the simulator can be found in [5]. The memory hierarchy within a node is modeled after that of the PentiumPro systems we use in our real implementation. The data cache hierarchy is an 8 KByte first level direct mapped write through cache and a 512 KBytes second level two way set associative cache (line size 32B each) The L2 ....
A. Bilas. Improving the Performance of Shared Virtual Memory on System Area Networks. PhD thesis, Department of Computer Science, Princeton University, 1998. 166 BIBLIOGRAPHY 167
....an interest in exploring moving functionality from the main compute nodes down into the network interface. In some studies, such functionality is implemented as extra handler code run by a programmable network interface processor such as the LANai processor in a Myrinet network interface [5] [6] [7] Other approaches have provided even more aggressive levels of hardware support, up to full hardware cache coherence [8] 9] Our proposal, which implements protocol processing in configurable FPGA chips on the network interface, represents an intermediate position between full hardware or ....
Angelos Bilas. Improving the Performance of Shared Virtual Memory on System Area Networks. Technical Report #TR-586-98, Princeton Computer Science Department, August, 1998.
....except in the network links and switches themselves. Thus, when we change protocol or communication layer costs, the impact on contention is included as well. The processor has a P6 like instruction set, and is assumed to be a 1 IPC processor. Details about the simulator can be found in [1]. e o r y B u s m M Processor First level Cache Core Buffer Write Second Level Cache F F O F F O Core I O B u s Snooping Device Network Interface M e m o r y Figure 2: Simulated node architecture. The fine grained access control needed for FG can be provided ....
....The cost of each protocol handler is computed according to the protocol task it performs. The simulator has been validated against real system implementations for both FG (by setting parameters close to those of the Typhoon zero system [16] and comparing with it) and SVM for our real cluster [1]. The results, omitted for space reasons, are surprisingly accurate. Applications: Table 1 shows the applications and the problem sizes we use in this work. These applications are written for hardware DSM and they are known to deliver excellent parallel speedups for hardware cache coherent ....
[Article contains additional citation context not shown here]
A. Bilas. Improving the Performance of Shared Virtual Memory on System Area Networks. PhD thesis, Dept. of Computer Science, Princeton University, August 1998. Available as technical report TR-586-98.
....except in the network links and switches themselves. Thus, when we change protocol or communication layer costs, the impact on contention is included as well. The processor has a P6 like instruction set, and is assumed to be a 1 IPC processor. Details about the simulator can be found in [1]. e o r y B u s m M Processor First level Cache Core Buffer Write Second Level Cache F F O F F O Core I O B u s Snooping Device Network Interface M e m o r y Figure 2: Simulated node architecture. The fine grained access control needed for FG can be provided ....
....The cost of each protocol handler is computed according to the protocol task it performs. The simulator has been validated against real system implementations for both FG (by setting parameters close to those of the Typhoon zero system [17] and comparing with it) and SVM for our real cluster [1]. The results, omitted for space reasons, are surprisingly accurate. Applications: Table 1 shows the applications and the problem sizes we use in this work. These applications are written for hardware DSM and they are known to deliver excellent parallel speedups for hardware cache coherent ....
[Article contains additional citation context not shown here]
A. Bilas. Improving the Performance of Shared Virtual Memory on System Area Networks. PhD thesis, Dept. of Computer Science, Princeton University, August 1998. Available as technical report, Princeton University TR-58698.
.... 120 130 4 8 16 32 64 128 256 512 1K 2K 4K 8K 16K 32K 64K 128K 256K 512K 1M 2M 4M Message Size (in bytes) 0 10 20 30 40 50 60 70 80 90 100 110 120 130 Bandwidth (MB s) Max N2H AM BIP FM PM VMMC Figure 11: Unidirectional Bandwidth nodes in the network [15] or on demand [3], when a node needs to find a new route. Dynamic network mapping and reliable communication allow for easy maintenance, upgrading, changes in the network topology, and configuration. Connection establishment: Mechanisms for connection establishment can either be provided by a library or left to ....
A. Bilas. Improving the Performance of Shared Virtual Memory on System Area Networks. PhD thesis, Dept. of Computer Science, Princeton University, August 1998. Available as technical report TR-586-98.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC