14 citations found. Retrieving documents...
A. Bilas, C. Liao, and J. Singh, "Using network interface support to avoid asynchronous protocol processing in shared virtual memory systems," in Proc. of ISCA, 1999. 128

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Active I/O Switches in System Area Networks - Hao, Heinrich   (Correct)

....of the large number of disks in servers to form a powerful parallel computing engine. There are network devices in this class as well. The Myrinet NIC has embedded processors that can execute user programs. Many research efforts take advantage of this computing power in different situations [5, 7, 12]. The main differences in our approach are that the intelligence lies in the switch rather than in the end devices, and the switch architecture contains customized hard ware to separate data from control and improve switch throughput. The location of our active switches within the system yields ....

A. Bilas, C. Liao, and J.P. Singh. Using Network Interface Support to Avoid Asynchronous Protocol Processing in Shared Virtual Memory Systems. In ISCA, pages 282-293, May 1999. [22]


Performance Impact of Using ESP to Implement VMMC Firmware - Kumar, Li (2002)   (Correct)

....Often, applications can achieve better performance if the device supports a richer interface. For instance, a similar set of SPLASH2 applications observed a 37 increase in performance when additional network support was added to VMMC to avoid asynchronous protocol processing in the SVM library [5]. ESP makes it easier to explore and add new features to the device firmware. The rest of the paper is organized as follows. Section II discusses the related work. Section III presents an overview of the ESP language. Section IV describes how the ESP compiler generates e#cient code. Section V ....

....head 314.98 4.76 B. Application Performance Applications. Figure 6 shows the experimental setup used to run the applications. The SPLASH2 applications [19] run on a cluster of SMP nodes using the VMMC software to communicate. These applications run on top of the Shared Virtual Memory (SVM) [5] library that, in turn, runs on top of the VMMC library. In the common case, the VMMC library bypasses the operating system and directly interacts with the VMMC firmware running on the Myrinet network card. The VMMC device driver that runs inside the operating system is needed for services like ....

[Article contains additional citation context not shown here]

A. Bilas, C. Liao, and J.P. Singh, "Using Network Interface Support to Avoid Asynchronous Protocol Processing in Shared Virtual Memory Systems," in International Symposium on Computer Architecture, 1999.


ESP: A Language for Programmable Devices - Kumar (2002)   (Correct)

....Call (RPC) 17] Shared Virtual Memory (SVM) 16] Sockets [37] and NX Message Passing [4] have been implemented on top of the low level API provided by VMMC. Several distributed applications [59, 66] that run on a cluster have also used VMMC as the communication mechanism. Significant effort [42, 41, 18, 30] has been spent on implementing, maintaining, performance tuning, and extending the functionality of VMMC. Our experience with program ming VMMC firmware using event driven state machines in C makes it an ideal candidate for a case study in this thesis. 1.4 Event driven State machines ....

....a given instant. 4.4.2 Application Performance Applications. Figure 4.6 shows the experimental setup used to run the applications. The SPLASH2 applications [103] run on a cluster of SMP nodes using the VMMC software to communicate. These applications run on top of the Shared Virtual Memory (SVM) [18] library that, in turn, runs on top of the VMMC library. The VMMC software architecture is discussed in Section 1.3.3. The applications in the SPLASH2 suite [103] are parallel applications that use sharedaddress space to communicate with each other. The versions of these applications used in this ....

[Article contains additional citation context not shown here]

A. Bilas, C. Liao, and J. Singh. Using Network Interface Support to Avoid Asynchronous Protocol Processing in Shared Virtual Memory Systems. In Proceedings of the International Symposium on Computer Architecture, Atlanta, Georgia, May 1999.


Removing the Overhead from Software-Based Shared Memory - Radovic, Hagersten (2001)   (2 citations)  (Correct)

....been reported [37] In this paper we suggest a new efficient approach for software based coherence protocols. While other work has proposed elaborate schemes for cutting down on the overhead associated with interrupting and or polling caused by the asynchronous communication between the agents [5], 29] our implementation has completely eliminated the protocol agent interactions. In DSZOOM the entire coherence protocol is implemented in the protocol handler running in the requesting processor. This also makes use of a processor that otherwise would have been idle. Rather than relying on a ....

....is shown in Figure 5. Floating point store snippets are the major slowdown factor for FFT, LU c, and LU nc. LU is one of the most store intensive SPLASH 2 applications [43] and will typically perform much better on software based DSM systems with weaker memory models (for example on GeNIMA [5] with home based LRC protocol) 7 Program 8 Processors 16 Processors FFT 1.29 ( 8.9 ) 1.08 ( 29.8 ) LU c 1.58 ( 0.2 ) 1.50 ( 8.3 ) LU nc 1.60 ( 9.6 ) 1.44 ( 6.2 ) Radix 1.15 ( 2.4 ) 1.07 ( 6.0 ) Barnes 1.15 ( 11.5 ) 1.05 ( 1.7 ) FMM 1.03 ( 3.2 ) 1.02 ( 3.6 ) Ocean c 1.25 ( 8.4 ) 1.14 ....

[Article contains additional citation context not shown here]

A. Bilas, C. Liao, and J. P. Singh. Using Network Interface Support to Avoid Asynchronous Protocol Processing in Shared Virtual Memory Systems. In Proceedings of the 26th Annual International Symposium on Computer Architecture (ISCA'99), May 1999.


DSZOOM - Low Latency Software-Based Shared Memory - Radovic, Hagersten (2001)   (Correct)

....[SFH # 96] In this paper we suggest a new efficient approach for software based coherence protocols. While other work have proposed elaborate schemes for cutting down on the overhead associated with interrupting and or polling caused by the asynchronous communication between the agents [BLS99] MFHW96] our implementation has completely eliminated the protocol agent interactions. In DSZOOM the entire coherence protocol is implemented in the protocol handler running in the requesting processor. This also makes use of a processor that otherwise would have been idle. Rather than relying ....

....relatively values from Table 2. Network delay is about 3 microseconds for all DSZOOM EMU configurations. 5 Related Work Many different SW DSM implementations have been proposed over the years: Blizzard S [SFL # 94] Brazos [SB97] Cashmere 2L [SDH # 97] DGK # 99] CRL [JKW95] GeNIMA [BLS99] Ivy [Li88] LH89] MGS [YKA96] Munin [CBZ91] Shasta [SGT96] SGA97] SG97a] SG97b] DGK # 99] Sirocco S [SFH # 98] SoftFLASH [ENCH96] and TreadMarks [KCDZ94] Most of them suffer from synchronous interrupt protocol processing. We belive that many of these implementations would ....

A. Bilas, C. Liao, and J. P. Singh. Using Network Interface Support to Avoid Asynchronous Protocol Processing in Shared Virtual Memory Systems. In Proceedings of the 26th Annual International Symposium on Computer Architecture (ISCA'99), May 1999.


ESP: A Language for Programmable Devices - Kumar, Mandelbaum, Yu, Li (2001)   (Correct)

....in the system. These fast paths tend to be fairly brittle and applications often fall o# the fast path. While some applications [16] which repeatedly send very large messages) that have very simple communication patterns benefit from the fast paths, a lot of applications do not. SVM applications [4] experience a lot of contention in the network and the actual latency measured by the di#erent applications varied between 3 times to 10 times slower than the microbenchmarks numbers for small messages. So, for most applications, the vmmcOrigNoFastPaths is a more accurate representative than ....

A. Bilas, C. Liao, and J. Singh. Using network interface support to avoid asynchronous protocol processing in shared virtual memory systems. In International Symposium on Computer Architecture, June 1999.


Providing Hardware DSM Performance at Software DSM Cost - Heinrich, Speight (2000)   (Correct)

....alleviating the amount of expensive communication to some degree, high synchronization rates, frequent sharing, or large amounts of false sharing severely hinder the performance of software DSM systems. As a result, their performance remains poor compared to their hardware DSM counterparts [3,9]. Still, the cost advantages of software DSM clusters make them a viable alternative for certain applications. 3. Differences Between Hardware DSM and Software DSM To the naive eye, a physical comparison of a hardware DSM machine like the FLASH multiprocessor with a modern software DSM system ....

A. Bilas, C. Liao, and J. P. Singh. Using Network Interface Support to Avoid Asynchronous Protocol Processing in Shared Virtual Memory Systems. In Proceedings of the 26th International Symposium on Computer Architecture,May1999.


Software Distributed Shared Memory over Virtual Interface.. - And   (Correct)

....the time to apply the di from the handler time. Knowing the total di size that was transferred and approximating the di application time with the memory copy time, for all seven applications we studied, we got a gain of no more than 5 . This is consistent with what other people have shown [3]. Remote Read. RDMA Read is a VIA feature that allows fetching of data without interrupting the processor on the remote node. Although present in the VIA speci cation, the VIA implementation that we used in our experiments does not support RDMA Read. We try to make a rough approximation of the ....

....5 . The elimination of the remote handling time, would also reduce the communication latency experienced by the clients, by the same amount. This brings the total contribution of the remote read to no more than 10 , not counting the side e ect on synchronization due to critical section dilation [3]. Bilas et al. [3] have shown that the remote read facility can help reduce the page fetch times by about 20 for most applications. Broadcast Support. VIA doesn t specify any primitive or mechanism for broadcast. Broadcast can be really useful in the context of a software DSM system. With support ....

[Article contains additional citation context not shown here]

Angelos Bilas, Cheng Liao, and Jaswinder Pal Singh. Using Network Interface Support to Avoid Asynchronous Protocol Processing in Shared Virtual Memory Systems. In Proceedings of the 26th International Symposium on Computer Architecture, 1999.


The Effect of Network Total Order, Broadcast, and.. - Stets, Dwarkadas, ..   (Correct)

....the page directory among nodes, and relying on network total order and reliability to avoid acknowledging the receipt of metadata information. This paper evaluates the performance implications of each of these design decisions. Our investigation builds on earlier results from the GeNIMA SDSM [4]. GeNIMA s creators examined the performance impact of remote reads, remote writes, and specialized locking suport in the network interface. In our investigation, we examine remote writes, inexpensive broadcast, and network total order. In subsequent sections, we will explain how these features ....

....in LU is not as large: 400K out of a total of 1.19M page updates are satisfied by the broadcast buffers, while 106K pages are placed in the broadcast buffers. All other applications, with the exception of SOR, also benefit from the use of CSM ADB by smaller amounts. 4. Related Work Bilas et al. [4] use their GeNIMA SDSM to examine the impact of special network features on SDSM performance. Their network has remote write, remote read, and specialized lock support, but no broadcast or total ordering. GeNIMA disseminates write notices through broadcast and so could benefit from the appropriate ....

A. Bilas, C. Liao, and J. P. Singh. Using Network Interface Support to Avoid Asynchronous Protocol Processing in Shared Virtual Memory Systems. In Proc. of 26th Intl. Symp. on Computer Architecture, Atlanta, GA, May 1999.


The Effect of Network Total Order, Broadcast, and.. - Stets, Dwarkadas, ..   (Correct)

....the page directory among nodes, and relying on network total order and reliability to avoid acknowledging the receipt of metadata information. This paper evaluates the performance implications of each of these design decisions. Our investigation builds on earlier results from the GeNIMA SDSM [4]. The GeNIMA researchers examined the performance impact of remote read, remote write, and specialized locking suport in the network interface. In our investigation, we examine remote write, along with features for inexpensive broadcast and network total order. In subsequent sections, we will ....

....buffers, while 30K pages are placed in the broadcast buffers) However, due to the large amount of false sharing in this application, the adaptive broadcast protocol is able to significantly reduce the synchronization wait time by reducing protocol perturbation. 4 Related Work Bilas et al. [4] use their GeNIMA SDSM to examine the impact of special network features on SDSM performance. Their network has remote write, remote read, and specialized lock support, but no broadcast or total ordering. GeNIMA disseminates write notices through broadcast and so could benefit from efficient ....

A. Bilas, C. Liao, and J. P. Singh. Using Network Interface Support to Avoid Asynchronous Protocol Processing in Shared Virtual Memory Systems. In Proceedings of 26th International Symposium on Computer Architecture, Atlanta, GA, May 1999.


Jupiter/SVM: A JVM-based Single System - Image For Clusters   Self-citation (Bilas)   (Correct)

No context found.

A. Bilas, C. Liao, and J. Singh, "Using network interface support to avoid asynchronous protocol processing in shared virtual memory systems," in Proc. of ISCA, 1999. 128


CableS : Thread Control and Memory System Extensions for.. - Jamieson, Bilas (2001)   Self-citation (Bilas)   (Correct)

....a low latency, high bandwidth Myrinet SAN [5] The software infrastructure in the system includes a custom communication layer and a highly optimized SVM system. The communication layer we use on top of Myrinet is a user level communication layer, Virtual Memory Mapped Communication (VMMC) [2,9]. VMMC provides both explicit, direct remote memory operations (reads and writes) and notification based send primitives. The SVM protocol used is GeNIMA [16] which is a home based, page level SVM protocol. The consistency model in the protocol is Release Consistency [11] GeNIMA provides an ....

J. S. A.Bilas, C Liao. Using network interface support to avoid asynchronous protocol processing in shared virtual memory systems. In Proceedings of the The 26th International Symposium on Computer Architecture, Atlanta, Georgia, May 1998.


Cables: Thread Control and Memory Management Extensions for.. - Jamieson, Bilas (2002)   Self-citation (Bilas)   (Correct)

No context found.

J. S. A.Bilas, C Liao. Using network interface support to avoid asynchronous protocol processing in shared virtual memory systems. In Proceedings of the The 26th International Symposium on Computer Architecture, Atlanta, Georgia, May 1999.


Active Memory Clusters: Efficient Multiprocessing on.. - Heinrich, Speight.. (2002)   (1 citation)  (Correct)

No context found.

Bilas, A., Liao, C., Singh, J.P.: Using Network Interface Support to Avoid Asynchronous Protocol Processing in Shared Virtual Memory Systems. In Proceedings of the 26th International Symposium on Computer Architecture, May 1999.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC