| K. L. Johnson, "High-performance all-software distributed shared memory," Massachusettes Institute of Technology, Cambridge, Mass., Tech. Rep. MIT/LCS/TR-674, Dec. 1995. |
....Cohesion [54] Millipede[55] 25] MigThread outfitted Strings [56] and JIAJIA [57] and DSM included distributed operating systems, like Gobelins, that use process thread migration mainly to improve load balancing. There also exist DSM systems, e.g. D CVM, and those, e.g. Prelude [58] and MCRL [59], that make use of thread migration and computation migration respectively to exploit data locality. Prelude was one of the pioneers in using computation migration, which can be viewed as a generalization of thread 9 migration, on a DSM system, to exploit data locality. Computation migration is ....
K. L. Johnson, "High-performance all-software distributed shared memory," Massachusettes Institute of Technology, Cambridge, Mass., Tech. Rep. MIT/LCS/TR-674, Dec. 1995.
....typically are short lived in Orca, this is not likely to happen often. Both problems discussed above (polling or interrupt overhead and context switching overhead) are certainly not unique to Myrinet. They are equally important on other platforms that have fast communication, such as the CM 5 [15]. On slower networks, the problems are less serious, because the relative overhead of an interrupt or context switch is much lower. 4.2 Exploiting the Programmability of the Interface Boards Another performance problem in our initial design is the high cost of the Panda multicast protocol. ....
....100 raises the polling overhead to a factor of two, because of the additional network polls. Decreasing the rate to N = 10000 also has a negative impact on performance because the network is not polled fast enough. Setting the polling frequency right is a difficult problem, as recognized by others [15]. The optimized system handles incoming messages through interrupts, which is much more efficient for Water. We have determined that the overhead of such interrupts on execution time is less than 0.5 (using eight processors) Comparing the speedups of Water on Ethernet and the initial Myrinet ....
K.L. Johnson. High-Performance All-Software Distributed Shared Memory. Technical Report MIT/LCS/TR-674 (Ph.D. thesis), Massachusetts Institute of Technology, December 1995.
....messages are processed by the local node. In contrast, polling requires that the programmer be aware of when messages might need to be processed and ensure that the network is polled frequently enough to allow the messages to be serviced promptly, or else suffer poor performance or even deadlock [14]. Placing this additional burden on programmers has a negative impact on the ease of use of the message passing programming model. 2) Comparing Communication Volumes: Having validated shared memory and message passing independently, the next step is to compare their relative merits. Although ....
K. Johnson, "High-performance all software distributed shared memory," Ph.D. dissertation, Dep. Elect. Eng. Comput. Sci., Massachusetts Inst. Technol., Cambridge, Nov. 1995.
....locally after a start operation. We use the same DSM protocol as CRL and similar access checks. The DSM protocol is a home node, directory based invalidation protocol. We, like CRL, employ a sequential memory consistency model. Exact information about the coherency protocol can be found in [6]. The communication substrate used is LFC [3] on Myrinet [4] running on a cluster of workstations (Pentium Pro s at 200 MHz) The region s state information is stored in front of the object (see Figure 2) so it can be found quickly. Next, pointers to all region pointers are stored in a list. As ....
K.L. Johnson. High-Performance All-Software Distributed Shared Memory. PhD thesis, Laboratory for Computer Science, MIT, Cambridge, MA, December 1995. Technical Report MIT/LCS/TR674.
.... potentials in a system of water molecules in the liquid state, as reported in the SPLASH parallel application suite description [49] The version of the application that we used is adapted from the nsquared version from the SPLASH 2 benchmark suite [66] and is identical to that reported on in [29]. As described in [29] there is a region for each molecule and three small regions used to calculate running sums updated every iteration by each processor. The problem size that we use is 512 molecules. As suggested in the benchmark notes, the application was run for three iterations, and the ....
.... of water molecules in the liquid state, as reported in the SPLASH parallel application suite description [49] The version of the application that we used is adapted from the nsquared version from the SPLASH 2 benchmark suite [66] and is identical to that reported on in [29] As described in [29], there is a region for each molecule and three small regions used to calculate running sums updated every iteration by each processor. The problem size that we use is 512 molecules. As suggested in the benchmark notes, the application was run for three iterations, and the times for the second and ....
[Article contains additional citation context not shown here]
K.L. Johnson. High-Performance All-Software Distributed Shared Memory. PhD thesis, Massachusetts Institute of Technology, December 1995.
....a message. The application is required to poll the device status regularly; when a message is detected, a user space message handler is invoked to process the message. Controlling the polling rate, however, is sensitive and the incorporation of polls into a multi threaded application is difficult [7]. With the current interest in high performance user level communication, interrupt driven versus polling based message handling has become an important issue. In this paper, we study the tradeoff between both mechanisms from two viewpoints: performance and ease of programming. We argue that ....
....respect to varying message injection and polling rates. We expect these rates to influence the performance of pollingbased and interrupt driven mechanisms in different ways; the goal of this experiment is to validate our expectations. The program we use is based on the benchmark used by Johnson [7]. In this benchmark, which we ran on eight processors, each processor repeatedly sends a request to a random partner and waits for a reply from that partner. After receiving the reply, the processor performs a fixed, but controllable, amount of synthetic work before it sends the next request. The ....
[Article contains additional citation context not shown here]
K.L. Johnson. High-Performance All-Software Distributed Shared Memory. PhD thesis, Laboratory for Computer Science, MIT, Cambridge, U.S.A., December 1995. Technical Report MIT/LCS/TR-674.
....messages are processed by the local node. In contrast, polling requires that the programmer be aware of when messages might need to be processed and ensure that the network is polled frequently enough to allow the messages to be serviced promptly, or else suffer poor performance or even deadlock [14]. Placing this additional burden on programmers has a negative impact on the ease of use of the message passing programming model. Comparing communication volumes Having validated shared memory and message passing independently, the next step is to compare their relative merits. Although ....
Kirk Johnson. High-Performance All Software Distributed Shared Memory. PhD thesis, M.I.T., Department of Electrical Engineering and Computer Science, November 1995.
....from the underlying network and supports a simple communication model based on shared objects. Orca provides a high level programming model, which can be characterized as object based distributed shared memory. Many more recent programming languages and libraries support a similar model [12] [16], 22] so Orca is representative for a larger body of work. We have used Orca for dozens of applications, and our experience indeed confirms that the language is easy to use, but that performance debugging is hard. To address this problem, we have developed a visualization tool for Orca called ....
Kirk L. Johnson, M. Frans Kaashoek, and Deborah A. Wallach. High-performance all-software distributed shared memory. 15th ACM Symp. on Operating Systems Principles, pages 213--228, Dec 1995.
....has received much attention. Several systems use a mixture of polling and interrupts. The Remote Queueing model of [12] uses interrupts only for specific messages (e.g. operating system messages) or under specific circumstances (e.g. network overflow) The CRL distributed shared memory system [13] uses interrupts to deliver protocol request messages, and switches to polling for receiving reply messages. Langendoen et al. 6] describe a generalization of this idea: polling is used if the receiving processor is idle (e.g. when it is waiting for a reply) interrupts are used if the receiving ....
K.L. Johnson. High-Performance All-Software Distributed Shared Memory. PhD thesis, Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, U.S.A., December 1995. Available as Technical Report MIT/LCS/TR-674.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC