| J. Heinlein. Optimized Multiprocessor Communication and Synchronization Using a Programmable Protocol Engine. Ph.D. Dissertation, Stanford University, Stanford, CA, March1998. |
....active memory proposals haveadvocated the technique of remapping the address space 7 of a process in an application specific manner. Accesses to this space are then used as a signal to the memory controller to perform active operations rather than satisfying this access from physical memory [5,13,30]. For example, when performing matrix operations that require row and column traversals, one traversal uses the cache effectively whereas the other does not. We can provide multiple memory viewpoints of the same matrix using shadow address spaces. Row traversals are unchanged, whereas column ....
J. Heinlein. Optimized Multiprocessor Communication and Synchronization Using a Programmable Protocol Engine. Ph.D. Dissertation, Stanford University, Stanford, CA, March1998.
....synchronization primitives studied for a range of benchmarks. These gains were much more impressive than others had predicted, and underscored the magnitude of delay in acquiring data in a critical section, not just the lock. Over the years, various synchronization mechanisms have been proposed [29, 16, 11, 3, 13, 27, 25, 14]. Most synchronization operations use an atomic read modifywrite primitive. While not all architectures provide a synchronization primitive, they offer some form of instruction that atomically swaps a value (either predefined or contained in a register) with one in memory. Conditional versions of ....
John Heinlein. Optimized Multiprocessor Communication and Synchronization Using a Programmable Protocol Engine. PhD thesis, Stanford University, Stanford, CA, March 1998.
....solution, and an adaptive solu 26 tion that reverts to the centralized solution when there is little contention. Blizzard, an implementation of the Tempest interface that runs on a cluster of Sun SPARCstation 20s, supports a centralized message based lock [SFL 94] In his dissertation [Hei98], Heinlein describes a distributed message based lock implementation for the Stanford FLASH multiprocessor. Lim and Agarwal observe that different locking algorithms are better suited for different levels of contention. When there is no contention, test set can quickly acquire and release a lock, ....
....part with those published by Mellor Crummey and Scott, with one exception: the performance of test set with exponential back off on the GP1000 is, for the most part, worse than test set. They attribute this behavior to higher network contention in the GP1000 than in the Butterfly 1. In his thesis [Hei98], Heinlein compares the performance of his distributed message based locking primitive with the performance of MCS and a lock implemented with the load reserve store conditional instructions. To compare these primitives, he simulates the execution of a microbenchmark that is similar to the one ....
John Heinlein. Optimized Multiprocessor Communication and Synchronization Using a Programmable Protocol Engine. PhD thesis, Stanford University, Stanford, CA, March 1998.
.... Recent active memory proposals have advocated the technique of remapping the address space of a process in an application speci c manner and using accesses to this space as a signal to the memory controller to perform active operations rather than satisfying this access from physical memory [25,26]. For example, when performing matrix operations that require row and column traversals, one traversal uses the cache e ectively whereas the other does not. We can provide multiple memory viewpoints of the same matrix using shadow address spaces much like that proposed in the Impulse memory ....
....and column traversals, one traversal uses the cache e ectively whereas the other does not. We can provide multiple memory viewpoints of the same matrix using shadow address spaces much like that proposed in the Impulse memory controller [25] and for user level messaging in the FLASH multiprocessor [26]. Row traversals would be unchanged, whereas column traversals would be treated as row traversals of a matrix at a di erent (shadow) address. The memory controller would issue scatter gather commands to the active memory elements which in turn would fetch individual double words from a column and ....
J. Heinlein. Optimized Multiprocessor Communication and Synchronization Using a Programmable Protocol Engine. Ph.D. Dissertation, Stanford University, Stanford, CA, March 1998.
....is close to what a direct message passing implementation would have done. Unfortunately, due to the lack of direct control over communication, even these clever lock and barrier implementations incur more network traffic than a message passing implementation of similar algorithms (See Heinlein [40] 1 ) 1 Heinlein s implementation is best viewed as a hybrid that blurs the line between message passing and shared memory. A large part of the message passing code implementing the lock and barrier runs on an embedded processor in the NIU, with application code interacting with this code ....
J. Heinlein. Optimized Multiprocessor Communication and Synchronization Using a Programmable Protocol Engine. PhD thesis, Stanford University, Stanford, CA, Mar. 1998.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC