29 citations found. Retrieving documents...
M. Marchetti, L. Kontothanassis, R. Bianchini, and M. L. Scott, "Using Simple Page Placement Policies to Reduce the Cost of Cache Fills in Coherent Shared-Memory Systems", Proceedings of the 9th International Parallel Processing Symposium, pp. 480-485, Santa Barbara, CA, April 1995. -

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Clustered Objects: Initial Design, Implementation and Evaluation - Appavoo   (Correct)

....we present later, we show that these techniques can lead to an improvement in performance of two orders of magnitude in some cases. Other more course grain approaches for improving locality in general SMP software include automated support for memory page placement, replication and migration [18, 23, 40] and cache a#nity aware process scheduling [39, 24, 13, 33, 9] 1.2 SMP Operating Systems Poor performance of the operating system can have considerable impact on application performance. For example, for parallel workloads studied by Torrellas et al. the operating system accounted for as much ....

Michael Marchetti, Leonidas Kontothanassis, Ricardo Bianchini, and Michael Scott. Using simple page placement policies to reduce the cost of cache fills in coherent shared-memory systems. In Proceedings of the 9th International Symposium on Parallel Processing (IPPS'95, pages 480--485, Los Alamitos, CA, USA, April 1995. IEEE Computer Society Press.


Is Data Distribution Necessary in OpenMP? - Nikolopoulos, Papatheodorou.. (2000)   (1 citation)  (Correct)

....upon a miss in the L2 cache. Page placement in ccNUMA systems is considered as a task of the operating system and previous research came up with simple solutions for achieving satisfactory data locality at the page level with page placement schemes implemented entirely in the operating system [15, 16]. However, the memory access traces of parallel programs do not and can not always conform to the memory management strategy of the operating system. The problem is pronounced in OpenMP because the programming model is oblivious to the distribution of data in the system. This section investigates ....

.... well tuned by the providers to exploit the characteristics of the memory system of the SGI Origin2000 and exhibit very good scalability up to 32 processors [8] The OpenMP implementations of the NAS benchmarks are optimized to achieve good data locality with a firsttouch page placement strategy [16]. This strategy places each virtual memory page in the same node with the processor that reads or writes it first during the execution of the program. First touch is the default page placement scheme used by cellular IRIX, the Origin2000 operating system. The NAS benchmarks are customized to ....

[Article contains additional citation context not shown here]

M. Marchetti, L. Kontothanassis, R. Bianchini and M. Scott. Using Simple Page Placement Policies to Reduce the Cost of Cache Fills in Coherent Shared-Memory Systems. Proceedings of the 9th International Parallel Processing Symposium, pp. 480--485. Santa Barbara, CA, April 1995.


Comparing the Effectiveness of Fine-Grain Memory Caching.. - Lai, Falsafi (2000)   (Correct)

....research indicates that CC NUMA s performance may be very sensitive to the initial data allocation and placement [9] As such, in this paper we use a first touch placement policy in all the systems we study. This policy is simple and has been shown to substantially eliminate unnecessary traffic [13]. In this policy, an user invoked directive on every node initiates page migration and placement at the start of the parallel phase of the program. Upon the first request for each page, the home node migrates the page to the requester, assuming the first requester is likely to prove a frequent ....

Michael Marchetti, Leonidas Kontothanassis, Ricardo Bianchini, and Michael L. Scott. Using simple page placement policies to reduce the cost of cache fills in coherent shared-memory systems. In Proceedings of the Nineth International Parallel Processing Symposium, April 1995.


Excel-NUMA: Toward Programmability, Simplicity, and High.. - Zhang, Cintra, Torrellas   (Correct)

....we show that EX NUMA gets close to the upper bound possible, namely elimination of all con ict misses. Other related work is operating system induced migration and replication of pages based on 16 run time feedback. Several authors have shown that this technique boosts the performance of CCNUMA [2, 9, 12]. This technique may have some of the e ects of EX NUMA, although migration is only supported at page granularity and, in addition, it has operating system overhead. We are currently comparing its e ectiveness to that of EX NUMA. 8 Discussion and Concluding Remarks This paper has presented ....

M. Marchetti, L. Kontothanassis, R. Bianchini, and M. Scott. Using Simple Page Placement Policies to Reduce the Cost of Cache Fills in Coherent Shared-Memory Systems. In Proceedings of the 9th International Parallel Processing Symposium, April 1995.


User-Level Dynamic Page Migration for.. - Nikolopoulos.. (2000)   (5 citations)  (Correct)

....affinity is important in order to avoid forcing threads accessing remote memory upon L2 cache misses. In practical situations, the memory affinity set of each thread is allocated on the node on which the thread is executed for the first time, via a first touch page placement strategy [6], or in an application specific manner which is hardwired in the program. In both circumstances, the memory allocation scheme assumes that each thread will be bound to a specific node of the system throughout the lifetime of the program. If a thread is migrated to a node other than the node on ....

....orphaned, in the sense that they no longer belong to the memory affinity set of any thread. 2.2. The Role of Dynamic Page Migration Dynamic page migration based on information from hardware reference counters is a technique that can potentially alleviate the problem described in Section 2. 1 [1, 6, 14]. The idea behind dynamic page migration is to collect per node reference information for each page in memory and migrate a page to a remote node if: 1) the reference counters indicate that the remote node accesses the page significantly more frequently compared to the home node; 2) the benefits ....

M. Marchetti, L. Kontothanassis, R. Bianchini and M. Scott. Using Simple Page Placement Policies to Reduce the Cost of Cache Fills in Coherent Shared-Memory Systems. Proc. of the 9th International Parallel Processing Symposium, pp. 480--485, Santa Barbara, CA, April 1995.


Exploring the Value of Supporting Multiple DSM Protocols in.. - Kuramkote, Carter (1999)   (Correct)

....was modified to provide the page translation, allocation, and replacement support needed by the various distributed shared memory models. However, we assume that enough memory is available at each node for home pages and for replicating any number of S COMA pages. We used the first touch algorithm [20] to distribute home pages to nodes. The DSM controller, the system bus, and the network are all clocked at 120MHz. All cycle counts reported herein are with respect to this clock. The model of the processor and bus interconnect, the system bus and memory controller are similar to the one found in ....

M. Marchetti, L. Kontothonassis, R. Bianchini, and M.L. Scott. Using simple page placement policies to reduce the code of cache fills in coherent shared-memory systems. In Proceedings of the Ninth ACM/IEEE International Parallel Processing Symposium (IPPS), April 1995.


Global Bus Design of a Bus-Based COMA Multiprocessor DICE - Gyungho Lee Bland   (Correct)

....in the system does not guarantee available space in a specific node, nor does it guarantee that the available space is contiguous. Since the concept of locality suggests that it would be highly beneficial if the node originating the page fault should also be the recipient of the incoming page [15], clearing specific locations for the incoming page may become necessary. Clearing a page of data in the local memory may only require reserving space if no EXL or SHO attributes currently exist. In the case where all the blocks are not in the INV or SHN state, a relocation transaction becomes ....

MARCHETTI, M., KONTOTHANASSIS, L., BIAN- CHINI, R., AND SCOTT, M.L. "Using Simple Page Placement Policies to Reduce the Cost of Cache Fills in Coherent Shared-Memory Systems," in Proceedings of the 9th International Parallel Processing Symposium, April 1995.


Design and Performance of the Software-controlled COMA - Moga (1998)   (Correct)

....3 opportunity can be exploited at several levels: application, compiler, operating system and hardware, being essentially a problem of data placement in the first three levels and data replication and migration in the last two. Clever initial placement controlled by the application or compiler [52], and page migration and replication by the operating system [79] can improve the situation, but there are limitations. Irregular sharing patterns and dynamic work scheduling defy static placement by applications and compilers, and un padded or mixed read write data can reduce the efficiency of ....

....miss handling 44 work interface can interrupt the main processor whenever there is a message in the incoming buffers. 3. 5 Data Placement in the Attraction Memory Techniques for data placement in the main memory of COMA architectures have received lesser attention than techniques for CC NUMA [52]. Part of the reason is the lesser performance impact of data placement in COMA, due to automatic data migration and replication at the main memory level. This, however, does not mean there are no benefits from careful, yet application independent, data placement in COMA. There are two aspects to ....

M. Marchetti, L. Kontothanassis, R. Bianchini, and M.L. Scott. Using Simple Page Placement Policies to Reduce the Cost of Cache Fills in Coherent Shared-Memory Systems. In Proc. of the 9th Int'l Parallel Processing Symposium. April 1995.


AS-COMA: An Adaptive Hybrid Shared Memory Architecture - Kuo, Carter, Kuramkote.. (1998)   (4 citations)  (Correct)

....access cache (RAC) on the DSM controller. Applications that suffer a large number of conflict misses to remote data, e.g. due to the limited amount of caching of remote data, perform poorly on CC NUMAs [5] Unfortunately, these applications are fairly common [5, 14, 16] Careful page allocation [2, 9], migration [21] or replication [21] can alleviate this problem by carefully selecting or modifying the choice of home node for a given page of data, but these techniques have to date only been successful for read only or non shared pages. The conflict miss cost in the CC NUMA model is ....

....shared memory models. All three hybrid architectures we study adopt BSD4.4 s page allocation mechanism and paging policy [10] with minor modifications. Free min and free target (see Section 3) were set to 5 and 7 of total memory, respectively. We extended the first touch allocation algorithm [9] to distribute home pages equally to nodes by limiting the number of home pages that are allocated at each node to a proportional share of the total number of pages. Once this limit is reached, remaining pages are allocated in a round robin fashion to nodes that have not reached the limit. The ....

M. Marchetti, L. Kontothonassis, R. Bianchini, and M.L. Scott. Using simple page placement policies to reduce the code of cache fills in coherent shared-memory systems. In Proceedings of the Ninth ACM/IEEE International Parallel Processing Symposium (IPPS), April 1995.


The Effectiveness of SRAM Network Caches in Clustered DSMs - Moga, Dubois (1998)   (17 citations)  (Correct)

....4.2. Evaluation approach We derive our results using execution driven simulation. Eight SPLASH 2 [25] benchmarks using a single address space programming model have been compiled for a SPARC V7 architecture. They are listed in Table2. Data is placed in main memory using a first touch policy [17]. The SPLASH 2 benchmarks are optimized, so that this policy is close to being optimal in minimizing the number of remote accesses. We have done a small modification to LU. With the original LU code, the first touch placement resulted in all pages been local for cluster 0, because the master ....

M. Marchetti, L. Kontothanassis, R. Bianchini, and M.L. Scott. Using Simple Page Placement Policies to Reduce the Cost of Cache Fills in Coherent Shared-Memory Systems. In Proceedings of the 9th International Parallel Processing Symposium. April 1995


Mechanisms for Distributed Shared Memory - Reinhardt (1996)   (4 citations)  (Correct)

....possible software protocol for Tempest systems. In Stache, each shared page has a unique home node. Currently, Stache provides two home node placement algorithms. The first assigns pages to nodes round robin as they are allocated. The second algorithm a simple first touch migrate once scheme [MKBS95] attempts to reduce communication by placing each page on a node that references it. In this algorithm, the first node to access a page is the initial home. Unfortunately, all shared data written during the sequential initialization phase ends up on one node. To redistribute this data, the ....

Michael Marchetti, Leonidas Kontothanassis, Ricardo Bianchini, and Michael L. Scott. Using simple page placement policies to reduce the cost of cache fills in coherent shared-memory systems. In Ninth International Parallel Processing Symposium, April 1995.


AS-COMA: An Adaptive Hybrid Shared Memory Architecture - Kuo, Carter, Kuramkote.. (1998)   (4 citations)  (Correct)

....access cache (RAC) on the DSM controller. Applications that suffer a large number of conflict misses to remote data, e.g. due to the limited amount of caching of remote data, perform poorly on CC NUMAs [3] Unfortunately, these applications are fairly common [3, 10] Careful page allocation [1, 7], migration [13] or replication [13] can alleviate this problem by selecting or modifying the choice of home node for a given page of data, but these techniques have to date only been successful for read only or non shared pages. The conflict miss cost in the CC NUMA model is represented by (N ....

....page size is 4 kilobytes. All three hybrid architectures we study adopt BSD4.4 s page allocation mechanism and paging policy [8] with minor modifications. Free min and free target (see Section 3) were set to 5 and 7 of total memory, respectively. We extended the first touch allocation algorithm [7] to distribute home pages equally. The modeled processor and DSM engine are clocked at 120MHz. The system bus modeled is HP s Runway bus, which is also clocked at 120MHz. All cycle counts reported herein are with respect to this clock. The characteristics of the L1 cache, RACs, and network that we ....

M. Marchetti, L. Kontothonassis, R. Bianchini, and M. Scott. Using simple page placement policies to reduce the code of cache fills in coherent shared-memory systems. In Proceedings of the Ninth ACM/IEEE International Parallel Processing Symposium (IPPS), Apr. 1995.


Eliminating Useless Messages in Write-Update Protocols .. - Bianchini, LeBlanc.. (1994)   (3 citations)  Self-citation (Bianchini)   (Correct)

....of proliferation updates is to assure that one of the processors sharing a block is the home node for that block. In fact, the processor that writes to the block the most should be made the home of the block. Page placement and migration techniques, such as presented in [Chandra et al. 1994; Marchetti et al. 1994] can be used to implement this strategy successfully in many cases. If applied to SOR, for instance, this strategy would have eliminated about half of the useless updates in the program. Another way in which proliferation updates can be eliminated is by combining writes to the same words in ....

M. Marchetti, L. I. Kontothanassis, R. Bianchini, and M. L. Scott, "Using Simple Page Placement Policies to Reduce the Cost of Cache Fills in Coherent Shared-Memory Systems," Technical Report 535, Department of Computer Science, University of Rochester, September 1994.


The Effect of Network Total Order, Broadcast, and.. - Stets, Dwarkadas, ..   Self-citation (Kontothanassis Scott)   (Correct)

....use of remote write, since efficient polling can be implemented on other network interfaces [10, 26] that lack the ability to write to arbitrary, user defined locations. 2.3. 5 CSM MS Mg and CSM None Mg: Home Node Migration All of the above protocol variants use first touch home node assignment [18]. Home assignment is extremely important because processors on the home node write directly to master copy and so do not incur costly twin and diff overheads. If a page has multiple writers during the course of execution, protocol overhead can potentially be reduced by migrating the home node to ....

M. Marchetti, L. Kontothanassis, R. Bianchini, and M. L. Scott. Using Simple Page Placement Policies to Reduce the Cost of Cache Fills in Coherent Shared-Memory Systems. In Proceedings of the Ninth International Parallel Processing Symposium, Santa Barbara, CA, April 1995.


The Effect of Network Total Order, Broadcast, and.. - Stets, Dwarkadas, ..   Self-citation (Kontothanassis Scott)   (Correct)

....message polling mechanism, described above, should be considered independent of remote write; similarly efficient polling can be implemented on other networks [10, 30] 2.3.5. CSM MS Mg and CSM None Mg: Home Node Migration. All of the above protocol variants use firsttouch home node assignment [20]. Home assignment is extremely important because processors on the home node write directly to the master copy and so do not incur the overhead of twins and diffs. If a page has multiple writers during the course of execution, protocol overhead can potentially be reduced by migrating the home node ....

M. Marchetti, L. Kontothanassis, R. Bianchini, and M. L. Scott. Using Simple Page Placement Policies to Reduce the Cost of Cache Fills in Coherent Shared-Memory Systems. In Proc. of the 9th Intl. Parallel Processing Symp., Santa Barbara, CA, Apr. 1995.


The Effect of Network Total Order, Broadcast, and.. - Stets, Dwarkadas, ..   Self-citation (Kontothanassis Scott)   (Correct)

....Our message polling mechanism, described above, should be considered independent of remote write; similarly efficient polling can be implemented on other networks [10, 30] 2.3. 5 CSM MS Mg and CSM None Mg: Home Node Migration All of the above protocol variants use first touch home node assignment [20]. Home assignment is extremely important because processors on the home node write directly to the master copy and so do not incur the overhead of twins and diffs. If a page has multiple writers during the course of execution, protocol overhead can potentially be reduced by migrating the home node ....

M. Marchetti, L. Kontothanassis, R. Bianchini, and M. L. Scott. Using Simple Page Placement Policies to Reduce the Cost of Cache Fills in Coherent Shared-Memory Systems. In Proceedings of the Ninth International Parallel Processing Symposium, Santa Barbara, CA, April 1995.


Using Memory-Mapped Network Interfaces to Improve the.. - Kontothanassis, Scott (1996)   (24 citations)  Self-citation (Kontothanassis Scott)   (Correct)

....with the directory information. Pages are initially assigned to home nodes in round robin order, but are moved by the operating system to the first processor to access the page after the program has completed its initialization phase. This simple placement policy has been shown to work quite well [20]; it reduces the expected cost of a cache miss by guaranteeing that no page is assigned to a node whose processor does not use it. The directory information for a page includes a list of the current readers and writers, and an indication of the page s global state, which may be one of the ....

M. Marchetti, L. Kontothanassis, R. Bianchini, and M. L. Scott. Using Simple Page Placement Policies to Reduce the Cost of Cache Fills in Coherent SharedMemory Systems. 9th Intl. Parallel Processing Symp., Santa Barbara, CA, Apr. 1995.


High Performance Software Coherence for Current and Future .. - Kontothanassis, Scott (1994)   (6 citations)  Self-citation (Kontothanassis Scott)   (Correct)

....without caches, these systems migrate and replicate pages in the manner of distributed shared memory systems, but also make on line decisions between page movement and uncached reference. We have experimented with dynamic page movement in conjunction with software coherence on NCC NUMA machines [24], and have found that while appropriate placement of a unique page copy reduces the average cache fill cost appreciably, replication of pages provides no significant benefit in the presence of hardware caches. Moreover, we have found that relaxed consistency greatly reduces the opportunities for ....

M. Marchetti, L. Kontothanassis, R. Bianchini, and M. L. Scott. Using Simple Page Placement Policies to Reduce the Cost of Cache Fills in Coherent Shared-Memory Systems. In Proceedings of the Ninth International Parallel Processing Symposium, Santa Barbata, CA, April 1995.


Software Shared Memory Support on Clusters of Symmetric.. - Robert Stets   Self-citation (Kontothanassis Scott)   (Correct)

....Their coherence protocol was derived from TreadMarks [3] Their findings stated that for clustering to provide significant benefits, reduction in inter node messages and bandwidth requirements must be proportional to the degree of clustering. In simulating a similar system, Karlsson and Stenstrom [12] found that the limiting factor in performance was the latency rather than the bandwidth of the message level interconnect. The MGS system from MIT was the first implementation of a layered coherence protocol [20] The system uses a multiwriter protocol, similar to Munin, to manage coherence ....

M. Marchetti, L. Kontothanassis, R. Bianchini, and M. L. Scott. Using Simple Page Placement Policies to Reduce the Cost of Cache Fills in Coherent Shared-Memory Systems. In Proceedings of the Ninth International Parallel Processing Symposium, Santa Barbara, CA, April 1995.


High Performance Software Coherence for Current and Future .. - Kontothanassis, Scott (1994)   (6 citations)  Self-citation (Kontothanassis Scott)   (Correct)

....without caches, these systems migrate and replicate pages in the manner of distributed shared memory systems, but also make on line decisions between page movement and remote reference. We have experimented with dynamic page movement in conjunction with software coherence on NCC NUMA machines [33], and have found that while appropriate placement of a unique page copy reduces the average cache fill cost appreciably, replication of pages provides no significant benefit in the presence of hardware caches. Moreover, we have found that relaxed consistency greatly reduces the opportunities for ....

M. Marchetti, L. Kontothanassis, R. Bianchini, and M. L. Scott. Using Simple Page Placement Policies to Reduce the Cost of Cache Fills in Coherent Shared-Memory Systems. TR 535, Computer Science Department, University of Rochester, September 1994. Submitted for publication.


VM-Based Shared Memory on Low-Latency.. - Kontothanassis.. (1997)   (4 citations)  Self-citation (Kontothanassis Scott)   (Correct)

....impact on performance. The home node itself can access the page directly, while the remaining processors have to use the slower Memory Channel interface. We assign home nodes at run time, based on which processor first touches a page after the program has completed any initialization phase [27]. More detailed information on the Cashmere protocol (and its network interface specific variants) can be found in other papers [21, 20] 2.2 TreadMarks TreadMarks is a distributed shared memory system based on lazy release consistency (LRC) 19] Lazy release consistency is a variant of release ....

M. Marchetti, L. Kontothanassis, R. Bianchini, and M. L. Scott. Using Simple Page Placement Policies to Reduce the Cost of Cache Fills in Coherent Shared-Memory Systems. In Proceedings of the Ninth International Parallel Processing Symposium, Santa Barbara, CA, April 1995.


Efficient Use of Memory-Mapped Network Interfaces.. - Hardavellas.. (1997)   (1 citation)  Self-citation (Kontothanassis Scott)   (Correct)

....choice of home node is important. Ideally, a page should be placed at the node that accesses it the most. At the very least, it should be placed at a node that accesses it some. Cashmere currently uses a first touch after initialization policy, resulting in reasonably good home node placement [15]. A fixed choice of home node may lead to poor performance for migratory data in Cashmere and AURC, because of high write through traffic. Similarly, a home node for directory entries in Shasta implies a 3way request for invalid data, which is directed through the home node to the current dirty ....

M. Marchetti, L. Kontothanassis, R. Bianchini, and M. L. Scott. Using Simple Page Placement Policies to Reduce the Cost of Cache Fills in Coherent Shared-Memory Systems. In Proc. of the 9th Intl. Parallel Processing Symp., Apr. 1995.


Evaluation of the Memory Page Migration Influence in.. - Corbalan, Martorell.. (2002)   (Correct)

No context found.

M. Marchetti, L. Kontothanassis, R. Bianchini, and M. L. Scott, "Using Simple Page Placement Policies to Reduce the Cost of Cache Fills in Coherent Shared-Memory Systems", Proceedings of the 9th International Parallel Processing Symposium, pp. 480-485, Santa Barbara, CA, April 1995. -


A Case for User-Level Dynamic Page Migration - Nikolopoulos, Papatheodorou.. (2000)   (1 citation)  (Correct)

No context found.

M. Marchetti, L. Kontothanassis, R. Bianchini, and M. Scott. Using Simple Page Placement Policies to Reduce the Cost of Cache Fills in Coherent SharedMemory Systems. In Proc. of the 9th International Parallel Processing Symposium, April 1995.


Stable Performance For Cc-Numa Using First Touch Page.. - Sarah Talbot   (1 citation)  (Correct)

No context found.

Michael Marchetti, Leonidas Kontothanassis, Ricardo Bianchini, and Michael L. Scott. Using simple page placement policies to reduce the cost of cache fills in coherent shared-memory systems. In 9th International Parallel Processing Symposium (IPPS), Santa Barbara, CA, pages 480--485, April 1995.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC