8 citations found. Retrieving documents...
B. Gamsa. Region-oriented main memory management in shared-memory numa multiprocessors. Master's thesis, University of Toronto, 1992.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Evaluating Memory System Performance of a Large Scale NUMA.. - Karim Harzallah (1993)   (Correct)

....data according to how it is used. Each class is then associated with specific consistency maintenance requirements. They implemented ADAPT (Automatic Data Allocation and Partitioning Tool) to analyse parallel loops for the BBN TC200, a scalable shared memory multiprocessor. In related work Gamsa [Gam92] suggested involving the programmer in the process of coherency management, taking advantage of the availability of knowledge about the data structures and the sharing patterns among the threads. It is the user s task to define the domain of sharing and to call the proper synchronization routines ....

....time (a key parameter that controls the backoff mechanism protocol) H ring T 1: local access bus 1 T 2: on station access T 3: off station access bus 2 Figure 2: The Hector Prototype 2. 2 The Workload Model It is often possible to partition an application into roughly steady state phases [Gam92] where each phase involves a different memory access pattern. To estimate the total response time, we sum the response time of each phase. Such partitioning helps pinpoint memory system bottlenecks, since the performance attained by an application is largely defined by its memory reference ....

B. Gamsa. Region-oriented main memory management in shared-memory numa multiprocessors. Master's thesis, University of Toronto, 1992.


Algorithms for Dynamic Software Cache Coherence - Harjinder Sandhu (1995)   (1 citation)  (Correct)

....a set of annotations that also mark all series of references to those regions. Similar annotation models have been proposed for managing shared data in distributed shared memory (DSM) environments [4] 10] for exploiting locality through main memory in scalable shared memory multiprocessors [11][8] and for improving the performance of hardware cache coherence protocols [14] In this work, we focus exclusively on the use of this model for managing caches entirely through software in shared memory architectures. Relative to other forms of cache management, the Shared Regions approach ....

....the Midway system used these same annotations to allow the system to exploit multiple consistency models including entry consistency [4] Other work has focussed on exploiting locality among the distributed memory units of hardware distributed shared memory multiprocessors. For instance, in Romm [11], replication of shared regions among memory units was used to reduce memory access latency. The Hybrid Hardware Software protocol presented by Chandra et al. [8] also uses shared regions annotations and bulk data transfers for placing regions in the memory unit of a processor that is to access it ....

B. Gamsa. Region-oriented main memory management in shared-memory NUMA multiprocessors. Master's Thesis, University of Toronto, 1992.


Multiprogrammed Parallel Application Scheduling in NUMA.. - Brecht (1994)   (3 citations)  (Correct)

....cache line reads consist of one request packet and two reply packets (in order to return the entire 16 byte cache line) Note that the delay switches are not in effect during local or on station requests. Detailed descriptions of the Hector memory system are available elsewhere [Vranesic1991] Gamsa1992] Stumm1993] ##################################################### Delay 32bit 32bit cache cache load store load writeback ##################################################### ##################################################### local 10 10 19 19 ....

B. Gamsa, Region-Oriented Main Memory Management in Shared-Memory NUMA Multiprocessors, M.Sc. Thesis, University of Toronto, Toronto, Ontario, September, 1992.


On the Importance of Parallel Application Placement in NUMA.. - Brecht (1993)   (7 citations)  (Correct)

....system is symmetric, asymmetry is introduced, since cache line reads consist of one request packet but two reply packets (in order to return the entire 16 byte cache line) Note that the delay switches have no affect on local or on station requests. For more detailed descriptions of the Hector see [4] [17] 12] 12 To provide insight into the importance of localization on a slightly larger system and in other shared memory multiprocessors we set the delay switches to 16 and conduct the same localized versus non localized placement experiment. The results of this experiment are shown in ....

B. Gamsa, Region-Oriented Main Memory Management in Shared-Memory NUMA Multiprocessors, M.Sc. Thesis, University of Toronto, Toronto, Ontario, September, 1992.


Performance, Safety and Idioms in Parallel Programming Systems - Lu (1995)   (Correct)

....In terms of performance, cache coherent shared memory in hardware is fast, but it is not a panacea. Hardware cache coherence does not address the issue of main memory locality when the working set exceeds the cache size, or the interaction between an application and main memory management policies [Gam92]. Furthermore, cache coherent shared memory is not generally supported on NOWs, which will continue to be ubiquitous due to their cost performance advantages. Good parallel algorithms and fast hardware must be supplemented with the flexibility to address the limitations (or absence) of the ....

B. Gamsa. Region-Oriented Main Memory Management in Shared-Memory NUMA Multiprocessors. Master's Thesis, University of Toronto, 1992.


Computation and Data Partitioning on Scalable Shared Memory .. - Tandri, Abdelrahman (1995)   (1 citation)  (Correct)

....one cluster. Each processor memory pair consists of a Motorola MC88100 CPU, a 16 KB instruction cache, a 16 KB data cache and 4 MB of the globally addressable memory. The hardware provides no support for cache coherence. The coherence of data is maintained by software at cache line granularity [10]. Data distributions are implemented using the array allocation techniques described in [21, 3] 5.1 Contention and Synchronization Conscious Distribution The ADI program has two phases with parallelism along orthogonal dimensions in each phase. It operates on three 2 dimensional arrays A, B and ....

B. Gamsa. Region-oriented main memory management in shared-memory NUMA multiprocessors. Master's thesis, Department of Computer Science, University of Toronto, Toronto, CANADA, 1992.


Hierarchical Clustering: A Structure for Scalable.. - Unrau, Krieger.. (1993)   (22 citations)  Self-citation (Gamsa)   (Correct)

....resources. We have since experimented with de coupling the cluster size and the policies. For example, it is now possible to request that pages of a memory region be replicated or migrated to the local processor, even if there is already a copy of the page on another processor in the local cluster [Gamsa 1992]. For some applications, we have found that this approach results in substantial performance improvements. Fixed cluster sizes When Hurricane is booted, a fixed sized cluster is established for all resources. We believe that more flexibility could result in performance improvements; in particular, ....

Gamsa, B. 1992. Region-oriented main memory management in shared-memory NUMA multiprocessors.


Hierarchical Clustering: A Structure for Scalable.. - Unrau, Krieger.. (1995)   (22 citations)  Self-citation (Gamsa)   (Correct)

....We have since experimented with de coupling the cluster size and the policies. For example, it is now possible to request that pages of a memory region be replicated or migrated to the local processor, even if there is already a copy of the page on another processor in the local cluster [21]. For some applications, we have found that this approach results in substantial performance improvements. Fixed cluster sizes When HURRICANE is booted, a fixed sized cluster is established for all resources. We believe that more flexibility could result in performance improvements; in ....

Benjamin Gamsa. Region-oriented main memory management in shared-memory NUMA multiprocessors. Master's thesis, Department of Computer Science, University of Toronto, Toronto, September 1992.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC