| Z. Vranesic et al. The NUMAchine Multiprocessor. Technical Report CSRI-324, Computer Systems Research Institute, University of Toronto, 1995. |
.... of this type have been introduced, such as the Convex SPP 1000 [9] the Kendall Square Research KSR1 2 [15] and the Cray T3D [10] Numerous research systems also exist, including the Stanford DASH [19] and FLASH [14] the MIT Alewife [1] and the University of Toronto Hector [29] and NUMAchine [28]. In such systems, the memory is physically distributed to provide scalability, but all processors share a common global address space across the entire memory, as shown in Figure 1. High speed caches lower the effective access latency for both local and remote memory, and reduce contention. ....
Z. Vranesic et al. The NUMAchine multiprocessor. Tech. Rep. CSRI-324, Computer Systems Research Institute, University of Toronto, Canada, April 1995.
....cards Figure B.4: 2nd resistor to be added to the first batch of the Proc R3 cards 108 APPENDIX B. OTHER INFORMATION TO ASSEMBLE A STATION Figure B. 5: 3rd resistor to be added to the first batch of the Proc R3 cards 1N4002 1N4002 VOUT2 VOUT LT1038 ADJ VIN DRAWING LAST MODIFIED=Fri Sep 15 13:36:40 1995 9P TITLE=2 1V REG ABBREV=21VREG VIN VOUT 100U CAP196 11P 61.9 1 5P TH 10U SMD 10P 6P 1 42.2 TH 12P 13P 14P Figure B.6: Schematic of 2.1 V supply Appendix C Using The Powerup Controller C.1 Overview This document describes the use of the powerup controller designed for ....
Zvonko G. Vranesic et al. The NUMAchine Multiprocessor. Tech. Rep. CSRI-324, Computer Systems Research Institute, University of Toronto, Toronto, Ontario, Canada, April 1995. 115
.... (SSMMs) have become increasingly viable as platforms for highperformance computing by efficiently supporting coherent shared memory in hardware for large numbers of processors [2] Examples include the Convex SPP1000 [3] the Stanford FLASH [5] and the University of Toronto NUMAchine [12]. Although the memory in SSMMs is logically shared by all processors, it is physically distributed to provide scalability, as shown in Figure 1. As a result, memory accesses are non uniform; access latency for remote memory is considerably higher than for local memory. SSMMs heavily rely on data ....
Z. Vranesic et al. The NUMAchine multiprocessor. Tech. Rep. CSRI-324, Computer Systems Research Institute, University of Toronto, Canada, April 1995.
....portion of shared memory is less than the latency for accessing non local or remote portions. Hence, careful placement of data in shared memory is essential for scaling performance. Examples of SSMMs include the Stanford Dash [1] and Flash [2] the University of Toronto Hector [3] and NUMAchine [4], the KSR1 [5] and the Convex Exemplar [6] Automatic parallelization of scientific applications on bus based shared memory multiprocessors has been mainly concerned with the detection of parallelism and the scheduling of parallel loop iterations [7, 8] Hence, it is not surprising that on SSMMs ....
....surprising that on SSMMs issues related to data placement have been ignored by compilers and delegated to the operating system as part of page management. Policies, such as first hit and roundrobin place pages in the physically distributed shared memory as these pages are initially accessed [3, 4, 9]. However, it is too often the case that such policies fail to enhance memory locality and cause contention and hot spots, leading to poor performance [9] Operating system policies are oblivious to application data access patterns and manage data at too coarse of a granularity. Data ....
Z. Vranesic et al. The NUMAchine Multiprocessor. Technical Report CSRI-324, Computer Systems Research Institute, University of Toronto, 1995.
....portion of shared memory is less than the latency for accessing non local or remote portions. Hence, careful placement of data in shared memory is essential for scaling performance. Examples of SSMMs include the Stanford Dash [1] and Flash [2] the University of Toronto Hector [3] and NUMAchine [4], the KSR1 [5] and the Convex Exemplar [6] Automatic parallelization of scientific applications on bus based shared memory multiprocessors has been mainly concerned with the detection of parallelism and the scheduling of parallel loop iterations [7, 8] Hence, it is not surprising that on SSMMs ....
....surprising that on SSMMs issues related to data placement have been ignored by compilers and delegated to the operating system as part of page management. Policies, such as first hit and round robin place pages in the physically distributed shared memory as these pages are initially accessed [3, 4, 9]. However, operating system policies are oblivious to application data access patterns and manage data at too coarse of a granularity. It is too often the case that such policies fail to enhance memory locality, cause contention and hot spots, and lead to poor performance [9] Data partitioning ....
Z. Vranesic et al. The NUMAchine Multiprocessor. Technical Report CSRI-324, Computer Systems Research Institute, University of Toronto, 1995.
.... have emerged as viable platforms for a variety of supercomputing applications; example systems include the Stanford Dash [10] and Flash [6] the Cray T3D [4] the Convex Exemplar SPP [3] the Kendall Square Research KSR1 2 [7] and the University of Toronto Hector [16] and NUMAchine [15]. These multiprocessors provide hardware support for shared memory parallel programming, even though the memory is physically distributed throughout the system, as shown in Figure 1. The access latency for remote memory is greater than the latency for local or nearby memory. The effects of ....
Z. Vranesic et al. The NUMAchine multiprocessor. Tech. Rep. CSRI-324, Computer Systems Research Institute, University of Toronto, Canada, April 1995.
No context found.
Z. Vranesic, S. Brown, M. Stumm, S. Caranci, A. Grbic, R. Grindley, M. Gusat, O. Krieger, G. Lemieux, K. Loveless, N. Manjikian, Z. Zilic, T. Abdelrahman, B. Gamsa, P. Pereira, K. Sevcik, A. Elkateeb, and S. Srbljic. The NUMAchine Multiprocessor. Technical Report CSRI-324, Computer Systems Research Institute, University of Toronto, April 1995.
....to consider memory allocation issues in Section 4. Although we focus primarily on the Tornado kernel, it is important to note that these components are also used in the implementation of the Tornado system servers. Tornado is fully implemented (in C ) and runs on our 16 processor NUMAchine [14, 31] and on the SimOS simulator [27] it supports most of the facilities (e.g. shells, compilers, editors) and services (pipes, TCP IP, NFS, file system) one expects. Experimental results that demonstrate the performance benefits of our design are presented in Section 7, followed by an examination ....
.... 1300 for two remote interrupt exchanges) plus the cost of four cache transfers (a pair for each of the remote PPC call and return exchanges) 17 7 Experimental results The results presented in this paper are based on both hardware tests on a locally developed 16 processor NUMAchine prototype [14, 31] and the SimOS simulator from Stanford [27] The NUMAchine architecture consists of a set of stations (essentially small, bus based multiprocessors) connected by a hierarchy of rings. It uses a novel selective broadcast mechanism to efficiently handle invalidations, as well as broadcast data. The ....
Z. Vranesic et al. The NUMAchine multiprocessor. Technical Report CSRI-324, Computer Systems Research Institute, University of Toronto, 1995.
....logic devices. Since this machine will serve as a research vehicle for parallel software development, a number of hardware features to enhance experimentation have been included in the design. 1 Introduction This paper describes the hardware implementation of a multiprocessor called NUMAchine [7]. In NUMAchine, processors, caches, and memory are physically distributed throughout the system. The memory is shared by all processors, but the access latency depends on location. Hardware automatically maintains coherent copies of data throughout the system. This type of multiprocessor is known ....
Z. Vranesic et al. The NUMAchine Multiprocessor. Technical Report CSRI-324, Computer Systems Research Institute, University of Toronto, 1995.
....to consider memory allocation issues in Section 4. Although we focus primarily on the Tornado kernel, it is important to note that these components are also used in the implementation of the Tornado system servers. Tornado is fully implemented (in C ) and runs on our 16 processor NUMAchine [13, 29] and on the SimOS simulator [25] it supports most of the facilities (e.g. shells, compilers, editors) and services (pipes, TCP IP, NFS, file system) one expects. Experimental results that demonstrate the performance benefits of our design are presented in Section 7, followed by an examination ....
.... 1300 for two remote interrupt exchanges) plus the cost of four cache transfers (a pair for each of the remote PPC call and return exchanges) 17 7 Experimental results The results presented in this paper are based on both hardware tests on a locally developed 16 processor NUMAchine prototype [13, 29] and the SimOS simulator from Stanford [25] The NUMAchine architecture consists of a set of stations (essentially small, bus based multiprocessors) connected by a hierarchy of rings. It uses a novel selective broadcast mechanism to efficiently handle invalidations, as well as broadcast data. The ....
Z. Vranesic et al. The NUMAchine multiprocessor. Technical Report CSRI-324, Computer Systems Research Institute, University of Toronto, 1995.
No context found.
Z. Vranesic et al. The NUMAchine Multiprocessor. Technical Report CSRI-324, Computer Systems Research Institute, University of Toronto, 1995.
No context found.
Z. Vranesic et al. The NUMAchine multiprocessor. Tech. Rep. CSRI-324, Computer Systems Research Institute, University of Toronto, Canada, April 1995.
No context found.
Zvonko G. Vranesic et al. The NUMAchine Multiprocessor. Tech. Rep. CSRI-324, Computer Systems Research Institute, University of Toronto, Toronto, Ontario, Canada, April 1995.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC