| Michael Stumm, Zvonko Vranesic, Ron White, Ronald Unrau, and Keith Farkas. Experience with the Hector Multiprocessor. In Proceedings of the International Parallel Processing Symposium, Parallel Processing Fair, pages 9--16, Newport Beach, California, April 1993. |
....a sophisticated tradeoff of technology factors (i.e. best possible pin counts, clock speeds) and the links should be as fast as possible, allowing only simple networks like tori or hierarchical rings. Representatives of this line of research are the Cosmic cube project [5] or the Hector project [19]. After many research papers on this topic, it appears that the essential question is still open and needs to be readdressed in the light of commodity clusters incorporating the technology factors of commodity cluster interconnects. In related studies, 7] compares different networking ....
Michael Stumm, Zvonko Vranesic, Ron White, Ron Unrau, and Keith I. Farkas. Experiences with the Hector Multiprocessor. In Proceedings of the Seventh International Parallel Processing Symposium, April 1994.
....a sophisticated tradeoff of technology factors (i.e. best possible pin counts, clock speeds) and the links should be as fast as possible, allowing only simple networks like tori or hierarchical rings. Representatives of this line of research are the Cosmic cube project [5] or the Hector project [19]. Still after many research papers dedicated to this topic, it appears to us that the essential question is still open and needs to be re addressed in the light of commodity clusters incorporating the technology factors of commodity cluster interconnects. In related studies, 8] compares ....
Michael Stumm, Zvonko Vranesic, Ron White, Ron Unrau, and Keith I. Farkas. Experiences with the Hector Multiprocessor. In Proceedings of the Seventh International Parallel Processing Symposium, April 1994.
....Ring Inter To next station From previous station Fig, 2 1: Hector station organization. 2.1. 1 Observations about Hector Upon completion of Hector, the architecture and implementation decisions were evaluated in a paper by Stumm et al. entitled Experiences with the Hector Multiprocessor [1]. The paper concluded that there were both good and bad decisions made in the design and implementation of Hector. Choosing a hierarchical ring interconnect structure, and the simplicity of design were both considered good decisions. blerating unreliable communication and underestimating memory ....
.... set wb[3, 0] GND; clr wb[3, 0] GND; sma add oe n =VCC; data valid = GND; cmd valid out = GND; with states ( read data = b 001 , read cmd= b 010 , write data= b l 11 , write cmd= b l 10 ) reset n = lGLOBAL(lreset in) sram sm) clk =clk; sram sm) reset = Ireset n; req stn comp[1 ] = req stn[1 ] # req stn[3] ring. fifo AF = ring fifo AF n ring fifo HF; SRAM dat read ff. clk,clrn,d,ena) clk, reset n (SRAM dat read ff clk sram dat en) VCC, SRAM dat req) clk sram dat en. clk, clrn) clk, reset n) clk Sram dat en.d = SRAM dat read ff READ DATA) ....
[Article contains additional citation context not shown here]
M. Stumm, Z. Vranesic, D. Lewis and R. White, "Experiences with the Hector multiprocessor," in Proc. Intl. Parallel Processing Symposium - Parallel Systems Fair, 1993, pp. 9-16.
....scheduling in SSS CORE. There are other systems that implement 2 level scheduling. Mach[3, 2] uses partitioning 3 processors and a global shared queue within partitions. The operating system provides 2 level scheduling policies by partitioning and partition internal thread queues. Hector[32] connects clusters of processing modules by hierarchical ring network. Hurricane on Hector provides processor pool based scheduling[40, 4] Its Hierarchical Clustering [35] scheme enables to schedule threads of an application close to each other with granularity of each level of operating system ....
M. Stumm, Z. G. Vranesic, R. White, R. Unrau, and K. Farkas. Experiences with the Hector Multiprocessor. In Proc. Intl. Parallel Processing Symp. Parallel Systems Fair, pp. 9--16, January 1993.
.... of Hector from these is the degree of sophistication of its instrumentation [61] permitting adaptive and intelligent allocation [65] It should be noted that the Hector distributed run time environment is unrelated to both the Hector multiprocessor project at the University of Toronto [74] and the Hector distributed object project at the University of Queensland [3] D. Predicting Workstation Idle Periods An important aspect of predictive scheduling is to find idle workstations in a distributed sys tem. Theimer and Lantz [75] have proposed ways of finding idle hosts in ....
Michael Stumm, Zvonko Vranesic, Ron White, Ronald Unrau, and Keith Farkas, "Experi- ences with the Hector Multiprocessor", Proceedings of the 7th International Parallel Pro- cessing Symposium, Newport Beach, CA, April, 1994.
....is a research project. It should also be mentioned at this point that there are (at least) two other research projects using the name Hector doing work in distributed computing and multiprocessing. The first is the well known Hector multiprocessor project at the University of Toronto [8] [9]. The second is a system for supporting distributed objects in Python at the CRC for Distributed Systems Technology at the University of Queensland [10] The Hector environment described in this paper is unrelated to ei ther. Three other research systems can allocate tasks across NOW s and ....
M. Stumm, Z. Vranesic, R. White, R. Unrau, and K. Farkas, "Experiences with the Hector Multiprocessor", Proceedings of the 7th International Parallel Processing Symposium, IEEE Computer Society Press, Los Alamitos, CA, 1994.
....Hector is a research project. It should also be mentioned at this point that there are (at least) two other research projects using the name Hector doing work in distributed computing and multiprocessing. The first is the well known Hector multiprocessor project at the University of Toronto [8], 9] The second is a system for supporting distributed objects in Python at the CRC for Distributed Systems Technology at the University of Queensland [10] The Hector environment described in this paper is unrelated to ei ther. Three other research systems can allocate tasks across NOW s and ....
M. Stumm, , Z. Vranesic, R. White, R. Unrau, and K. Farkas, "Experiences with the Hector Multiprocessor", CSRI Technical Report CSRI--276, CSRI, University of Toronto, Toronto Canada, October 1992.
....y Sun Microsystems, Inc. Mt. View, CA, bad eng.sun.com z Dept. of Electrical Engineering, Stanford University, Stanford, CA, flynn mimd.stanford.edu support for a single global address space, which is preferred by many programmers. Multiprocessors such as KSR1 [8] DASH [16] and Hector [23, 24] have shown the feasibility of scalable shared memory systems with communication latencies orders of magnitude smaller than shared virtual memory emulations on distributed memory machines. Shared memory systems, however, generally support only consumer initiated communication; when a process needs ....
M. Stumm, Z. Vranesic, R. White, R. Unrau, and K. Farkas. Experiences with the Hector multiprocessor. In Proceedings of the Parallel Systems Fair, pages 10--17, Apr. 1993. Held in conjunction with the 7th International Parallel Processing Symposium.
....systems. The times for the KSR1 are in 50 nano second cycles and are the times required to read one 128 byte cache line [Dunigan1992] DASH and Hector have 30 and 60 nano second cycle times respectively and the latencies shown in Table 2. 1 are for loading one 16 byte cache line [Lenoski1992] Stumm1993] This table is not shown to compare these systems but to illustrate two of the key issues related to the use of shared memory NUMA multiprocessors: 1) The time to access remote memory can be significant. 2) The time to access remote memory depends on the distance to the location being accessed ....
....reads consist of one request packet and two reply packets (in order to return the entire 16 byte cache line) Note that the delay switches are not in effect during local or on station requests. Detailed descriptions of the Hector memory system are available elsewhere [Vranesic1991] Gamsa1992] Stumm1993] ##################################################### Delay 32bit 32bit cache cache load store load writeback ##################################################### ##################################################### local 10 10 19 19 ##################################################### ....
M. Stumm, Z. Vranesic, R. White, R. Unrau, and K. Farkas, "Experiences with the Hector Multiprocessor", Proceedings of the International Parallel Processing Symposium Parallel Processing Fair, pp. 9-16, April, 1993.
....cycles for each of these systems. The times for the KSR1 are in 50 nano second cycles and are the times required to read one 128 byte cache line [3] DASH and Hector have 30 and 60 nano second cycle times respectively and the latencies shown in Table 1. 1 are for loading one 16 byte cache line [6] [12]. This table illustrates two of the key issues related to the use of shared memory NUMA multiprocessors: 1) The time to access remote memory can be significant. 2) The time to access remote memory depends on the distance to the location being accessed (the number of levels of the hierarchy that ....
....symmetric, asymmetry is introduced, since cache line reads consist of one request packet but two reply packets (in order to return the entire 16 byte cache line) Note that the delay switches have no affect on local or on station requests. For more detailed descriptions of the Hector see [4] 17] [12]. 12 To provide insight into the importance of localization on a slightly larger system and in other shared memory multiprocessors we set the delay switches to 16 and conduct the same localized versus non localized placement experiment. The results of this experiment are shown in Figure 1.4. ....
M. Stumm, Z. Vranesic, R. White, R. Unrau, and K. Farkas, "Experiences with the Hector Multiprocessor", Proceedings of the International Parallel Processing Symposium Parallel Processing Fair, pp. 9-16, April, 1993.
....remote memory. Table 2 shows the difference between remote memory access costs and local memory access costs for different architectures configured with 64 processors. On these systems having most of the accessed data local to the accessing processor can be a major factor in improving performance [2, 3, 11]. In parallelizing a loop, it is important to consider the partitioning of both the data space and the loop iteration space, and how both are mapped onto the processors. For good performance, it is essential that the loop partitions and scheduling match the data partitions. Best performance is ....
....be executed, and hence the computation is dynamic. The input values determine the variation of iteration execution time. High variance can cause load imbalance. A matrix size of 800 Theta 800 integers is processed. The experiments were performed on Hector, a scalable shared memory multiprocessor [14, 11]. Hector consists of sets of processor memory pairs connected together by buses, several buses connected together by local rings, and several local rings connected together by a global ring (see Figure 4) Hector provides a single global physical address space; each memory module contains one ....
Michael Stumm, Zvonko G. Vranesic, Ron White, Ron Unrau, and Keith Farkas. Experiences with the Hector multiprocessor. In International Parallel Processing Symposium, April 1993.
No context found.
Michael Stumm, Zvonko Vranesic, Ron White, Ronald Unrau, and Keith Farkas. Experience with the Hector Multiprocessor. In Proceedings of the International Parallel Processing Symposium, Parallel Processing Fair, pages 9--16, Newport Beach, California, April 1993.
No context found.
Michael Stumm, Zvonko Vranesic, Ron White, Ronald Unrau, and Keith Farkas. Experiences with the Hector Multiprocessor. Technical report, Computer Science Research Institute, University of Toronto, 1992.
No context found.
M. Stumm, Z. Vranesic, R. White, R. Unrau, and K. Farkas. Experiences with the hector multiprocessor. Technical Report 276, University of Toronto, CSRI, available for anonymous ftp from ftp.csri.toronto.edu, October 1992.
No context found.
Michael Stumm, Zvonko Vranesic, Ron White, Ronald Unrau, and Keith Farkas. Experiences with the Hector multiprocessor. In Proceedings of the Parallel Systems Fair at the International Parallel Processing Symposium, pages 10--17, 1993.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC