| G. Gao, K. Likharev, P. Messina, and T. Sterling, "Hybrid technology multithreaded architecture", in Proc. Frontiers '96, (Annapolis, MD), pp. 98-105, (1996). |
....system has opened new questions in computation and organization. For existing architectures some problems are well studied subjects, for other problems it is a challenge to understand and make decisions. Moving memory closer to processors (PIMs) 3] creating multi level memory hierarchy (HTMT) [1], and having multithreading at each level of processors in software and hardware gives additional levels of complexity to support the data and task parallelism. There are several well known systems that have similar features to the HTMT architecture or execution model. For example, the Tera ....
....while parallelism focuses on the number of parallel nodes and levels involved in the computation. In addition to these broad issues are the more concrete issues of load balancing, task migration [17] memory distribution and control distribution. 3 HTMT system architecture The HTMT architecture [1] is a new design for a peta op system that attempts to solve the memory latency problem on a deep memory hierarchy, provide multithreading at the processor level, and instrument distributed processing using a PIM approach. It consists of the following components: 4096 multithreaded CPUs, called ....
Gao, G., K. Likharev, P. Messina, T. Sterling, \Hybrid Technology Multithreaded Architecture," 6th Symp. on Frontiers of Massively Parallel Computation, pp.98-105, Annapolis, MD, 1996. 22
....pseudo code for application execution and show memory layouts and execution flows for each benchmark. 2. Related work Development of the massively parallel HTMT system has opened new questions in computation and organization. Moving memory closer to processors [18] creating a memory hierarchy [12], and multithreading give additional levels of complexity. Many systems have similar features to the HTMT architecture or execution model. The Tera machine [8] has a similar underlying architecture and supports multithreading, but does not put the memory and processor on the same chip. The ....
....(MPI) or a Linda system [7] This implies enormous work in designing, testing, and optimizing the system, including concurrency and parallelism [6] load balancing, task migration [5] memory distribution and control distribution. 3. HTMT system architecture The HTMT architecture [12] is a new design for a petaflop system that attempts to solve the memory latency problem on a deep memory hierarchy, provide multithreading, and instrument distributed processing using a PIM approach. It consists of 4096 multithreaded CPUs, called SPELLs (Superconductive Processor ELLements) ....
G. Gao, K. Likharev, P. Messina, and T. Sterling. Hybrid Technology Multithreaded Architecture. In 6th Symp. on Frontiers of Massively Parallel Computation, pages 98--105, 1996.
....some statistics for the early HTMT prototype. Chapter 9 concludes and shows extension to our work for the future. Finally, the thesis also includes references to the literature. 5 CHAPTER 2 THE HTMT SYSTEM STRUCTURE Design and implementation of the HTMT prototype is a part of the HTMT project [6] which attempts to explore and characterize a synthesis of technologies, innovative architectures, and aggressive latency management techniques in a way that could accelerate availability of near petaflops scale computing systems; to develop the architecture, to collect statistics to minimize ....
G. Gao, K. Likharev, P. Messina, T. Sterling, "Hybrid Technology Multithreaded Architecture," 6th Sump. on Frontiers of Massively Parallel Computation, MD, pp. 98-105, Oct. 1996.
.... applications [16, 17] In the longer term, RSFQ may also provide the speed and power characteristics required by general purpose petaflop scale computing (petaflop = 10 15 floating point operations per second) which is likely to remain beyond the reach of the fastest semiconductor technologies [18, 19]. The primary immediate application of RSFQ logic is digital signal processing. The current state of RSFQ technology favors the design of circuits with a regular topology, limited control circuitry, a small number of distinct cells, and limited interconnections. The analysis of timing in RSFQ ....
G. Gao, K.K. Likharev, P.C. Messina, and T.L. Sterling, #Hybrid Technology Multithreaded Architecture,# in: Proc of PetaFlops Architecture Workshop, to be published; see also the Web site http://www.cesdis.gsfc.nasa.gov/petaflops/peta.html.
....are not influenced by clock skew. This advantage is of key importance for the emerging very high speed digital circuits, including those belonging to the superconductor rapid single flux quantum (RSFQ) logic memory family [2] Recent studies of the possibility of petaflops scale computations ([3], 4] raise the issue of reliable distributed asynchronous on chip and chip tochip communication media, and micropipelines may be good candidates to occupy that niche. II. Traditional Micropipelines and Their Drawbacks The operation of a traditional micropipeline (Fig. 1) is based on a simple ....
G. Gao, K. K. Likharev, P. C. Messina, and T. L. Sterling, "Hybrid technology multithreaded architecture," in Proc. Frontiers `96, (Annapolis, MD), pp. 98--105, Feb. 1996.
....its uniquely high speed and low power consumption. The petaflops computer presently being designed in the HTMT project combines several innovative technologies: RSFQ processors and networks, semiconductor SRAM and DRAM based processors in memory (PIMs) optical networks, and a holographic memory [2]. According to the preliminary design [3] the RSFQ subsystem of the HTMT machine will consist of 4,096 superconductor processing elements (SPELL) and a self routing multistage packet switching network (CNET) The network will enable any SPELL to access remote memory buffers belonging to other ....
G. Gao, K. K. Likharev, P. C. Messina, and T. L. Sterling, "Hybrid technology multithreaded architecture," in Proc. Frontiers `96, (Annapolis, MD), pp. 98--105, 1996.
....Thruput Nets for Petaflops Computing L. Wittie G. Sazaklis Y. Zhou D. Zinoviev May 1, 1998 1 Motivation This work was undertaken to estimate the complexity of networks that can be used to connect several thousand processing elements and memory interfaces in a Petaflops cryocomputer [3]. Two different network architectures have been studied in detail: multistage banyan networks and truncated ( pruned ) multidimensional meshes. For each architecture, we simulated many network shapes to determine the maximal aggregate throughput T and average latency avg . We have learned many ....
G. Gao, K. K. Likharev, P. C. Messina, and T. L. Sterling. Hybrid Technology Multithreaded Architecture. In Proc. Frontiers`96, pages 98--105, Annapolis, MD, 1996. Available electronically via anonymous FTP from ftp://rsfq1.physics.sunysb.edu/pub/ieee htm.ps.
.... that may be used as a component in commercial telecommunication switches; 2) to demonstrate that RSFQ technology is capable of conducting large scale projects, and 3) to identify parts and design strategies that may be used in similar projects, e.g. in multiprocessor interconnecting networks [7]. Each architecture has been considered under the same workload of 5:76 T bps in two different environments (96 Theta60 Gbps bit serial channels and 96 Theta1:875 Gpbs 32 bit parallel channels with parallel to serial and serial to parallel converters) with self routing and without contention ....
G. Gao, K. Likharev, P. Messina, and T. Sterling, Hybrid technology multithreaded architecture, in Proc. Frontiers`96, Annapolis, MD, 1996, pp. 98-- 105.
....machines, even with the best of CMOS technologies, will require the programmer to deal with multi million way parallelism. Many real applications may not permit such huge levels of parallelism. To attack such problems, a large scale collaboration among several research groups the HTMT project [1, 5, 19] is focusing on a mixed technology solution where extremely long latencies are possible, and where preemptive activity in the PIM based memory are essential to reduce or eliminate the latency penalties. This paper takes one such proposed PIM architecture, Shamrock, and discusses a possible ....
....on the outside to view the chip as memory in the conventional sense. Again, both of these connections stem naturally from the topology of the individual nodes, and do not require any expensive additional wiring on the chip. The HTMT System The Hybrid Technology Multi Threaded (HTMT) project [1, 5, 19] is a collaborative project among about half a dozen research groups (Cal Tech JPL, U. Delaware, SUNY Stonybrook, Notre Dame, Princeton, plus an association with many other government and industrial labs) to define a system that can reach a petaflops level of performance in significantly less time ....
Gao, G., K. Likharev, P. Messina, T. Sterling, "Hybrid Technology Multithreaded Architecture," 6th Sump. on Frontiers of Massively Parallel Computation, Annapolis, MD, Oct. 25-31, 1996, pp. 98-105.
....in a system of such a physical size makes the system prone to stalling for communication intensive programs. Manuscript received May 1, 1999. The HTMT project and the work described in this paper are supported by DARPA, NSA, and NASA, and in part by NSF grant No. ECS 9700313. The HTMT concept [2] assumes a hierarchical organization of the petaflops computing system (Fig. 1) with multiple levels of distributed memory: holographic data storage (HRAM) semiconductor SRAM and DRAM, and cryomemory (CRAM) as well as three types of processors: SRAM and DRAM based processors in memory (PIMs) ....
....high as 1000 processor cycles) is multithreading. This technique reduces the processor idle time by overlapping the execution of separate tasks called threads. Multithreading and context prefetching have been accepted as the key techniques of latency tolerance in the HTMT program execution model [2] and COOL I instruction set architecture [3] PIMs find ready threads, allocate the context of a ready thread in CRAM, and initiate its execution in a SPELL. When a SPELL finishes the execution of a thread, an SRAM PIM fetches the results from CRAM into SRAM. All of these multilevel activities can ....
G. Gao, K. K. Likharev, P. C. Messina, and T. L. Sterling, "Hybrid technology multithreaded architecture," in Proc. Frontiers `96, (Annapolis, MD), pp. 98--105, Feb. 1996.
....0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 liquid helium (4 K) Optical Interconnect SRAM PIM DRAM PIM Holographic Memory CNet RSFQ Processors Figure 1. HTMT computer concept. The HTMT concept [9] assumes a hierarchical organization of the petaflops computing system (Figure 1) with multiple levels of distributed memory: holographic data storage (HRAM) semiconductor SRAM and DRAM, and cryomemory (CRAM) as well as three types of processors: SRAM and DRAM based processors in memory (PIMs) ....
G. Gao, K. K. Likharev, P. C. Messina, and T. L. Sterling. Hybrid technology multithreaded architecture. In Proc. Frontiers`96, pages 98--105, Annapolis, MD, Feb. 1996.
....technology, petaflops, asynchronous pipelines. Contact person: Prof. Mikhail Dorojevets, EE Dept. SUNY, Stony Brook, NY 11794 2350 Phone: 516) 632 8611 Fax: 516) 632 8494 E mail: midor eegw.ee. sunysb.edu 1 Introduction The goal of the Hybrid Technology MultiThreaded Architecture (HTMT) project[9] is to develop and study a new computer architecture that exploits multiple technologies to achieve petaflops level performance. Fig. 1 shows the top level concept of the HTMT petaflops system. It has a hierarchical organization with multiple levels of memory and three types of processors: SRAM ....
Gao, G., Likharev, K. K., Messina, P. C., and Sterling, T. L. Hybrid technology multithreaded architecture. In Proc. Frontiers`96 (Annapolis, MD, 1996), pp. 98--105. Available via anonymous ftp from ftp://rsfq1.physics.sunysb.edu/pub.
....(a strand) will lead to unacceptably high latencies, hence poor performance. We propose alternative processor designs which use fine grain synchronizations between individual instructions in order to avoid these bottlenecks. 1. Introduction The Hybrid Technology Multi Threading (HTMT) project [2, 4] is an ambitious, long term study of the feasibility of combining several emerging technologies to produce, within ten years, a computer with a sustained speed of 1 petaFLOPS (10 15 floating point operations per second) and 1 petabyte of memory. HTMT will combine highspeed superconductor ....
G. Gao, K. K. Likharev, P. C. Messina, and T. L. Sterling. Hybrid technology multi-threaded architecture. In Proceedings of Frontiers '96: The Sixth Symposium on the Frontiers of Massively Parallel Computation, pages 98--105, Annapolis, Maryland, October 1996.
....computing module per network clock cycle (in the present design, 32 ps) I. Introduction RSFQ digital technology [1] has created an exciting opportunity for computing on the petaflops scale (1 petaflops = 10 15 floating point operations per second) with acceptable aggregate physical parameters [2]. The major advantage of RSFQ technology for this particular application is not so much its unparalleled speed, but rather its uniquely low power consumption, since multi megawatt power dissipation makes fully semiconductor petaflops scale computers hardly feasible. The system presently designed ....
....is not so much its unparalleled speed, but rather its uniquely low power consumption, since multi megawatt power dissipation makes fully semiconductor petaflops scale computers hardly feasible. The system presently designed in the HTMT (Hybrid Technology MultiThreaded architecture) project [2], 3] combines RSFQ with other innovative technologies: semiconductor SRAM and DRAM based processors in memory (PIMs) an optical switching network, and holographic memory. According to the preliminary design named COOL 0 [3] the RSFQ subsystem for a petaflops scale computer should consist of ....
G. Gao, K. K. Likharev, P. C. Messina, and T. L. Sterling, "Hybrid technology multithreaded architecture," in Proc. Frontiers`96, (Annapolis, MD), pp. 98--105, Feb. 1996.
.... that may be used as a component in commercial telecommunication switches; 2) to demonstrate that RSFQ technology is capable of conducting large scale projects, and 3) to identify parts and design strategies that may be used in similar projects, e.g. in multiprocessor interconnecting networks [10]. Each architecture has been considered under the same workload of 5:76 T bps in two different environments (96 Theta 60 Gbps bit serial channels and 96 Theta 1:875 Gpbs 32 bitparallel channels with parallel to serial and serial to parallel converters) with self routing and without contention ....
G. Gao, K. Likharev, P. Messina, and T. Sterling, "Hybrid technology multithreaded architecture," in Proc. Frontiers`96, (Annapolis, MD), pp. 98--105, 1996.
No context found.
G. Gao, K. Likharev, P. Messina, and T. Sterling, "Hybrid technology multithreaded architecture", in Proc. Frontiers '96, (Annapolis, MD), pp. 98-105, (1996).
No context found.
G. Gao, K. Likharev, P. Messina, T. Sterling, "Hybrid Technology Multi-Threaded Architecture: Project Summary. Project Description," HTMT Reports, 1998.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC