| H. T. Kung. Systolic communication. In Proceedings of the International Conference on Systolic Arrays, San Diego, California, May 1988. |
....pin counts. 1.3 Virtual Wires To overcome pin limitations in FPGA based logic emulators, t we propose the use of virtual wires. A virtual wire represents a simple connection between a logical output on one FPGA and a logical input on another FPGA. Established via a pipelined, statically routed [ 12] communication network, these virtual wires increase available off chip communication bandwidth by multiplexing the use of FPGA pin resources (physical wires) among multiple emulation signals (logical wires) t Although this paper focuses on logic emulators, virtual wire technol ogy can be ....
....thousand gates. Statically routed networks can be used whenever communication can be predetermined. Static refers to the fact that all data movement can be determined and optimized at compile time. This mechanism has been used in scheduling real time communication in a multiprocessor environment [12]. Other related uses of static routing include FPGAbased systolic arrays, such as Splash [7] and in the very large simulation subsystem (VLSS) 15] a massively parallel simulation engine which uses time division multiplexing to stagger logic evaluation. Virtual wires are similar to virtual ....
H. T. Kung. Systolic communication. In Proceedings of the International Conference on Systolic Arrays, San Diego, California, May 1988.
....the value is written both to the leftmost SSR of a chip and off chip. The rightmost SSR of the adjacent chip then stores the value entering the chip. Inter chip communication does not involve register reading. Although similar in form to inter PE queues, SSRs have several important differences [1, 19]. In queues, data accessed at the ends of the queue. Thus, data must be scheduled so that values appear at the head of the queue at the correct time. This is useful in multiple instruction stream, multipledata stream (MIMD) computers where deadlock and contention are key issues. These issues are ....
H. T. Kung. Systolic communication. In K. Bromley, S. Y. Kung, and E. Swartzlander, editors, First Systolic Arrays, pages 695--703. IEEE CS, May 1988.
....d(i Gamma 1; j Gamma 1) Les communications sont donc fr equentes et et de faible granularit e. Au niveau mat eriel, cela se traduit sur la machine de simulation par l utilisation de dispositifs mat eriels d edi es afin de r eduire les temps de synchronisation et de transfert entre processeurs [19]. L expression des communications dans le langage de programmation doit permettre une compilation qui tire parti de ces dispositifs, faute de quoi la simulation sera totalement inefficace. Les entr ees sorties entre le r eseau systolique et son interface font aussi partie int egrante de ....
H.T. Kung. Systolic Communication. In ISCA, pages 695--703, may 1988.
....implementation on a multiprocessor Dynamic Programming Parallel Implementations for the Knapsack Problem 11 machine. Our experiments have been conducted on an iWarp [32] This machine supports the register toregister communication model that is required to efficiently execute systolic programs [33]. The machine we use has 8 processors connected in a ring. To support efficient systolic experiments, we use the parallel language ReLaCS[34] which embodies both the computation and the communications aspects of systolic algorithms in a terse programming model. The systolic network is viewed as a ....
H. Kung, "Systolic Communication," in ISCA, pp. 695--703, may 1988.
....the von Neumann bootleneck [1] Systolic architectures feature the capability of exploiting a regular data flow through a network of identical cells with local memory. This characteristic is supported by an inter processor communication mechanism which avoids the overflow of the local memory [7]. There exists one commercially available programmable machine based on this concept, the iWarp [2] However this machine is much more expensive than a Transputer machine and its processor is not available as a component chip. One could expect to use the Transputer links and the services of a ....
H.T. Kung. Systolic Communication. In International Symposium on Computer Architecture, pages 695--703, May 1988. Fine grain parallelism on a MIMD machine using FPGAs 9
....pin counts. 1.3 Virtual Wires To overcome pin limitations in FPGA based logic emulators, 1 we propose the use of virtual wires. A virtual wire represents a simple connection between a logical output on one FPGA and a logical input on another FPGA. Established via a pipelined, statically routed [12] communication network, these virtual wires increase available off chip communication bandwidth by multiplexing the use of FPGA pin resources (physical wires) among multiple emulation signals (logical wires) 1 Although this paper focuses on logic emulators, virtual wire technology can be ....
....thousand gates. Statically routed networks can be used whenever communication can be predetermined. Static refers to the fact that all data movement can be determined and optimized at compile time. This mechanism has been used in scheduling real time communication in a multiprocessor environment [12]. Other related uses of static routing include FPGAbased systolic arrays, such as Splash [7] and in the very large simulation subsystem (VLSS) 15] a massively parallel simulation engine which uses time division multiplexing to stagger logic evaluation. Virtual wires are similar to virtual ....
H. T. Kung. Systolic communication. In Proceedings of the International Conference on Systolic Arrays, San Diego, California, May 1988.
....parallel applications, we must use optimal communication patterns to offset the hardware communication cost. In this paper, we investigate the various communication algorithms for pipeline multicomputers. The pipeline architecture is a common system structure for solving many scientific problems [4, 3, 8, 9]. Even though pipeline multicomputers may not be as widely use as other multicomputers, the notion of data pipelining is an important one. Almost all data communication suffers from data propagation delay regardless of the form of parallel architecture. Because data flow in pipeline multicomputers ....
....of receive(x, m) by node y, blocks until a message matching the type and structure of m is received from x. Incorrect message type and structure from x causes an error. 3 Communication algorithms for pipeline multicomputer Pipelining is a common method for parallelizing many scientific problems [4, 3, 8, 9]. A virtual pipeline multicomputer consists of a master and a string of p interconnected nodes as illustrated in Figure 5. The nodes in the pipeline are labeled 0; 1; p Gamma 1 starting from the leftmost node. This unique number is the node s pid. In addition, message flow in a pipeline ....
[Article contains additional citation context not shown here]
Kung, H. T., "Systolic communication", IEEE International Conference on Systolic Arrays, San Diego, CA, pp. 695-703, May 1988.
....of the three communication functions of pipeline based algorithms. In this following two subsections, we demonstrate how we can improve the performance of pipelined matrix multiplication. 5. 4 Pipelined Matrix Multiplication (PMM) Pipelined matrix multiplication was first studied by Kung [13] and then was re implemented by Brinch Hansen [4] Two matrices A and B of size n Theta n are multiplied to produce matrix C; C = A Theta B. The number of processors in the pipeline is p. Without loss of generality, we assume that n=p produces no remainder. In the general approach, each ....
Kung, H. T., "Systolic communication", IEEE International Conference on Systolic Arrays, San Diego, CA, pp. 695-703, May 1988.
....the network at constant speed, and interact where they meet. To exploit this concept on programmable machines, cells are replaced by processors and an inter processor communication mechanism which avoids the overflow of the local memory 2 Fr ed eric Raimbault, Dominique Lavenier is introduced [19]. Among others, such machines include the iWarp [6] MicMacs [23] SPLASH [10] B SYS [14] and Blitzen [5] We fist present in this section the ReLaCS programming model, the view of systolic machines held by a programmer writing in the ReLaCS language; second, we present the ReLaCS execution ....
H.T. Kung. Systolic Communication. In ISCA, pages 695--703, may 1988.
....methods and local data. Moreover, communication protocols can be uses iWarp communication methods as part of the examples. developed independently from the program to handle Further discussions on systolic communication can be found communication specific issues such as deadlock avoidance in [13]. and recovery from transmission failures. This makes memory The organization of the paper is as follows. We first communication the method of choice for applications which describe the fundamental differences between systolic and do not assume detailed knowledge about intercell commemory ....
Kung, H. T. Systolic Communication. Proceedings of June 1990. the International Conference on Systolic Arrays, San Diego,
....centric processing. Systems Command under Contract N00039 87 C 0251. Authors affiliations: S. Borkar, G. Cox, S. Gleason, B. Moore, iWarp supports both tightly and loosely coupled paral C. Peterson, L. Rankin, J. Sutton, J. Urbanski: Intel Corporation; lel processing, and both systolic [12] and message pass R. Cohn, T. Gross, H. T. Kung, M. Lam, J. Pieper, P. S. Tseng, ing models of communication. J. Webb: Carnegie Mellon University . iWarp can implement a variety of processor intercon 2. iWarp overview nection topologies including 1 dimensional (1D) arrays, An iWarp system ....
....that multiple communication only when the full message is available in the local memory paths be multiplexed on a physical bus. Multiplexing can also of the destination cell is it ready to be operated upon. Conbe used to keep a long message from monopolizing the physiversely, in systolic mode [12], the unit of communication and cal bandwidth for an indefinitely long time. processing can be as fine grained as a single word in a message. Door to door message passing. When a message arrives at the destination cell, it is generally first buffered in a system 4.1.1. Message passing memory ....
Kung, H. T. Systolic Communication. Proceedings of the International Conference on Systolic Arrays, May, 1988, pp. 695-703.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC