| S. Sakai, y. Yamaguchi, K. Hiraki, Y. Kodama, and T. Yuba, "An architecture of a dataflow single chip processor," in International Symposium on Computer Architecture, 1989. |
....The problem then became how to build a fully decentralized dataflow machine. WaveScalar is the creative extension of this line of reasoning. Dataflow has a long history. The first designs appeared in the early 70 s [6, 19, 20] and there was a significant revival in the 80 s and early 90 s [21, 22, 23, 24, 25, 26, 27]. Dataflow machines execute programs according to the dataflow firing rule (DFR) which stipulates that an instruction may execute at any time, as long as its operands are available. When dataflow instructions complete, they trigger the execution of dependent instructions. Values in a dataflow ....
....projects and stands in contrast to them. 5.1 Dataflow Dataflow computing is perhaps the best studied alternative to the Von Neumann model of computation. The first dataflow architectures [6, 20] appeared in the mid to late 70 s, and in the late 80 s and early 90 s there was a notable revival [21, 22, 23, 24, 25, 26]. The dataflow work of the late 80 s and early 90 s made it clear that high performance dataflow machines were difficult to build. Culler et al. 58] articulated this difficulty as a cost benefit problem and argued that dataflow suffers from two fundamental problems, both of which have to with ....
S. Sakai, y. Yamaguchi, K. Hiraki, Y. Kodama, and T. Yuba, "An architecture of a dataflow single chip processor," in Proceedings of the 16th annual international symposium on Computer architecture, pp. 46--53, ACM Press, 1989.
....1.1.2 Iannucci s Hybrid Architecture Development of hybrid architectures is an active area of research. See [GaudJot and Bic 1989] for a summary of recent research in the area. One of the best known hybrid architectures is the EM 4 being developed at the Electrotechnical Laboratory in Japan [Sakai et al. 1989]. I chose to base my work on Iannucci s system because of the ease with which I could access his compiler, developed at MIT, as well as its quality. Iannucci s extensions to the Id compiler make use of information available at compile time to create scheduling quanta (SQs) sequences of code ....
Sakai, Shuichi; Yamaguchi, Yoshinori; Hiraki, Kei; Kodama, Yuetsu; and Yuba, Toshitsugu. An Architecture of a Dataflow Single Chip Processor. Pro- ceedin9s of the 16 th Annual International Slmposium on Computer Architec- ture, Jerusalem, Israel, 1989, pages 46-53.
....past, this has been done on special purpose dataflow machines such as DDM1 [5] and the Manchester Dataflow Machine [6] that directly execute dataflow graphs. In addition to research in pure dataflow architectures, there is a growing interest in developing hybrid architectures , such as the EM 4 [12], that take advantage of the parallelism found by dataflow methods without sacrificing the straight line efficiency of von Neumann machines. Most recently, people have begun developing compilation techniques for executing dataflow programs on general purpose parallel machines [1] Our work is on ....
Shuichi Sakai, Yoshinori Yamaguchi, Kei Hiraki, Yuetsu Kodama, and Toshitsugu Yuba, "An Architecture of a Dataflow Single Chip Processor," Proceedings of the 16 th Annual International Symposium on Computer Architecture, Jerusalem, Israel, 1989, pages 46--53.
....building practical machines. The Dutch company DTN developed a simple dataflow machine based on the PD7281 [20] In Japan two follow up dataflow projects are under way. Building on experience with the SIGMA 1, the EM 4 with target structure of more than 1000 processing elements is being produced [12]. The work on the Q p project has led to the fabrication of a set of five VLSI chips, to be combined next year into a single chip stand alone data driven processor [9] The seminal dataflow research by Arvind at MIT has recently resulted in an arrangement with Motorola to build a machine based on ....
S.Sakai et al. An Architecture of a Dataflow Single Chip Processor, Proc. ISCA 89 , Jerusalem, Israel, 1989.
....within and across processors. Moreover, given sufficient parallelism, dynamic instruction scheduling has the added pragmatic benefit of being resilient to long and unpredictable communication latency. The most recent generation of dataflow machines (e.g. MIT s Monsoon [11, 12] ETL s EM 4 [13], and Sandia s Epsilon 2 [6] have shown how operand matching can be accomplished with simple hardware structures in two machine cycles. There does seem to be an unavoidable price of purely dynamic instruction scheduling, however. Each dyadic (two input) instruction requires the dynamic matching ....
....for instructions within that expression in order to eliminate any intraexpression synchronization and associated data copying. In this case, intermediate values could be communicated among instructions in the sequence via temporary registers, rather than asynchronously propagating values on tokens [9, 13]. Another objection to pure dataflow graphs concerns the coding of low level operating system and resource management functions, such as trap handlers and storage allocation routines. These operations require frequent, guaranteed exclusive access to processor state and need critical sections ....
[Article contains additional citation context not shown here]
S. Sakai, Y. Yamaguchi, K. Hiraki, Y. Kodama, and T. Yuba. An Architecture of a Dataflow Single Chip Processor. In Proceedings of the 16th Annual International Symposium on Computer Architecture, pages 46--53, Jerusalem, Israel, June 1989.
....possibility of deadlock requires atomic primitives and more extensive analysis. Our register allocation scheme is based on that of Chow and Hennessy [11] adapted for lazy state saving. The problem of register allocation in the presence of synchronization points has been studied in dataflow models [14, 41, 43], but the model is slightly different. For instance, TAM has many threads per context, whereas our execution model has only single thread per context, making local analysis around the touches sufficient. The non aliasing property of an access region s objects inside the region achieves runtime ....
S. Sakai, Y. Yamaguchi, K. Hiraki, Y. Kodama, and T. Yuba. An architecture of a dataflow single chip processor. In International Symposium on Computer Architecture, 1989.
....Features of EM 4 Our base hardware architecture is EM 4, which was designed and built by Electro Technical Laboratory (ETL) in Tsukuba, Japan. This section summarizes EM 4 s features essential to our software hardware architecture for a concurrent OO language. More details of EM 4 can be found in [13, 11]. EM 4 was originally designed to support the strongly connected arc dataflow model, which extends the dataflow model by introducing a new type of arcs called strongly connected arcs into dataflow graphs. A dataflow subgraph whose nodes are connected by strongly connected arcs is called the ....
....clocks, or 160ns. EM 4 also features zero overhead context switching. The context switching occurs when the CPU exits from the SCB and receives a packet. There is no context switching overhead involved owing to pipeline fusioning of the normal dataflow pipeline and the strongly connected pipeline[13]. 6.1.2 Flexibility in Amount of the Context to be Preserved Some machines have (1) provided support for automatic preservation of register contexts, and or (2) decreased the number of general registers in order to reduce the overhead of context switching. For example, in J Machine, there are ....
Shuichi Sakai, Yoshinori Yamaguchi, Kei Hiraki, Yuetsu Kodama, and Toshitsugu Yuba. An architecture of a dataflow single chip processor. In Proc. of the 16th Annual International Symposium on Computer Architecture, pages 46--53, June 1989.
.... there is a 1 tradeoff for the excessive use of eager data transfer; it increases the number of messages and matchings of the messages, but we demonstrate the effectiveness when employed with compiled pipelined sends, via performance measurements on a fine grained hybrid parallel architecture EM 4[4, 3]. There are many different levels of abstraction, even for the same general notion of directly communicating threads. For example, Concurrent Object Oriented Programming (COOP) languages, such as ABCL[10] provide high level abstraction, while distributed memory machines themselves provide the ....
....based on flow analysis) and (6) assembly language. The plan do style is mainly used for the high level code to provide dataflow information without loss of machine independence of the high level code. The compiler generates assembly code for a fine grained hybrid parallel architecture EM 4[4, 3], which was developed and built at Electrotechnical Laboratories. EM 4 consists of 80 processing elements and runs at 12.5 MHz clock speed, and facilitates a fine grained communication mechanism; for example, it provides a two words size packet output instruction which directly sends data from ....
S. Sakai, Y. Yamaguchi, K. Hiraki, Y. Kodama, and T. Yuba, "An Architecture of a Dataflow Single Chip Processor," Proc. of ISCA'89, pp. 46--53, June 1989.
....9 8 7 6 5 4 3 2 1 B A Figure 3.13 The macro data flow model used by EM 4. Researchers at Electrotechnical Laboratory, who produced what probably is the only practical data flow computer available (the SIGMA 1 [59, 60, 61] have now developed a new generation of (hybrid) data flow, the EM 4 [62]. The processing element of the machine, called EMC R, is essentially a RISC processor augmented with hardware for data flow operation (a fetch and matching unit) Their programming model is represented by the strongly connected arc model, in which a data flow graph can have two types of arcs: ....
S. Sakai, Y. Yamaguchi, K. Hiraki, Y. Kodama, and T. Yuba, "An architecture of a dataflow single chip processor," in Proceedings of the 16 th International Symposium on Computer Architecture, Jerusalen, Israel, pp. 46--53, May 1989.
....where synchronization is implicit, and strict coarse grain execution, where locality is highly exploited. Such advantages make multithreading a candidate for general purpose parallel computing, and many researchers have been interested in multithreaded models, architectures, and abstract machines[2, 3, 4, 8, 10]. Even with such new parallel computational models and improvements in processor speeds, the memory system in parallel computing is still a stumbling block to achieve comparable performance of computer systems. Especially in the parallel computations with centralized global data structures, ....
S. Sakai, Y. Yamaguchi, and K. Hiraki. "An Architecture of a Dataflow Single Chip Processor". In Proc. 16th Annual Int'l Sympo. on Computer Architecture, pp. 46-53, May 1989.
....and therefore, they can be executed without context switching overhead. Circular pipeline stages will be set full if sufficient parallelism resides in the execution of concurrent processes. This is quite different point from other multithreading architectures such as Monsoon [13] and EM 4 [14]. 2.2 Optimization of Datarol Architecture The motivation for designing an optimized Datarol processor, which we call Datarol II, is to improve the efficiency in the execution sequential program code while preserving high efficiency in the execution of parallel threads and or processes. Previous ....
....execution is performed in high speed due to RISC type execution mechanism in short cycle, and the inter thread concurrent execution also achieves high throughput due to the ultra multiprocessing mechanism inherited from Datarol I architecture. Similar architectures are Monsoon[13] T[12] EM 4[14], J Machine[6] and Epsilon 2[8] Monsoon is also designed to support multithread execution by introducing synchronizing join operator. Monsoon uses ordinary random access memory for join counter management and performs its join operation as one of general instructions. Join operation in ....
S. Sakai, Y. Yamaguchi, K. Hiraki, and T. Yuba, "An Architecture of a Dataflow Single Chip Processor," Proc. 16th ISCA, pp.46-53, (1989).
....the step following P RISC in dynamic data flow; it is implemented with (almost) stock microprocessors, and it is is even more compatible with conventional parallel machines based on von Neumann processors. Examples of machines specifically designed for macro data flow operation include the EM 4 [34], Epsilon 2 [14] OSCAR [17] and Harray [41, 42] The theory for construction and partitioning of a program dependence graph (PDG) and its use in the exploitation of loop and functional parallelism is presented in [35] The goal, as in autoscheduling, is to execute a parallel program at the ....
Shuichi Sakai, Yoshinori Yamaguchi, Kei Hiraki, Yuetsu Kodama, and Toshigtsu Yuba. An Architecture of a Dataflow Single Chip Processor. In Proceedings of the 16 th International Symposium on Computer Architecture. May, 1989, Jerusalem, Israel, pages 46--53, 1989.
....the Dataflow machines; these machines support only small messages sent and (implicitly) received with individual instructions. Examples of processors with integrated network interfaces include transputers[45] the iWarp systolic processor[6, 7] dataflow processors like Monsoon[15] and the EMC R[38], and hybrid processors such as the MDP[16] M machine[17] and the MIT Motorola 88110MP[36] The Alewife[1] is an example of a network interface on the L1 cache interface. Alewife supports multi part message specification and its Sparcle processor provides hardware multi threading. Race ....
S. Sakai, Y. Yamaguchi, K. Hiraki, Y. Kodama, and T. Yuba. An Architecture of a Dataflow Single Chip Processor. 16th Annual International Symposium on Computer Architecture, pages 46--53, 1989.
....unnecessarily costly in dealing with regular data structures such as vectors and matrices. Solutions to several of the above problems have been proposed in the design of second generation dataflow and multithreaded machines, such as Monsoon [8, and T [9] PRisc [10] Epsilon 2 [ EM 4 [11, 12], the hybrid model proposed by Iannucci [13] and the multithreaded execution model TAM [14, 15, 16] The major alterations to the basic fine grain dataflow model are: ffl Increased granularity has been introduced to reduce communication and matching overhead and simplify resource management. ....
S. Sakai, Y. Yamaguchi, K. Hiraki, Y. Kodama, and T. Yuba. An architecture of a data-flow single chip processor. In Proc. 16 th Int. Symp. on Computer Architecture, pages 46--53, May 1989.
....which we will consider in Chapter 5, but is independent of the issues presented in this section. R. Nikhil and Arvind expressed clearly the need for a tree of activation frames [NA89] Many systems have been designed to support a tree of activation frames, including dataflow machines such as EM 4 [SYH 89] and Monsoon [PC90] combinator reduction interpreters [Tur79, DR81] and multiprocessor LISP systems [Hal84] and threaded interpreters [Cla87, CSS 91, NPA92, Hal94] parent processor child processor parent processor child processor parent processor child processor parent processor child ....
....be to use on processor multithreading to allow up to K frames per processor per level in the call tree. The worst case bound does not get much better, but the typical case might show some improvement. Several systems to support dynamic MIMD style programming have been proposed and implemented [DR81, Hal84, Bir89, SYH 89, PC90, NPA92]. Those systems do not provide predictable, high performance nor do they provide any space bounds on running programs. D. Culler provides some ad hoc techniques for managing the space bounds of a dataflow program [Cul89] By limiting the number of iterations of a parallel loop that can run at the ....
Shuichi Sakai, Yoshinori Yamaguchi, Kei Hiraki, Yuetsu Kodama, and Toshitsugu Yuba. An Architecture of a Dataflow Single Chip Processor. In Proc. 16th Annual International Symposium on Computer Architecture, Jerusalem, Israel, pages 46--53, May 28-June 1, 1989.
....processor: i) in the processor core; ii) on a cache interface; iii) on the cache coherent memory bus; or (iv) on the I O bus. Examples of processors with integrated network interfaces include transputers[45] the iWarp systolic processor[6, 7] dataflow processors like Monsoon[16] and the EMC R[38], and hybrid processors such as the MIT MDP[17] M machine[18] and the MIT Motorola 88110MP[36] Currently, market forces and engineering effort dictate microprocessor design, making it extremely difficult to take this approach within a commercial microprocessor. The MIT Alewife[1] is an example ....
S. Sakai, Y. Yamaguchi, K. Hiraki, Y. Kodama, and T. Yuba. An Architecture of a Dataflow Single Chip Processor. 16th Annual International Symposium on Computer Architecture, pages 46--53, 1989.
....whenever a token arrives the processor must check the availability of its partner. If the partner has not arrived, the token must be saved until the arrival of the partner [6] In modern dataflow machines this type of synchronization is performed using an explicitlyaddressed token store [15, 32]. Tokens synchronize at a compiler ordained offset into an activation frame. Thus, when a token is processed, only the presence bits of the specified memory location are examined to see if the token s partner has already arrived. When the second token of an add instruction arrives, the value of ....
S. Sakai, Y. Yamaguchi, K. Hiraki, Y. Kodama, and T. Yuba. An Architecture of a Dataflow Single Chip Processor. Proceedings of the 16th Annual International Symposium on Computer Architecture, Jerusalem, Israel, pages 46--53, 1989.
No context found.
S. Sakai, y. Yamaguchi, K. Hiraki, Y. Kodama, and T. Yuba, "An architecture of a dataflow single chip processor," in International Symposium on Computer Architecture, 1989.
No context found.
S. Sakai, Y. Yamaguchi, K. Hiraki, Y. Kodama, and T. Yuba. An architecture of a dataflow single-chip processor. In 16th Annual International Symposium on Computer Architecture, pages 46--53, June 1989.
No context found.
S. Sakai, Y. Yamaguchi, K. Hiraki, Y. Kodama, and T. Yuba. An Architecture of a Dataflow Single Chip Processor. In Proc. of the leth Annual Int. S!/mp. on Comp. Arch., pages 46 53, Jerusalem, Israel, June 1989.
No context found.
S. Sakai, Y. Yamaguchi, K. Hiraki, Y. Kodama, and T. Yuba. An Architecture of a Dataflow Single Chip Processor. In Proc. of the 16th Annual Int. Symp. on Comp. Arch., pages 46--53, Jerusalem, Israel, June 1989.
No context found.
S. Sakai, Y. Yamaguchi, K. Hiraki, Y. Kodama, and T. Yuba. An architecture of a data-flow single chip processor. In Proc. 16 th Ann. Int. Symp. on Computer Architecture, pages 46--53, May 1989.
No context found.
S. Sakai, Y. Yamaguchi, K. Hiraki, Y. Kodama, and T. Yuba, "An architecture of a dataflow single chip processor," Proc. 16th Ann. Int'l Symp. on Computer Architecture, pp. 46--53, June 1989.
No context found.
S. Sakai, Y. Yamaguchi, K. Hiraki, Y. Kodama, and T. Yuba. An architecture of a dataflow single chip processor. In Proc. 16
No context found.
S. Sakai, Y. Yamaguchi, K. Hiraki, Y. Kodama, and T. Yuba. An architecture of a dataflow single chip processor. In Proc. 16 th Ann. Int. Symp. on Computer Architecture, pages 46--53, 1989.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC