| Bertin, P., Roncin, D., Vuillemin, J., "Programmable Active Memories: A Performance Assessment ", Proc. Symp. Research on Integrated Systems, Cambridge (Mass.) 1993 |
....programs, but blew up or big programs for which state space explosion turned out to be the rule. The next progress came from interaction with Jean Vuillemin s group at Digital Equipment Paris Research Laboratory. This group was developing the PeRLe FPGA based programmable hardware machine [16]. Many of the hardware designs involved controllers that are a pain in the neck to write with gates and registers, and the group thought that Esterel was very well adapted for direct controller specification. The author learned about logic and hardware and developed a structural translation of ....
P. Bertin, D. Roncin, and J. Vuillemin. Programmable active memories: a performance assessment. In G. Borriello and C. Ebeling, editors, Research on Integrated Systems: Proceedings of the 1993.
....programs, but blew up or big programs for which state space explosion turned out to be the rule. The next progress came from interaction with Jean Vuillemin s group at Digital Equipment Paris Research Laboratory. This group was developing the PeRLe FPGA based programmable hardware machine [16]. Many of the hardware designs involved controllers that are a pain in the neck to write with gates and registers, and the group thought that Esterel was very well adapted for direct controller specication. The author learned about logic and hardware and developed a structural translation of ....
P. Bertin, D. Roncin, and J. Vuillemin. Programmable active memories: a performance assessment. In G. Borriello and C. Ebeling, editors, Research on Integrated Systems: Proceedings of the 1993.
....declaring interfaces, interconnecting gates, etc. The embedded approach does this without making any modifications to existing language syntax, an important point because modifying the GPL syntax negates most of the advantages of the approach. Examples of past embedded languages include PAMDC [1] and Spyder [5] Embedded languages have some very significant advantages because they are based upon existing object oriented, general purpose languages. Embedded languages based on GPLs can access a large, readily available and well tested software toolbase to ease development, including: ....
P. Bertin, D. Roncin, and J. Vuillemin. Programmable active memories: a performance assessment. In G. Borriello and
....1 ff (1 Gamma Delta) 4) For instance, if Delta = one can evaluate p(x) for x and max jp i j . Those bounds may seem quite restrictive, but in practice there exist scaling techniques [8] that allow to compute p(x) for any x and p. 1. 2 The DEC PeRLe 1 Board The DEC PeRLe 1 board [3, 4] is a reconfigurable coprocessor designed by the Paris Research Laboratory of DEC. The board is based on a 4 Theta 4 computational matrix of XC3090 Xilinx FPGAs [16] surrounded by seven other FPGAs and four 1 MB memory banks, connected to the matrix. The Xilinx XC3090 FPGAs are programmable logic ....
P. Bertin, D. Roncin, and J. Vuillemin. Programmable active memories: A performance assessment. Technical Report 24, DEC Paris Research Lab., 1993.
....computing machine consisting of 23 Xilinx XC3090 FPGAs, a 4MB local RAM, and a 100MB s host bus. The PAM project has reported some of the best performance for configurable machines. A single PAM P 1 board can perform 2D DCT at a rate of 1. 4 GOPS (an OP is a multiply, add, subtract or shift)[2]. This section showed that RaPiD achieves 1.6 GOPS. From Advanced Research in VLSI (ARVLSI 99) pp 23 40, 1999. 18 7: Conclusion RaPiD represents an efficient configurable computing solution for regular computationallyintensive applications. By combining the appropriate amount of static and ....
P. Bertin, D. Roncin, and J. Vuillemin. Programmable active memories: a performance assessment. In Parallel Architectures and Their Efficient Use: First Heinz Nixdorf Symposium Proceedings, pages 119--30. Springer-Verlag, 1993.
....of the data path and high level compilation of the control circuit. FPGAs do not offer the high flexibility of silicon area. For data paths it is therefore sufficient to specify the logic, map the logic to lookup tables and specify their location using Xilinx Netlist (XNF) directives. PamDC[1] maps a structural description on the gate level to a Xilinx netlist. Experience with PamDC has shown that a low level, structural representation of FPGA circuits in C is very well suited for high performance FPGA design. The major drawback of PamDC is the low level of design. In order to ....
....traversing the sequencing graph. Global latency is determined by the sum of latencies in the longest path of the design which is equal to the maximal time stamp of global StreaModule outputs. The 4 6 0 0 0 0 0 0 6 6 16 6 6 0 1 1 0 0 1 0 6 6 0 1 10 10 10 0 5 2 1 0 8 out[1] in[1] in[0] out[0] start time=0 data rate=1 5 1 6 2 1 0 latency=2 8 time stamp=8 time stamp=1 fifo depth=5 time stamp=6 in[2] in[3] in[4] in[5] Figure 4: The figure shows the scheduled data flow graph implementing the equations: out[0] in[0] in[1] in[1] in[2] in[3] ....
[Article contains additional citation context not shown here]
P. Bertin, D. Roncin, J. Vuillemin, Programmable Active Memories: A Performance Assessment, ACM FPGA, February 1992.
....server 1 . The chip we designed houses 4 processors, each performing 100 million 12 bit operations per second. It was designed in 1994 using a 1 m CMOS technology. The 128 processor array is thus made of 32 chips. The reconfigurable interface is the PeRLe 1 board developed by Vuillemin et al. [3]. The Samba prototype could be largely improved by using up to date technology. As far as we can imagine it, the evolution of chip density would now allow between 16 and 20 processors to fit into a single chip, and running at a higher frequency. In the same way, the design of the reconfigurable ....
P. Bertin, D. Roncin, and J. Vuillemin, Programmable active memories: a performance assessment,inParallel architectures and their e#cient use, F. Meyer, B. Monien, and A.L. Rosenberg, editors, pp: 119--130, LNCS, Springer-Verlag, October 1992.
....of the data path and high level compilation of the control circuit. FPGAs do not support this highly flexible use of silicon area. For data paths it is therefore sufficient to specify the logic, map the logic to lookup tables and specify their location on the FPGA device. Experience with PamDC[1], a gate level design environment from the PAM project, has shown that a low level, structural representation of FPGA circuits in C is very well suited for high performance FPGA design. The major drawback of PamDC is the enormous design effort required at the gate level. In order to simplify the ....
....with a datapath width of 4 bits. The following code shows one round of IDEA encryption: const int NUMBLOCKINPUTS=4; const int NUMBLOCKOUTPUTS=4; const int BITS=16, COMPMODE=DIGITSERIAL; const int key[10] 9277,98,237,4,978,122,723,3654,24,1536; HWint BITS t[9] temp; void IDEA: build( t[1] = ideaKCM16(in[0] key[0] t[2] in[1] key[1] t[3] in[2] key[2] t[4] ideaKCM16(in[3] key[3] tmp = t[1] t[3] tmp = ideaKCM16(tmp , key[4] t[7] tmp (t[2] t[4] t[8] ideaKCM16(t[7] key[5] tmp = t[8] tmp) out[0] t[1] t[8] out[3] t[4] tmp; tmp = ....
[Article contains additional citation context not shown here]
P. Bertin,D. Roncin,J. Vuillemin, Programmable Active Memories: A Performance Assessment, ACM FPGA Conference, Monterey, Feb. 1992.
....of the data path and highlevel compilation of the control circuit. FPGAs do not support this highly flexible use of silicon area. For data paths it is therefore su#cient to specify the logic, map the logic to lookup tables and specify their location on the FPGA device. Experience with PamDC[1], a gate level design environment from the PAM project, has shown that a low level, structural representation of FPGA circuits in C is very well suited for high performance FPGA design. The major drawback of PamDC is the enormous design e#ort required at the gate level. In order to simplify the ....
....a datapath width of 4 bits. The following code shows one round of IDEA encryption: const int NUM BLOCK INPUTS=4; const int NUM BLOCK OUTPUTS=4; const int BITS=16, COMP MODE=DIGIT SERIAL; const int key[10] 9277,98,237,4,978,122,723,3654,24,1536 ; HWint BITS t[9] temp; void IDEA: build( t[1] = ideaKCM16(in[0] key[0] t[2] in[1] key[1] t[3] in[2] key[2] t[4] ideaKCM16(in[3] key[3] tmp = t[1] t[3] tmp = ideaKCM16(tmp , key[4] t[7] tmp (t[2] t[4] t[8] ideaKCM16(t[7] key[5] tmp = t[8] tmp) out[0] t[1] t[8] out[3] t[4] ....
[Article contains additional citation context not shown here]
P. Bertin,D. Roncin,J. Vuillemin, Programmable Active Memories: A Performance Assessment, ACM FPGA Conference, Monterey, Feb. 1992.
....about the Bioccelerator machine has been taken from the following WEB server address http: sgbcd.weizmann.ac.il BicMosaic.html HScan [7] is 128 processor filter dedicated for scanning DNA databases. It has been developed at IRISA and validated on a FPGA platform, the PeRLe1 prototype board [4]. It finds similar segments of identical length as BioScan does. The main di#erence between the other two systems is that it does not make exact calculation, but only detects the potentially interesting areas where similarities may appear. It is not yet a commercially available. 3.3 VLSI ....
P. Bertin, D. Roncin, and J. Vuillemin. Programmable active memories : a performance assessment. In F. Meyer auf der Heide, B. Monien, and A.L. Rosenberg, editors, Parallel Architectures and their e#cient use, pages 119--130, Lecture notes in Computer Science, Springer-Verlag, oct 1992.
....into a generalpurpose framework. These efforts include the attachment of special hardware substructures, proposed by Estrin [1] enhancing processor instruction sets with specialized complex instructions [2] 3] dynamic microprogramming [4] 5] utilizing reconfigurable computing elements [6] 7] [8], and the use of coprocessors [9] Unfortunately, all of the above efforts suffer from one or both of the following fundamental limitations: i) The integration effort is determined at design time, is permanent throughout its life, and, therefore, incapable of addressing new application programs. ....
P. Bertin, D. Roncin, and J. Vuillemin. Programmable active memories: A performance assessment. Research Report 24, Digital Paris Research Laboratory, March 1993.
....of research in the development of FPGA based reconfigurable computing engines. prism [2] developed at Brown University demonstrates substantial speedup in the case of large binary operations. pam, a universal reconfigurable hardware co processor developed by researchers at DEC Paris Research Labs [3, 4, 29], has been used to demonstrate superior performance cost ratio compared to every other existing technology of its time on a dozen applications ranging from computer arithmetic, cryptography, image analysis, neural networks, video compression, high energy physics, biology and astronomy. Another ....
P. Bertin, D. Roncin, and J. Vuillemin. Programmable active memories: a performance assessment. In G. Borriello and C. Ebeling, editors, Research on Integrated Systems: Proceedings of the 1993 Symposium, pages 88--102, 1993.
....ADPCM decoder cycle counts Let us take a closer look at how the total cycles are distributed across the code. The basic block level execution profile is shown in Table 1.4. The BB# column in the table gives the basic block number, the DynCyc column 3 Program 1 ADPCM Decoder static int indexTable[16] = 1, 1, 1, 1,2,4,6,8, 1, 1, 1, 1,2,4,6,8 ; step variation table static int stepsizeTable[89] 7,8,9,10,11,12,13,14,16,17,19,21,23,25,28,31,34,37,41,45,50,55,60,66,73,80, 88,97,107,118,130,143,157,173,190,209,230,253,279,307,337,371,408,449,494,544,598,658,724,796,876,963, ....
....as a field, reconfigurable computing is rather new, but it is gaining momentum. 3. 2 Application Studies Wireless communications, spread spectrum communications,IQ demodulation [102, 142, 82] Genetic Algorithms [54, 68, 151, 97, 56, 55] SAR,ATR [127, 130, 126, 167] Image coding, compression [147, 49, 142, 1, 16, 170, 134, 37, 41, 18]. DCT,FFT,filters [148, 35, 176, 115, 81, 116, 146, 89] Viterbi decoder [180] Parallel object recognition, geometric hashing [ Digit recurrence division, square root [100, 99] Various (big num, algebra, 23 etc) 16] Polynomial evaluations [44] On line arithmetic [160] Floating point ....
[Article contains additional citation context not shown here]
P. Bertin, D. Roncin, and J. Vuillemin. Programmable active memories: a performance assessment. In G. Borriello and C. Ebeling, editors, Research on Integrated Systems: Proceedings of the 1993 Symposium, pages 88--102, 1993. 151
....1 1 fiG t F t FeTVc (2) Dans la suite de cette section nous allons affiner les r esultats obtenus en tenant compte des diff erents parall elismes de donn ees propres a chaque configuration. Pour une architecture statique a base de FPGAs, comme dans les projets Splash et Perle de DEC [11] [10], nous d efinissons la puissance utile n ecessaire a une application temps r eel : Pu s = G s F e (3) Elle s exprime comme le produit du nombre de portes equivalentes G s (complexit e de l application) et la fr equence d echantillonnage des donn ees F e . On d efinit de la meme mani ere la ....
P. Bertin, D. Roncin and J. Vuillemin, Programmable active memories: 14 a performance assessment, In F. Meyer auf der Heide, B. Monien, and A. L. Rosenberg, editors, Parallel Architectures and their efficient use, pp. 119-130, Lecture notes in Computer Science, Springer-Verlag, October 1992.
....diversity for a task and must be reprogrammed in order to complete the task, the device goes partially or entirely unused during the reprogramming cycle. As one example of pipelining, i o, and functionality limitations, DEC s Programmable Active Memories ran from 15 33MHz for several application [BRV92] At these rates, the peak functional density extracted from the XC3090 semployedwas13 26 gate evaluations 2 s , only about 10 20 of the potential functional density. Example: Average Calculation Consider, again, our windowed average calculation: avg i = 1 8 Delta (x i Gamma3 x ....
Patrice Bertin, Didier Roncin, and Jean Vuillemin. Programmable Active Memories: A Performance Assessment. Prl report, DEC Paris Reserch Laboratory, 85, Av. Victor Hugo, 92563 Rueil-Malmaison Cedex, France, June 1992.
....as exemplified by Field Programmable Gate Arrays. An array of programmable logic can be used to configure applications specific hardware and thereby obtain excellent performance. For example, research at DEC Paris have implemented algorithms ranging from Laplace filters to binary convolutions (Bertin, Roncin Vuillemin 1993). If the per chip performance is computed as aggregate performance divided by the number of FPGA chips in the system, each Xilinx chip delivers approximately 500 million 16 bit operations per second. Machine Performance Clock I O BW Mem BW Tech 16 bit MOPS MHz MB sec Internal External TMC CM 2 ....
Bertin, P., Roncin, D. & Vuillemin, J. (1993), Programmable Active Memories: A Performance Assessment, in `Research on Integrated Systems Symposium'.
....far beyond what was expected. Scientists have exploited the reprogrammability of these devices to build custom hardware to solve computationally intensive problems such as simulation of spin systems in statistical physics [21] calculation of evolutionary distance between gene sequences [22, 23, 24], and real time image segmentation on an assembly line [19] Reprogrammable FPGAs also form the basis of both hardware emulation [25, 26, 27] and new approaches to system level reconfigurability [25, 28, 29] In particular, GANGLION [19] is a connectionist classifier (a.k.a. artificial neural ....
P. Bertin, D. Roncin, and J. Vuillemin. Programmable Active Memories: a Performance Assessment. In 1 st International ACM/SIGDA Workshop on Field Programmable Gate Arrays, pages 57--59, Berkeley, California, February 1992.
....board contained 23 XC3090 s roughly 15,000 4 LUTs. Using this component as an accelerator, DEC PRL was able to speedup many application by an order of magnitude and, in some cases, provide performance in excess of conventional supercomputers or custom VLSI implementations. Highlights from [BRV92] ffl Large number multiply 16 Theta faster than Cray II ffl 600kbit s, 512 bit RSA decoding fastest implementation in existence at time of development 10 Theta best software implementation on DEC Alpha ffl String matching within a factor of two of custom implementation requiring 28 VLSI ....
....diversity for a task and must be reprogrammed in order to complete the task, the device goes partially or entirely unused during the reprogramming cycle. As one example of pipelining, i o, and functionality limitations, DEC s Programmable Active Memories ran from 15 33MHz for several application [BRV92] At these rates, the peak functional density extracted from the XC3090 s employed was 13 26 gate evaluations 2 s , only about 10 20 of the potential functional density. Example: Average Calculation Consider, again, our windowed average calculation: avg i = 1 8 Delta Gamma x ....
Patrice Bertin, Didier Roncin, and Jean Vuillemin. Programmable Active Memories: A Performance Assessment. Prl report, DEC Paris Reserch Laboratory, 85, Av. Victor Hugo, 92563 Rueil-Malmaison Cedex, France, June 1992.
.... or emulation of complex systems such as SIMD architectures [Gokhale et al. 1991; Arnold et al. 1992] MIMD parallel processors [Barroso et al. 1994] neural networks [Cox and Blanz 1992] accelerators for scientific computation [Monaghan and Noakes 1992] and general purpose coprocessors [Bertin et al. 1992; Thomas et al. 1991] In addition, advances in high level hardware description languages and synthesis tools have significantly reduced the time for hardware system prototyping [Micheli 1994] We are currently using VHDL as the hardware description language for development of the simulation ....
Bertin, P., Roncin, D., and Vuillemin, J. 1992. Programmable active memories: a performance assessment. In Proceedings of the International ACM/SIGDA Workshop on Field Programmable Gate Arrays (February), 57--59.
No context found.
Bertin, P., Roncin, D., Vuillemin, J., "Programmable Active Memories: A Performance Assessment ", Proc. Symp. Research on Integrated Systems, Cambridge (Mass.) 1993
No context found.
P. Bertin, D. Roncin, and J. Vuillemin. Programmable active memories: a performance assessment. In G. Borriello and C. Ebeling, editors, Research on Integrated Systems: Proceedings of the 1993.
No context found.
P. Bertin, D. Roncin, J. Vuillemin, Programmable Active Memories: A Performance Assessment, ACM FPGA, February 1992.
No context found.
P. Bertin, D. Roncin, and J. Vuillemin. Programmable active memories: a performance assessment. In G. Borriello and C. Ebeling, editors, Research on Integrated Systems: Proceedings of the 1993.
No context found.
P. Bertin, D. Roncin, and J. Vuillemin. Programmable active memories: a performance assessment. In G. Borriello and C. Ebeling, editors, Research on Integrated Systems: Proceedings of the 1993.
No context found.
P. Bertin, D. Roncin, and J. Vuillemin, "Programmable Active Memories: A Performance Assessment," Proceedings of the 1993 Symposium: Research on Integrated Systems, MIT Press, 1993. 126
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC