DMCA
PROCESSOR GENERATOR FOR TRANSPORT TRIGGERED ARCHITECTURES (2003)
Citations
1653 |
Error bounds for convolutional codes and an asymptotically optimal decoding algorithm
- Viterbi
- 1967
(Show Context)
Citation Context ... used in DSP realizations but it is difficult to exploit in C compilers because the ANSI C does not contain predefined data type for fractional representation. The third benchmark is Viterbi decoding =-=[26]-=-, an algorithm widely used in many decoding and estimation applications in the communications and signal processing domain. The algorithm decodes 256-state 1/2-rate convolutional codes and, contains p... |
619 |
Digital Integrated Circuits: A Design Perspective, Pearson Education, Upper Saddle River,
- Rabaey, Chandrakasan, et al.
- 2003
(Show Context)
Citation Context ...pressions of the boolean results of the compare units (Rx and Ry) can be specified. MOVE32INT was realized and fabricated in a 2 µm (minimal gate length 1.6 µm, 2 metal layers) CMOS Sea of Gates (SoG)=-=[11]-=- technology. The SoG image contains 88 rows of 1088 transistor pairs per row resulting in 191k transistors. The total die size is 1s1 cm 2 . MOVE32INT can achieve a relatively high clock rate, 80 MHz,... |
142 |
Getting to the Bottom of Deep Submicron,"
- Sylvester, Keutzer
- 1998
(Show Context)
Citation Context ...gate length is decreased to 90 nm and below the effect of wiring has to be considered even on designs of gate count under 100 kgates, such as the MOVE processors cores discussed in this chapter. [28] =-=[29]-=- During the synthesis the capacitances and thus the delays associated with final wiring are unknown. Models of interconnect, known as wire-load models [29], attempt to predict the amount of capacitanc... |
133 |
Reuse Methodology Manual for System-on-a-Chip Designs,
- Keating, Bricaud
- 1998
(Show Context)
Citation Context ... source VHDL source code file where the filename, version history and information about functional unit is given. Moreover, generics and constants were used extensively instead of hardcoded literals. =-=[21]-=- discusses the coding practices for reusable RTL code in more depth. The designed library contains functional units that cover all integer operations that are supported by the front-end of the softwar... |
126 | Register Organization for Media Processing.
- Rixner, Dally, et al.
- 2000
(Show Context)
Citation Context ...the performance tends to degrade when the number of ports is very high. Clustered VLIW architectures have been introduced to lower the requirements for the register file and thus improve scalability. =-=[3]-=- [4] Transport triggered architecture was developed to reduce the complexity of VLIW by placing the register traffic under program control. In other words, the data transports become visible at the ar... |
125 |
Microprocessor Architectures: From VLIW to TTA.
- Corporaal
- 1997
(Show Context)
Citation Context ... connectivity, which results in quadratically growing complexity with number of functional units. However, the full bandwidth of this network is seldom utilized, not even when all the units are busy. =-=[2]-=- The register file complexity may also become a bottleneck in VLIW processors. For each functional unit two read ports and one write port are required. This is only for the worst case situation when e... |
110 |
Application-Specific Integrated Circuits
- SMITH
- 1997
(Show Context)
Citation Context ... the processor generator are compared in Section 5.3. In principal, the implementation flow is similar to conventional a standard cell ASIC design flow, which is thoroughly explained, for example, in =-=[23]-=-. The VHDL code obtained from the processor generator and predesigned libraries were used as design entry. Functional verification and register transfer level simulation of the VHDL code was performed... |
46 | Partitioned Register File for TTAs.
- Janssen, Corporaal
- 1996
(Show Context)
Citation Context ...ificantly in comparison to VLIWs. Moreover, the GPRs and the register file ports can be efficiently partitioned into multiple register files partitions without notable degradation in performance. [7] =-=[8]-=- 2.3 Software Aspects For traditional operation triggered architectures (OTA), such as RISCs and VLIWs, the executable program consists of an ordered set of operations which are performed by the proce... |
45 |
1364-2005) for Verilog Hardware Description Language
- Standard
- 2005
(Show Context)
Citation Context ...y independent but it can still contain the exact definitions of registers, buses, and off-chip ports that the physical implementation requires. Both VHDL and Verilog, being standardized languages [16]=-=[17]-=-, are accepted as design entry format by the majority of design automation tools. A special class of tools, logic synthesis software, can automatically transform a behavioral description of the proces... |
38 |
Code generation for transport triggered architectures.
- Hoogerbrugge
- 1996
(Show Context)
Citation Context ...roceeds only when functional unit is accessed by a result move. For this reason FU pipeline has to be flushed if speculative operations have been triggered.s2. Transport Triggered Architectures 21 In =-=[6]-=-, the two pipelining alternatives were compared in terms of clock cycle count. Even though the hybrid pipelining offers greater scheduling freedom, compared to virtualtime-latching pipelines, the requ... |
24 |
Introduction to Digital Systems.
- Ercegovac, Moreno, et al.
- 1999
(Show Context)
Citation Context ...ability, however, is not typically a severe problem in modern process technologies which have more than five metal layers available for wiring. Demultiplexing can also be realized with AND-OR network =-=[20]-=-. Fig. 22 depicts a datapath that is architecturally identical with one illustrated in Fig. 21. For each bus connection an AND gate of bus bitwidth is required in the demultiplexers which are presente... |
21 | Using Transport Triggered Architectures for Embedded Processor Design.
- Corporaal, Arnold
- 1998
(Show Context)
Citation Context ...nvironments specifically for generating application-specific processors and their language tools makes the actual CPU creation considerably more attractive. One of such design tools is MOVE framework =-=[1]-=-, a set non-commercial software tools aimed at computer aided design of application-specific processors. The MOVE framework utilizes a subset of transport triggered architecture (TTA), a class of very... |
19 |
Closing the Gap Between ASICs and Custom”,
- Chinnery, Keutzer
- 2000
(Show Context)
Citation Context ...cted together, as long as only one driver is enabled at a time. Tristate buses are commonly used in full custom designs, but in standard cell based ASICs the use of tristate buses is not recommended. =-=[19]-=- Tristate buses require specific production test structures in order to get stuck’at 1 and stuck’at 0 faults detected. In addition, a standard cell library usually contains only a limited selection of... |
16 |
Compiler strategies for transport triggered architectures.
- Janssen
- 2001
(Show Context)
Citation Context ...s between hardware resources, and an instruction scheduler optimizes the code trying to minimize the execution time and code size. Details of code generation and optimization are discussed in [6] and =-=[9]-=-. A software toolset for code generation is described in chapter 3. 2.4 Realizations A few prototype implementations have been designed and manufactured. Two of such experimental processors are presen... |
15 |
Python Reference Manual. Centrum voor Wiskunde en Informatica,
- Rossum
- 1995
(Show Context)
Citation Context ...ures of the processor generator, specially when the processor generator is run on a modern computer workstation. Due to the low performance requirements, the processor generator was written in Python =-=[18]-=-. Python is an interpreted, interactive, object-oriented programming language. It incorporates modules, exceptions, dynamic typing, very high level dynamic data types, and classes. Python combines pow... |
13 | MOVE32INT, a sea of gates realization of a high performance transport triggered architecture', Mi20 Modular Operational Semantic Speci of Transport Triggered Architectures croprocessor and Microprogramming
- Corporaal, Arend
- 1993
(Show Context)
Citation Context ...e Processor In order to evaluate the specific design and implementation tradeoffs, an instance of a transport triggered architecture, called MOVE32INT, was designed at Delft University of Technology. =-=[10]-=- The architecture mainly consists of a transport network, controlled by the network controller, and several functional units. The network contains 4 busses. Each bus contains a data bus, which is capa... |
13 |
Constant geometry algorithm for discrete cosine transform
- Takala, Akopian, et al.
- 2000
(Show Context)
Citation Context ...] has been used. Constant geometry algorithms being regular and modular allow better exploitation of the inherent parallelism. The second benchmark is a 32-point DCT, where DCT algorithm described in =-=[25]-=- is used. The created C-code contains five functions, one for each processing column of the signal flow graph of the algorithm. Each processing column is written totally unrolled, i.e., no iterations ... |
13 |
Evaluation on power reduction applying gated clock approaches
- Palumbo, Pappalardo, et al.
- 2002
(Show Context)
Citation Context ...lexer Q data_out clock tree and its leaf registers. In order to reduce the activity of the clock node of a register bank the clock node is enabled only when the register bank has to sample new input. =-=[27]-=- Without clock gating register banks are implemented by using a feedback loop and a multiplexer. When such registers maintain the same value through multiple cycles, they use power unnecessarily. Fig.... |
10 |
Instruction Set Extensions for Embedded Processors
- Arnold
- 2001
(Show Context)
Citation Context ...ruction word plus the number of data pins on the load/store units whereas the number of power/groundsfu_def.h processor_parameters.h 3. MOVE Framework 33 user input pads is related to the total area. =-=[13]-=- CHDL files C files pre−processing compiling MPG program netlist running info.txt MOVE.VHDL assembler_info.txt synthesis Figure 13. MOVE processor generator design flow The minimum cycle time is the m... |
8 |
de Veciana. Application-specific clustered VLIW datapaths: Early exploration on a parameterized design space
- Lapinskii, Jacome, et al.
- 2002
(Show Context)
Citation Context ...performance tends to degrade when the number of ports is very high. Clustered VLIW architectures have been introduced to lower the requirements for the register file and thus improve scalability. [3] =-=[4]-=- Transport triggered architecture was developed to reduce the complexity of VLIW by placing the register traffic under program control. In other words, the data transports become visible at the archit... |
8 |
One- and two-dimensional constant geometry fast cosine transform algorithms and architectures
- Kwak, You
- 1999
(Show Context)
Citation Context ... cosine transform (DCT) realized with rowcolumn approach, i.e., the entire two-dimensional (2-D) transform is computed with the aid of 1-D transforms. Here the constant geometry algorithm proposed in =-=[24]-=- has been used. Constant geometry algorithms being regular and modular allow better exploitation of the inherent parallelism. The second benchmark is a 32-point DCT, where DCT algorithm described in [... |
4 | Register file port requirements of transport triggered architectures - Hoogerbrugge, Corporaal - 1994 |
4 |
An Analysis of the Wire-Load Model Uncertainty Problem
- Gopalakrishnan, Odabasioglu, et al.
- 2002
(Show Context)
Citation Context ...tive gate length is decreased to 90 nm and below the effect of wiring has to be considered even on designs of gate count under 100 kgates, such as the MOVE processors cores discussed in this chapter. =-=[28]-=- [29] During the synthesis the capacitances and thus the delays associated with final wiring are unknown. Models of interconnect, known as wire-load models [29], attempt to predict the amount of capac... |
2 |
Code Generation and Optimization for Embedded Processors
- Cilio
- 2002
(Show Context)
Citation Context ...ext of ASIP design because it allows to add custom operations to be created with any number of input and output operands, without having to modify the general organization of the instruction decoder. =-=[5]-=- When execution of an operation takes more than one machine cycle, it may be subject to pipelining. The execution stages are local to each FU pipeline and independent from other FU pipelines. In [2], ... |
2 |
The MPG manual
- Smit
- 2000
(Show Context)
Citation Context ...omputer aided design tools that accept VHDL as entry language, e.g., logic synthesis software, can be used to process the design further. Complete design flow using the MPG is illustrated in Fig. 13. =-=[14]-=- Before being able to generate the VHDL description, the MPG needs information on the target processor configuration. This information is divided between two textual files, “processor_parameters.h” an... |
2 | Design and implementation of an advanced instruction fetch unit for the MOVE framework
- Roos
- 1997
(Show Context)
Citation Context ...hat the specification in the in“fu_def.h” actually matches the definitions in the machine description file. The instruction fetch unit template applied by the MOVE processor generator is presented in =-=[15]-=-. It contains support for features such as exceptions, virtual memory, and instruction cache. Even though these functions could be useful in some implementations, organization of the instruction unit ... |
1 |
An application specific processor for a multi-system navigation receiver
- Aardoom, Stravers
- 1992
(Show Context)
Citation Context ...rate, 80 MHz, despite the modest technology used. 2.4.2 Application-Specific Processor for Navigation Receiver An application-specific processor for a multi-system navigation receiver is presented in =-=[12]-=-. Since various computation tasks, such as real-time digital filtering, Fourier transforms, and tracking loop algorithms, has to be performed in the receiver, an ASIP was chosen instead of custom hard... |