| Digital Equipment Corporation. DECChip 21064-AA RISC Microprocessor Preliminary Data Sheet, 1992. |
....on the right timeline. The processor starts with all four contexts being interleaved. As we can see, this interleaving is enough The instruction set architecture best suited for the interleaved processor has no compiler filled branch or load delay slots. This is the trend for modern architectures [6]. to separate the dependent instructions from context B, completely hiding the pipeline dependency. The lower switch cost of the interleaved scheme is also illustrated in Figure 3; the switch cost associated with a cache miss is reduced from the seven cycles of the blocked scheme to two cycles ....
....more aggressive. As Figure 5 shows, the integer pipeline modeled is seven stages deep, one less than the R4000. The R4000 has a separate Tag Check stage between DF2 and WB, which has been folded into the DF2 stage for our processor. The floating point pipeline is based on the DEC Alpha 21064 [6], and is nine stages deep. Both pipelines forward results whenever possible to reduce operation latency. The arrows above the pipelines in Figure 5 denote possible result forwarding paths. IF1 IF2 RF EX DF1 DF2 WB Integer Pipeline IF1 IF2 RF EX1 EX2 EX3 EX4 EX5 WB Floating Point Pipeline ....
[Article contains additional citation context not shown here]
Digital Equipment Corporation. DECChip 21064-AA RISC Microprocessor Preliminary Data Sheet, 1992.
....or general purpose computing has not been so clear. Initial attempts have been made at constructing elementary computers from optical computational elements [14, 35, 36, 44, 45, 66] but these efforts cannot compete with the power of current electronic computing elements such as the DEC Alpha chip [21]. In contrast to the primitive state of optical computational elements, the state of optical communications is much more mature. The use of optical communications has become quite commonplace and practical in large scale transmissions systems (e.g. long haul and intracity voice and data ....
Digital Equipment Corporation, Maynard, Massachusetts. DECChip 21064-- AA RISC Microprocessor Preliminary Data Sheet, April 1992.
....observations. First, optical fiber communications technology continues to replace electrical communications technology because of favorable qualities such as high bandwidth and immunity to noise [4] Second, electronic computing capabilities continue to increase immensely, e.g. the DEC Alpha chip [2] and Intel s Project 2000 . These two observations have led us to consider how optical fiber communications can be incorporated into general purpose parallel computers. Prior work has considered the optical fiber transmission system to be a black box. The optical fiber transmission system accepts ....
Digital Equipment Corp., Maynard, Massachusetts. DECChip 21064-AA RISC Microprocessor Preliminary Data Sheet, Apr. 1992.
....interrupt masking has been efficient, requiring only a few cycles. Unfortunately, the time required to modify the hardware interrupt level has not scaled with processor speed improvements. In pipelined processors, writing the processor interrupt mask typically requires a pipeline flush [13, 14]. In superscalar systems, interrupt level manipulations require scalar instruction issue, further limiting performance [15] Many recent RISC CPU implementations provide only a part of the interrupt mask logic on the processor package, with the remainder of interrupt masking implemented by ....
.... systems, interrupt level manipulations require scalar instruction issue, further limiting performance [15] Many recent RISC CPU implementations provide only a part of the interrupt mask logic on the processor package, with the remainder of interrupt masking implemented by off processor hardware [13, 14]. For these systems, interrupt masking is a three step process: 1) disable processor interrupts, 2) write the off chip mask register(s) and 3) finally reenable processor interrupts. The first stage requires a pipeline flush, and the second stage requires a potentially expensive off chip access. ....
Digital. DECChip 21064-AA RISC Microprocessor Preliminary Data Sheet. Digital Equipment Corporation, Manyard, MA, 1992.
....The primary cache is actually on the processor chip, and is typically a fairly small (4 16kbytes) direct mapped write through cache. The secondary cache is a much larger external cache and is frequently copy back. Recent CPUs, such as the MIPS R4000 [MIPS91] and the DEC Alpha AXP 21064 [DEC92c], provide on chip support for the secondary cache, improving the possibilities for pipelining tag checking. Studies have reported that the primary cache has a miss ratio of 10 20 , with the secondary cache missing 1 2 of the time. 2.5 Implementation Technology Much of the work described in ....
Digital Equipment Corporation. DECChip 21064-AA RISC Microprocessor Preliminary Data Sheet, April 1992. (p 16)
....electrons to travel the same distance. So the advances in VLSI technology actually aggravated clock skew problems. In order to minimize these effects, increasingly larger portions of the synchronous VLSI chips are devoted to clock distribution. It was reported that DEC s new RISC chip Alpha 21064 [19, 21] uses one third of its chip area for clock distribution. Furthermore, as more and more transistors were packed into a single chip, designers began to face real power dissipation problems. In synchronous chips with global clocking, even the inactive parts of the chip, including the clocks to those ....
Digital Equipment Corporation, Maynard, MA. DEC Chip 21064-AA RISC Microprocessor Preliminary Data Sheet, 1992.
....calculation is straightforward. Although the performance monitoring hardware on the KSR is rather unique, something comparable may be required for other cache coherent architectures. On simpler architectures, such as a messagepassing system, the performance monitoring capabilities of the DEC Alpha [12] should be sufficient to gather the same information. pp is currently installed for use by the user community at the Cornell Theory Center on their 128node KSR1. Example output from the current version of pp is shown in Figure 1. Lost cycles for each category are presented in seconds, aggregated ....
Digital Equipment Corporation. DECChip 21064AA RISC microprocessor preliminary data sheet. Digital Equipment Corporation, Maynard, MA, 1992.
....the change is always less than a factor of two, even though the load latency varies over two orders of magnitude. Interestingly, ATG tends to increase with higher load latency in 1 The longer latencies are representative of contemporary RISC processors such as the DEC 21064 Alpha chip [33], which has single cycle integer operations, 6 cycle latency for floating point instructions, and 3 cycle latency for memory loads that hit in the on chip cache. Table 6.1 Average ATG Parallelism Measurements INT=1 FLOAT=1 INT=1 FLOAT=6 load latency load latency benchmark 1 10 100 3 30 300 ADM ....
.... ( 1) then any instruction level concurrency that is exploited increases the utilization beyond 1 since proc = To achieve higher clock rates, modern microprocessors use increasing degrees of pipelining, resulting in higher (3 to 6 cycle floating point latencies are not uncommon) [75, 54, 33]) Memory accesses also incur long latencies, especially in multiprocessors where cache miss rates tend to be higher due to sharing and main memory accesses must traverse multistage networks. To maintain good utilization , the exploited concurrency proc must equal for the SI processor, and ....
[Article contains additional citation context not shown here]
Digital Equipment Corporation. DECChip 21064-AA RISC Microprocessor Preliminary Data Sheet, April 1992.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC