25 citations found. Retrieving documents...
T. Asprey, G.S. Averill, E. Delano, R. Mason, B. Weiner, and J. Yetter. Performance features of the PA7100 microprocessor. IEEE Micro, 13:22--35, June 1993.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Image Processing on High Performance RISC Systems - Baglietto, Maresca.. (1996)   (12 citations)  (Correct)

....Business Machines RISC System 6000 model 43P [17] In the rest of the paper we will refer to these systems by means of the name of their manufacturers, namely SUN, HP, DEC, SGI and IBM. The reference systems are based on five different CPUs, namely SuperSPARC [37] 39] HPPA 7150 [15][2], DECchip 21064 (Alpha) 6] 26] MIPS R4400 [12] and PowerPC 604 [36] which cover the two directions of instruction level parallelism, i.e. pipelining and scalarity, and feature different memory hierarchy organization. Table 1 summarizes the characteristics of the reference systems. The System ....

....LU: ILU 25 DTO: float Operating conditions: image size: 512512 mask size:55 compilers:c = native compiler, g = GNU compiler. Table 5 Performance of CONV. BAGLIETTO et al. IMAGE PROCESSING ON HIGH PERFORMANCE RISC SYSTEMS 11 1. t0 = in[7] in[0] 2. t1 = in[6] in[1] 3. t2 = in[5] in[2]; 4. t3 = in[4] in[3] 5. t4 = in[3] in[4] 6. t5 = in[2] in[5] 7. t6 = in[1] in[6] 8. t7 = in[0] in[7] 9. t10 = t3 t0; 10. t11 = t2 t1; 11. t12 = t1 t2; 12. t13 = t0 t3; 13. t14 = t4 t15; 14. t15 = t6 t5) COS 4; 15. t16 = t6 t5) COS 4; 16. t17 = t7 ....

[Article contains additional citation context not shown here]

Asprey T., Averill G. S., DeLano E., Mason R., Weiner B. and Yetter J., Performance Features of the PA7100 Microprocessor, IEEE Micro, pp. 22-35, June 1993.


An Analysis Of Division Algorithms And Implementations - Oberman, Flynn (1995)   (Correct)

....division latency is to clock the divider at a faster frequency than the system clock. For example, in the HP PA7100, the very low cycle time of the radix 4 divider compared with the system clock allows it to retire 4 bits of quotient every machine cycle, effectively becoming a radix 16 divider [16]. The only additional hardware cost in this implementation is a few gates to generate the 2X clock. 10 CLA TABLE MUX Pj D MUX CSA Quotient CONVERTER CLA TABLE Pj D MUX CSA Quotient CONVERTER Figure 5: Higher radix using hardware replication 11 QS DMF PRF QS DMF PRF QS DMF PRF ....

T. Asprey et al. Performance features of the PA7100 microprocessor. IEEE Micro, 13(3):22--35, June 1993.


Complexity/Performance Tradeoffs with Non-Blocking Loads - Farkas, Jouppi (1994)   (6 citations)  (Correct)

....processor to continue to access the data cache during the processing of a nonblocking load miss, a lockup free cache [7] is required. Non blocking loads have only recently appeared in microprocessors [3, 9] and often these implementations have been fairly restrictive. For example, the HP PA7100 [1] allows a maximum of only one miss outstanding in the cache (i.e. hit under miss ) The only recent appearance of mostly restrictive implementations is in part due to the more significant hardware complexity required to implement non blocking loads. Yet studies of non blocking loads have often ....

Tom Asprey, et. al. Performance Features of the PA7100 Microprocessor. IEEE Micro 13(3):22-35, June, 1993.


MOB Forms: A Class of Multilevel Block Algorithms for Dense.. - Juan Navarro Toni (1994)   (11 citations)  (Correct)

.... operations) and CPF (overh) corresponding to the rest of the code in the inner loop (containing the floating point operations) Several of today s high performance processors can initiate in each cycle one floating point multiplication, one floating point addition, and one integer instruction [Aspr93], OeGr90] In these cases, the minimum CPF (inner) achievable is 0.5. Other processors have a minimum of 1 (Alpha) or 0.25 (Power2) Although in general it is not possible to achieve these values, with blocking at the register level it is possible to get close to them as we show in Section 3.1. ....

T. Asprey et Al., Performance Features of the PA7100 Microprocessor. IEEE Micro, June 1993, pp 22--35


Improving Balanced Scheduling with Compiler Optimizations that.. - Lo, Eggers (1995)   (16 citations)  (Correct)

....misses. Blocking processors simplify the design of the code scheduler, by enabling all load instructions to be handled identically, whether they hit or miss in the cache. The traditional blocking processor model has recently been challenged by processors that do not block on loads [ER94] McL93] Asp93] DA92] Gwe94a] Gwe94b] Mip94] CS95] Rather than stalling until a cache miss is satisfied, they use lockup free caches [Kro81] FJ94] to continue executing instructions to hide the latency of outstanding memory requests. On these non blocking architectures, instruction latency is exposed to ....

T. Asprey. Performance features of the PA7100 microprocessor. IEEE Micro, 13(3):22--35, June 1993.


Survey of High Performance RISC Microprocessors - Murphy, MacDonald (1994)   (Correct)

....instruction caches, and no secondary cache. Each node has 16 Mbyte or 64 Mbyte of memory. 3 Hewlett Packard s Precision Architecture RISC The first implementation of the PA RISC architecture specification was delivered by HewlettPackard (HP) in 1986. The most recent implementation, the PA7100 [ Asprey et al. 1993 ] is the seventh implementation by HP, and was first introduced in 1992. 3.1 Microprocessor Operation The pipeline in the PA7100 is clocked at up to 125MHz. In order to allow the use of industry standard SRAMs, writes to the cache chips may take up to two cycles, while reads must complete in a ....

....memory hierarchy. For a thorough discussion of the interplay between architectural features and their effects on performance, the reader is referred to [ White et al. 1993a; White et al. 1993b ] Pipeline interlock penalties for some of the processors surveyed are shown in Table 5, taken from [ Asprey et al. 1993 ] pipeline frequencies have been updated) The following points about each of the processors summarise the advantages and disadvantages of each; ffl The DEC Alpha s major advantage over the other processors covered is its high clock rate. This is due to the relatively long pipeline it employs. ....

Tom Asprey, Gregory S. Averill, Eric DeLano, Russ Mason, Bill Weiner, and Jeff Yetter. Performance Features of the PA7100 Microprocessor. IEEE Micro, 13(3):22-- 35, June 1993.


The Benefits of Clustering in Shared Address Space.. - Erlichson, Nayfeh, .. (1994)   (10 citations)  (Correct)

....they are to clustering. In addition, we also approximate the specific implications for clustering at the first level cache in the memory hierarchy. Clustering at the first level cache is possible in processors which have external first level caches such as those produced by Hewlett Packard [1]. However, other forms of clustering are possible, such as a shared second level or a shared memory bus. The specific results for these clustering configurations will depend on latencies, associativities, and contention. In the future we will study realistic clustering configurations at the ....

Tom Asprey, et. al. Performance Features of the PA7100 Microprocessor. IEEE Micro, June 1993, pp. 22--35.


Real-Time Computing: Implications for General Microprocessors - Weems, Dropsho   (Correct)

....integer units can be used in parallel with the floating point unit to fetch operands for the floating point calculations, resulting in vector processing of floating point operations. Graphics support. None. Communication support. Basic multiprocessor protocol support and interrupts. HP PA7100[9, 10] Noteworthy instructions. The integer multiply and divide execution times are data dependent, since only minimal hardware support is provided. Lee states that most of these operations as they appear in applications are variables paired with constants; in this case the execution time can be ....

Asprey, T., et al., Performance Features of the PA7100 Microprocessor. IEEE Micro, 1993. 13(June): p. 22-35.


Compiling for the Multiscalar Architecture - Vijaykumar (1998)   (24 citations)  (Correct)

....an implicit form of parallelism present in sequential programs identified by a combination of the compiler and the hardware, whereas explicit parallelism of parallel programs is identified and specified by the programmer. To exploit ILP, modern microarchitectures called superscalar architectures [11] [21] 29] 44] 46] 47] 53] 58] 77] 97] 109] 113] identify independent instructions and execute multiple independent instructions simultaneously. Modern microprocessors extract ILP by establishing a window of instructions in the dynamic instruction stream. The first instruction in the ....

T. Asprey et al. Performance features of the pa7100 microprocessor. IEEE Micro, pages 22--35, June 1993.


An Area/Performance Comparison of Subtractive and.. - Soderquist, Leeser (1995)   (3 citations)  (Correct)

....pipelined, but their latencies are matched and only two cycles long multiplication register file addition Operation Latency Throughput addition 2 1 multiplication 2 1 Figure 4: Independent add multiply configuration each. The particular chip which this configuration is based on, the HP PA7200 [1, 7], also has a very short cycle time. Other designs with similar topologies include the Sun UltraSPARC, the Mips 10000, Intel P6, and DEC 21164. Latency Implementation Divide Square Root 8 bit seed Goldschmidt 9 13 16 bit seed Goldschmidt 7 10 radix 4 SRT 15 15 radix 16 SRT 8 8 Table 4: ....

Tom Asprey et al. Performance features of the PA7100 microprocessor. IEEE Micro, 13(3):22--35, June 1993.


Real-Time RISC Processing - Weems, Dropsho (1995)   (Correct)

....versions of every processor in the PowerPC family are planned. Currently the only low power processor in the family is the 603, which is a much simpler design than either the 604 or the 620 a fact that may make it attractive as a first target for real time enhancement. HP PA71C. The PA71C[12] has a two instruction issue pipeline with a 100 MHz frequency and achieves a SPECint92 of 109 and a SPECfp92 of 168. The interesting features of the PA71C are the capabilities to manually manipulate cache and TLB entries, the low latency floating point division and square root (8 and 15 cycles ....

Asprey, T., et al., Performance Features of the PA7100 Microprocessor. IEEE Micro, 1993. 13(June): p. 22-35.


An Empirical and Analytic Study of Stack vs. Heap Cost for.. - Appel, Shao (1994)   (Correct)

....with a non frame allocation (see Figure 3) the average net cost per frame is only 0.687 instructions. Cache control hint: On the HP PA7100, a store instruction can have a cache control hint specifying that the block will be overwritten before being read; this avoids the read if the write misses [9]. But these machines have very large primary caches anyway, so locality can be handled by generational collection. Smart write buffer: Instead of sub block placement (which complicates the cache) one might add a feature to the write buffer: write misses normally bypass the cache, but if the write ....

Tom Asprey, Gregory S. Averill, Eric DeLano, Russ Mason, Bill Weiner, and Jeff Yetter. Performance features of the PA7100 microprocessor. IEEE Micro, 13(3), June 1993.


VLSI Datapath Choices: Cell-Based Versus Full-Custom - Chang (1998)   (4 citations)  (Correct)

....to the continued scaling of VLSI technology have not yet been reached. Several vendors are predicting the realization of 100 Million and 1 Billion transistor logic chips in the near future. How best to employ these transistors 1 The data for this graph comes from a wide range of sources [2] [4] [8] 11] 13] 16] 18] 19] 29] 30] 33] 32] 39] 41] 42] 45] 49] 46] 47] 52] 51] 55] 60] 73] 74] 76] 75] Processor Trends 1979 1997 0.00E 00 1.00E 09 2.00E 09 3.00E 09 4.00E 09 5.00E 09 6.00E 09 7.00E 09 8.00E 09 9.00E 09 1.00E 10 MOT 68000 Hitachi Matsushita i486DX2 MIPS ....

Asprey, T., Averill, G. S., DeLano, E., Mason, R., Weiner, B., and Yetter, J. Performance Features of the PA7100 Microprocessor. IEEE Micro (June 1993), 22--35.


Memory Organization and Management for Linear.. - Navarro, Lang.. (1994)   (Correct)

....The PA Risc 7100 processor can also issue one integer and one floating point instruction per cycle. The floating point instruction can perform two independent operations per cycle, one multiplication and one addition. Table 1. 3 shows the latencies between the producer and consumer instructions [AsAl93]. We work with a configuration with a clock frequency of 99 MHz and an off chip directmapped data cache of 32 Kword and a line size of 4 words. The TLB has 120 entries, is fully associative, and the page size is 0.5 Kword. Chapter 2 1 level blocking at the cache level In this chapter we ....

T. Asprey et Al., Performance Features of the PA7100 Microprocessor. IEEE Micro, June 1993, pp 22--35.


Cache Performance of Fast-Allocating Programs - Gonçalves, Appel (1995)   (4 citations)  (Correct)

....CPU. For prefetch, 4 concurrent L2 cache accesses. 22] SuperSPARC 1993 20K I (L1) 16K D (L1) 4 8 4 5 way 4 way yes no WV On chip first level cache. LRU replacement. 17] HPPA RISC 1992 4K 1M I 4K 4M D 32 d m yes no WV On chip support to dual off chip caches. Cache size is system dependent. [18] I = instruction cache; D = data cache; d m = direct mapped cache; L1 L2 = first second level cache; unknown. WV = Explicit alloc not needed with write validate policy or with 1 word cache line. Table 2: General information about the benchmark programs. Program Lines Description Barnes Hut ....

Tom Asprey et el. Performance features of the PA7100 microprocessor. IEEE Micro, 13(3):22--35, June 1993.


U-cache: A Cost-effective Solution to the Synonym Problem - Kim, Min, Jeon, Ahn.. (1995)   (Correct)

....with only a few entries, performs almost as well as (in some cases outperforms) a fully configured hardware based solution when more than 95 of the pages are aligned. 1 Introduction Recently, virtual caches are becoming increasingly important due to the emergence of high speed processors[1, 2, 3]. In virtual caches, cache access and address translation are performed in parallel, thus reducing cache access time. The physical caches, in contrast, require that address translation be performed before accessing the cache. This, in many cases, slows down the cache access. This research was ....

T. Asprey and et al. Performance features of the PA7100 microprocessor. Micro, 13(3):22--35, June 1993.


Latency Tolerant Architectures - Bennett (1998)   (2 citations)  (Correct)

No context found.

T. Asprey, G.S. Averill, E. Delano, R. Mason, B. Weiner, and J. Yetter. Performance features of the PA7100 microprocessor. IEEE Micro, 13:22--35, June 1993.


Reducing The Impact Of Register Pressure On Software Pipelined Loops - Llosa (1996)   (8 citations)  (Correct)

No context found.

T. Asprey, G.S. Averill, E. Delano, R. Mason, B. Weiner, and J. Yetter. Performance features of the PA7100 microprocessor. IEEE Micro, 13(3):22--35, June 1993.


Nearest Neighbor Classification on a High Performance.. - Juan Navarro Jos'e (1995)   (Correct)

No context found.

T. Asprey et Al., Performance Features of the PA7100 Microprocessor. IEEE Micro, June 1993, pp 22--35.


A Mean Value Analysis Multiprocessor Model Incorporating.. - Albonesi, Koren (1996)   (4 citations)  (Correct)

No context found.

T. Asprey, et al, Performance Features of the PA7100 Microprocessor, IEEE Micro,


A New Page Table for 64-bit Address Spaces - Talluri, Hill, Khalidi. (1995)   (20 citations)  (Correct)

No context found.

Tom Asprey, Gregory S. Averill, Eric DeLano, Russ Mason, Bill Weiner, and Jeff Yetter. Performance Features of the PA7100 Microprocessor. IEEE Micro, 13(3):22--35, June 1993.


Advanced Vector Architectures - Espasa (1997)   (Correct)

No context found.

Tom Asprey, Gregory S. Averill, Eric DeLano, Russ Mason, Bill Weiner, and Jeff Yetter. Performance Features of the PA7100 Microprocessor. IEEE Micro, pages 22--35, June 1993.


Area And Performance Tradeoffs In Floating-Point Divide And .. - Soderquist, Leeser (1994)   (6 citations)  (Correct)

No context found.

Tom Asprey et al. Performance features of the PA7100 microprocessor. IEEE Micro, 13(3):22--35, June 1993.


Compiling Standard ML For Efficient Execution On Modern Machines - Shao (1994)   (14 citations)  (Correct)

No context found.

Tom Asprey, Gregory S. Averill, Eric DeLano, Russ Mason, Bill Weiner, and Jeff Yetter. Performance Features of the PA7100 Microprocessor. IEEE Micro, 13(3):22--35, June 1993.


Effective Utilization of the Reorder Buffer for Short-Lived.. - Lozano, Guang, Gao (1994)   (Correct)

No context found.

Tom Asprey, Gregory S. Averill, Eric Delan, Russ Mason, Bill Weiner, and Jeff Yetter. Performance Features of the PA7100 Microprocessor. IEEE Micro, 13(3):22--35, June 1993.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC