| Cvetanovic, Z. D. Bhandarkar, "Characterization of Alpha AXP Performance Using TP and SPEC Workloads", Proc. Int.Symposium on Computer Architecture, April 1994. |
....to be due to a shell sort algorithm that ignores the principles of locality. Overall, the TLB miss rate of other Java programs in this suite is about 2 , which is much higher than 0. 1 reported for SPEC95 benchmarks and desktop workloads (written in C and C ) 25] Cvetanovic and Bhandarkar [14] observe that commercial workloads tend to have high TLB misses, but point out that a number of SPECfp applications do have a similar behavior. Although they report the percentage of time spent in the PAL (Privileged Architecture Library) code that performs other functions besides handling of TLB ....
....taken literally. 6 Instruction cache miss rates for 228 jack and 213 javac where slightly higher: 0.5 and 0.7 , respectively. 7 Because our metrics and memory subsystem are different from those used by Cvetanovic and Bhandarkar, it is not meaningful to compare our data to the data reported in [14]. 8 209 db spends most of the time sorting a very large array pointing to records further pointing to vectors and then arrays. 201 compress 209 db 228 jack 213 javac 202 jess 227 mtrt pBOB 0 5 10 of references that are misses in D TLB Figure 11: D TLB misses D TLB ....
[Article contains additional citation context not shown here]
Z. Cvetanovic and D. Bhandarkar. Characterization of Alpha AXP performance using TP and SPEC workloads. In Proc. of ISCA-21, pages 60--70, Apr. 1994.
....These techniques reduce data cache misses, and are orthogonal to the goal of CGP which tries to reduce I cache misses. CGP may be implemented on top of these cache conscious algorithms. It is only recently that researchers have examined the performance impact of architectural features on DBMSs [1, 12, 25, 10, 19, 9, 11, 14]. Their results show that database applications have large instruction and data footprints and exhibit more unpredictable branch behavior than benchmarks that are commonly used in architectural studies (e.g. SPEC) Database applications have fewer loops and suffer from frequent context switches, ....
Z. Cvetanovic and D. Bhandarkar. Characterization of Alpha Axp Performance using TP and SPEC Workloads. In Proceedings of the 21st Annual International Symposium on Computer Architecture, pages 60--70, April 1994.
....challenging as numerically intensive applications, yet share Department of Electrical and Computer Engineering, and Minnesota Supercomputing Institute, University of Minnesota, 200 Union St. SE, Minneapolis, MN 55455, E mail: lilja ece.umn.edu few of their processing characteristics [2] [3], 4] 5] 6] While numerically intensive applications tend to iteratively apply a fixed number of operations to large sets of regularly structured data, the more general nonnumeric applications tend to apply a range of different operations to each of several widely dispersed data elements. ....
Z. Cvetanovic and D. Bhandarkar, "Characterization of Alpha AXP performance using TP and SPEC workloads," in International Symposium on Computer Architecture, 1994, pp. 60--70.
....and concluded that the latter is often worse. Eickemeyer et al. [4] showed that a significant performance improvement can be obtained for OLTP workloads when a multithreaded processor is used. Finally, other studies that have involved database workloads include the work by Cvetanovic and Bhandarkar [1] on a DEC Alpha AXP system, Torrellas et al. [12] on an SGI multiprocessor, and Rosenblum et al. [9] on a simulated SGI multiprocessor. In general, these studies agree on the relatively worse memory performance of commercial workloads. Overall, however, they do not give us the insight of what the ....
Z. Cvetanovic and D. Bhandarkar. Characterization of Alpha AXP Performance Using TP and SPEC Workloads. In Proceedings of the 21st Annual International Symposium on Computer Architecture, pages 60--70, April 1994.
....3.8.1.1. Maynard, et al. IBM RS 6000 Maynard, et al. used a proprietary software instrumentation tool to examine the performance of various technical and commercial workloads, including TPC A and TPC C, on the IBM RS 6000, which employs a Pow 59 Characteristics Maynard94 [64] Cvetanovic94 [27] Cvetanovic96a [28] Rosenblum94 [87] Perl96 [80] System evaluated RS 6000 DEC 7000 AXP AlphaServer 8200 SGI based UP SMP Alpha PCs, 4 proc. AlphaServer 2100 Processor PowerPC (in order) Alpha 21064 (in order) Alpha 21164 (in order) MIPS like (UP: out of order, MP: ....
....well tuned as those used in the more recent studies. 3.8.1.2. Cvetanovic and Bhandarkar: Alpha Workstation Characterization Cvetanovic and Bhandarkar studied the impact of several architectural characteristics on commercial and technical workload performance for the Alpha 21064 based Alpha AXP [27] and Alpha 21164 based AlphaServer 8200 [28] In particular, they used built in hardware counters to examine the breakdown of user kernel time and privileged architecture library (PAL) time, a decomposition of memory latency, L1 and L2 cache miss rates, percentage of dual issued instructions, and ....
[Article contains additional citation context not shown here]
Z. Cvetanovic and D. Bhandarkar. "Characterization of Alpha AXP performance using TP and SPEC workloads," Proc. of the 21st ISCA, pages 60 - 70, April 1994.
....accesses for that workload. Using realistic times for early Alpha computer systems, the SQL i stream and d stream hits beyond 96KB each total about 1.8 CPI. Combined with a 100 hits execution rate of about 1 CPI, these total about 4.6 CPI. This is roughly consistent with the measurements in [CB94] of 4.3 CPI, and 20 25 operating system time, and 39 36 i stream plus d stream stall time, on the TP1 database workload on an older AlphaSystem 7000 computer. 3.5 Discussion We conclude from these experiments that processor pin bandwidth is a bottleneck for some applications. SQL Server ....
Zarka Cvetanovic and Dileep Bhandarkar. Characterization of Alpha AXP performance using TP and SPEC workloads. In Proc. of the 21st Annual Symposium on Computer Architecture, pages 60--70, April 1994.
....surpassed technical workloads to become the largest and fastest growing market segment for high performance servers. A number of recent studies have underscored the radically different behavior of commercial workloads such as on line transaction processing (OLTP) relative to technical workloads [4,7, 8,21,28,34,36]. First, commercial workloads often lead to inefficient executions dominated by a large memory stall component. This behavior arises from large instruction and data footprints and high communication miss rates which are characteristic for such workloads [4] Second, multiple instruction issue and ....
....has already been referenced in earlier sections. We further discuss some of the previous work pertinent to database workloads and CMP in this section. There have been a large number of recent studies of database applications (both OLTP and DSS) due to the increasing importance of these workloads [4,7,8,12,21,27,28,34,35,36,42,46]. To the best of our knowledge, this is the first paper that provides a detailed evaluation of database workloads in the context of chip multiprocessing. Ranganathan et al. 35] study user level traces of database workloads in the context of wide issue out of order processors, and show that the ....
Z. Cvetanovic and D. Bhandarkar. Characterization of Alpha AXP Performance using TP and SPEC Workloads. In 21st Annual International Symposium on Computer Architecture, pages 60--70, April 1994.
....database systems will assume an even more crucial role in computer systems of the future from the desktop to highly scalable multiprocessors or clusters. Despite their increasing prominence, however, database management systems (DBMS) have been the subject of only limited architectural study [3,6,12,16,22]. Not surprisingly, these studies have shown that database systems can exhibit strikingly high cache miss rates. In the past, these miss rates were less significant, because I O latency was the limiting factor for database performance. However, with the latest generation of commercial database ....
.... results later) Overall, this data confirms the aggregate cache behavior of transaction processing workloads found by others; namely, that they suffer from higher miss rates than scientific codes (at least as exhibited by SPEC and SPLASH benchmarks) with instruction misses a particular problem [3,6,12,16]. For example, columns 2 and 3 of Table 3 show that on chip caches are relatively ineffective both at current cache sizes (32KB) and at larger sizes (128KB) expected in next generation processors. In addition, instruction cache behavior is worse than data cache behavior, having miss rates of 23.3 ....
[Article contains additional citation context not shown here]
Z. Cvetanovic and D. Bhandarkar. Characterization of Alpha AXP performance using TP and SPEC workloads. In 21st Ann. Int'l Symp. on Computer Arch., p. 60--70, April 1994.
....Over the last few years the performance of processors has increased at a much higher pace than the decrease in memory latency. The common method of reducing this gap is to add a cache. However, many programs still spend a significant amount of execution time stalling on data cache misses [2] [5]. The effectiveness of a cache depends on the amount of reference locality in the executed software. Therefore, by analyzing the amount and characteristics of memory reference locality, we will not only be able to explain the reason for the large amount of cache misses, but also potentially find ....
....the experimental methodology, while Section 5 presents our experimental results. Discussion and related work is presented in Section 6, and the paper is concluded in Section 7. 3 2 Background A significant part of the execution time in a program is spent on data stalls, in some cases over 70 [5]. This problem is present in both uni and multiprocessor systems, but the amount of data stall time tends to increase with the number of processors in a multiprocessor system [2] The data stall time is related to the miss rate in the data cache and a simple solution to reduce the miss rate is to ....
Z. Cvetanovic and D. Bhandarkar, "Characterization of Alpha AXP performance using TP and SPEC workloads," In Proceedings the 21st International Symposium on Computer Architecture, pp. 60-70, 1994.
....codes taken from the SPECfp92 suite. 1 Table 2.2 lists the performance data from the study and Figure 2.5 plots the speedup of the C90 over the 21164. On the vectorized codes, the C90 was up to four times faster even though the SPECfp92 benchmarks fit into the caches of the Alpha 21164 [GHPS93, CB94b] There are many factors that contribute to the Alpha s superior scalar performance. The 21164 has both a higher clock rate and shorter latencies in clock cycles through almost all functional units 2 . The 21164 can issue up to four instructions in one cycle whereas the C90 cannot issue more ....
Z. Cvetanovic and D. Bhandarkar. Characterization of Alpha AXP performance using TP and SPEC workloads. In Proc. 17th ISCA, pages 60--70, April 1994.
....exploited instruction level parallelism lags far behind the hardware capacity. Cvetanovic and Bhandarkar found that programs running on a 2 way superscalar Digital Alpha 21064 could dual issue only 20 50 of their instructions, which means that in 67 90 of cycles, only one instruction executed [3]. Similarly, Diep et al. found that on a 4 way superscalar Power PC 620, four integer SPEC benchmarks completed an average of 1.05 1.25 instructions per cycle and three floating point SPEC benchmarks completed an average of 1.0 1.9 instructions per cycle [4] This large disparity between a ....
Zarka Cvetanovic and Dileep Bhandarkar. Characterization of the Alpha AXP Performance Using TP and SPEC Workloads. In Proceedings of the 21st Annual International Symposium on Computer Architecture, pages 60--70, April 1994.
....of process scheduling and the I O capability of the machine. Maynard et al. [5] contrasted the cache performance of technical and commercial workloads and concluded that the latter is often worse. Other studies that have involved database workloads include the work by Cvetanovic and Bhandarkar [2] on a DEC Alpha AXP system, Torrellas et al. [13] on an SGI multiprocessor, and Rosenblum et al. [6] on a simulated SGI multiprocessor. In general, these studies concur on the relatively worse memory performance of commercial workloads. In this paper, we analyze the memory system performance of the ....
Z. Cvetanovic and D. Bhandarkar. Characterization of Alpha AXP Performance Using TP and SPEC Workloads. In Proceedings of the 21st Annual International Symposium on Computer Architecture, pages 60--70, April 1994.
....to be packaged and that the packaging of large single chip die with high numbers of pins is a significant portion of the total cost of the packaged chip. In volume production the cost of placing a chip on an MCM D substrate is expected to be cheaper than the cost of individually packaged chips [3][4]. 3 The Hydra Multiprocessor Design 3.1 Architecture Hydra is composed of four high performance off the shelf microprocessors with on chip Figure 2. The relative cost and performance of recent microprocessors. B B B B B B B B B 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 2 4 6 8 10 12 14 16 Relative ....
Z. Cvetanovic and D. Bhandarkar, "Characterization of Alpha AXP performance using TP and SPEC workloads", in 21st Annual Int. Symp. Computer Architecture, Chicago, pp. 60--69, 1994.
....and concluded that the latter is often worse. Eickemeyer et al. [4] showed that a significant performance improvement can be obtained for OLTP workloads when a multithreaded processor is used. Finally, other studies that have involved database workloads include the work by Cvetanovic and Bhandarkar [1] on a DEC Alpha AXP system, Torrellas et al. [10] on an SGI multiprocessor, and Rosenblum et al. [7] on a simulated SGI multiprocessor. In general, these studies agree on the relatively worse memory performance of commercial workloads. However, they do not give us the insight of what the actual ....
Z. Cvetanovic and D. Bhandarkar. Characterization of Alpha AXP Performance Using TP and SPEC Workloads. In Proceedings of the 21st Annual International Symposium on Computer Architecture, pages 60-- 70, April 1994.
.... kinds of low level studies have been used for guiding machine implementations for some time, and this approach became popular in the early 80s as part of the RISC philosophy of microprocessor design [HP90] For example, the detailed knowledge of the relative frequency of execution of instructions [CB94b, ML93] has led to the inclusion of new compound operations (like the fused multiply add instruction) in the instruction set so that the common case Instruction Level Characterization 37 is optimized and executes in fewer cycles. While this detailed approach has been routinely applied by the ....
....memory category, we have included vector loads and stores as well as the indirect memory accessing functions, gather and scatter. With respect to loads and stores, we see that overall there are 52 more loads than stores (using a weighted average) It has already been reported for scalar programs [CB94b] that it is more common to have load instructions than store instructions. Also in [VH92] a similar relation was found for code compiled on a Cray Y MP vector multiprocessor. It is interesting to note that for all programs the percentage of store operations remains between a 10 and a 20 . 42 ....
Zarka Cvetanovic and Dileep Bhandarkar. Characterization of ALPHA AXP performance using TP and SPEC workloads. International Symposium on Computer Architecture, pages 60--70, 1994.
No context found.
Cvetanovic, Z. D. Bhandarkar, "Characterization of Alpha AXP Performance Using TP and SPEC Workloads", Proc. Int.Symposium on Computer Architecture, April 1994.
....International Symposium on High Performance Computer Architecture, San Jose, CA, February 1996. 4. CPI Figure 1 compares the cycles per instruction (CPI) for the AlphaServer 8200 and the DEC 7000 systems. The DEC 7000 measurements were done in 1994 with code scheduling optimized for the 21064 [3]. The AlphaServer 8200 results were obtained with code scheduled for the 21164 and also benefited from other generic compiler enhancements not included in the DEC 7000 code. The Alpha 21164 achieves consistently lower CPI than the 21064 in spite of running at a 50 faster clock rate. The quad ....
....(Doduc, Wave5, Ora) Spice has very few (6 ) floating point instructions. 13. CONCLUSIONS Cache and memory system design, as well as compiler techniques that can manage the memory access patterns were recognized as major performance factors in the first implementation of the Alpha architecture [3]. The new implementation of the Alpha architecture addressed these issues and provided 2 to 3 times the performance of the previous generation. Since the design addressed stalls caused by cache misses, quad issuing provided additional benefit in the floating point SPEC92 benchmarks. Quad issuing ....
Z. Cvetanovic and D. Bhandarkar, "Characterization of Alpha AXP Performance Using TP and SPEC Workloads", The 21st Annual International Symposium on Computer Architecture, April 1994, pp. 60 - 70.
No context found.
Z. Cvetanovic and D. Bhandark r. Characterization of Alpha-Axp Performance using TP and SPEC Wo rk oads. In Proceedings of the 21st International Symposium on Computer Architecture, pages 60--70, April 1994.
No context found.
Zarka Cvetanovic and Dileep Bhandarkar. Characterization of Alpha AXP Performance Using TP and SPEC Workloads. In Proceedings of the 21st International Symposium on Computer Architecture, pages 60--70, 1994.
No context found.
Zarka Cvetanovic and Dileep Bhandarkar. Characterization of Alpha AXP Performance Using TP and SPEC Workloads. In Proceedings of the 21st International Symposium on Computer Architecture, pages 60--70, 1994.
No context found.
Z. Cvetanovic and D. Bhandarkar. Characterization of Alpha AXP performance using TP and SPEC workloads. In 21st Annual International Symposium on Computer Architecture, pages 60--70, April 1994.
No context found.
Cvetanovic, Z. and Bhandarkar, D. Characterization of Alpha AXP performance using TP and SPEC Workloads. In Proceedings of the 21st Annual International Symposium on Computer Architecture, Chicago, Ill., IEEE, 1994.
No context found.
Z. Cvetanovic and D. Bhandarkar, "Characterization of Alpha AXP Performance Using TP and SPEC Workloads, " The 21st Annual International Symposium on Computer Architecture (April 1994): 60--70.
No context found.
Zarka Cvetanovic and Dileep Bhandarkar. Characterization of the Alpha AXP Performance Using TP and SPEC Workloads. In Proceedings of the 21st Annual International Symposium on Computer Architecture, pages 60--70, April 1994.
No context found.
Z. Cvetanovic and D. Bhandarkar, "Characterization of Alpha AXP Performance using TP and SPEC Workloads", Proceedings of the 22nd International Symposium on Computer Architecture, 1994, pp. 6070.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC