| K. Dixit. "New CPU benchmark suites from SPEC." COMPCON Spring 1992. |
....applications use. Therefore, we use memory traces from several different applications. We selected application traces based on the following criteria: ffl scientific nature ffl programming language ffl locality ffl size The applications are taken from well known benchmark suites (SPEC [15, 16] and SPLASH [31] They are written in different languages (C and Fortran) They exhibit a range of localities. And they each make enough memory references to provide meaningful data, but not so many as to take an inordinate amount of time to simulate. We use the same trace files for both the ....
Kaivalya M. Dixit. New CPU benchmark suites from SPEC. In Thirty-Seventh IEEE Computer Society International Conference, pages 305--310, Spring 1992.
.... of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 steffan cs.cmu.edu May 7, 1998 Abstract The encorporation of a custom computing machine (CCM) into the pipeline of a superscalar processor has been shown to allow significant gains in performance for generalpurpose codes [1,2,3]. This paper describes the architectural features necessary for such an in pipeline CCM to achieve maximum potential performance gains when all mapping is done automatically by the compiler. We find that a CCM with two inputs and two outputs that computes a result in one cycle captures the ....
....the ability to use four unique configurations within a small number of cycles can be very beneficial. 1 Introduction Previous studies have shown that the encorporation of a custom computing machine (CCM) into the pipeline of a superscalar processor can result in significant performance gains [1,2,3]. However, these studies have assumed an architecture for the CCM and then measured the performance of mapping to it whenever beneficial. This study takes a different approach: while making aggressive assumptions about the capabilities of a CCM, we measure the performance benefits of CCMs of ....
[Article contains additional citation context not shown here]
K. M. Dixit. New CPU Benchmark Suites from SPEC. In COMPCON, Spring 1992.
....by finding better solutions. This can only be done when researchers agree on common experimental setups. This type of positive competition is an essential motivation for improving scientific work: benchmarking inspires progress. Examples include the well know computer benchmarks (e.g. SPECmarks [10]) 55] or the high level synthesis benchmarks [11] The benchmarks help to guide research work and they motivate the search for new, better solutions. Their basic disadvantage is the static nature of the benchmarks. They represent a significant research challenge when they are first introduced. ....
K.M. Dixit. New CPU benchmark suites from SPEC. In COMPCON `92, pages 305--310, 1992.
....SPLASH2 is a set of parallel applications for shared memory architectures written in Stanford University and using PARMACS. It covers a large field of currently encountered scientific applications. For multiprogramming purposes, we used applications from SPLASH2 and from SPEC92 benchmarks [Dix92] This last set of applications was built by SPEC (Standard Performance Evaluation Corporation) as a standard performance evaluation platform for microprocessors. It includes a mix of integer and floating point programs. Table 1 gives a list of the benchmarks used in the simulations. Irisa ....
K.M. Dixit. New cpu benchmark suites from spec. COMPCON, pages 305--310, spring 1992.
....[SWG92] SPLASH2 is a set of parallel applications for shared memory architectures written in Stanford University and using PARMACS. It covers a large field of currently encountered scientific applications. For multiprogramming purpose, we used applications of SPLASH2 and of Spec92 benchmarks [Dix92] This last set of applications was built by SPEC (Standard Performance Evaluation Corporation) as a standard performance evaluation platform for microprocessors. It includes a mix of integer and floating point programs. Table 1 gives a list of the benchmarks used in the simulations. SPEC codes ....
K.M. Dixit. New cpu benchmark suites from spec. COMPCON, pages 305--310, spring 1992.
....a large variety of register irregularities and because of its widespread use. The integer program generated by the IP allocator is sent to a CPLEX 6.0 integer program solver [4] The solver runs on a HP 9000 780 workstation with a 160MHz PA 8000 processor and 256MB of main memory. The SPEC92 [2] integer benchmarks are used as test inputs. The benchmarks consist of six programs: compress, eqntott, xlisp, sc, espresso, and cc1. For each function in a benchmark, a maximum solver time limit of 1024 seconds is allowed. The experiment assumes a simplified version of the cost model described ....
K. Dixit. New CPU benchmark suites from SPEC. In Digest of Papers Compcon, Spring 1992, pages 305--310. IEEE, 1992.
....in cache size. We assumed a hot word first demand fetch read policy, a write back write policy and an LRU replacement policy. Cache statistics were taken every 100,000 references. We obtain workload traces from Brigham Young University [6] containing portions of the SPEC benchmark suites [2] 3] [4] [7] 11] These traces included both single user (SPECint92) and multiuser (KENBUS) environments with operating systems references. They were gathered on a system having an i486 processor running UNIX System V R4. We chose to run 60 million references from each workload and we primed the caches ....
Kaivalya M. Dixit. New CPU Benchmark Suites From SPEC. In IEEE Computer Society International Conference (COMPCON), pages 305--310, San Francisco, California, February 1992.
....for most people programming the computer, the execution model and the performance observed is not that of the hardware, but rather the performance of the entire system, both hardware and software. Classic benchmark sets such as NAS [BBB 91] Perfect [CFN90] Linpack [DBMS79] Don92] and SPEC [Dix92] all measure the integrated performance of both the hardware and software. As a variety of software tools simplify hardware development, other tools reduce the amount of time and effort required to produce a good compiler for a given machine. DRAFT: Do Not Distribute 10:45 8 September 1994 ....
K. M. Dixit. New CPU benchmark suite from SPEC. In Proceedings of COMPCON Spring 92 -- The Thirty-seven IEEE Computer Society International Conference, pages 305--310, San Francisco, California, February 1992.
....memory hierarchy additions in configurable hardware. The three applications studied include two kernel applications known for their poor cache performance (Matrix Multiplication and Fast Fourier Transform) as well as a more substantial program: the Tomcatv benchmark from the SPEC92 suite [9]. 3.3.1 Matrix Multiplication In the first example, we study a matrix multiplication program multiplying two 100 by 100 matrices. The elements are double precision floating point. This means that a total of 80000 bytes are needed for each matrix. This greatly exceeds the size of the L1 cache. ....
K. M. Dixit. New CPU Benchmark Suites from SPEC. In Proc. COMPCON, Spring 1992.
....running. Thus, using shared first level caches is better, except for high number of threads, where private instruction caches and a shared data cache should be preferred. Tullsen et al. do not study the impact of all cache parameters on performance. Moreover, this study considered only the SPEC92 [17] benchmark suite. This suite is known to be inappropriate for cache evaluation, as the number of requests missing a small L1 instruction cache (and also in fact on the data cache) is particularly low and not realistic [5] Moreover, the contention on the second level cache is not evaluated. We ....
....To take this higher complexity into account, we assumed a 6 stage static pipeline, illustrated in Figure 5. DECODE RENAME DISPATCH EXECUTION READ REG. from thread buffers from instruction windows from memory FETCH M FETCH B Figure 5: static instruction pipeline 2. 3 Workloads SPEC benchmarks [17] have been widely used for microprocessor simulations. However, different studies have shown that this benchmark set exhibits very low cache miss rates, and so is poorly suited for cache simulation. Limiting the study to user level instructions usually leads to very optimistic results. The IBS ....
K.M. Dixit. New Cpu Benchmark Suites from Spec. COMPCON, pages 305--310, spring 1992.
.... Multiflow compiler (which already does loop unrolling and trace scheduling) Using code generated for the DEC Alpha and a simulator that models the AXP 21164, we simulated the effects of all three optimizations on both balanced and traditional scheduling, using Perfect Club [BCK 89] and SPEC92 [Dix92] benchmarks. Our results confirm the hypothesis. Balanced scheduling interacts well with all three optimizations, producing average speedups that range from 1.15 to 1.40. More importantly, its performance advantage relative to traditional scheduling increases when combined with the optimizations. ....
....particles in the framework of the Quark Gluon theory swm256 Fortran Solves shallow water equations using finite difference equations tomcatv Fortran Vectorized mesh generation program 4. 1 Workload Our experiments use benchmarks (Table 1) taken from the Perfect Club [BCK 89] and SPEC92 [Dix92] suites (the Multiflow compiler has both Fortran and C front ends) We chose numeric programs because their high percentage of loops make them appropriate testbeds for optimizations targeted for loops. Arrays in the C program are laid out in row major order; the Fortran benchmarks have a ....
K.M. Dixit. New CPU benchmark suites from SPEC. In COMPCON, pages 305--310, February 1992.
....in related loop restructuring literature and with readily available source code, since we are particularly interested in how array restructuring compares and interacts with existing loop restructuring techniques. These loops include, among others, the NASA7 kernels 1 of the SPEC 92 benchmarks [Dixit 1992], which have been used in several key studies of compiler directed cache optimizations [Carr, McKinley, and Tseng 1994; Li 1993; Wolf 1992] They are described in Table 7.1. For each loop, we experimented with a range of problem sizes. We chose the problem sizes so that the major arrays are at ....
Kaivalya M. Dixit. "New CPU Benchmarks Suites from SPEC." In Digest of Papers, COMPCON Spring '92, Thirty-Seventh IEEE Computer Society International Conference, (San Francisco, CA, February), 305-10. Los Alamitos, CA: IEEE Computer Society Press, 1992. 180
.... present performance results for eight programs: matmult is a matrix multiplication routine with two versions (with and without loop unrolling) syr2k is banded matrix update routine from BLAS [10] adi is from Livermore kernels [22] btrix, vpenta and cholesky are from Spec92 NASA benchmark suite [9]; and transpose is a matrix transpose program from NWchem [23] a software package for computational chemistry. matmult, syr2k and transpose are programs consisting of a single loop nest whereas the others contain multiple loop nests. For each program in the suite, we experiment with four ....
....code is an example of a class of programs where data layout optimizations generate better code than loop transformations; although loop transformations are applicable and improve the performance of the original code somewhat. 1.3. 4 vpenta The vpenta benchmark is from Spec92 NASA benchmark suite [9]. Table 1 shows the miss rates for four versions of vpenta for different cache parameters. The problem size is 720. The spatial locality in col version is poor for all references. Our optimization framework stores the two dimensional arrays row major, and the three dimensional arrays in such a ....
[Article contains additional citation context not shown here]
K. M. Dixit. New CPU benchmark suites from SPEC. In Proc. COMPCON'92 --37 th IEEE Computer Society International Conference, San Francisco, CA, Feb 1992.
....tailoring of the system structure to the requirements imposed by the algorithms [40] For complex applications this makes an analysis of the programs necessary to determining the requirements. In generalpurpose computing this is typically done by an analysis of frequently used programs [33] 51] [20] [68] 77] 100] 87] In ES design, this analysis must be carried out for a specific spectrum of programs. Therefore analysis tools are required which support this in a comfortable and efficient way [22] 23] An example of an estimation based on a VLIW compiler is included into the SpecSyn ....
K.M. Dixit. New CPU benchmark suites from SPEC. In COMPCON `92, pages 305--310, 1992.
....points for multiple arrays is an open research problem. Table 1: Programs in our experiment set. Program Source Arrays Array Nodes Edges Var Constr Time Sizes S D S D S D S D S D MxM 3 34.5 52 52 85 193 36 144 42 54 0.71 0.84 MxMxM [8] 3 57.6 90 90 104 224 52 172 100 120 2.41 2. 69 Vpenta [10] 9 86.7 116 116 175 206 116 197 145 181 3.90 4.52 ADI [37] 6 201.3 70 70 157 319 108 210 69 87 3.94 4.15 Transpose [19] 2 64.0 28 28 41 73 16 48 24 32 0.52 0.68 Amhmtm [42] 10 34.6 66 66 98 122 48 72 88 104 2.01 2.29 Bmcm [42] 11 24.1 46 46 82 98 36 52 48 60 0.80 0.93 Aps [42] 17 192.3 123 ....
....48 24 32 0.52 0.68 Amhmtm [42] 10 34.6 66 66 98 122 48 72 88 104 2.01 2.29 Bmcm [42] 11 24.1 46 46 82 98 36 52 48 60 0.80 0.93 Aps [42] 17 192.3 123 123 133 133 127 127 140 140 0.61 0.79 Tsf [42] 2 45.0 36 36 69 95 36 72 46 58 0.81 0.90 LU 3 8.7 54 54 86 158 48 120 36 48 0.70 0. 86 Tomcatv [10] 9 14.7 66 66 116 146 74 104 68 84 0.95 2.10 Htribk [11] 5 10.5 52 52 89 112 64 96 60 80 0.83 1.87 Table 2: Different versions and the associated compiler flags. Version Brief Explanation Optimization Flags No Opt Input program (This can be either the original code or the ....
K. M. Dixit. New CPU benchmark suites from SPEC. In Proc. COMPCON'92 -- 37th IEEE Computer Society International Conference, San Francisco, CA, February 1992.
....that exploit loop level parallelism found in a variety of scientific programs. The SUIF compiler will also serve as part of a framework for investigating other compilation issues critical to simultaneous multithreading. For our analysis of medium grained parallellism, we are using the SPEC92 [Dix92] and Perfect Club [BCK 89] benchmark suites. These programs are sequential programs, but exhibit medium grained loop level parallelism. By using the SUIF compiler, we can create threads to execute the iterations of a parallel loop to take full advantage of both this parallelism and the ....
K.M. Dixit. New CPU benchmark suites from SPEC. In COMPCON, pages 305--310, Spring 1992.
....the reorder buffer (for the R10000) branch prediction, instruction fetching, branching penalties, the memory hierarchy (including contention) etc. The parameters of our two machine models are shown in Table 1. We simulated fourteen SPEC92 benchmarks (five integer and nine floating point) Dix92] all of which were compiled with O2 using the standard MIPS compilers under IRIX 5.3. We simulate 1, 10, and 100 instruction generic miss handlers, and we pessimistically assume that all instructions within the handlers are data dependent on each other (hence a 10 instruction handler requires 10 ....
Kaivalya M. Dixit. New CPU Benchmark Suites from SPEC. Proc. COMPCON, Spring 1992.
....and Simulating Speculative Regions To quantify the potential for TLDS in non numeric codes, we examine a set of real non numeric applications in which potential speculative regions are identified by hand. Table 1 summarizes the ten non numeric applications studied, which are taken from the SPEC92 [4], SPEC95 [3] and NAS Parallel [1] benchmark suites. These applications were compiled with O2 optimization using the standard MIPS compilers under IRIX 5.3, and the source code and resulting object files were not modified in any way. Table 2 lists the speculative regions analyzed in this study. ....
K. M. Dixit. New cpu benchmark suites from spec. In COMPCON, Spring 1992.
....to select these parameters in such a way that the resulting system has a high performance cost ratio. The foundation for selecting the parameters in the right way is formed by measured data obtained from an analysis of general purpose applications. This data is available in the literature, e.g. [14] [21] 24] 27] 29] 33] In the same way an ES can be designed from a set of generic components where again the parameters must be selected in a way that the system is well balanced. The difficulty is the recording of the analysis data. At that point ES design differs from general purpose ....
K.M. Dixit: "New CPU Benchmark Suites from SPEC", COMPCON `92, pp. 305-310, 1992.
....performance; SMT can, by taking better advantage of ILP, even in sequential sections. 3 Five of our benchmarks are explicitly parallel programs from the SPLASH 2 suite [38] which are built on the Argonne National Laboratories parallel macro library [3] Tomcatv and hydro2d from SPEC92 [7], as well as shallow and linpack, are implicitly parallel programs for which we use the SUIF compiler [37] to extract loop level parallelism. SUIF generates transformed C output files that contain calls to a parallel runtime library to create threads and execute loop iterations in parallel. In ....
K. Dixit. New CPU benchmark suites from SPEC. In COMPCON '92 digest of papers, pages 305--310, 1992.
....a one cycle increase in the misprediction penalty would have less than a 1 impact on instruction throughput in single threaded mode. With 8 threads, where throughput is more tolerant of misprediction delays, the impact was less than .5 . 2. 2 Workload Our workload is the SPEC92 benchmark suite [10]. To gauge the raw instruction throughput achievable by multithreaded superscalar processors, we chose uniprocessor applications, assigning a distinct program to each thread. This models a parallel workload achieved by multiprogramming rather than parallel processing. In this way, throughput ....
K.M. Dixit. New CPU benchmark suites from SPEC. In COMPCON, Spring 1992, pages 305--310, 1992.
....instruction queues, track their dependences, and issue them. We eventually squash all wrong path instructions a cycle after a branch misprediction is discovered in the exec stage. Our throughput results only count useful instructions. Our workload comes primarily from the SPEC92 benchmark suite [7]. We use five floating point programs (alvinn, doduc, fpppp, ora, and tomcatv) and two integer programs (espresso and xlisp) from that suite, and the document typesetting program TeX. We assign a distinct program to each thread in the processor: the multiprogrammed workload stresses our ....
K.M. Dixit. New CPU benchmark suites from SPEC. In COMPCON, Spring 1992, pages 305--310, 1992.
....chunks of memory references must be simulated in each sample. Because informing loads are both low overhead and low perturbation, we have flexibility to choose sampling frequency from a much broader range of values. 3. 2 Results We have run our tool on a collection of SPEC and other applications [Dix92,Smi92] For each, we collected the per static reference data as described above. In the sections below, we present information on the execution time overhead of the informing load based tool. We also quantify data cache perturbation induced by running monitoring code interspersed with the ....
....with early commit CTI instructions. 21 than one gets from simply counting the fill rate of load delay slots since adding an instruction after a load may cause the former load slot instruction to replace a NOP in a branch delay slot. For a collection of SPEC benchmarks and UNIX utilities [Dix92,Smi92] we found that the overhead averaged around 0.6 instructions per informing load. On a superscalar machine however, the informing load overhead is dramatically lower on average because data dependences typically limit the average execution rate of instructions to a value below the peak rate ....
Kaivalya M. Dixit. New CPU Benchmark Suites from SPEC. In Proc. COMPCON, Spring 1992.
No context found.
K. Dixit. "New CPU benchmark suites from SPEC." COMPCON Spring 1992.
No context found.
K. Dixit. New CPU benchmark suites from SPEC. In COMPCON '92 digest of papers, pages 305--310, February 1992.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC