| MCCALPIN, J. D. 1995. Memory bandwidth and machine balance in high performance computers. IEEE Tech. Comm. Comput. Arch. Newslett. See also http://www.cs.virginia.edu/ stream. |
....of instruction cache miss samples that occurred in each procedure. 3.2 Instruction Level Bottlenecks Dcpicalc provides a detailed view of the time spent on each instruction in a procedure. Figure 2 illustrates the output of dcpicalc for the key basic block in a McCalpin like copy benchmark [McCalpin 1995], running on an AlphaStation 500 5 333. The copy benchmark runs the following loop where n # 2,000,000 and where the array elements are 64 bit integers: for (i # 0; i # n; i##) c[i] # a[i] The compiler has unrolled the loop four times, resulting in four loads and stores per iteration. The ....
.... A 333MHz AlphaStation 500 Several tests from the x11perf X server performancetesting program. The tests chosen are representative of CPU bound tests. McCalpin N A 333MHz AlphaStation 500 The McCalpin STREAMS benchmark, consisting of four loops that measure memory system bandwidth [McCalpin 1995]. http: www.specbench.org osg spec95 http www.specbench.org gpc xpc.static index.html Table III. Description of Multiprocessor Workloads Workload Mean base Runtime (secs. Platform Description AltaVista 319 # 2 300MHz 4 CPU ALPHASERVER 4100 A trace of 28,622 queries made to the ....
MCCALPIN, J. D. 1995. Memory bandwidth and machine balance in high performance computers. IEEE Tech. Comm. Comput. Arch. Newslett. See also http://www.cs.virginia.edu/ stream.
....of instruction cache miss samples that occurred in each procedure. 3.2 Instruction Level Bottlenecks Dcpicalc provides a detailed view of the time spent on each instruction in a procedure. Figure 2 illustrates the output of dcpicalc for the key basic block in a McCalpin like copy benchmark [15], running on an AlphaStation 500 5 333. The copy benchmark runs the following loop where n = 2000000 and the array elements are 64 bit integers: for (i = 0; i n; i ) c[i] a[i] The compiler has unrolled the loop four times, resulting in four loads and stores per iteration. The generated ....
....N A 333 MHz ALPHASTATION 500 Several tests from the x11perf X server performance testing program. The tests chosen are representative of CPU bound tests [16] McCalpin N A 333 MHz ALPHASTATION 500 The McCalpin STREAMS benchmark, consisting of four loops that measure memory system bandwidth [15]. Multiprocessor workloads AltaVista 319 Sigma 2 300 MHz 4 CPU ALPHASERVER 4100 A trace of 28622 queries made to the 3.5 GB AltaVista news index. The system was driven so as to maintain 8 outstanding queries. DSS 2786 Sigma 35 300 MHz 8 CPU ALPHASERVER 8400 A decision support system (DSS) ....
J. D. McCalpin. Memory bandwidth and machine balance in high performance computers. IEEE Technical Committee on Computer Architecture Newsletter, December 1995. http://www.cs.virginia.edu/stream.
....code in the kernel ( vmunix) as well as code in shared libraries. 3.2 Instruction Level Bottlenecks Dcpicalc provides a detailed view of the time spent on each instruction in a procedure. Figure 2 illustrates the output of dcpicalc for the key basic block in a McCalpin like copy benchmark [15], running on an AlphaStation 500 5 333. The copy benchmark runs the following loop where n = 2000000 and the array elements are 64 bit integers: for (i = 0; i n; i ) c[i] a[i] Total samples for event type cycles = 6095201, imiss = 1117002 The counts given below are the number of samples ....
....x11perf N A 333 MHz ALPHASTATION 500 Several tests from the x11perf X server performance testing program. The tests chosen are representative of CPU bound tests [16] McCalpin N A 333 MHz ALPHASTATION 500 The McCalpin STREAMS benchmark, consisting of four loops that measure memory system bandwidth [15]. Multiprocessor workloads AltaVista 319 Sigma 2 300 MHz 4 CPU ALPHASERVER 4100 A trace of 28622 queries made to the 3.5 GB AltaVista news index. The system was driven so as to maintain 8 outstanding queries. DSS 2786 Sigma 35 300 MHz 8 CPU ALPHASERVER 8400 A decision support system (DSS) ....
J. D. McCalpin. Memory bandwidth and machine balance in high performance computers. IEEE Technical Committee on Computer Architecture Newsletter, December 1995. http://www.cs.virginia.edu/stream.
....Uses Perl to perform text and data manipulation Table 3: SPEC 95 Benchmark Programs 2.4. 3 Memory Bandwidth and Latency Benchmarks STREAM is a synthetic benchmark that measures the sustainable memory bandwidth of a processor system using four long vector operations, described in Table 4 [21]. The array sizes used by the benchmark are defined to be larger than the cache of the machine being tested, and data is not reused. We chose STREAM because Ousterhout, Burger, and others [7] 27] 24] argue that high bandwidth is essential to the performance of operating systems, commercial ....
MCCALPIN, J. Memory bandwidth and machine balance in high performance computers. In IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter (Dec. 1995).
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC