Results 1 - 10
of
14,797
TABLE 2. Microarchitecture configuration. single processor core (PE) slipstream memory hierarchy
2002
Cited by 7
Table 5-2. Microarchitecture configuration. single processor core (PE) slipstream memory hierarchy private L1 instr. cache (see memory hier. column) size = 64 KB caches
"... In PAGE 10: ...able 4-4. IR-misprediction rate, recovery latency, slack, and delay buffer length............................................ 45 Table5 -1.... In PAGE 10: ...able 5-1. Qualitative comparisons of duplication and recovery methods. ........................................................ 64 Table5 -2.... In PAGE 10: ...able 5-2. Microarchitecture configuration....................................................................................................... 67 Table5 -3.... In PAGE 73: ...63 5.4 Qualitative Comparisons of Duplication and Recovery Methods Table5 -1 summarizes the advantages, disadvantages, and required hardware support of the two memory duplication methods (top-half) and three memory recovery methods (bottom-half). Notice the four useful measurements introduced in Sections 5.... In PAGE 73: ... Results in Section 5.6 quantify much of the information summarized in Table5 -1. Note that the cache-based value prediction technique is not listed in Table 5-1, but is used in conjunction with either invalidation-based recovery model to reduce the performance impact of recovery-induced misses.... In PAGE 73: ...kipped-write relate to recovery. Results in Section 5.6 quantify much of the information summarized in Table 5-1. Note that the cache-based value prediction technique is not listed in Table5 -1, but is used in conjunction with either invalidation-based recovery model to reduce the performance impact of recovery-induced misses. Figure 5-4 shows the original slipstream microarchitecture with software-based memory duplication.... In PAGE 74: ...64 Table5 -1. Qualitative comparisons of duplication and recovery methods.... In PAGE 76: ... The functional simulator checks retired R-stream control flow and data flow outcomes. Microarchitecture parameters are listed in Table5 -2. The top-left portion of the table lists parameters for individual processors within a CMP.... In PAGE 77: ...nv.-dirty, or inv./inv.-dirty with value prediction The Simplescalar [5] compiler and ISA are used. We use eight SPEC2000 integer benchmarks compiled with -O3 optimization and run with ref input datasets ( Table5 -3). The first billion instructions are skipped and then 100 million instructions are simulated.... In PAGE 78: ...68 have to maintain several full memory images to measure the number of stale, self-repair, persistent-stale, and persistent-skipped-write references (this is a statistics-gathering issue). Table5 -3. Benchmarks.... ..."
Table 1. Configuration of the front and back processors.
2005
"... In PAGE 6: ... The cache module (including the run-ahead cache) in our simulator models both data and tag stores. The front and back processors have the same configurations (but a shared L2 cache), shown in Table1 . The correctness assertions are disabled in the front processor model but enforced in the back processor model.... In PAGE 7: ... In our experiments, we also modeled run-ahead execution [13], [25] and slipstream processors [27], [37] to compare with DCE. Run-ahead execution is implemented according to [25] but with the processor model described in Table1 and a 4 kB run-ahead cache. Oracle memory disambiguation is also modeled for both run-ahead execution and slipstreaming processors.... In PAGE 11: ... Figure 9 shows the execution time of a baseline processor and DCE with both pessimistic and oracle memory disambiguation. The default configuration shown in Table1 is used for the baseline processor, the front processor, and the back processor in DCE. Two important observations can be made from Figure 9.... ..."
Cited by 20
Table 6-1. SPEC2K benchmarks. Benchmark Input Dataset
"... In PAGE 10: ...able 4-2. Counter status of root non-checker instruction (X). . . . . . . . . . . . . . . 32 Table6 -1.... In PAGE 10: ...able 6-1. SPEC2K benchmarks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Table6 -2.... In PAGE 10: ...able 6-2. SPEC95 benchmarks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Table6 -3. Single processor core con guration and shared L2 cache.... In PAGE 10: ... . . . . . . . . . 48 Table6 -4.... In PAGE 58: ... The SPEC95 benchmarks are run to comple- tion. Table6 -2 shows the SPEC95 benchmarks, their inputs, and dynamic instruction count.... In PAGE 59: ... Table6 -2. SPEC95 benchmarks.... In PAGE 59: ...c lt; ctl.in (dcrand.big) 120 million 6.2 Microarchitecture Parameters Parameters for a single processor core and the shared L2 cache within a CMP are shown in Table6 -3 [5]. Parameters for the slipstream components are shown in Table 6-4 [5].... In PAGE 59: ... Parameters for the slipstream components are shown in Table 6-4 [5]. Table6 -3. Single processor core con guration and shared L2 cache.... In PAGE 60: ...Table6 -4. Slipstream component con guration.... ..."
Table I. Specification of Xtensa processor core for profiling.
Table 2: Speed-ups over the sequential version on a 19 Processor Sequent
"... In PAGE 7: ... The speed-ups of these optimized versions are under reported. Table2 contains the speed-ups over the sequential version of the parallel versions. The execution times in seconds of all the program and program subpart versions appear in Table 3.... In PAGE 7: ... The execution times in seconds of all the program and program subpart versions appear in Table 3. In Table2 and 3, a blank entry means that no program or pro- gram subpart fell in that category. For example, the automatically parallelized version and the user parallelized version did not differ for Control, Direct, and ODE and therefore we did not measure any subparts.... ..."
Table 4: Human Genome Application
"... In PAGE 8: ... Similar to the Prolog-OBDC Interface, it is difficult to imagine a method for producing shorter more readable code than that above. Table4 provides the results for a combined DCS and DS method using a query of the form a(X),b(Y),c(X,Y). The speed-ups are normalized by the absolute performance of one processing node.... ..."
Table 4: Human Genome Application
"... In PAGE 8: ... Similar to the Prolog-OBDC Interface, it is difficult to imagine a method for producing shorter more readable code than that above. Table4 provides the results for a combined DCS and DS method using a query of the form a(X),b(Y),c(X,Y). The speed-ups are normalized by the absolute performance of one processing node.... ..."
Results 1 - 10
of
14,797