17 citations found. Retrieving documents...
V. Krishnan and J. Torrellas. A Direct-Execution Framework for Fast and Accurate Simulation of Superscalar Processors. In International Conference on Parallel Architectures and Compilation Techniques, pages 286--293, October 1998.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Branch Prediction On Demand: an Energy-Efficient Solution - Chaver, Pinuel, Prieto, .. (2003)   (Correct)

....consisting of two 4K entry global history tables that use 10 bits of global history, and a pskew component consisting of a 1Kentry local history table (8 bit wide) with two 2K entry PHTs. Table 1 shows some parameters of the simulated system. We use a MINT based execution driven simulator [9] that models the contention and occupancy of all resources as the evaluation tool. The simulator incorporates Wattch [4] to evaluate energy consumption. Our simulator fully accounts for all mis speculation induced overhead. Processor Core: 1 GHz Out of order Branch units: 1 Issue width: 6 ....

V. Krishnan and J. Torrellas. A Direct-Execution Framework for Fast and Accurate Simulation of Superscalar Processors. In International Conference on Parallel Architectures and Compilation Techniques, pages 286--293, Paris, France, October 1998.


Variability in Architectural Simulations of Multi-threaded.. - Alameldeen, Wood (2003)   (8 citations)  (Correct)

....commercial workloads [2] Other studies have evaluated the fidelity of simulation results for uniprocessors [8] and multiprocessors [13] compared to the actual hardware being modeled. Krishnan and Torrellas examined experimental errors in multiprocessor simulations due to simple processor models [18]. Cain et al. 5] discussed issues related to simulation precision and accuracy. In our infrastructure, we use TFsim to increase the simulation precision, and commercial workloads to increase accuracy. Our work distinguishes itself from other studies by focussing on the variability phenomenon in ....

Venkata Krishnan and Josep Torrellas. A DirectExecution Framework for Fast and Accurate Simulation of Superscalar Processors. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pp. 286--293, October 1998.


Time Varying Behavior of Programs - Sherwood, Calder (1999)   (15 citations)  (Correct)

....already done in the past, these past executions are cached, and then replayed when they are seen again later. Direct execution, the translation of simulated hardware to native machine code, was also shown to be applicable to out of order execution with some effort by Krishnan and Torrellas [6]. However, even with these techniques, full simulation is still often many times too slow to simulate a large number of real programs to completion. The resources to execute hundred of billions of instructions for each of the eighteen SPEC95 benchmarks is far outside the average researcher, ....

V. Krishnan and J. Torrellas. A direct-execution framework for fast and accurate simulation of superscalar processors. In International Conference on Parallel Architectures and Compilation Techniques (PACT), October 1998.


Exploiting Instruction-Level Parallelism for Memory System.. - Pai (2000)   (Correct)

....[RBDH97] and parallelization [MRF 97] are used to speed up shared memory simulation. Both techniques are orthogonal to ours and can be used in conjunction with DirectRSIM. Concurrently, Krishnan and Torrellas have proposed a method similar to ours for directexecution for ILP multiprocessors [KT98] They do not discuss the potential for error (or solutions) when using values of non blocking reads in direct execution. They also do not assess the accuracy of their simulator or compare performance with detailed simulation. Their performance comparison with a previous simple processor ....

Venkata Krishnan and Josep Torrellas. A Direct-Execution Framework for Fast and Accurate Simulation of Superscalar Processors. In Proceedings of the IFIP WG 10.3 Working Conference on Parallel Architectures and Compilation Techniques, PACT '98, October 1998.


Improving the Accuracy vs. Speed Tradeoff for Simulating.. - Durbhakula, Pai, Adve (1999)   (3 citations)  (Correct)

....path is affected by the dynamic ordering of synchronization accesses and contention. In the uniprocessor case, however, DirectRSIM effectively becomes a trace driven simulator. Concurrently, Krishnan and Torrellas have proposed a method similar to ours for direct execution for ILP multiprocessors [6]. They do not discuss the potential for error (or solutions) when using values of non blocking loads in direct execution. They also do not assess the accuracy of their simulator or compare performance with detailed simulation. Their performance comparison with a previous simple processor ....

V. Krishnan and J. Torrellas. A Direct-Execution Framework for Fast and Accurate Simulation of Superscalar Processors. In Proc. Parallel Architectures and Compilation Techniques, October 1998.


Using a User-Level Memory Thread for Correlation Prefetching - Yan Solihin Jaejin (2002)   (11 citations)  Self-citation (Torrellas)   (Correct)

No context found.

V. Krishnan and J. Torrellas. A Direct-Execution Framework for Fast and Accurate Simulation of Superscalar Processors. In International Conference on Parallel Architectures and Compilation Techniques, pages 286--293, October 1998.


Using a User-Level Memory Thread for Correlation Prefetching - Yan Solihin Jaejin (2002)   (11 citations)  Self-citation (Torrellas)   (Correct)

No context found.

V. Krishnan and J. Torrellas. A Direct-Execution Framework for Fast and Accurate Simulation of Superscalar Processors. In International Conference on Parallel Architectures and Compilation Techniques, pages 286--293, October 1998.


Correlation Prefetching with a User-Level Memory Thread - Solihin, Lee, Torrellas (2003)   Self-citation (Torrellas)   (Correct)

....exception is CG, which is a regular application. Table 2 describes the applications. The last four columns of the table will be explained later. 4. 2 Simulation Environment The evaluation is done using an execution driven simulation environment that supports a dynamic superscalar processor model [20]. We model a PC architecture with a simple memory processor that is integrated in either the North Bridge chip or in a DRAM chip, following the microarchitecture of Fig. 3. Table 3 shows the parameters used for each component of the architecture. All cycles are 1.6 GHz cycles. The architecture is ....

V. Krishnan and J. Torrellas, "A Direct-Execution Framework for Fast and Accurate Simulation of Superscalar Processors," Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, pp. 286293, Oct. 1998.


Energy-Efficient Hybrid Wakeup Logic - Huang, Renau, Torrellas (2002)   (3 citations)  Self-citation (Torrellas)   (Correct)

....4. EVALUATION ENVIRONMENT We evaluate our proposal on a simulated generic out of order processor with a centralized instruction window structure. Table 1 shows some parameters of the simulated system. Our execution driven simulator system models the contention and occupancy of all the resources [8]. Processor Frequency: 1 GHz Branch units: 1 Issue width: 6 Branch penalty: 8 cycles Dynamic issue: yes RAS entries: 32 I window size: 96 BTB entries: 2048 Load store units: 2 BTB assoc: 4 Int,FP units: 5,4 Branch predictor: GAp(10,8) Pending loads,stores: 16,16 Caches Bus Memory L1 ....

V. Krishnan and J. Torrellas. A Direct Execution Framework for Fast and Accurate Simulation of Superscalar Processors. In International Conference on Parallel Architectures and Compilation Techniques, pages 286--293, Oct. 1998.


Speculative Synchronization: Applying Thread-Level.. - Martinez, Torrellas (2002)   (4 citations)  Self-citation (Torrellas)   (Correct)

....5 EVALUATION ENVIRONMENT To evaluate Speculative Synchronization, we use simulations driven by several parallel applications. In this section, we describe the machine architecture modeled and the applications executed. 5. 1 Architecture Modeled We use an execution driven simulation framework [18] to model in detail a CC NUMA multiprocessor with 16 or 64 nodes. The system uses the release memory consistency model and a cache coherence protocol along the lines of DASH [21] Each node has one processor and a two level hierarchy of write back caches. The processor is a 4 issue out of order ....

V. Krishnan and J. Torrellas. A direct-execution framework for fast and accurate simulation of superscalar processors. In International Conference on Parallel Architectures and Compilation Techniques, pages 286--293, Paris, France, Oct. 1998.


Eliminating Squashes Through Learning Cross-Thread.. - Cintra, Torrellas (2002)   (5 citations)  Self-citation (Torrellas)   (Correct)

....speedups because they are dependent on the efficiency of the parallel execution of the rest of the code. 5.2 Architecture Simulated The evaluation is based on execution driven simulations. Our simulation environment uses an extension to MINT [27] that includes a superscalar processor model [9], and supports dynamic spawn, squash, restart, and retire of light weight threads. The processor model is a 4 issue dynamic superscalar with register renaming, branch prediction, and non blocking memory operations. Some of its parameters are shown in the left portion of Table 2. The memory system ....

V. Krishnan and J. Torrellas. "A Direct-Execution Framework for Fast and Accurate Simulation of Superscalar Processors." Intl. Conf. on Parallel Architectures and Compilation Techniques, pages 286293, October 1998.


Improving the Performance of Bristled CC-NUMA Systems.. - Martinez, Torrellas.. (1999)   Self-citation (Torrellas)   (Correct)

....above. We have integrated this tool in a detailed simulator of a CC NUMA system with dynamic superscalar processors. We perform simulations under a system based on MINT [19] that has been modified to support accurate and efficient execution driven simulations of dynamic superscalar processors [9]. The simulated CC NUMA system has 32 single processor PNs. Each processor is modelled as a 4 issue dynamic superscalar with three pipelined integer units, two pipelined add multiply floating point units, one non pipelined division unit and two load store units. The processor has a 128 entry ....

V. Krishnan and J. Torrellas. "A Direct-Execution Framework for Fast and Accurate Simulation of Superscalar Processors ". Intl. Conf. on Parallel Architectures and Compilation Techniques, October 1998.


Improving the Performance of Bristled CC-NUMA Systems.. - Martínez, Torrellas.. (1999)   Self-citation (Torrellas)   (Correct)

....described above. We have integrated this tool in a detailed simulator of a CCNUMA system with dynamic superscalar processors. We perform simulations under a system based on MINT [19] that has been modi ed to support accurate and ecient execution driven simulations of dynamic superscalar processors [9]. The simulated CC NUMA system has 32 single processor PNs. Each processor is modelled as a 4 issue dynamic superscalar with three pipelined integer units, two pipelined add multiply oating point units, one non pipelined division unit and two load store units. The processor has a 128 entry ....

V. Krishnan and J. Torrellas. \A Direct-Execution Framework for Fast and Accurate Simulation of Superscalar Processors ". Intl. Conf. on Parallel Architectures and Compilation Techniques, October 1998.


Architectural Support for Scalable Speculative.. - Cintra, Martinez.. (2000)   (26 citations)  Self-citation (Torrellas)   (Correct)

....execution driven simulations. In our base experiments, each thread is composed of a single loop iteration. 4. 2 Simulation Environment Our execution driven simulation environment is based on an extension to MINT [21] that includes a superscalar processor model with non blocking memory operations [10], and supports dynamic spawn, squash, restart, and retire of light weight threads. We use these threads to attempt speculative parallelization on the loops in Table 3. The processor model is that of a 4 issue dynamic superscalar with register renaming, branch prediction, and non blocking memory ....

.... including thread commit time, thread squash time, and stalls caused by L1, L2, VC, and LMDT overflows (Overhead) idle time waiting for other threads to complete (Imbalance) and conventional pipeline hazards (Other) The contribution of each category is measured at the grain size of issue slots [10]. From the figure, we see that our scheme delivers speedups that range from 1.5 to 5.6. Over the five applications, the average speedup is 2.8. Interestingly, the results show that, in all applications but one (BDNA ) the main obstacle to better speedups is not related to the speculative aspect ....

V. Krishnan and J. Torrellas. "A Direct-Execution Framework for Fast and Accurate Simulation of Superscalar Processors." Intl. Conf. on Parallel Architectures and Compilation Techniques, pages 286-293, October 1998.


A Chip-Multiprocessor Architecture with Speculative.. - Krishnan, Torrellas (1999)   (22 citations)  Self-citation (Krishnan Torrellas)   (Correct)

....of the memory hierarchies are shown in Table 7. In the table, latencies correspond to round trips from the processor without contention. In the simulations, we use the release memory consistency model. 5. 1 SIMULATION APPROACH We use a MINT based execution driven simulation environment [13]. MINT [29] generates events by instrumenting binaries and captures both application and library code execution. We have modi ed 26 Parameter CMP 4 Issue 12 Issue Superscalar Superscalar [L1 , L2] size (Kbytes) 4x16 , 1024] 64 , 1024] 64 , 1024] L1 , L2] line size (Bytes) 32 , 64] 32 , ....

....trace before feeding it to the simulator. The rescheduler performs a resource constrained scheduling of instructions. Since it gathers 27 several loop iterations, it enables aggressive loop unrolling. It is this optimized window of instructions that is nally passed to the processor simulator [13]. 6 EVALUATION To evaluate the proposed architecture, we compare the performance of our 4x4 issue CMP to that of an aggressive superscalar (Section 6.1) Later, we assess the requirements imposed by our speculation hardware (Sections 6.2 and 6.3) 6.1 CMP VERSUS SUPERSCALAR To put the ....

V. Krishnan and J. Torrellas. A Direct Execution Framework for Fast and Accurate Simulation of Superscalar Processors. In PACT '98, pages 286-293, October 1998.


The Need for Fast Communication in Hardware-Based.. - Krishnan, Torrellas (1999)   (6 citations)  Self-citation (Krishnan Torrellas)   (Correct)

....Each processor in the CMP has a relatively small private L1 cache of 16 Kbytes. All processors share a larger on chip L2 cache. The characteristics of the memory hierarchy are shown in Table 2. 4. 2 Simulation Approach We evaluate the architectures using an execution driven simulation environment [11]. Our environment includes MINT as a front end [26] The environment captures both application and library code and 10 Parameter Value [L1,L2] Cache Size (Kbytes) 16x4,1024] L1,L2] Cache Line Size (Bytes) 32,64] L1,L2] Cache Associativity [2,4] L1 Banks 3 L1 Latency (Cycles) 1 L2 ....

V. Krishnan and J. Torrellas. A Direct-Execution Framework for Fast and Accurate Simulation of Superscalar Processors. In PACT '98, pages 286-293, October 1998.


Quantifying the Benefits of SPECint Distant.. - Ortega, Martel..   Self-citation (Krishnan)   (Correct)

....L2 cache and main memory take respectively 1, 6 and 26 cycles of time. When simulating a simultaneous multithreading processor we have supposed one single L1 cache shared among all the threads. Simulation Approach Our simulation environment is built on a MINT based execution driven simulator [8]. MINT [22] captures both application and library code execution and generates events by instrumenting binaries. Our back end simulator is extremely detailed and performs a cycle accurate simulation of the architectures and hardware support for speculation described. The synchronisation ....

V. Krishnan and J. Torrellas. A direct execution framework for fast and accurate simulation of superscalar processors. International Conference on Parallel Architectures and Compilation Techniques, October 1998.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC