82 citations found. Retrieving documents...
M. D. Smith, M. Johnson, et al. Limits on multiple instruction issue. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) , pages 290--302. ACM Press, 1989.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Available Parallelism in Video Applications - Liao, Wolfe (1997)   (5 citations)  (Correct)

....of up to 50,000 operations were measured and an average speedup of around 90 was observed. Smith used trace driven simulations to measure the effective limits of multiple instruction issue in superscalar architectures and observed an instruction issue rate of about two instructions per cycle [1]. Wall also tested ILP limit with various assumptions using a wide variety of hardware and software techniques including branch prediction, register renaming, and alias analysis, and concluded that average parallelism rarely exceeds 7 [2] Audio and video applications use different enough ....

Michael D. Smith, Mike Johnson, and Mark A. Horowitz, "Limits on Multiple Instruction Issue", Third International Symposium on Architectural Support for Programming Languages and operating Systems, pp. 290- 302, April 1989.


Computing Along the Critical Path - Tullsen, Calder (1998)   (1 citation)  (Correct)

....with studies that attempt to measure the inherent instruction level parallelism (ILP) limits in various programs. They profile or simulate code, following the dependence paths, and measure the amount of parallelism given various architectural constraints. Among those have been Smith, et al. [20], Butler, et al. 4] Wall [26] Theobald, et al. 22] and Lam and Wilson [11] The difference in this paper is that we are not searching just for the length of the critical path (another way of thinking about the ILP they measured) but the composition of the critical path. We want to know ....

M.D. Smith, M. Johnson, and M.A. Horowitz. Limits on multiple instruction issue. In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS III), pages 290--302, 1989.


On the Performance of A Multi-Threaded RISC Architecture - Lindsay, Preiss (1993)   (Correct)

....or superpipelined organization in order to increase execution speed by exploiting instruction parallelism, i.e. by attempting to execute more than one instruction in parallel. Unfortunately, the amount of available instruction parallelism in typical programs is low; on the order of two [1] to eight [2] instructions. In addition, both techniques tend to increase the penalty for branches and, hence, require expensive hardware for branch prediction. Ultimately, processor performance is limited by the presence of data and control hazards in programs as well as the need to fetch data ....

Michael D. Smith, Mike Johnson, and Mark A. Horowitz. Limits on multiple instruction issue. In ASPLOS 3, 1989.


Loop Optimization Techniques On Multi-Issue Architectures - Kaiser   (Correct)

....comment: Cache misses impose a larger penalty for multi issue and other parallel machines. This is because the number of instructions lost is magnified by the width of the instruction window. Smith, Johnson and Horowitz study the available parallelism for a superscalar MIPS architecture in [161]. In this study, trace driven simulations were used to find the parallelism for variations of the MIPS architecture, including superscalar versions. The benchmarks used were non scientific code, i.e. avoiding the Livermore Loops. They start with code optimized for the R2000 in this study. Pixie is ....

M. D. Smith, M. Johnson, M. A. Horowitz, Limits on Multiple Instruction Issue, Proceedings of the Third International Conference on Architectural Support for 181 Programming Languages and Operating Systems, 1989, vol. 17, pp. 290-302.


Adaptive Explicitly Parallel Instruction Computing - Talla (2000)   (4 citations)  (Correct)

....assumptions about the compilation and processor technology which may never be attained or (better yet) may be bettered in future. Early studies [161, 128] concluded that there is very limited parallelism in general purpose applications. Since then several ILP limit studies have been conducted [108, 83, 157, 155, 172, 173, 90]. Most of these studies concluded that, in general, the available ILP is limited, claiming that often the number of maximum instructions that can be issued on each cycle is typically less than 10. All these studies explain the general perception that conventional approaches to ILP may not improve ....

M. Smith, M. Johnson, and M. Horowitz. Limits on multiple instruction issue, 1989.


Compiler and Architectural Techniques for Improving the.. - Grossman   (Correct)

....in order to achieve the maximum possible performance. In particular, in order to make effective use of a large number of functional units it is necessary to perform optimizations across basic block boundaries, as the amount of parallelism available within basic blocks tends to be quite limited ([Smith89], Wall91] In this paper we survey some common compiler and architectural techniques for increasing program ILP and making more effective use of the available hardware resources. We begin by discussing Trace Scheduling ( Fisher81] Fisher83] in section 2. In trace scheduling, compilation ....

Michael D. Smith, Mike Johnson, Mark A. Horowitz, Limits on Multiple Instruction Issue , Proc. ASPLOS 89, pp. 290-302


Stop twisting thumbs: Speculative Execution - Isaac Siu Ta-Yan   (Correct)

....we do not have any instruction to dispatch since we don t know what to execute next. Previous superscalar implementations stall the execution of one or more functional units. However, the intrinsic parallelism of most non numerical applications is proved to be insufficient to make this effective[6]. The solution is very similar to the single issue case. We use branch prediction to assume either the taken or non taken path will be reached, and execute the instruction located there. By doing this, the processor avoids twisting its thumbs. Now instead of sitting idle waiting for branch ....

M.D. Smith, M. Johnson, and M.A. Horowitz. "Limits on Multiple Instruction Issue". Proc. Third Int. Conf. on Architectural Support for Programming Languages and Operating Systems, April 1989, pp. 290-302.


Reducing The Impact Of Register Pressure On Software Pipelined Loops - Llosa (1996)   (8 citations)  (Correct)

....and a factor 2 3 for the numeric benchmarks. They also investigated the increase in parallelism when unrolling loops for the numeric benchmarks. They found a degree of up to 6 parallelism with unrolling degrees of 10. Another study for superscalar processors was carried by M.D. Smith et al. SJH89] This study was based on traces executed in the MIPS R2000 processor. The experiment was done for non numeric programs and with several configurations of functional units. In fact it was not an study on the limits of available parallelism, but an study of attainable parallelism with a limited ....

M.D. Smith, M. Johnson, and M.A. Horowitz. Limits on multiple instruction issue. In Proc., Third Internat. Conf. on Architectural Support for Programming Languages and Operating Systems, pages 290--302, April 1989.


Superscalar Branch Instruction Processor - Jeremiah, Vassiliadis, Blaner   (Correct)

....stream or continue executing the sequential (fall through) stream. This performance diminishing effect of branch instructions is further amplified in computers with multiple pipelined functional units. This effect has been known for some time [3] and has recently received further attention [4] with the advent of superscalar and scalable compound instruction set machines (SCISM) 5] These machines, which attempt to execute multiple instructions in parallel from a single instruction stream, are particularly susceptible to the adverse effects of branches, because not only will branch ....

M. D. Smith, M. Johnson, and M. Horowitz, "Limits on multiple instruction issue," Proceedings of ASPLOS III, ACM, pp. 290--302, 1989.


Reorder Buffer Structure with Shelter Buffer for.. - Chang, Park, Choi (2000)   (Correct)

.... fo givenproW64W andco[804 the relative perfoB manceimpro vements o superscalar and superpipelined pro cesso against thetraditio#0 scalarpro cesso [2] Ho wever, Smithcoth[5F7 that thepro87W withoW cohoW mathematicalo eratioa do es no haveenoB7 parallelismto execute two instructio6 per cycle [3]. In theo[56088[#BF issue,instructio mayalso coo pleteoe o oete because they are no issued in sequential oal[ and have di#erent latency in theiro eratio48 To take full advantageo theo[8WB40[#B8 issue, the pro cesso shoo emplo y theo[885 F[#B5 co885 F[#B Ho wever, theo[57W4 [#B0 co57W4 [# leadsto a ....

M.D. Smith, M. Johnson, and M.A. Horowitz, "Limits on multiple-instruction issue," Proc. Third Int. Conf., Architectural Support Programming Language and Oper. Sys., pp.290--302, April 1989.


Load Latency Tolerance In Dynamically Scheduled Processors - Srinivasan, Lebeck (1999)   (32 citations)  (Correct)

....on the cache organization. The latency tolerance of references that hit in the cache is not accounted for, even though it may be quite large. We evaluate the latency tolerance of individual load instructions without being tied down to a specific memory system. Graph based analyses have been used [2, 9, 15, 21, 25, 28] to study the amount of parallelism available in programs. They work on a static dynamic execution trace under idealistic assumptions such as unconstrained resources, single cycle operational latencies for functional units, perfect branch prediction and alias analysis. They use the dependence ....

M. D. Smith, M. Johnson, and M. A. Horowitz. Limits on Multiple Instruction Issue. In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS III), pages 290--302, May 1989.


The Limits of Instruction Level Parallelism in SPEC95.. - Posti, Greene, Tyson..   (Correct)

.... the hardware needed to detect data dependencies between instructions, the complexities of fetching noncontiguous instructions from memory, and gathering multiple data items from memory in single cycles, arrived at much more pessimistic results that suggested 2 3 as a realistic limit for IPC [JW89, SJH89] Nevertheless, the limit studies showed that the limitations were one of physical implementation, not logical limitations, and thus provided a realistic goal for implementers. In fact, before some of these studies there was already a signi cant body of research that proposed to eliminate the ....

.... group at the University of Illinois showed that 16 or more processors could be kept busy on workloads characterized by FORTRAN DO loops [KBC 74] Furthermore, not all researchers who considered the attenuation that results from implementation complexities were as pessimistic as [JW89] and [SJH89] In [BYP 91] as the title suggests, the authors argue a strong case for implementation complexities being less limiting. Indeed, in the past few years manufacturers have now started to produce machines with the ability to issue six instructions per cycle, with more on the horizon [Gwe97] ....

M. D. Smith, M. Johnson, and M. A. Horowitz. Limits on multiple instruction issue. In Proc. ASPLOS-3, number 5, pages 290-302, May 1989.


Instruction-Processing Optimization Techniques For VLSI.. - Bunda (1993)   (1 citation)  (Correct)

....to execution resource limits such as inter instruction dependencies, load and store latencies, floatingpoint latencies, and so on. Instruction fetch bandwidth and branch latency are identified by Smith, Johnson, and Horowitz as principal barriers to increased performance in superscalar processors [71]. 2.3 Normalizing Performance Measurements The instruction count, or path length IC is the total number of instructions in an execution trace for a given program. Given the total number of cycles to complete program P for two different machines, if clock rates are identical, it is easy to decide ....

.... demands are high, and cache misses can be expensive (a stall cycle can represent several lost issue opportunities) Smith, Johnson, and Horowitz identify instruction fetch performance as more critical than instruction parallelism in limiting superscalar performance on non scientific programs [71]. The traffic reduction and increased cache performance provided by 16 bit instructions could help. Like VLIWs, the increase in antidependencies arising from the small register name space could limit parallelism, but register renaming can help mitigate this problem. Machines with deep pipelines ....

Michael D. Smith, Mike Johnson, and Mark A. Horowitz. Limits on multiple instruction issue. ACM SIGPLAN Notices, Proceedings ASPLOS-III, 24:290--301, May 1989. 173


Boosting Beyond Static Scheduling in a Superscalar Processor - Smith, Lam, Horowitz (1990)   (63 citations)  Self-citation (Smith Horowitz)   (Correct)

....Another effect of using hardware to detect instruction level parallelism is that the hardware can only analyze a small window of dynamic instructions during each cycle, thus limiting the possible candidates for parallel issue. Finally, instruction fetch efficiency, defined in Smith et al. [21] as the average number of useful instructions fetched per cycle, is reduced when executing from scalar object code. As a result of the large number of branches during the execution of nonnumerical code, dynamic schedulers suffer a significant performance penalty due to branch point misalignment in ....

....reorder buffer space in MATCH and enough general purpose registers in TORCH. Finally, we assumed a small number of functional units since the fetch efficiency and not the number of functional units is the limiting factor when exploiting instruction level parallelism in non numerical applications [21]. As a result, we maximize the functional unit cost performance tradeoff by making the load store pipe, our most expensive functional unit to duplicate, the most frequently used resource. With one of each functional unit, the integer ALU is the most frequently used functional unit. By adding an ....

M.D. Smith, M. Johnson, M.A. Horowitz, "Limits on Multiple Instruction Issue." Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems (April 1989), pp. 290-302.


Efficient Superscalar Performance Through Boosting - Michael Smith Mark (1992)   (48 citations)  Self-citation (Smith Horowitz)   (Correct)

....to exploit ILP. In non numerical applications, the amount of ILP is limited. Limit studies, studies which try to bound the amount of exploitable ILP in applications, show that superscalar processors must look beyond branch boundaries to exploit the available ILP in non numerical applications [22][29] These studies show that good performance requires both a good instruction schedule and speculative execution, the execution of instructions before it is known for certain whether those instructions will be executed. What is not known is how to best schedule instructions for a superscalar ....

M.D. Smith, M. Johnson, and M.A. Horowitz. Limits on Multiple Instruction Issue. In Proc. Third Int. Conf. on Architectural Support for Programming Languages and Operating Systems, pp. 290--302, April 1989.


Energy Dissipation In General Purpose Microprocessors - Gonzalez, Horowitz (1996)   (81 citations)  Self-citation (Horowitz)   (Correct)

No context found.

M. D. Smith, M. Johnson, and M. Horowitz, "Limits on multiple instruction issue," in Int. Symp. Computer Architecture, Boston, MA, Apr. 1989, pp. 290--302.


Dataflow: A Complement to Superscalar - Budiu, Artigas, Goldstein (2005)   (Correct)

No context found.

M. D. Smith, M. Johnson, et al. Limits on multiple instruction issue. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) , pages 290--302. ACM Press, 1989.


Characterizing a New Class of Threads in Scientific .. - Rodrigues, Murphy, ..   (Correct)

No context found.

M. D. Smith, M. Johnson, and M. A. Horowitz. Limits on multiple instruction issue. In Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating System (ASPLOS), volume 24, pages 290--302, New York, NY, 1989. ACM Press.


Journal of Instruction-Level Parallelism 6 (2004) 1-23.. - Ai Access Foundation (2004)   (Correct)

No context found.

M. D. Smith, M. Johnson, and M. A. Horowitz. "Limits on multiple instruction issue." In Proc. of the 3rd Intl. Conf. on Architectural Support for Programming Languages and Operating System, pages 290--302, 1989.


Adaptive Explicitly Parallel Instruction Computing - Surendranath Talla Of (2000)   (4 citations)  (Correct)

No context found.

M. Smith, M. Johnson, and M. Horowitz. Limits on multiple instruction issue, 1989.


Spatial Computation - Budiu (2003)   (Correct)

No context found.

M. D. Smith, M. Johnson, and M. A. Horowitz. Limits on multiple instruction issue. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 290--302. ACM Press, 1989.


Constraint Graph Analysis of Multithreaded Programs - Cain, al. (2004)   (Correct)

No context found.

M. D. Smith, M. Johnson, and M. A. Horowitz. "Limits on multiple instruction issue." In Proc. of the 3rd Intl. Conf. on Architectural Support for Programming Languages and Operating System, pages 290--302, 1989.


Trap-driven Memory Simulation - Uhlig (1995)   (2 citations)  (Correct)

No context found.

Smith, M. D., Johnson, M. and Horowitz, M. A. Limits on multiple instruction Issue. In Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems, Boston, Massachusetts, ACM, 290-302, 1989.


Limits and Graph Structure of Available Instruction-Level.. - Stefanovic, Martonosi (2001)   (1 citation)  (Correct)

No context found.

M. D. Smith, M. Johnson, and M. A. Horowitz. Limits on multiple instruction issue. In ASPLOS III, pages 290--302, Boston, Massachusetts, 1989.


Speculative Execution based on Value Prediction - Gabbay (1996)   (98 citations)  (Correct)

No context found.

M. D. Smith, M. Johnson and M. A. Horowitz. Limits on Multiple Instruction Issue. Proceeding of the 3 rd International Conference on Architectural Support for Programming Languages and Operating System. April, 1989, pp. 290-302.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC