• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 1,168
Next 10 →

Dynamo: A Transparent Dynamic Optimization System

by Vasanth Bala, Evelyn Duesterwald , Sanjeev Banerjia - ACM SIGPLAN NOTICES , 2000
"... We describe the design and implementation of Dynamo, a software dynamic optimization system that is capable of transparently improving the performance of a native instruction stream as it executes on the processor. The input native instruction stream to Dynamo can be dynamically generated (by a JIT ..."
Abstract - Cited by 479 (2 self) - Add to MetaCart
native binaries can be accelerated Dynamo, and often by a significant degree. For example, the average performance of --O optimized SpecInt95 benchmark binaries created by the HP product C compiler is improved to a level comparable to their --O4 optimized version running without Dynamo. Dynamo achieves

Slipstream processors: improving both performance and fault tolerance

by Karthik Sundaramoorthy, Zach Purser, Eric Rotenberg - In Proceedings of the ninth international conference on Architectural
"... Processors execute the full dynamic instruction stream to arrive at the final output of a program, yet there exist shorter instruction streams that produce the same overall effect. We propose creating a shorter but otherwise equivalent version of the original program by removing ineffectual computat ..."
Abstract - Cited by 187 (6 self) - Add to MetaCart
. Detailed simulations of an example implementation show an average improvement of 7 % for the SPEC95 integer benchmarks. 2) Fault tolerance. The shorter program is a subset of the full program and this partial-redundancy is transparently leveraged for detecting and recovering from transient hardware faults

A New Efficient Algorithm for Computing Gröbner Bases (F4)

by Jean-charles Faugère - IN: ISSAC ’02: PROCEEDINGS OF THE 2002 INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND ALGEBRAIC COMPUTATION , 2002
"... This paper introduces a new efficient algorithm for computing Gröbner bases. To avoid as much as possible intermediate computation, the algorithm computes successive truncated Gröbner bases and it replaces the classical polynomial reduction found in the Buchberger algorithm by the simultaneous reduc ..."
Abstract - Cited by 365 (57 self) - Add to MetaCart
. Some previously untractable problems (Cyclic 9) are presented as well as an empirical comparison of a first implementation of this algorithm with other well known programs. This comparison pays careful attention to methodology issues. All the benchmarks and CPU times used in this paper are frequently

Efficient Path Profiling

by Thomas Ball, James R. Larus - In Proceedings of the 29th Annual International Symposium on Microarchitecture , 1996
"... A path profile determines how many times each acyclic path in a routine executes. This type of profiling subsumes the more common basic block and edge profiling, which only approximate path frequencies. Path profiles have many potential uses in program performance tuning, profile-directed compilatio ..."
Abstract - Cited by 287 (8 self) - Add to MetaCart
benchmarks, path profiling overhead averaged 31%, as compared to 16 % for efficient edge profiling. Path profiling also identifies longer paths than a previous technique, which predicted paths from edge profiles (average of 88, versus 34 instructions). Moreover, profiling shows that the SPEC95 train input

A Trace Cache Microarchitecture and Evaluation

by Eric Rotenberg, Steve Bennett, James E. Smith - IEEE Transactions on Computers , 1999
"... As the instruction issue width of superscalar proces-sors increases, instruction fetch bandwidth requirements will also increase. It will eventually become necessary to fetch multiple basic blocks per clock cycle. Conventional in-struction caches hinder this effort because long instruction sequences ..."
Abstract - Cited by 55 (3 self) - Add to MetaCart
. The microarchitecture provides high instruc-tion fetch bandwidth with low latency by explicitly sequenc-ing through the program at the higher level of traces, both in terms of (1) control flow prediction and (2) instruction supply. For the SPEC95 integer benchmarks, trace-level se-quencing improves performance from 15

Vector Microprocessors for Desktop Computing

by Mark G. Stoodley, Corinna G. Lee , 1999
"... Desktop workloads are expected to shift over the next few years to become increasingly mediacentric. These multimedia applications require much larger computational demands than current desktop processors can provide. In this paper, we describe four major requirements that we believe any effective ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
applications and the SPEC95 integer benchmarks are either highly interactive or contain little ...

Exploiting hardware performance counters with flow and context sensitive profiling

by Glenn Ammons, Thomas Ball, James R. Larus - ACM Sigplan Notices , 1997
"... A program pro le attributes run-time costs to portions of a program's execution. Most pro ling systems su er from two major de ciencies: rst, they only apportion simple metrics, such as execution frequency or elapsed time to static, syntactic units, such as procedures or statements; second, the ..."
Abstract - Cited by 254 (9 self) - Add to MetaCart
ciently captures calling contexts for procedure-level measurements. Our measurements show that the SPEC95 benchmarks execute a small number (3{28) of hot paths that account for 9{98 % of their L1 data cache misses. Moreover, these hot paths are concentrated in a few routines, which have complex dynamic

Control Independence in Trace Processors

by Eric Rotenberg, James E. Smith - IN PROC. 32ND INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE , 1999
"... Branch mispredictions are a major obstacle to exploiting instruction-level parallelism, at least in part because all instructions after a mispredicted branch are squashed. However, instructions that are control independent of the branch must be fetched regardless of the branch outcome, and do not ne ..."
Abstract - Cited by 26 (0 self) - Add to MetaCart
speculation support is easily leveraged to selectively re-execute incorrect-data dependent, control independent instructions. Control independence improves trace processor performance from 2 % to 25%, and 13 % on average, for the SPEC95 integer benchmarks.

Dynamic Branch Decoupled Architecture

by Akhilesh Tyagi, Hon-Chi Ng, Prasant Mohapatra - 1999 IEEE International Conference on Computer Design , 1999
"... We propose an alternative approach to branch resolution based on the earlier work on decoupled memory architectures. Branch decoupling is a technique to decouple a single instruction stream program into two streams. One stream is solely dedicated to resolving branches as early as possible (both the ..."
Abstract - Cited by 8 (1 self) - Add to MetaCart
speedup of 25:6% for SPEC95 integer benchmarks and 6:1% for SPEC95 FP benchmarks over a 2-level adaptive branch predictor. The average number of branch penalty cycles per instruction for DBD reduces to :0475 compared to :0835 for the 2-level branch predictor. 1 Introduction The instruction

Performance of Natural I/O Applications

by Stevan Vlaovic , Rich Uhlig - IN PROCEEDINGS OF 2 ND ANNUAL WORKSHOP ON WORKLOAD CHARACTERIZATION , 1999
"... In this paper we investigate the properties of Natural I/O applications and compare them to five SPEC95 integer benchmarks. Natural I/O is a type of input into the system; unlike the standard keyboard or some digitized form, the input can be composed of a user's handwriting, speech, gestures, e ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
In this paper we investigate the properties of Natural I/O applications and compare them to five SPEC95 integer benchmarks. Natural I/O is a type of input into the system; unlike the standard keyboard or some digitized form, the input can be composed of a user's handwriting, speech, gestures
Next 10 →
Results 1 - 10 of 1,168
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University