This directory is created automatically and some papers may be mislabeled. Only document within the CiteSeer database are listed. The directory is intended to provide entry points for browsing the database and is not intended to be authoritative. Papers may not appear in all relevant categories. For example, papers in a sub-category may not appear in higher level categories.
359 Intelligence Without Reason - Brooks (1991)(Correct)
Computers and Thought are the two categories that together define Artificial Intelligence as a discipline. It is generally accepted that work in Artificial Intelligence over the last thirty years has ... / influence on aspects of computer architectures. In this paper we also
255 Foundations for the Study of Software Architecture - Perry, Wolf (1992)(Correct)
The purpose of this paper is to build the foundation for software architecture. We first develop an intuition for software architecture by appealing to several wellestablished architectural discipline... / . . Computing Hardware Architecture There are several
232 Tempest and Typhoon: User-Level Shared Memory - Reinhardt, Larus, Wood (1994)(Correct)
Future parallel computers must efficiently execute not
only hand-coded applications but also programs written in
high-level, parallel programming languages. Today's
machines limit these programs to a ... / International Symposium on Computer Architecture April . dynamic
219 The NAS Parallel Benchmarks - Bailey, Barszcz, Barton, Browning.. (1994)(Correct)
A new set of benchmarks has been developed for the performance
evaluation of highly parallel supercomputers. These benchmarks consist
of five "parallel kernel" benchmarks and three "simulated applicat... / program to a new parallel computer architecture requires a major effort
184 Weak Ordering - A New Definition - Adve (1990)(Correct)
A memory model for a shared memory, multiprocessor
commonly and often implicitly assumed by programmers
is that of sequential consistency. This model
guarantees that all memory accesses will appear to... / International Symposium on Computer Architecture Honolulu Hawaii June
184 Compiler Transformations for High-Performance Computing - Bacon (1993)(Correct)
In the last three decades a large number of compiler transformations for optimizing programs have been implemented. Most optimizations for uniprocessors reduce the number of instructions executed by t... / to be familiar with modern computer architecture and basic program
165 Multiscalar Processors - Sohi (1995)(Correct)
Multiscalar processors use a new, aggressive implementation
paradigm for extracting large quantities of instruction
level parallelism from ordinary high level language programs.
A single program is di... / International Symposium on Computer Architecture pp. - May . br programs and the hardware architecture of a multiscalar
142 List Processing in Real Time on a Serial Computer - Baker, Jr. (1978)(Correct)
ing with credit is permitted. To copy otherwise, to republish, to
post on servers, to redistribute to lists, or to use any component of this
work in other works whether directly or by incorporation vi... / classical Von Neumann serial computer architecture with real memory in this
131 Type Inclusion Constraints and Type Inference - Aiken, Wimmers (1993)(Correct)
We present a general algorithm for solving systems of inclusion constraints over type expressions.
The constraint language includes function types, constructor types, and liberal intersection and unio... / Programming Languages and Computer Architecture pages - August
130 An Evaluation of Directory Schemes for Cache Coherence - Agarwal, al. (1988)(Correct)
The problem of cache coherence in shared-memory multiprocessors has been addressed using two basic approaches: directory schemes and snoopy cache schemes. Directory schemes have been given less attent... / International Symposiumn on Computer Architecture ISCA Honolulu
129 Limits of Control Flow on Parallelism - Lam, Wilson (1992)(Correct)
This paper discusses three techniques useful in relaxing the
constraints imposed by control flow on parallelism: control
dependence analysis, executing multiple flows of control simultaneously,
and sp... / International Symposium on Computer Architecture Gold Coast Australia
121 The Turn Model for Adaptive Routing - Glass, Ni (1992)(Correct)
We present a model for designing wormhole routing algorithms that are deadlock
free, livelock free, minimal or nonminimal, and maximally adaptive. A unique feature
of this model is that it is not base... / International Symposium on Computer Architecture May . The Turn
112 A Short Cut to Deforestation - Gill, Launchbury, Jones (1993)(Correct)
Lists are often used as "glue" to connect separate parts of a program together. We propose an automatic technique for improving the efficiency of such programs, by removing many of these intermediate ... / Functional Programming and Computer Architecture London MacQueen ed.
104 A Unified Formalization of Four Shared-Memory Models - Adve (1993)(Correct)
This paper presents a shared-memory model, data-race-free-1, that unifies four earlier models: weak ordering,
release consistency (with sequentially consistent special operations), the VAX memory mode... / International Symposium on Computer Architecture June The
92 Direct Bulk-Synchronous Parallel Algorithms - Gerbessiotis, Valiant (1992)(Correct)
We describe a methodology for constructing parallel algorithms that are
transportable among parallel computers having different numbers of processors,
different bandwidths of interprocessor communicat... / motivation one relating to computer architecture. A currently unresolved
91 A Fold for All Seasons - Sheard, Fegaras (1993)(Correct)
Generic control operators, such as fold, can be generated from algebraic type definitions. The class of types to which these techniques are applicable is generalized to all algebraic types definable i... / Programming Languages and Computer Architecture Cambridge
90 Maintaining Strong Cache Consistency in the World-Wide Web - Pei Cao (1998)(Correct)
As the Web continues to explode in size, caching becomes
increasingly important. With caching comes
the problem of cache consistency. Conventional wisdom
holds that strong cache consistency is too exp... / been studied extensively in computer architecture distributed shared
87 Complexity-Effective Superscalar Processors - Palacharla (1998)(Correct)
The performance trade-off between hardware complexity and clock speed in the design of superscalar microarchitectures is first investigated. Using the results of this trade-off analysis, the thesis pr... / about circuits and computer architecture with him. His advice and
85 PixelFlow: High-Speed Rendering Using Image Composition - Molnar, Eyles, Poulton (1992)(Correct)
We describe PixelFlow, an architecture for high-speed image
generation that overcomes the transformation- and frame-buffer--
access bottlenecks of conventional hardware rendering architectures.
PixelF... / th ACM-IEEE Symposium on Computer Architecture April pp. - . br I. . Computer Graphics Hardware Architecture I. . Computer
82 Supporting Systolic and Memory Communication in iWarp - Borkar (1990)(Correct)
TM general computing; whereas systolic communication is iWarp is a parallel architecture developed jointly by efficient and well suited for speed critical applications. Carnegie Mellon University and ... / International Symposium on Computer Architecture Seattle Washington May
78 Projections for Strictness Analysis - Wadler, Hughes (1987)(Correct)
Contexts have been proposed as a means of performing strictness analysis on non-flat domains. Roughly speaking, a context describes how much a sub-expression will be evaluated by the surrounding progr... / Programming Languages and Computer Architecture Portland Oregon
71 Memory Bandwidth Limitations of Future Microprocessors - Burger (1996)(Correct)
This paper makes the case that pin bandwidth will be a critical consideration for future microprocessors. We show that many of the techniques used to tolerate growing memory latencies do so at the exp... / International Symposium on Computer Architecture May . Reprinted by
69 Lambda Lifting: Transforming Programs to Recursive Equations - Johnsson (1985)(Correct)
Lambda lifting is a technique for transforming a functional program with local function definitions, possibly with free variables in the function definitions, into a program consisting only of global ... / Programming Languages and Computer Architecture Lecture Notes in
68 Supercomputer Performance Evaluation and the Perfect Benchmarks - Cybenko (1990)(Correct)
In the past three years, the Perfect Benchmark
TM
Suite has evolved
from a supercomputer performance evaluation plan, presented by Kuck
and Sameh at the 1987 International Conference on Supercomputi... / al.Benchmarking advanced architecture computers Tech. Rep. C P
68 Software Versus Hardware Shared-Memory Implementation: A Case Study - Cox (1994)(Correct)
We compare the performance of software-supported shared memory on a general-purpose network to hardware-supported shared memory on a dedicated interconnect. Up to eight processors, our results are bas... / International Symposium on Computer Architecture pages - June
67 STiNG: A CC-NUMA Computer System for the Commercial Marketplace - Lovett (1996)(Correct)
STiNG" is a Cache Coherent Non-Uniform Memory Access
(CC-NUMA) Multiprocessor designed and built by Sequent Computer
Systems, Inc. It combines four processor Symmetric Multiprocessor
(SMP) nodes (cal... / International Symposium on Computer Architecture May . Page
65 Baring it all to Software: The Raw Machine - Waingold, Taylor, Sarkar, Lee, Lee.. (1997)(Correct)
Rapid advances in technology force a quest for computer architectures that exploit new opportunities and shed existing mechanisms that do not scale. Current architectures, such as hardware scheduled s... / force a quest for computer architectures that exploit new br the low-level details of the hardware architecture to the compiler so that
63 Dynamic Instruction Reuse - Sodani, Sohi (1997)(Correct)
This paper introduces the concept of dynamic instruction reuse.
Empirical observations suggest that many instructions, and
groups of instructions, having the same inputs, are executed
dynamically. Suc... / International Symposium on Computer Architecture ISCA June
63 The Gamma Database Machine Project - DeWitt, Ghandeharizadeh, Schneider.. (1990)(Correct)
This paper describes the design of the Gamma database machine and the techniques employed in its implementation. Gamma is a relational database machine currently operating on an Intel iPSC/2 hypercube... / and can thus like the hardware architecture be scaled almost
59 Real Time Compression of Triangle Mesh Connectivity - Gumhold, Straßer(Correct)
In this paper we introduce a new compressed representation for the connectivity of a triangle mesh. We present local compression and decompression algorithms which are fast enough for real time applic... / I. . Computer Graphics Hardware Architecture I. . Computer
56 Efficient Type Inference for Higher-Order Binding-Time Analysis - Henglein (1991)(Correct)
Binding-time analysis determines when variables and expressions in a program can be
bound to their values, distinguishing between early (compile-time) and late (run-time) binding.
Binding-time informa... / Programming Languages and Computer Architecture FPCA Cambridge
54 Deadlock-Free Multicast Wormhole Routing in Multicomputer Networks - Lin (1991)(Correct)
Efficient routing of messages is the key to the performance of multicomputers. Multicast communication refers to the delivery of the same message from a source node to an arbitrary number of destinati... / International Symposium on Computer Architecture Toronto Canada May
54 Geometric Compression Through Topological Surgery - Taubin, Rossignac (1996)(Correct)
In this paper we introduce a new compressed representation for
polyhedral models and associated compression and decompression
algorithms. Such a compressed representation significantly reduces
the tim... / I. . Computer Graphics Hardware Architecture I. . Com- puter
53 Microkernels Meet Recursive Virtual Machines - Ford (1996)(Correct)
This paper describes a novel approach to providingmodular and extensible operating system functionality and encapsulated environments based on a synthesis of microkernel and virtual machine concepts. ... / on and export existing hardware architectures so they can support
53 Design Tradeoffs for Software-Managed TLBs - Richard Uhlig (1993)(Correct)
this paper appeared in the Proceedings of the 20th Annual International Symposium on Computer
Architecture, San Diego, May 1993.
Authors' address: Department of Electrical Engineering and Computer Sci... / an increasing number of computer architectures including the AMD
53 Replacement policies for a proxy cache - Rizzo, Vicisano (1998)(Correct)
In this paper we analyse access traces to a Web proxy, looking at statistical parameters to be used
in the design of a replacement policy for documents held in the cache. In the first part of the pape... / studied examples from computer architecture. Other popular examples
51 Attribute Grammars as a Functional Programming Paradigm - Johnsson (1987)(Correct)
The purpose of this paper is twofold. Firstly we show how attributes in an
attribute grammar can be simply and efficiently evaluated using a lazy functional
language. The class of attribute grammars w... / Programming Languages and Computer Architecture Nancy France .
49 Once Upon a Type - Turner, Wadler, Mossin (1995)(Correct)
A number of useful optimisations are enabled if we can determine when a value is accessed at most once. We extend the Hindley-Milner type system with uses, yielding a type-inference based program anal... / Functional Programming and Computer Architecture San Diego California
49 Efficient Exact Arithmetic for Computational Geometry - Fortune, Van Wyk (1993)(Correct)
We experiment with exact integer arithmetic to implement
primitives for geometric algorithms. Naive use
of exact arithmetic---either modular or multiprecision
integer---increases execution time dramat... / length is defined by the computer architecture typical -bit integer
48 The Design of Nectar: A Network Backplane for Heterogeneous.. - Arnould (1989)(Correct)
Nectar is a "network backplane" for use in heterogeneous
multicomputers. The initial system consists of a starshaped
fiber-optic network with an aggregate bandwidth
of 1.6 gigabits/second and a switch... / The Nectar network computer architecture project attacks the
46 A Standard ML Compiler - Appel, MacQueen (1987)(Correct)
Standard ML is a major revision of earlier dialects of the functional language
ML. We describe the first compiler written for Standard ML in Standard ML.
The compiler incorporates a number of novel fe... / Programming Languages and Computer Architecture LNCS Vol ed. J. P.
46 Cache Miss Equations: An Analytical Representation of Cache Misses - Ghosh (1997)(Correct)
With the widening performance gap between processors and main memory, efficient memory referencing behavior is necessary for good program performance. Both hand-tuning and compiler optimization techni... / between Compilers and Computer Architectures Third International
45 An Argument for Simple COMA - Ashley Saulsbury (1995)(Correct)
We present design details and some initial performance
results of a novel scalable shared memory
multiprocessor architecture. This architecture features
the automatic data migration and replication ca... / International Symposium on Computer Architecture pages - June br that allow a simpler hardware architecture than COMA or CC-NUMA but
45 Transactional Memory: Architectural Support for Lock-Free Data.. - Herlihy, Moss (1993)(Correct)
A shared data structure is lock-free if its operations do not
require mutual exclusion. If one process is interrupted in
the middle of an operation, other processes will not be
prevented from operatin... / International Symposium on Computer Architecture pages - June br Sohi who propose a hardware architecture that optimistically
45 A Note on Distributed Computing - Waldo (1994)(Correct)
We argue that objects that interact in a distributed system need to be dealt with in ways that are intrinsically different from objects that interact in a single address space. These differences are... / object does not know the hardware architecture on which the recipient of
45 Can Logic Programming Execute as Fast as Imperative Programming?.. - Van Roy (1990)(Correct)
Bibliographic references of "Can Logic Programming Execute as Fast as Imperative Programming?", Van Roy unknown
170
79. P. Voda, Trilogy version 1.0, Complete Logic Systems, Inc, September 1987.
80. ... / International Symposium on Computer Architecture June . . T. P. br Nakashima and K. Nakajima Hardware Architecture of the Sequential
44 For a Better Support of Static Data Flow - Consel, Danvy(Correct)
This paper identifies and solves a class of problems that arise
in binding time analysis and more generally in partial evaluation of programs:
the approximation and loss of static information due to... / Programming Languages and Computer Architecture volume of Lecture
43 Quantifying Behavioral Differences Between C and C++ Programs - Calder (1994)(Correct)
Improving the performance of C programs has been a topic of great interest for many years. Both hardware technology and compiler optimization research has been applied in an effort to make C programs ... / The design of computer architecture is typically driven by the
41 MGS: A Multigrain Shared Memory System - Yeung (1996)(Correct)
Parallel workstations, each comprising 10-100 processors, promise cost-effective general-purpose multiprocessing. This paper explores the coupling of such small- to medium-scale shared memory multipro... / International Symposium on Computer Architecture May . MGS A
41 An Object-Oriented Concurrent Reflective Language for Dynamic.. - Masuhara (1994)(Correct)
this paper proposes an object-oriented concurrent reflective language unknown (in IPSJ SIG Notes, 94-PRG-18, pp.57--64, 1994)
23567?;!ABD9-E4?=@?
:/C,35GKIFHJ81.0
NP)R
SWX"
ZOO(`[]Y+VT+LQ
nlpl~aszhiy... / to the application and or hardware architecture for efficient execution.
40 Minimizing Register Requirements under Resource-Constrained.. - Govindarajan, Altman, Gao (1994)(Correct)
In this paper we address the following software pipelining problem: given a loop and a machine architecture with a fixed number of processor resources (e.g. function units), how can one construct a so... / Today rapid advances in computer architecture -hardware and software br advances in computer architecture -hardware and software technology
40 DOCTOR: An IntegrateD SOftware Fault InjeCTiOn EnviRonment - Han, Rosenberg, Shin (1995)(Correct)
This paper presents an integrateD sO ftware fault injeC T iO n enviR onment (DOCTOR) which is capable of injecting various types of faults with different options, automatically collecting performance ... / complexity of contemporary computer architectures and the high-degree of br on the underlying hardware architecture and operating system a
39 Dynamic Dependency Analysis of Ordinary Programs - Austin (1992)(Correct)
A quantitative analysis of program execution is essential
to the computer architecture design process. With the current
trend in architecture of enhancing the performance of
uniprocessors by exploitin... / is essential to the computer architecture design process. With the
38 Embra: Fast and Flexible Machine Simulation - Witchel, Rosenblum (1996)(Correct)
This paper describes Embra, a simulator for the processors, caches, and memory systems of uniprocessors and cache-coherent multiprocessors. When running as part of the SimOS simulation environment, Em... / as well as for studies of computer architecture. In this capacity it has
38 A Comparison of Adaptive Wormhole Routing Algorithms - Boppana, Chalasani (1993)(Correct)
Improvement of message latency and network utilization in torus interconnection networks by increasing adaptivity in wormhole routing algorithms is studied. A recently proposed partially adaptive al... / th Annual Int'l Symp. on Computer Architecture A Comparison of
38 Speculative Versioning Cache - Gopal (1998)(Correct)
Dependences among loads and stores whose addresses
are unknown hinder the extraction of instruction level parallelism
during the execution of a sequential program. Such
ambiguous memory dependences ca... / on High-Performance Computer Architecture. control units
37 Detection and Exploitation of File Working Sets - Tait (1991)(Correct)
The work habits of most individuals yield file access patterns that are quite
pronounced and can be regarded as defining working sets of files used for particular
applications. This paper describes a ... / is an old idea in computer architecture. Prepaging has not had
37 Data Access Microarchitectures for Superscalar Processors with.. - Chen (1991)(Correct)
The performance of superscalar processors is more sensitive to the memory system delay than their single-issue predecessors. This paper examines alternative data access microarchitectures that effecti... / th Ann. Int'l Symp. Computer Architecture Toronto Canada pp. br when dealing with the hardware architecture. First addresses for the
34 An Empirical Comparison of the Kendall Square Research KSR-1 and.. - Singh, Joe, Gupta, Hennessy (1993)(Correct)
Two interesting variants of large-scale shared-addressspace
parallel architectures are cache-coherent non-uniformmemory
-access machines (CC-NUMA) and cache-only
memory architectures (COMA). Both have... / International Symposium on Computer Architecture pages - May br to refer to its COMA architecture. hardware at the relatively fine
33 The J-Machine Multicomputer: An Architectural Evaluation - Noakes (1993)(Correct)
The MIT J-Machine multicomputer has been constructed to study the role of a set of primitive mechanisms in providing efficient support for parallel computing. Each J-Machine node consists of an integr... / International Symposium on Computer Architecture May . The J-Machine
33 FFTW: An Adaptive Software Architecture For The FFT - Frigo, Johnson (1998)(Correct)
FFT literature has been mostly concerned with minimizing the number of floating-point operations performed by an algorithm. Unfortunately, on present-day microprocessors this measure is far less impor... / must know the details of a computer architecture in order to design a fast
33 Distributed Loop Computer Networks: A Survey - Bermond, Comellas, Hsu (1995)(Correct)
Distributed loop computer networks are extensions of the ring networks and are widely
used in the design and implementation of local area networks and parallel processing
architectures. We give a surv... / For general reference on computer architecture and parallel processing
33 Tradeoffs in Supporting Two Page Sizes - Talluri, Kong, Hill, Patterson (1992)(Correct)
As computer system main memories get larger and processor
cycles-per-instruction (CPIs) get smaller, the time spent
in handling translation lookaside buffer (TLB) misses
could become a performance bot... / International Symposium on Computer Architecture June . Tradeoffs
31 Extending SUIF for Machine-dependent Optimizations - Smith (1996)(Correct)
This paper describes a set of modifications and extensions to the base SUIF library that provide the abstractions necessary for machine-dependent optimizations such as global instruction scheduling. W... / of a research program in computer architecture and machine-dependent
31 Selective Value Prediction - Calder, Reinman, Tullsen (1998)(Correct)
Value Prediction is a relatively new technique to increase instruction-level parallelism by breaking true data dependence chains. A value prediction architecture produces values, which may be later co... / International Symposium on Computer Architecture May . Selective
31 Dynamic Self-Invalidation: Reducing Coherence Overhead in.. - Lebeck, Wood (1995)(Correct)
This paper introduces dynamic self-invalidation (DSI),
a new technique for reducing cache coherence overhead in
shared-memory multiprocessors. DSI eliminates invalidation
messages by having a processo... / International Symposium on Computer Architecture Reprinted by permission
30 A Tractable Scheme Implementation - Kelsey, Rees (1993)(Correct)
Scheme 48 is an implementation of the Scheme programming language constructed
with tractability and reliability as its primary design goals. It has the structural
properties of large, compiler-based... / or with other details of hardware architecture. The virtual machine
29 GRASP - A New Search Algorithm for Satisfiability - Silva, Sakallah (1996)(Correct)
This report introduces GRASP (Generic seaRch Algorithm for the Satisfiability Problem), an integrated algorithmic framework
for SAT that unifies several previously proposed search-pruning techniques a... / Karem A. Sakallah Advanced Computer Architecture Laboratory Department of
29 Multi-Protocol Active Messages on a Cluster of SMP's - Lumetta, Mainwaring, Culler (1997)(Correct)
Clusters of multiprocessors, or Clumps, promise to be
the supercomputers of the future, but obtaining high
performance on these architectures requires an understanding
of interactions between the mult... / International Symposium on Computer Architecture Gold Coast Qld. br operates. . Hardware Architecture The experimental
29 Control Flow Speculation in Multiscalar Processors - Jacobson, al. (1997)(Correct)
The Multiscalar architecture executes a single
sequential program following multiple flows of control. In
the Multiscalar hardware, a global sequencer, with help
from the compiler, takes large steps t... / on High Performance Computer Architecture February - in San
28 Architecture Validation for Processors - Ho (1995)(Correct)
Modern, high performance microprocessors are extremely
complex machines which require substantial validation effort to
ensure functional correctness prior to tapeout. Generating the
corner cases to te... / International Symposium on Computer Architecture June . Abstract
27 A Performance Study of Memory Consistency Models - Zucker, Baer (1992)(Correct)
Recent advances in technology are such that the speed of processors is increasing faster than memory
latency is decreasing. Therefore the relative cost of a cache miss is becoming more important. Howe... / International Symposium on Computer Architecture. y Supported by a
27 Implementing Regular Tree Expressions - Aiken, Murphy (1991)(Correct)
Regular tree expressions are a natural formalism for describing the
sets of tree-structured values that commonly arise in programs; thus,
they are well-suited to applications in program analysis. We d... /
26 Concurrent Object-Oriented Programming in Act 1 - Lieberman (1987)(Correct)
this paper will try to accomplish several goals (in parallel):
We will argue that the actor model is an appropriate way to think about parallel computation. Since
many actors may be actively sending o... / versions. As advances in computer architecture and changing economics
26 A Novel Approach Towards Automatic Data Distribution - Jordi Garcia (1995)(Correct)
Data distribution is one of the key aspects that a
parallelizing compiler for a distributed memory architecture should
consider, in order to get efficiency from the system. The cost of
accessing loc... / Ayguad and Jess Labarta Computer Architecture Department Universitat
25 Register Relocation: Flexible Contexts for Multithreading - Waldspurger, Weihl (1993)(Correct)
Multithreading is an important technique that improves
processor utilization by allowing computation
to be overlapped with the long latency operations that
commonly occur in multiprocessor systems. Th... / Patterson and J. Hennessy. Computer Architecture A Quantitative br A. For the conventional hardware architecture with fixed contexts
25 Implementing Haskell overloading - Augustsson (1993)(Correct)
Haskell overloading poses new challenges for compiler writers.
Until recently there have been no implementations of
it which have had acceptable performance; users have been
adviced to avoid it by usi... / Programming Languages and Computer Architecture Copenhagen Denmark
25 Memory-System Design Considerations For Dynamically-Scheduled.. - Farkas (1997)(Correct)
Memory-System Design Considerations for Dynamically-Scheduled Microprocessors Keith Istvan Farkas Doctor of Philosophy Graduate Department of Electrical and Computer Engineering University of Toronto ... / of my work in many areas of computer architecture and I thank Dr. Chow and
24 The Energy Efficiency of IRAM Architectures - Fromm, Perissakis, Cardwell.. (1996)(Correct)
Portable systems demand energy efficiency in order to maximize
battery life. IRAM architectures, which combine DRAM and
a processor on the same chip in a DRAM process, are more energy
efficient than c... / of general interest to the computer architecture community Hence
24 The Impact of Synchronization and Granularity on Parallel Systems - Chen (1990)(Correct)
In this paper, we study the impact of synchronization and granularity on the performance of parallel systems using an execution-driven simulation technique. We find that even though there can be a lot... / and the ELI- . Int. Sym. Computer Architecture - June .
24 Multiple-Block Ahead Branch Predictors - Seznec, Jourdan, Sainrat, Michaud (1996)(Correct)
A basic rule in computer architecture is that a processor
cannot execute an application faster than it fetches
its instructions. This paper presents a novel costeffective
mechanism called the two-bloc... / Abstract A basic rule in computer architecture is that a processor
24 Quickly Generating Billion-Record Synthetic Databases - Gray (1994)(Correct)
Evaluating database system performance often requires generating synthetic databases -- ones having certain statistical properties but filled with dummy information. When evaluating different databa... / First consider parallel computer architecture and the associated br Horst R.Chou T.The Hardware Architecture and Linear Expansion of
23 Counting Networks and Multi-Processor Coordination (Extended Abstract) - Aspnes, al. (1991)(Correct)
James Aspnes
Maurice Herlihy
y
Nir Shavit
z
Digital Equipment Corporation
Cambridge Research Lab
CRL 90/11 September 18, 1991
Abstract
Many fundamental multi-processor coordination problems c... / th Symposium on Computer Architecture June . M. br has been the focus of hardware architecture design
23 Threaded Multiple Path Execution - Wallace, Calder, Tullsen (1998)(Correct)
This paper presents Threaded Multi-Path Execution (TME), which exploits existing hardware on a Simultaneous Multithreading (SMT) processor to speculatively execute multiple paths of execution. When th... / International Symposium on Computer Architecture June . Threaded
23 Memory Latency Effects in Decoupled Architectures with a Single Data.. - Lizyamma Kurian (1994)(Correct)
Decoupled computer architectures partition the
memory access and execute functions in a computer
program and achieve high performance by exploiting
the fine--grain parallelism between the two. These a... / Abstract Decoupled computer architectures partition the memory br Structured Memory Access Architecture Computer Systems Group report
23 Precise Compile-Time Performance Prediction for Superscalar-Based.. - Wang (1994)(Correct)
Optimizing compilers (particularly parallel compilers)
are constrained by their ability to predict performance
consequences of the transformations they apply. Many
factors, such as unknowns in control... / A recent trend in parallel computer architecture is to use superscalar
23 On the Cost-Effectiveness of PRAMs - Abolhassan, Keller, Paul (1991)(Correct)
We introduce a formalism which allows to treat
computer architecture as a formal optimization problem.
We apply this to the design of shared memory
parallel machines. Present computers of this type
su... / J. Paul Institute for Computer Architecture and Parallelism Computer br differences between the hardware architectures of the two classes but
22 Polytypic Pattern Matching - Jeuring (1995)(Correct)
The (exact) pattern matching problem can be informally
specified as follows: given a pattern and a text, find all
occurrences of the pattern in the text. The pattern and
the text may both be lists, or... / Programming Languages and Computer Architecture Cambridge Massachusetts
22 Designing Memory Consistency Models For Shared-Memory Multiprocessors - Adve (1993)(Correct)
The memory consistency model (or memory model) of a shared-memory multiprocessor system influences
both the performance and the programmability of the system. The simplest and most intuitive model for... / models in particular and computer architecture in general. I am br Symposium on Computer Architecture Computer Architecture News
22 Random Access Protocols for High Speed Interprocessor Communication.. - Dowd (1991)(Correct)
This paper examines optical star coupled systems as a means of providing interprocessor communication. In particular, a MIMD (multiple instruction, multiple data) distributed memory parallel computer ... / communication parallel computer architecture computer communication. br parallel computer architecture computer communication.
22 The Impact of Instruction-Level Parallelism on Multiprocessor.. - Pai (1997)(Correct)
Current microprocessors exploit high levels of
instruction-level parallelism (ILP) through techniques
such as multiple issue, dynamic scheduling, and nonblocking
reads. This paper presents the first d... / on High Performance Computer Architecture February - in
21 Storageless Value Prediction Using Prior Register Values - Tullsen (1999)(Correct)
This paper presents a technique called register value
prediction (RVP) which uses a type of locality called
register-value reuse. By predicting that an instruction will
produce the value that is alrea... / International Symposium on Computer Architecture May Storageless
21 Mobile Computation - Cardelli (1996)(Correct)
s to recompile source code.
Techniques have emerged to get some of the advantages of both off-line and on-line
portability, such as just-in-time compilation and run-time linking. But the emphasis is
... / possibly within a different computer architecture. Some RPC systems also
21 Early Experience with Message-Passing on the SHRIMP Multicomputer - Felten, Alpert, Bilas, Blumrich.. (1996)(Correct)
The SHRIMP multicomputer provides virtual
memory-mapped communication (VMMC), which
supports protected, user-level message passing, allows
user programs to perform their own buffer management,
and sep... / the rd Intl. Symposium on Computer Architecture. Abstract The SHRIMP br systems is limited by the hardware architecture. Active Messages is one
21 Hierarchical Z-Buffer Visibility - Greene, Kass, Miller (1993)(Correct)
An ideal visibility algorithm should a) quickly reject most of the
hidden geometry in a model and b) exploit the spatial and perhaps
temporal coherence of the images being generated. Ray casting
with ... / I. . Computer Graphics Hardware Architecture Graphics Processors
20 The nofib Benchmark Suite of Haskell Programs - Partain (1993)(Correct)
This position paper describes the need for, make-up of, and "rules of the
game" for a benchmark suite of Haskell programs. (It does not include
results from running the suite.) Those of us working on ... / standard text on computer architecture is an admirable expos'e of
20 Parametric Feature Detection - Nayar, Baker, Murase (1995)(Correct)
A large number of visual features are parametric in nature, including, edges, lines,
corners, and junctions. We present a general framework for the design and implementation
of detectors for parametri... / detectors and sketch a hardware architecture for a general feature
19 Parallelizing Applications into Silicon - Babb, Rinard, Moritz, Lee, Frank.. (1999)(Correct)
The next decade of computing will be dominated by embedded
systems, information appliances and application-specific
computers. In order to build these systems, designers will
need high-level compilati... / because trends in computer architecture are moving towards more br generation h resulting hardware architecture. is a list of abstract