Home     Top: Architecture    Subtopics:   Clusters   Distributed Architecture   Parallel  

Change ordering:   Authority   Hubs (tutorials)   Date   Expected authority       Show titles only
Ordered by the number of citations

This directory is created automatically and some papers may be mislabeled. Only document within the CiteSeer database are listed. The directory is intended to provide entry points for browsing the database and is not intended to be authoritative. Papers may not appear in all relevant categories. For example, papers in a sub-category may not appear in higher level categories.

534   Active Messages: a Mechanism for Integrated Communication and.. - von Eicken, Culler, Goldstein.. (1992)   (Correct)
The design challenge for large-scale multiprocessors is (1) to minimize communication overhead, (2) allow communication to overlap computation, and (3) coordinate the two without sacrificing processor... / International Symposium on Computer Architecture ACM Press May

359   Intelligence Without Reason - Brooks (1991)   (Correct)
Computers and Thought are the two categories that together define Artificial Intelligence as a discipline. It is generally accepted that work in Artificial Intelligence over the last thirty years has ... / influence on aspects of computer architectures. In this paper we also

255   Foundations for the Study of Software Architecture - Perry, Wolf (1992)   (Correct)
The purpose of this paper is to build the foundation for software architecture. We first develop an intuition for software architecture by appealing to several wellestablished architectural discipline... / . . Computing Hardware Architecture There are several

232   Tempest and Typhoon: User-Level Shared Memory - Reinhardt, Larus, Wood (1994)   (Correct)
Future parallel computers must efficiently execute not only hand-coded applications but also programs written in high-level, parallel programming languages. Today's machines limit these programs to a ... / International Symposium on Computer Architecture April . dynamic

220   Virtual Memory Mapped Network Interface for the SHRIMP Multicomputer - Blumrich, Li, Alpert, Dubnicki.. (1994)   (Correct)
The network interfaces of existing multicomputers require a significant amount of software overhead to provide protection and to implement message passing protocols. This paper describes the design of... / Symposium on Computer Architecture April pp.

219   The NAS Parallel Benchmarks - Bailey, Barszcz, Barton, Browning.. (1994)   (Correct)
A new set of benchmarks has been developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of five "parallel kernel" benchmarks and three "simulated applicat... / program to a new parallel computer architecture requires a major effort

217   Lazy Release Consistency for Software Distributed Shared Memory - Keleher, Cox, Zwaenepoel (1992)   (Correct)
Relaxed memory consistency models, such as release consistency, were introduced in order to reduce the impact of remote memory access latency in both software and hardware distributed shared memory (D... / International Symposium on Computer Architecture pages - May .

184   Weak Ordering - A New Definition - Adve (1990)   (Correct)
A memory model for a shared memory, multiprocessor commonly and often implicitly assumed by programmers is that of sequential consistency. This model guarantees that all memory accesses will appear to... / International Symposium on Computer Architecture Honolulu Hawaii June

184   Compiler Transformations for High-Performance Computing - Bacon (1993)   (Correct)
In the last three decades a large number of compiler transformations for optimizing programs have been implemented. Most optimizations for uniprocessors reduce the number of instructions executed by t... / to be familiar with modern computer architecture and basic program

165   Multiscalar Processors - Sohi (1995)   (Correct)
Multiscalar processors use a new, aggressive implementation paradigm for extracting large quantities of instruction level parallelism from ordinary high level language programs. A single program is di... / International Symposium on Computer Architecture pp. - May . br programs and the hardware architecture of a multiscalar

157   The MIT Alewife Machine: Architecture and Performance - Agarwal, Bianchini, Chaiken, al (1995)   (Correct)
Alewife is a multiprocessor architecture that supports up to 512 processing nodes connected over a scalable and cost-effective mesh network at a constant cost per node. The MIT Alewife machine, a prot... / International Symposium on Computer Architecture pages - June

142   List Processing in Real Time on a Serial Computer - Baker, Jr. (1978)   (Correct)
ing with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works whether directly or by incorporation vi... / classical Von Neumann serial computer architecture with real memory in this

135   Simultaneous Multithreading: Maximizing On-Chip Parallelism - Tullsen, Eggers, Levy (1995)   (Correct)
This paper examines simultaneous multithreading, a technique permitting several independent threads to issue instructions to a superscalar 's multiple functional units in a single cycle. We present se... / International Symposium on Computer Architecture Santa Margherita Ligure

131   Type Inclusion Constraints and Type Inference - Aiken, Wimmers (1993)   (Correct)
We present a general algorithm for solving systems of inclusion constraints over type expressions. The constraint language includes function types, constructor types, and liberal intersection and unio... / Programming Languages and Computer Architecture pages - August

130   An Evaluation of Directory Schemes for Cache Coherence - Agarwal, al. (1988)   (Correct)
The problem of cache coherence in shared-memory multiprocessors has been addressed using two basic approaches: directory schemes and snoopy cache schemes. Directory schemes have been given less attent... / International Symposiumn on Computer Architecture ISCA Honolulu

129   Limits of Control Flow on Parallelism - Lam, Wilson (1992)   (Correct)
This paper discusses three techniques useful in relaxing the constraints imposed by control flow on parallelism: control dependence analysis, executing multiple flows of control simultaneously, and sp... / International Symposium on Computer Architecture Gold Coast Australia

121   The Turn Model for Adaptive Routing - Glass, Ni (1992)   (Correct)
We present a model for designing wormhole routing algorithms that are deadlock free, livelock free, minimal or nonminimal, and maximally adaptive. A unique feature of this model is that it is not base... / International Symposium on Computer Architecture May . The Turn

112   A Short Cut to Deforestation - Gill, Launchbury, Jones (1993)   (Correct)
Lists are often used as "glue" to connect separate parts of a program together. We propose an automatic technique for improving the efficiency of such programs, by removing many of these intermediate ... / Functional Programming and Computer Architecture London MacQueen ed.

110   A Comparison of Dynamic Branch Predictors that use Two Levels of.. - Yeh, Patt (1993)   (Correct)
Recent attention to speculative execution as a mechanism for increasing performance of single instruction streams has demanded substantially better branch prediction than what has been previously avai... / International Symposium on Computer Architecture pp. - May -

110   Parity-Based Loss Recovery for Reliable Multicast Transmission - Nonnenmacher, Biersack, Towsley (1997)   (Correct)
We investigate how FEC (Forward Error Correction) can be combined with ARQ (Automatic Repeat Request) to achieve scalable reliable multicast transmission. We consider the two scenarios where FEC is in... / difficult to implement. The hardware architecture for the RSE coder

109   Exploiting Choice: Instruction Fetch and Issue on an Implementable.. - Tullsen, Eggers, Emer, Levy, Lo.. (1996)   (Correct)
Simultaneous multithreading is a technique that permits multiple independent threads to issue multiple instructions each cycle. In previous work we demonstrated the performance potential of simultaneo... / International Symposium on Computer Architecture Philadelphia PA May br simultaneous multithreading hardware architecture. A Simultaneous

104   A Unified Formalization of Four Shared-Memory Models - Adve (1993)   (Correct)
This paper presents a shared-memory model, data-race-free-1, that unifies four earlier models: weak ordering, release consistency (with sequentially consistent special operations), the VAX memory mode... / International Symposium on Computer Architecture June The

102   The Interaction of Architecture and Operating System Design - Anderson, Levy, Bershad, Lazowska (1991)   (Correct)
Today's high-performance RISC microprocessors have been highly tuned for integer and floating point application performance. These architectures have paid less attention to operating system requiremen... / recent directions in computer architecture and operating systems

98   Alternative Implementations of Two-Level Adaptive Branch Prediction - Yeh, Patt (1992)   (Correct)
As the issue rate and depth of pipelining of high performance Superscalar processors increase, the importance of an excellent branch predictor becomes more vital to delivering the potential performanc... / International Symposium on Computer Architecture pp. - May -

97   Architectural Support for Quality of Service for CORBA Objects - Zinky, Bakken, Schantz (1997)   (Correct)
this paper we discuss four major problems we have observed in our developing and deploying wide-area distributed object applications and middleware. First, most programs are developed ignoring the var... / operating system or computer architecture on which they will

95   The Glasgow Haskell compiler: a technical overview - Jones, Hall, Hammond, Partain, Wadler (1992)   (Correct)
We give an overview of the Glasgow Haskell compiler, focusing especially on way in which we have been able to exploit the rich theory of functional languages to give very practical improvements in the... / The RISC revolution in computer architecture was based partly on the

92   Direct Bulk-Synchronous Parallel Algorithms - Gerbessiotis, Valiant (1992)   (Correct)
We describe a methodology for constructing parallel algorithms that are transportable among parallel computers having different numbers of processors, different bandwidths of interprocessor communicat... / motivation one relating to computer architecture. A currently unresolved

91   IMPACT: An Architectural Framework for Multiple-Instruction-Issue.. - Chang, Mahlke, Chen, Warter, Hwu (1991)   (Correct)
The performance of multiple-instruction-issue processors can be severely limited by the compiler's ability to generate efficient code for concurrent hardware. In the IMPACT project, we have developed ... / International Symposium on Computer Architecture pp. -

91   A Fold for All Seasons - Sheard, Fegaras (1993)   (Correct)
Generic control operators, such as fold, can be generated from algebraic type definitions. The class of types to which these techniques are applicable is generalized to all algebraic types definable i... / Programming Languages and Computer Architecture Cambridge

90   Maintaining Strong Cache Consistency in the World-Wide Web - Pei Cao (1998)   (Correct)
As the Web continues to explode in size, caching becomes increasingly important. With caching comes the problem of cache consistency. Conventional wisdom holds that strong cache consistency is too exp... / been studied extensively in computer architecture distributed shared

88   The DASH Prototype: Logic Overhead and Performance - Lenoski, Laudon, Joe, Nakahira.. (1993)   (Correct)
The fundamental premise behind the DASH project is that it is feasible to build large-scale shared-memory multiprocessors with hardware cache coherence. While paper studies and software simulators ... / International Symposium on Computer Architecture Gold Coast Australia

87   Complexity-Effective Superscalar Processors - Palacharla (1998)   (Correct)
The performance trade-off between hardware complexity and clock speed in the design of superscalar microarchitectures is first investigated. Using the results of this trade-off analysis, the thesis pr... / about circuits and computer architecture with him. His advice and

85   PixelFlow: High-Speed Rendering Using Image Composition - Molnar, Eyles, Poulton (1992)   (Correct)
We describe PixelFlow, an architecture for high-speed image generation that overcomes the transformation- and frame-buffer-- access bottlenecks of conventional hardware rendering architectures. PixelF... / th ACM-IEEE Symposium on Computer Architecture April pp. - . br I. . Computer Graphics Hardware Architecture I. . Computer

82   Supporting Systolic and Memory Communication in iWarp - Borkar (1990)   (Correct)
TM general computing; whereas systolic communication is iWarp is a parallel architecture developed jointly by efficient and well suited for speed critical applications. Carnegie Mellon University and ... / International Symposium on Computer Architecture Seattle Washington May

79   The SPLASH-2 Programs: Characterization and Methodological.. - Steven Cameron (1995)   (Correct)
The SPLASH-2 suite of parallel applications has recently been released to facilitate the study of centralized and distributed sharedaddress -space multiprocessors. In this context, this paper has two ... / International Symposium on Computer Architecture pages - June

78   Projections for Strictness Analysis - Wadler, Hughes (1987)   (Correct)
Contexts have been proposed as a means of performing strictness analysis on non-flat domains. Roughly speaking, a context describes how much a sub-expression will be evaluated by the surrounding progr... / Programming Languages and Computer Architecture Portland Oregon

71   Memory Bandwidth Limitations of Future Microprocessors - Burger (1996)   (Correct)
This paper makes the case that pin bandwidth will be a critical consideration for future microprocessors. We show that many of the techniques used to tolerate growing memory latencies do so at the exp... / International Symposium on Computer Architecture May . Reprinted by

71   Optimization of Instruction Fetch Mechanisms for High Issue Rates - Conte, Menezes, Mills, Patel (1995)   (Correct)
Recent superscalar processors issue four instructions per cycle. These processors are also powered by highly-parallel superscalar cores. The potential performance can only be exploited when fed by hig... / International Symposium on Computer Architecture Santa Margherita

70   Amoeba - A Distributed Operating System for the 1990s - Mullender, van Rossum, Tanenbaum.. (1990)   (Correct)
Amoeba is the distributed system developed at the Free University (VU) and Centre for Mathematics and Computer Science (CWI), both in Amsterdam. Throughout the project's ten-year history, a major conc... / - . . The Amoeba Hardware Architecture The Amoeba hardware

69   Adaptive Software Cache Management for Distributed Shared Memory.. - Bennett, Carter, Zwaenepoel (1990)   (Correct)
An adaptive cache coherence mechanism exploits semantic information about the expected or observed access behavior of particular data objects. We contend that, in distributed shared memory systems, ad... / International Symposium on Computer Architecture pages - June

69   Lambda Lifting: Transforming Programs to Recursive Equations - Johnsson (1985)   (Correct)
Lambda lifting is a technique for transforming a functional program with local function definitions, possibly with free variables in the function definitions, into a program consisting only of global ... / Programming Languages and Computer Architecture Lecture Notes in

69   Maximizing Performance in a Striped Disk Array - Chen, Patterson (1990)   (Correct)
Improvements in disk speeds have not kept up with improvements in processor and memory speeds. One way to correct the resulting speed mismatch is to stripe data across many disks. In this paper, we ... /

68   Supercomputer Performance Evaluation and the Perfect Benchmarks - Cybenko (1990)   (Correct)
In the past three years, the Perfect Benchmark TM Suite has evolved from a supercomputer performance evaluation plan, presented by Kuck and Sameh at the 1987 International Conference on Supercomputi... / al.Benchmarking advanced architecture computers Tech. Rep. C P

68   Software Versus Hardware Shared-Memory Implementation: A Case Study - Cox (1994)   (Correct)
We compare the performance of software-supported shared memory on a general-purpose network to hardware-supported shared memory on a dedicated interconnect. Up to eight processors, our results are bas... / International Symposium on Computer Architecture pages - June

68   Compiler-directed Data Prefetching in Multiprocessors with Memory.. - Edward Gornish (1990)   (Correct)
Memory hierarchies are used by multiprocessor systems to reduce large memory access times. It is necessary to automatically manage such a hierarchy, to obtain effective memory utilization. In this pap... / Tenth Annual Symposium on Computer Architecture . GJG Dennis

67   STiNG: A CC-NUMA Computer System for the Commercial Marketplace - Lovett (1996)   (Correct)
STiNG" is a Cache Coherent Non-Uniform Memory Access (CC-NUMA) Multiprocessor designed and built by Sequent Computer Systems, Inc. It combines four processor Symmetric Multiprocessor (SMP) nodes (cal... / International Symposium on Computer Architecture May . Page

65   Baring it all to Software: The Raw Machine - Waingold, Taylor, Sarkar, Lee, Lee.. (1997)   (Correct)
Rapid advances in technology force a quest for computer architectures that exploit new opportunities and shed existing mechanisms that do not scale. Current architectures, such as hardware scheduled s... / force a quest for computer architectures that exploit new br the low-level details of the hardware architecture to the compiler so that

63   Dynamic Instruction Reuse - Sodani, Sohi (1997)   (Correct)
This paper introduces the concept of dynamic instruction reuse. Empirical observations suggest that many instructions, and groups of instructions, having the same inputs, are executed dynamically. Suc... / International Symposium on Computer Architecture ISCA June

63   The Gamma Database Machine Project - DeWitt, Ghandeharizadeh, Schneider.. (1990)   (Correct)
This paper describes the design of the Gamma database machine and the techniques employed in its implementation. Gamma is a relational database machine currently operating on an Intel iPSC/2 hypercube... / and can thus like the hardware architecture be scaled almost

59   Unboxed values as first class citizens in a non-strict functional.. - Jones, Launchbury (1991)   (Correct)
The code compiled from a non-strict functional program usually manipulates heapallocated boxed numbers. Compilers for such languages often go to considerable trouble to optimise operations on boxed nu... / Programming Languages and Computer Architecture Cambridge Sept .

59   Dynamic Speculation and Synchronization of Data Dependencies - Moshovos, al. (1997)   (Correct)
Data dependence speculation is used in instruction-level parallel (ILP) processors to allow early execution of an instruction before a logically preceding instruction on which it may be data dependent... / International Symposium on Computer Architecture Abstract Data

59   Real Time Compression of Triangle Mesh Connectivity - Gumhold, Straßer   (Correct)
In this paper we introduce a new compressed representation for the connectivity of a triangle mesh. We present local compression and decompression algorithms which are fast enough for real time applic... / I. . Computer Graphics Hardware Architecture I. . Computer

56   Evaluation of Release Consistent Software Distributed Shared Memory.. - Sandhya Dwarkadas (1993)   (Correct)
We evaluate the effect of processor speed, network characteristics, and software overhead on the performance of release-consistent software distributed shared memory. We examine five different protoco... / International Symposium on Computer Architecture pages - May .

56   Efficient Type Inference for Higher-Order Binding-Time Analysis - Henglein (1991)   (Correct)
Binding-time analysis determines when variables and expressions in a program can be bound to their values, distinguishing between early (compile-time) and late (run-time) binding. Binding-time informa... / Programming Languages and Computer Architecture FPCA Cambridge

55   Adaptive Cache Coherency for Detecting Migratory Shared Data - Cox, Fowler (1993)   (Correct)
Parallel programs exhibit a small number of distinct datasharing patterns. A common data-sharing pattern, migratory access, is characterized by exclusive read and write access by one processor at a t... / International Symposium on Computer Architecture pages - May

54   Deadlock-Free Multicast Wormhole Routing in Multicomputer Networks - Lin (1991)   (Correct)
Efficient routing of messages is the key to the performance of multicomputers. Multicast communication refers to the delivery of the same message from a source node to an arbitrary number of destinati... / International Symposium on Computer Architecture Toronto Canada May

54   Geometric Compression Through Topological Surgery - Taubin, Rossignac (1996)   (Correct)
In this paper we introduce a new compressed representation for polyhedral models and associated compression and decompression algorithms. Such a compressed representation significantly reduces the tim... / I. . Computer Graphics Hardware Architecture I. . Com- puter

53   Microkernels Meet Recursive Virtual Machines - Ford (1996)   (Correct)
This paper describes a novel approach to providingmodular and extensible operating system functionality and encapsulated environments based on a synthesis of microkernel and virtual machine concepts. ... / on and export existing hardware architectures so they can support

53   Design Tradeoffs for Software-Managed TLBs - Richard Uhlig (1993)   (Correct)
this paper appeared in the Proceedings of the 20th Annual International Symposium on Computer Architecture, San Diego, May 1993. Authors' address: Department of Electrical Engineering and Computer Sci... / an increasing number of computer architectures including the AMD

53   Replacement policies for a proxy cache - Rizzo, Vicisano (1998)   (Correct)
In this paper we analyse access traces to a Web proxy, looking at statistical parameters to be used in the design of a replacement policy for documents held in the cache. In the first part of the pape... / studied examples from computer architecture. Other popular examples

52   Memory System Characterization of Commercial Workloads - Barroso, Gharachorloo, Bugnion (1998)   (Correct)
Commercial applications such as databases and Web servers constitute the largest and fastest-growing segment of the market for multiprocessor servers. Ongoing innovations in disk subsystems, along wit... / International Symposium on Computer Architecture June . Memory

51   Improving Release-Consistent Shared Virtual Memory using Automatic.. - Iftode (1996)   (Correct)
Shared virtual memory is a software technique to provide shared memory on a network of computers without special hardware support. Although several relaxed consistency models and implementations are q... / th Annual Symposium on Computer Architecture pages - May

51   Attribute Grammars as a Functional Programming Paradigm - Johnsson (1987)   (Correct)
The purpose of this paper is twofold. Firstly we show how attributes in an attribute grammar can be simply and efficiently evaluated using a lazy functional language. The class of attribute grammars w... / Programming Languages and Computer Architecture Nancy France .

50   Mariposa: a wide-area distributed database system - Stonebraker, Aoki, Litwin, Pfeffer.. (1996)   (Correct)
The requirements of wide-area distributed database systems differ dramatically from those of local-area network systems. In a wide-area network (WAN) configuration, individual sites usually report t... / features of a particular hardware architecture. As a result traditional

49   Efficient Support for Irregular Applications on Distributed-Memory.. - Mukherjee, Sharma, Hill, Larus.. (1995)   (Correct)
Irregular computation problems underlie many important scientific applications. Although these problems are computationally expensive, and so would seem appropriate for parallel machines, their irregu... / CM- . Recent research in computer architecture has led to another

49   The Detection And Elimination Of Useless Misses In Multiprocessors - Dubois, Skeppstedt, Ricciulli.. (1993)   (Correct)
In this paper we introduce a classification of misses in shared-memory multiprocessors based on inter processor communication. We identify the set of essential misses, i.e., the smallest set of misses... / In The Intl. Symp. On Computer Architecture The Detection And

49   Once Upon a Type - Turner, Wadler, Mossin (1995)   (Correct)
A number of useful optimisations are enabled if we can determine when a value is accessed at most once. We extend the Hindley-Milner type system with uses, yielding a type-inference based program anal... / Functional Programming and Computer Architecture San Diego California

49   Increasing the Instruction Fetch Rate via Multiple Branch Prediction.. - Yeh (1993)   (Correct)
High performance computer implementation today is increasingly directed toward parallelism in the hardware. Superscalar machines, where the hardware can issue more than one instruction each cycle, are... / Recent advances in computer architecture have focused on

49   Efficient Exact Arithmetic for Computational Geometry - Fortune, Van Wyk (1993)   (Correct)
We experiment with exact integer arithmetic to implement primitives for geometric algorithms. Naive use of exact arithmetic---either modular or multiprecision integer---increases execution time dramat... / length is defined by the computer architecture typical -bit integer

48   The Design of Nectar: A Network Backplane for Heterogeneous.. - Arnould (1989)   (Correct)
Nectar is a "network backplane" for use in heterogeneous multicomputers. The initial system consists of a starshaped fiber-optic network with an aggregate bandwidth of 1.6 gigabits/second and a switch... / The Nectar network computer architecture project attacks the

48   Using the SimOS Machine Simulator to Study Complex Computer Systems - Rosenblum, Bugnion, Devine, Herrod (1997)   (Correct)
This paper identifies two challenges that machine simulators such as SimOS must overcome in order to effectively analyze large complex workloads: handling long workload execution times and collecting ... / Key Words and Phrases computer architecture computer simulation br Words and Phrases computer architecture computer simulation computer

47   A Comparative Analysis of Schemes for Correlated Branch Prediction - Young, Gloy, Smith (1995)   (Correct)
Modern high-performance architectures require extremely accurate branch prediction to overcome the performance limitations of conditional branches. We present a framework that categorizes branch predi... / International Symposium on Computer Architecture June . A

46   A Standard ML Compiler - Appel, MacQueen (1987)   (Correct)
Standard ML is a major revision of earlier dialects of the functional language ML. We describe the first compiler written for Standard ML in Standard ML. The compiler incorporates a number of novel fe... / Programming Languages and Computer Architecture LNCS Vol ed. J. P.

46   Cache Miss Equations: An Analytical Representation of Cache Misses - Ghosh (1997)   (Correct)
With the widening performance gap between processors and main memory, efficient memory referencing behavior is necessary for good program performance. Both hand-tuning and compiler optimization techni... / between Compilers and Computer Architectures Third International

45   An Argument for Simple COMA - Ashley Saulsbury (1995)   (Correct)
We present design details and some initial performance results of a novel scalable shared memory multiprocessor architecture. This architecture features the automatic data migration and replication ca... / International Symposium on Computer Architecture pages - June br that allow a simpler hardware architecture than COMA or CC-NUMA but

45   Transactional Memory: Architectural Support for Lock-Free Data.. - Herlihy, Moss (1993)   (Correct)
A shared data structure is lock-free if its operations do not require mutual exclusion. If one process is interrupted in the middle of an operation, other processes will not be prevented from operatin... / International Symposium on Computer Architecture pages - June br Sohi who propose a hardware architecture that optimistically

45   Generic Virtual Memory Management for Operating System Kernels - Abrossimov, Rozier (1989)   (Correct)
We discuss the rationale and design of a Generic Memory management Interface, for a family of scalable operating systems. It consists of a general interface for managing virtual memory, independently ... / of the underlying hardware architecture e.g. paged versus

45   A Note on Distributed Computing - Waldo (1994)   (Correct)
We argue that objects that interact in a distributed system need to be dealt with in ways that are intrinsically different from objects that interact in a single address space. These differences are... / object does not know the hardware architecture on which the recipient of

45   Can Logic Programming Execute as Fast as Imperative Programming?.. - Van Roy (1990)   (Correct)
Bibliographic references of "Can Logic Programming Execute as Fast as Imperative Programming?", Van Roy unknown 170 79. P. Voda, Trilogy version 1.0, Complete Logic Systems, Inc, September 1987. 80. ... / International Symposium on Computer Architecture June . . T. P. br Nakashima and K. Nakajima Hardware Architecture of the Sequential

45   Bananas in Space: Extending Fold and Unfold to Exponential Types - Meijer, Hutton (1995)   (Correct)
Fold and unfold are general purpose functionals for processing and constructing lists. By using the categorical approach of modelling recursive datatypes as fixed points of functors, these functionals... /

44   For a Better Support of Static Data Flow - Consel, Danvy   (Correct)
This paper identifies and solves a class of problems that arise in binding time analysis and more generally in partial evaluation of programs: the approximation and loss of static information due to... / Programming Languages and Computer Architecture volume of Lecture

43   Quantifying Behavioral Differences Between C and C++ Programs - Calder (1994)   (Correct)
Improving the performance of C programs has been a topic of great interest for many years. Both hardware technology and compiler optimization research has been applied in an effort to make C programs ... / The design of computer architecture is typically driven by the

43   FUDGETS - A Graphical User Interface in a Lazy Functional Language - Carlsson, Hallgren (1993)   (Correct)
This paper describes an implementation of a small windowbased graphical user interface toolkit for X Windows written in the lazy functional language LML. By using this toolkit, a Haskell or LML progra... /

42   Performance Characterization of a Quad Pentium Pro SMP Using OLTP.. - Keeton, Patterson, He, Raphael, Baker (1998)   (Correct)
Commercial applications are an important, yet often overlooked, workload with significantly different characteristics from technical workloads. The potential impact of these differences is that comput... / such as SPEC or LINPACK in computer architecture performance studies. This br approaches. . SMP Hardware Architecture Table shows the

41   MGS: A Multigrain Shared Memory System - Yeung (1996)   (Correct)
Parallel workstations, each comprising 10-100 processors, promise cost-effective general-purpose multiprocessing. This paper explores the coupling of such small- to medium-scale shared memory multipro... / International Symposium on Computer Architecture May . MGS A

41   An Object-Oriented Concurrent Reflective Language for Dynamic.. - Masuhara (1994)   (Correct)
this paper proposes an object-oriented concurrent reflective language unknown (in IPSJ SIG Notes, 94-PRG-18, pp.57--64, 1994) 23567?;!ABD9-E4?=@? :/C,35GKIFHJ81.0 NP)R SWX" ZOO(`[]Y+VT+LQ nlpl~aszhiy... / to the application and or hardware architecture for efficient execution.

41   The Agree Predictor: A Mechanism for Reducing Negative Branch History .. - Sprangle, Chappell, Alsup, Patt (1997)   (Correct)
Deeply pipelined, superscalar processors require accurate branch prediction to achieve high performance. Two-level branch predictors have been shown to achieve high prediction accuracy. It has also be... / Yale N. Patty Advanced Computer Architecture Laboratory y

41   Improving Superscalar Instruction Dispatch and Issue by Exploiting.. - Vajapeyam, Mitra (1997)   (Correct)
Superscalar processors currently have the potential to fetch multiple basic blocks per cycle by employing one of several recently proposed instruction fetch mechanisms. However, this increased fetch b... / International Symposium on Computer Architecture Denver June .

40   Minimizing Register Requirements under Resource-Constrained.. - Govindarajan, Altman, Gao (1994)   (Correct)
In this paper we address the following software pipelining problem: given a loop and a machine architecture with a fixed number of processor resources (e.g. function units), how can one construct a so... / Today rapid advances in computer architecture -hardware and software br advances in computer architecture -hardware and software technology

40   Run-time Adaptive Cache Hierarchy Management via Reference Analysis - Johnson, Hwn (1997)   (Correct)
Improvements in main memory speeds have not kept pace with increasing processor clock frequency and improved exploitation of instruction-level parallelism. Consequently, the gap between processor and ... / International Symposium on Computer Architecture June -

40   DOCTOR: An IntegrateD SOftware Fault InjeCTiOn EnviRonment - Han, Rosenberg, Shin (1995)   (Correct)
This paper presents an integrateD sO ftware fault injeC T iO n enviR onment (DOCTOR) which is capable of injecting various types of faults with different options, automatically collecting performance ... / complexity of contemporary computer architectures and the high-degree of br on the underlying hardware architecture and operating system a

39   Dynamic Dependency Analysis of Ordinary Programs - Austin (1992)   (Correct)
A quantitative analysis of program execution is essential to the computer architecture design process. With the current trend in architecture of enhancing the performance of uniprocessors by exploitin... / is essential to the computer architecture design process. With the

39   Mobile robot miniaturisation: A tool for investigation in control.. - Mondada, Franzi, Ienne (1994)   (Correct)
The interaction of an autonomous mobile robot with the real world critically depends on the robots morphology and on its environment. Building a model of these aspects is extremely complex, making sim... / bus Figure . Khepera hardware architecture. topology.eps

39   Fine-Grain Software Distributed Shared Memory on SMP Clusters - Daniel Scales Kourosh (1997)   (Correct)
Commercial SMP nodes are an attractive building block for software distributed shared memory systems. The advantages of using SMP nodes include fast communication among processors within the same node... / International Symposium on Computer Architecture pages - April

38   Embra: Fast and Flexible Machine Simulation - Witchel, Rosenblum (1996)   (Correct)
This paper describes Embra, a simulator for the processors, caches, and memory systems of uniprocessors and cache-coherent multiprocessors. When running as part of the SimOS simulation environment, Em... / as well as for studies of computer architecture. In this capacity it has

38   A Simple Algorithm for Nearest Neighbor Search in High Dimensions - Nene, Nayar (1995)   (Correct)
The problem of finding the closest point in high-dimensional spaces is common in pattern recognition. Unfortunately, the complexity of most existing search algorithms, such as k-d tree and R-tree, gro... / visual correspondence hardware architecture. i Introduction

38   Processor Coupling: Integrating Compile Time and Runtime Scheduling.. - Keckler (1992)   (Correct)
The technology to implement a single-chip node composed of 4 high-performance floating-point ALUs will be available by 1995. This paper presents processor coupling,a mechanism for controlling multiple... / International Symposium on Computer Architecture dominate. The amount of

38   A Comparison of Adaptive Wormhole Routing Algorithms - Boppana, Chalasani (1993)   (Correct)
Improvement of message latency and network utilization in torus interconnection networks by increasing adaptivity in wormhole routing algorithms is studied. A recently proposed partially adaptive al... / th Annual Int'l Symp. on Computer Architecture A Comparison of

38   Speculative Versioning Cache - Gopal (1998)   (Correct)
Dependences among loads and stores whose addresses are unknown hinder the extraction of instruction level parallelism during the execution of a sequential program. Such ambiguous memory dependences ca... / on High-Performance Computer Architecture. control units

37   Detection and Exploitation of File Working Sets - Tait (1991)   (Correct)
The work habits of most individuals yield file access patterns that are quite pronounced and can be regarded as defining working sets of files used for particular applications. This paper describes a ... / is an old idea in computer architecture. Prepaging has not had

37   Data Access Microarchitectures for Superscalar Processors with.. - Chen (1991)   (Correct)
The performance of superscalar processors is more sensitive to the memory system delay than their single-issue predecessors. This paper examines alternative data access microarchitectures that effecti... / th Ann. Int'l Symp. Computer Architecture Toronto Canada pp. br when dealing with the hardware architecture. First addresses for the

37   Computer-Aided Design of a Generic Robot Controller Handling.. - Simon, Espiau, Castillo, Kapellos (1995)   (Correct)
This paper describes an original system, Orccad, for the computer-aided design of robot controllers. Accessed by three different user levels (system, control and application), it proposes a coherent a... / of functional tasks on a hardware architecture fall in the gap

35   Trace-driven Memory Simulation: A Survey - Uhlig, Mudge   (Correct)
ion and Trace Collection Methods Operating System Compiler Assembler Linker Loader Emulation Microcode Circuits and Gates Hardware Software Single-stepping Code Annotation Instruction Emulation Microc... / TREVOR N. MUDGE Advanced Computer Architecture Lab ACAL Electrical

35   Warm Fusion: Deriving Build-Catas from Recursive Definitions - Launchbury, Sheard (1995)   (Correct)
Program fusion is the process whereby separate pieces of code are fused into a single piece, typically transforming a multi-pass algorithm into a single pass. Recent work has made it clear that the pr... /

34   Software DSM Protocols that Adapt between Single Writer and Multiple.. - Cristiana Amza (1997)   (Correct)
We present two software DSM protocols that dynamically adapt between a single writer (SW) and a multiple writer (MW) protocol based on the application 's sharing patterns. The first protocol (WFS) ad... / the Second High Performance Computer Architecture Symposium pages -

34   Compiler-Controlled Multithreading for Lenient Parallel Languages - Schauser, Culler, von Eicken (1991)   (Correct)
Tolerance to communication latency and inexpensive synchronization are critical for general-purpose computing on large multiprocessors. Fast dynamic scheduling is required for powerful non-strict pa... / Programming Languages and Computer Architecture Aug. Springer

34   An Empirical Comparison of the Kendall Square Research KSR-1 and.. - Singh, Joe, Gupta, Hennessy (1993)   (Correct)
Two interesting variants of large-scale shared-addressspace parallel architectures are cache-coherent non-uniformmemory -access machines (CC-NUMA) and cache-only memory architectures (COMA). Both have... / International Symposium on Computer Architecture pages - May br to refer to its COMA architecture. hardware at the relatively fine

33   The Impact of Communication Locality on Large-Scale Multiprocessor.. - Johnson (1992)   (Correct)
As multiprocessor sizes scale and computer architects turn to interconnection networks with non-uniform communication latencies, the lure of exploiting communication locality to increase performance b... / International Symposium on Computer Architecture May . application

33   Comparison of Hardware and Software Cache Coherence Schemes - Adve, Adve, Hill, Vernon (1991)   (Correct)
We use mean value analysis models to compare representative hardware and software cache coherence schemes for a large-scale shared-memory system. Our goal is to identify the workloads for which either... / International Symposium on Computer Architecture May .

33   The J-Machine Multicomputer: An Architectural Evaluation - Noakes (1993)   (Correct)
The MIT J-Machine multicomputer has been constructed to study the role of a set of primitive mechanisms in providing efficient support for parallel computing. Each J-Machine node consists of an integr... / International Symposium on Computer Architecture May . The J-Machine

33   Petri Net Analysis Using Boolean Manipulation - Pastor, Roig, Cortadella, Badia (1994)   (Correct)
This paper presents a novel analysis approach for bounded Petri nets. The net behavior is modeled by boolean functions, thus reducing reasoning about Petri nets to boolean calculation. The state exp... / Badia Department of Computer Architecture Universitat Polit ecnica

33   FFTW: An Adaptive Software Architecture For The FFT - Frigo, Johnson (1998)   (Correct)
FFT literature has been mostly concerned with minimizing the number of floating-point operations performed by an algorithm. Unfortunately, on present-day microprocessors this measure is far less impor... / must know the details of a computer architecture in order to design a fast

33   Distributed Loop Computer Networks: A Survey - Bermond, Comellas, Hsu (1995)   (Correct)
Distributed loop computer networks are extensions of the ring networks and are widely used in the design and implementation of local area networks and parallel processing architectures. We give a surv... / For general reference on computer architecture and parallel processing

33   Tradeoffs in Supporting Two Page Sizes - Talluri, Kong, Hill, Patterson (1992)   (Correct)
As computer system main memories get larger and processor cycles-per-instruction (CPIs) get smaller, the time spent in handling translation lookaside buffer (TLB) misses could become a performance bot... / International Symposium on Computer Architecture June . Tradeoffs

32   Home-based SVM protocols for SMP clusters: Design and Performance - Samanta (1998)   (Correct)
As small-scale shared memory multiprocessors proliferate in the market, it is very attractive to construct largescale systems by connecting smaller multiprocessors together in software using efficient... / st Annual Symposium on Computer Architecture pages - Apr.

31   Extending SUIF for Machine-dependent Optimizations - Smith (1996)   (Correct)
This paper describes a set of modifications and extensions to the base SUIF library that provide the abstractions necessary for machine-dependent optimizations such as global instruction scheduling. W... / of a research program in computer architecture and machine-dependent

31   Selective Value Prediction - Calder, Reinman, Tullsen (1998)   (Correct)
Value Prediction is a relatively new technique to increase instruction-level parallelism by breaking true data dependence chains. A value prediction architecture produces values, which may be later co... / International Symposium on Computer Architecture May . Selective

31   Evolving Electronic Robot Controllers that Exploit Hardware Resources - Thompson (1995)   (Correct)
Artificial evolution can operate upon reconfigurable electronic circuits to produce efficient and powerful control systems for autonomous mobile robots. Evolving physical hardware instead of control... / many other evolvable hardware architectures both analogue and

31   Dynamic Self-Invalidation: Reducing Coherence Overhead in.. - Lebeck, Wood (1995)   (Correct)
This paper introduces dynamic self-invalidation (DSI), a new technique for reducing cache coherence overhead in shared-memory multiprocessors. DSI eliminates invalidation messages by having a processo... / International Symposium on Computer Architecture Reprinted by permission

30   A Tractable Scheme Implementation - Kelsey, Rees (1993)   (Correct)
Scheme 48 is an implementation of the Scheme programming language constructed with tractability and reliability as its primary design goals. It has the structural properties of large, compiler-based... / or with other details of hardware architecture. The virtual machine

30   PRISM-II Compiler and Architecture - Wazlowski, Agarwal, Lee, Smith, Lam, .. (1993)   (Correct)
This paper discusses the architecture and compiler for a general-purpose metamorphic computing platform called PRISM-II . PRISM-II improves the performance of many computationally-intensive tasks by a... / an applicationspecific computer architecture In

29   Parallel Programming using Functional Languages - Roe (1991)   (Correct)
simulation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 213 8.6 Debugging : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 227 8.7 Summary : : : :... / machines . Parallel computer architecture

29   GRASP - A New Search Algorithm for Satisfiability - Silva, Sakallah (1996)   (Correct)
This report introduces GRASP (Generic seaRch Algorithm for the Satisfiability Problem), an integrated algorithmic framework for SAT that unifies several previously proposed search-pruning techniques a... / Karem A. Sakallah Advanced Computer Architecture Laboratory Department of

29   Multi-Protocol Active Messages on a Cluster of SMP's - Lumetta, Mainwaring, Culler (1997)   (Correct)
Clusters of multiprocessors, or Clumps, promise to be the supercomputers of the future, but obtaining high performance on these architectures requires an understanding of interactions between the mult... / International Symposium on Computer Architecture Gold Coast Qld. br operates. . Hardware Architecture The experimental

29   Control Flow Speculation in Multiscalar Processors - Jacobson, al. (1997)   (Correct)
The Multiscalar architecture executes a single sequential program following multiple flows of control. In the Multiscalar hardware, a global sequencer, with help from the compiler, takes large steps t... / on High Performance Computer Architecture February - in San

28   Architecture Validation for Processors - Ho (1995)   (Correct)
Modern, high performance microprocessors are extremely complex machines which require substantial validation effort to ensure functional correctness prior to tapeout. Generating the corner cases to te... / International Symposium on Computer Architecture June . Abstract

28   Polling Watchdog: Combining Polling and Interrupts for Efficient.. - Maquelin (1996)   (Correct)
Parallel systems supporting multithreading, or message passing in general, have typically used either polling or interrupts to handle incoming messages. Neither approach is ideal; either may lead to e... / International Symposium on Computer Architecture Philadelphia

28   Exact and Approximate Methods for Calculating Signal and Transition.. - Chi-Ying Tsui (1994)   (Correct)
In this paper, we consider the problem of calculating the signal and transition probabilities of the internal nodes of the combinational logic part of a finite state machine (FSM). Given the state tra... / Alvin M. Despain Advanced Computer Architecture Laboratiry ACAL-TR- -

27   Optimizing Instruction Cache Performance for Operating System.. - Torrellas, Xia, Daigle (1995)   (Correct)
High instruction cache hit rates are key to high performance. One known technique to improve the hit rate of caches is to use an optimizing compiler to minimize cache interference via an improved layo... / on High-Performance Computer Architecture. This work was supported

27   An Analysis of Dynamic Branch Prediction Schemes on System Workloads - Gloy, Young, Chen, Smith (1996)   (Correct)
Recent studies of dynamic branch prediction schemes rely almost exclusively on user-only simulations to evaluate performance. We find that an evaluation of these schemes with user and kernel reference... / International Symposium on Computer Architecture May .

27   Stage Scheduling: A Technique to Reduce the Register Requirements of.. - Eichenberger, Davidson (1995)   (Correct)
Modulo scheduling is an efficient technique for exploiting instruction level parallelism in a variety of loops, resulting in high performance code but increased register requirements. We present a set... / Edward S. Davidson Advanced Computer Architecture Laboratory EECS

27   A Performance Study of Memory Consistency Models - Zucker, Baer (1992)   (Correct)
Recent advances in technology are such that the speed of processors is increasing faster than memory latency is decreasing. Therefore the relative cost of a cache miss is becoming more important. Howe... / International Symposium on Computer Architecture. y Supported by a

27   *T: A Multithreaded Massively Parallel Architecture - Nikhil, Papadopoulos, Arvind (1992)   (Correct)
What should the architecture of each node in a general purpose, massively parallel architecture (MPA) be? We frame the question in concrete terms by describing two fundamental problems that must be so... / Proc. th. Intl. Symp. on Computer Architecture May -

27   Implementing Regular Tree Expressions - Aiken, Murphy (1991)   (Correct)
Regular tree expressions are a natural formalism for describing the sets of tree-structured values that commonly arise in programs; thus, they are well-suited to applications in program analysis. We d... /

27   Optimizing the Instruction Cache Performance of the Operating System - Torrellas, Xia, Daigle (1995)   (Correct)
High instruction cache hit rates are key to high performance. One known technique to improve the hit rate of caches is to minimize cache interference by improving the layout of the basic blocks of t... / on High-Performance Computer Architecture in January of . This

26   A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches - Kessler, HIll, Wood (1994)   (Correct)
This paper compares the trace-sampling techniques of set sampling and time sampling. Using the multi-billion-reference traces of Borg et al., we apply both techniques to multi-megabyte caches, where s... / performance cold start computer architecture memory systems

26   Concurrent Object-Oriented Programming in Act 1 - Lieberman (1987)   (Correct)
this paper will try to accomplish several goals (in parallel): We will argue that the actor model is an appropriate way to think about parallel computation. Since many actors may be actively sending o... / versions. As advances in computer architecture and changing economics

26   A Novel Approach Towards Automatic Data Distribution - Jordi Garcia (1995)   (Correct)
Data distribution is one of the key aspects that a parallelizing compiler for a distributed memory architecture should consider, in order to get efficiency from the system. The cost of accessing loc... / Ayguad and Jess Labarta Computer Architecture Department Universitat

26   Provably Efficient Scheduling for Languages with Fine-Grained.. - Blelloch, Gibbons, Matias (1995)   (Correct)
this paper has been extended to generate the pdf-schedule for such languages by using a 2-3 tree data structure that maintains the ready set in the appropriate priority order [BGMN97]. unknown [Hag91]... / Programming Languages and Computer Architecture Lecture Notes in Computer

26   Active Pages: A Computation Model for Intelligent Memory - Oskin, Chong, Sherwood (1998)   (Correct)
Microprocessors and memory systems suffer from a growing gap in performance. We introduce Active Pages, a computation model which addresses this gap by shifting data-intensive computations to the memo... / International Symposium on Computer Architecture Barcelona. Active

26   RAID-II: A High-Bandwidth Network File Server - Drapeau, Shirriff, Hartman, Miller.. (1994)   (Correct)
In 1989, the RAID (Redundant Arrays of Inexpensive Disks) group at U. C. Berkeley built a prototype disk array called RAID-I. The bandwidth delivered to clients by RAID-I was severely limited by the m... /

25   Register Relocation: Flexible Contexts for Multithreading - Waldspurger, Weihl (1993)   (Correct)
Multithreading is an important technique that improves processor utilization by allowing computation to be overlapped with the long latency operations that commonly occur in multiprocessor systems. Th... / Patterson and J. Hennessy. Computer Architecture A Quantitative br A. For the conventional hardware architecture with fixed contexts

25   Implementing Haskell overloading - Augustsson (1993)   (Correct)
Haskell overloading poses new challenges for compiler writers. Until recently there have been no implementations of it which have had acceptable performance; users have been adviced to avoid it by usi... / Programming Languages and Computer Architecture Copenhagen Denmark

25   A Comparison of Full and Partial Predicated Execution Support for ILP .. - Mahlke (1995)   (Correct)
One can effectively utilize predicated execution to improve branch handling in instruction-level parallel processors. Although the potential benefits of predicated execution are high, the tradeoffs in... / International Symposium on Computer Architecture pp. - May .

25   Coherent Network Interfaces for Fine-Grain Communication - Mukherjee, Falsafi, al. (1996)   (Correct)
Historically, processor accesses to memory-mapped device registers have been marked uncachable to insure their visibility to the device. The ubiquity of snooping cache coherence, however, makes it pos... / International Symposium on Computer Architecture ISCA Abstract

25   Memory-System Design Considerations for Dynamically-Scheduled.. - Keith Farkas Farkas (1997)   (Correct)
In this paper, we identify performance trends and design relationships between the following components of the data memory hierarchy in a dynamically-scheduled processor: the register file, the lockup... / International Symposium on Computer Architecture pages - May .

25   Memory-System Design Considerations For Dynamically-Scheduled.. - Farkas (1997)   (Correct)
Memory-System Design Considerations for Dynamically-Scheduled Microprocessors Keith Istvan Farkas Doctor of Philosophy Graduate Department of Electrical and Computer Engineering University of Toronto ... / of my work in many areas of computer architecture and I thank Dr. Chow and

24   The Energy Efficiency of IRAM Architectures - Fromm, Perissakis, Cardwell.. (1996)   (Correct)
Portable systems demand energy efficiency in order to maximize battery life. IRAM architectures, which combine DRAM and a processor on the same chip in a DRAM process, are more energy efficient than c... / of general interest to the computer architecture community Hence

24   The Impact of Synchronization and Granularity on Parallel Systems - Chen (1990)   (Correct)
In this paper, we study the impact of synchronization and granularity on the performance of parallel systems using an execution-driven simulation technique. We find that even though there can be a lot... / and the ELI- . Int. Sym. Computer Architecture - June .

24   Critical Issues Regarding the Trace Cache Fetch Mechanism - Patel, Friendly, Patt (1997)   (Correct)
In order to meet the demands of wider issue processors, fetch mechanisms will need to fetch multiple basic blocks per cycle. The trace cache supplies several basic blocks each cycle by storing logical... / Technical Report Advanced Computer Architecture Laboratory Department of

24   Multiple-Block Ahead Branch Predictors - Seznec, Jourdan, Sainrat, Michaud (1996)   (Correct)
A basic rule in computer architecture is that a processor cannot execute an application faster than it fetches its instructions. This paper presents a novel costeffective mechanism called the two-bloc... / Abstract A basic rule in computer architecture is that a processor

24   The Memory Performance of DSS Commercial Workloads in Shared-Memory.. - Trancoso (1997)   (Correct)
Although cache-coherent shared-memory multiprocessors are often used to run commercial workloads, little work has been done to characterize how well these machines support such workloads. In particula... / Computer Architecture Department Universitat

24   Low Power Architecture Design and Compilation Techniques for.. - Su, Tsui, Despain (1994)   (Correct)
Reducing switching activity would significantly reduce power consumption of a processor chip. In this paper, we present two novel techniques, Gray code addressing and Cold scheduling, for reducing swi... / Alvin M. Despain Advanced Computer Architecture Laboratory ACAL-TR- -

24   Quickly Generating Billion-Record Synthetic Databases - Gray (1994)   (Correct)
Evaluating database system performance often requires generating synthetic databases -- ones having certain statistical properties but filled with dummy information. When evaluating different databa... / First consider parallel computer architecture and the associated br Horst R.Chou T.The Hardware Architecture and Linear Expansion of

23   Counting Networks and Multi-Processor Coordination (Extended Abstract) - Aspnes, al. (1991)   (Correct)
James Aspnes Maurice Herlihy y Nir Shavit z Digital Equipment Corporation Cambridge Research Lab CRL 90/11 September 18, 1991 Abstract Many fundamental multi-processor coordination problems c... / th Symposium on Computer Architecture June . M. br has been the focus of hardware architecture design

23   A Comparison of Entry Consistency and Lazy Release Consistency.. - Sarita Adve (1996)   (Correct)
This paper compares several implementations of entry consistency (EC) and lazy release consistency (LRC), two relaxed memory models in use with software distributed shared memory (DSM) systems. We u... /

23   Threaded Multiple Path Execution - Wallace, Calder, Tullsen (1998)   (Correct)
This paper presents Threaded Multi-Path Execution (TME), which exploits existing hardware on a Simultaneous Multithreading (SMT) processor to speculatively execute multiple paths of execution. When th... / International Symposium on Computer Architecture June . Threaded

23   Informing Memory Operations: Providing Memory Performance Feedback in .. - Horowitz (1996)   (Correct)
Memory latency is an important bottleneck in system performance that cannot be adequately solved by hardware alone. Several promising software techniques have been shown to address this problem succes... / International Symposium on Computer Architecture. May . cache

23   Memory Latency Effects in Decoupled Architectures with a Single Data.. - Lizyamma Kurian (1994)   (Correct)
Decoupled computer architectures partition the memory access and execute functions in a computer program and achieve high performance by exploiting the fine--grain parallelism between the two. These a... / Abstract Decoupled computer architectures partition the memory br Structured Memory Access Architecture Computer Systems Group report

23   Precise Compile-Time Performance Prediction for Superscalar-Based.. - Wang (1994)   (Correct)
Optimizing compilers (particularly parallel compilers) are constrained by their ability to predict performance consequences of the transformations they apply. Many factors, such as unknowns in control... / A recent trend in parallel computer architecture is to use superscalar

23   Reducing TLB and Memory Overhead Using Online Superpage Promotion - Romer, Ohlrich, Karlin, Bershad (1995)   (Correct)
Modern microprocessors contain small TLBs that maintain a cache of recently used translations. A TLB's coverage is the sum of the number of bytes mapped by each entry. Applications with working sets l... / International Symposiumon Computer Architecture . Reducing TLB and

23   On the Cost-Effectiveness of PRAMs - Abolhassan, Keller, Paul (1991)   (Correct)
We introduce a formalism which allows to treat computer architecture as a formal optimization problem. We apply this to the design of shared memory parallel machines. Present computers of this type su... / J. Paul Institute for Computer Architecture and Parallelism Computer br differences between the hardware architectures of the two classes but

23   Worst Case Execution Time Analysis for Modern Hardware Architectures - Ottosson (1997)   (Correct)
Knowing the worst case execution times (WCETs) for programs are crucial for the design and verification of real-time systems. Modern hardware architectures utilize pipelined execution and cache memory... / Time Analysis for Modern Hardware Architectures Greger Ottosson Mikael

23   Worst-Case Execution Time Analysis for Modern Hardware Architectures - Ottosson (1997)   (Correct)
We present a method for determining the worst case execution time (WCET) for programs running on systems with modern hardware architectures, e.g. pipelined processors and cache memory. The method is b... / Time Analysis for Modern Hardware Architectures Greger Ottosson Mikael

23   Efficient Simulation of Caches under Optimal Replacement with.. - Sugumar, Abraham (1993)   (Correct)
Cache miss characterization models such as the three Cs model are useful in developing schemes to reduce cache misses and their penalty. In this paper we propose the OPT model that uses cache simulati... / Santosh G. Abraham Advanced Computer Architecture Laboratory Department of

23   The Declining Effectiveness of Dynamic Caching for General-Purpose.. - Douglas Burger James (1995)   (Correct)
The computational power of commodity general-purpose microprocessors is racing to truly amazing levels. As peak levels of performance rise, the building of memory systems that can keep pace becomes in... / times. Much research in computer architecture today is focused on

22   Polytypic Pattern Matching - Jeuring (1995)   (Correct)
The (exact) pattern matching problem can be informally specified as follows: given a pattern and a text, find all occurrences of the pattern in the text. The pattern and the text may both be lists, or... / Programming Languages and Computer Architecture Cambridge Massachusetts

22   Designing Memory Consistency Models For Shared-Memory Multiprocessors - Adve (1993)   (Correct)
The memory consistency model (or memory model) of a shared-memory multiprocessor system influences both the performance and the programmability of the system. The simplest and most intuitive model for... / models in particular and computer architecture in general. I am br Symposium on Computer Architecture Computer Architecture News

22   Random Access Protocols for High Speed Interprocessor Communication.. - Dowd (1991)   (Correct)
This paper examines optical star coupled systems as a means of providing interprocessor communication. In particular, a MIMD (multiple instruction, multiple data) distributed memory parallel computer ... / communication parallel computer architecture computer communication. br parallel computer architecture computer communication.

22   Performance Measurement and Trace Driven Simulation of Parallel CAD.. - Jiun-Ming Hsu (1990)   (Correct)
This paper presents the performance evaluation, workload characterization and trace driven simulation of a hypercube multicomputer running realistic workloads. Six representative parallel applications... / th Int'l Symp. on Computer Architecture May . Performance

22   The Impact of Instruction-Level Parallelism on Multiprocessor.. - Pai (1997)   (Correct)
Current microprocessors exploit high levels of instruction-level parallelism (ILP) through techniques such as multiple issue, dynamic scheduling, and nonblocking reads. This paper presents the first d... / on High Performance Computer Architecture February - in

22   A Comprehensive Set of Tools for Solving Partial Differential.. - Bruaset, Langtangen (1996)   (Correct)
This chapter presents an overview of the functionality in Diffpack, which is a software environment for the numerical solution of partial differential equations. Examples on how object-oriented prog... / visualization and computer architecture. Moreover the numerical

21   Storageless Value Prediction Using Prior Register Values - Tullsen (1999)   (Correct)
This paper presents a technique called register value prediction (RVP) which uses a type of locality called register-value reuse. By predicting that an instruction will produce the value that is alrea... / International Symposium on Computer Architecture May Storageless

21   Mobile Computation - Cardelli (1996)   (Correct)
s to recompile source code. Techniques have emerged to get some of the advantages of both off-line and on-line portability, such as just-in-time compilation and run-time linking. But the emphasis is ... / possibly within a different computer architecture. Some RPC systems also

21   Early Experience with Message-Passing on the SHRIMP Multicomputer - Felten, Alpert, Bilas, Blumrich.. (1996)   (Correct)
The SHRIMP multicomputer provides virtual memory-mapped communication (VMMC), which supports protected, user-level message passing, allows user programs to perform their own buffer management, and sep... / the rd Intl. Symposium on Computer Architecture. Abstract The SHRIMP br systems is limited by the hardware architecture. Active Messages is one

21   Hierarchical Z-Buffer Visibility - Greene, Kass, Miller (1993)   (Correct)
An ideal visibility algorithm should a) quickly reject most of the hidden geometry in a model and b) exploit the spatial and perhaps temporal coherence of the images being generated. Ray casting with ... / I. . Computer Graphics Hardware Architecture Graphics Processors

20   Generating Efficient Code for Lazy Functional Languages - Smetsers, Nöcker, van Groningen.. (1991)   (Correct)
In this paper we will discuss how a good code generator can be built for (lazy) functional languages. Starting from Concurrent Clean, an experimental lazy functional programming language, code is gene... / Programming Languages and Computer Architecture FPCA Portland

20   The nofib Benchmark Suite of Haskell Programs - Partain (1993)   (Correct)
This position paper describes the need for, make-up of, and "rules of the game" for a benchmark suite of Haskell programs. (It does not include results from running the suite.) Those of us working on ... / standard text on computer architecture is an admirable expos'e of

20   Integrated Predicated and Speculative Execution in the IMPACT EPIC.. - August, Connors, Mahlke, Sias.. (1998)   (Correct)
Explicitly Parallel Instruction Computing (EPIC) architectures require the compiler to express program instruction level parallelism directly to the hardware. EPIC techniques which enable the compiler... / important aspect of computer architecture is to provide the

20   Parametric Feature Detection - Nayar, Baker, Murase (1995)   (Correct)
A large number of visual features are parametric in nature, including, edges, lines, corners, and junctions. We present a general framework for the design and implementation of detectors for parametri... / detectors and sketch a hardware architecture for a general feature

19   Streamlining Data Cache Access with Fast Address Calculation - Austin, Pnevmatikatos, Sohi (1995)   (Correct)
For many programs, especially integer codes, untolerated load instruction latencies account for a significant portion of total execution time. In this paper, we present the design and evaluation of a ... / International Symposium on Computer Architecture Instruction Clock

19   A Performance Comparison of Contemporary DRAM Architectures - Cuppu, Jacob, Davis, Mudge (1999)   (Correct)
In response to the growing gap between memory access time and processor speed, DRAM manufacturers have created several new DRAM architectures. This paper presents a simulation-based performance study ... / International Symposium on Computer Architecture May - in Atlanta

19   Parallelizing Applications into Silicon - Babb, Rinard, Moritz, Lee, Frank.. (1999)   (Correct)
The next decade of computing will be dominated by embedded systems, information appliances and application-specific computers. In order to build these systems, designers will need high-level compilati... / because trends in computer architecture are moving towards more br generation h resulting hardware architecture. is a list of abstract

19   Hierarchical Scalable Photonic Architectures for High-Performance.. - Dowd (1993)   (Correct)
This paper introduces two hierarchical optical structures for processor interconnection and compares their performance through analytic models and discrete-event simulation. Both architectures are bas... / Key Words parallel computer architecture processor

19   Computational Ram: A Memory-SIMD Hybrid and its Application to DSP - Elliott (1992)   (Correct)
this paper we describe (1) the CfflRAM architecture, (2) a working 8Kbit prototype, (3) a full scale CfflRAM designed in a 4Mbit DRAM process, and (4) CfflRAM applications. unknown page 1/4 Computati... / Figure CfflRAM Computer Architecture CfflRAM can also be

CiteSeer - citeseer.org - Terms of Service - Privacy Policy - Copyright © 1997-2002 NEC Research Institute