This directory is created automatically and some papers may be mislabeled. Only document within the CiteSeer database are listed. The directory is intended to provide entry points for browsing the database and is not intended to be authoritative. Papers may not appear in all relevant categories. For example, papers in a sub-category may not appear in higher level categories.
496 PVM: A Framework for Parallel Distributed Computing - Sunderam (1990)(Correct)
The PVM system is a programming environment for the development and
execution of large concurrent or parallel applications that consist of many
interacting, but relatively independent, components. It ... / high-bandwidth external I O or high performance graphics thereby br environments already possess the hardware diversity required to solve such
259 Efficient Software-Based Fault Isolation - Wahbe, Lucco, Anderson, Graham (1993)(Correct)
One way to provide fault isolation among cooperating software modules is to place each in its own address space. However, for tightly-coupled modules, this solution incurs prohibitive context switch o... / Unfortunately there is a high performance cost to providing fault br poses a tradeoff relative to hardware fault isolation substantially
223 High Performance Messaging on Workstations: Illinois Fast Messages.. - Pakin, Lauria, Chien (1995)(Correct)
ing with credit is permitted. To
copy otherwise, to republish, to post on servers or to redistribute to
lists, requires prior specific permission and/or a fee. Request Permissions
from Publication Dep... / High Performance Messaging on Workstations br layers are needed to deliver the hardware performance to the application
219 The NAS Parallel Benchmarks - Bailey, Barszcz, Barton, Browning.. (1994)(Correct)
A new set of benchmarks has been developed for the performance
evaluation of highly parallel supercomputers. These benchmarks consist
of five "parallel kernel" benchmarks and three "simulated applicat... / community by the year a high-performance operational computing system br not kept pace with advances in hardware software and algorithms. In
184 Weak Ordering - A New Definition - Adve (1990)(Correct)
A memory model for a shared memory, multiprocessor
commonly and often implicitly assumed by programmers
is that of sequential consistency. This model
guarantees that all memory accesses will appear to... / that weak ordering facilitates high performance implementations but that br in terms of a set of rules for hardware that have to be made visible to
184 Compiler Transformations for High-Performance Computing - Bacon (1993)(Correct)
In the last three decades a large number of compiler transformations for optimizing programs have been implemented. Most optimizations for uniprocessors reduce the number of instructions executed by t... / Compiler Transformations for High-Performance Computing DAVID F. BACON br organizations. Simultaneously hardware designers are able to employ
183 The Existence of Refinement Mappings - Abadi, Lamport (1988)(Correct)
Refinement mappings are used to prove that a lower-level specification correctly
implements a higher-level one. We consider specifications consisting
of a state machine (which may be infinite-state) t... / work includes exploring high-performance personal computing br Our approach to both hardware and software research is to
181 The Amber System: Parallel Programming on a Network of Multiprocessors - Chase (1989)(Correct)
Microprocessor-based shared-memory multiprocessors are becoming widely available and
promise to provide cost-effective high-performance computing.
This paper describes a programming system called Ambe... / to provide cost-effective high-performance computing. This paper br in which coherence is provided by hardware means for locally-executing
165 Multiscalar Processors - Sohi (1995)(Correct)
Multiscalar processors use a new, aggressive implementation
paradigm for extracting large quantities of instruction
level parallelism from ordinary high level language programs.
A single program is di... / in the program. To achieve high performance however modern processors br by a combination of software and hardware. The tasks are distributed to a
162 Performance of Various Computers Using Standard Linear Equations.. - Dongarra (1995)(Correct)
This report compares the performance of different computer systems in solving dense systems of linear equations. The comparison involves approximately a hundred computers, ranging from a Cray Y-MP to ... / special features.Thus many high-performance machines may not have br as new machines are added and as hardware and software systems improve.
161 The MIT Alewife Machine: A Large-Scale Distributed-Memory.. - Agarwal, Chaiken, Johnson, Kranz.. (1991)(Correct)
The Alewife multiprocessor project focuses on the architecture and design of a large-scale parallel machine. The machine uses a low dimension direct interconnection network to provide scalable communi... / processor. Introduction High-performance computer design is driven by br and concentrates on the novel hardware features of the machine including
161 The Sprite Network Operating System - Ousterhout, Cherenson, Douglis.. (1988)(Correct)
Sprite is a new operating system for networked uniprocessor and multiprocessor workstations with large physical memories. It implements a set of kernel calls much like those of 4.3 BSD UNIX, with exte... / machines which provide high performance even for diskless br workstation with special hardware support for Lisp applications
148 Why Aren't Operating Systems Getting Faster As Fast As Hardware? - Ousterhout (1989)(Correct)
This note evaluates several hardware platforms and operating systems using a set of benchmarks that test memory bandwidth and various operating system features such as kernel entry/exit and file syste... / the design and application of high performance scientific computers. We br Getting Faster As Fast As Hardware John Ousterhout d i g i t a
145 The Network Architecture of the Connection Machine CM-5 - Leiserson, Abuhamdeh, Douglas.. (1994)(Correct)
The Connection Machine Model CM-5 Supercomputer is a massively parallel computer system designed to offer performance in
the range of 1 teraflops (10
12
floating-point operations per second). The CM... / second The CM- obtains its high performance while offering ease of br back-door access to all system hardware to test system integrity and to
133 Totem: A Fault-Tolerant Multicast Group Communication System - Moser, Melliar-Smith, Agarwal.. (1996)(Correct)
When Totem delivers multicast messages, it invokes operations in the same total
order throughout the distributed system. The result: consistency of replicated
data and simplified programming of applic... / systems use inexpensive highperformance computers and can be br networks LANs and exploits the hardware broadcasts of such networks to
131 Shoring Up Persistent Applications - Carey, DeWitt, Franklin, Hall.. (1994)(Correct)
SHORE (Scalable Heterogeneous Object REpository) is a persistent object system under development at the University of Wisconsin. SHORE represents a merger of objectoriented database and file system te... / systems or on the kinds of high-performance multicomputer hardware br of high-performance multicomputer hardware needed for certain large scale
130 An Evaluation of Directory Schemes for Cache Coherence - Agarwal, al. (1988)(Correct)
The problem of cache coherence in shared-memory multiprocessors has been addressed using two basic approaches: directory schemes and snoopy cache schemes. Directory schemes have been given less attent... / cycle time especially in a high performance machine. Attempts to reduce br cache coherency support in hardware. These snoopy cache schemes also
123 The Nexus Approach to Integrating Multithreading and Communication - Foster (1996)(Correct)
Lightweight threads have an important role to play in parallel systems: they can be used to exploit shared-memory parallelism, to mask communication and I/O latencies, to implement remote memory acces... / threads and communication in high-performance distributed-memory systems. br handlers At the lower-performance higher-functionality end of the
118 The Zebra Striped Network File System - Hartman, Ousterhout (1993)(Correct)
Zebra is a network file system that increases throughput by striping file data across multiple servers. Rather than striping each file separately, Zebra forms all the new data from each client into a ... / file system. This provides high performance for writes of small files as br Sprite file system on the same hardware. For small files the Zebra
118 The Paradyn Parallel Performance Measurement Tools - Miller, Callaghan (1995)(Correct)
Paradyn is a performance measurement tool for parallel and distributed programs. Paradyn uses several novel technologies so that it scales to long running programs (hours or days) and large (thousand ... / an ARPA Graduate Fellowship in High Performance Computing. br to accept new operating system hardware and application specific
118 Zebra: A Striped Network File System - Hartman, Ousterhout (1993)(Correct)
This paper presents the design of Zebra, a striped network file system. Zebra applies ideas from log-structured file system (LFS) and RAID research to network file systems, resulting in a network file... / designed to provide both high performance and high availability. This br to provide both high performance and high availability. This is
118 A Metaobject Protocol for C++ - Chiba (1995)(Correct)
This paper presents a metaobject protocol (MOP)
for C++. This MOP was designed to bring the
power of meta-programming to C++ programmers.
It avoids penalties on runtime performance
by adopting a new m... / criteria of such a MOP are high performance and arbitrary br runtime. If this is not done in hardware the software will need to be
115 PVM: Parallel Virtual Machine - Geist, Beguelin, Dongarra, Jiang.. (1994)(Correct)
this reporting is to be turned on (1) or turned off (0) for subsequent
calls. A value of (2) will cause the program to exit after printing the error message (not
implemented in 3.2). The default is re... / J. Petrie Jr. The High Performance Fortran Handbook by br fast pace of change in computer hardware software and algorithms often
115 NetSolve: A Network Server for Solving Computational Science Problems - Casanova (1995)(Correct)
This paper presents a new system, called NetSolve, that allows users to access computational resources,
such as hardware and software, distributed across the network. The development of NetSolve
was m... / based Information Library for high performance computing Ninf project br computational resources such as hardware and software distributed across
113 High Speed Switch Scheduling for Local Area Networks - Anderson, Owicki, Saxe, Thacker (1993)(Correct)
Current technology trends make it possible to build communication networks that can support high performance distributed computing. This paper describes issues in the design of a prototype switch for ... / networks that can support high performance distributed computing. This br switch architectures use the same hardware for both scheduling and data
109 Shared Memory Consistency Models: A Tutorial - Adve, Gharachorloo (1995)(Correct)
Parallel systems that support the shared memory abstraction are becoming
widely accepted in many areas of computing. Writing correct and efficient
programs for such systems requires a formal specifica... / the design and application of high performance scientific computers. We br advance. Technologies both hardware and software do not all advance
104 A Unified Formalization of Four Shared-Memory Models - Adve (1993)(Correct)
This paper presents a shared-memory model, data-race-free-1, that unifies four earlier models: weak ordering,
release consistency (with sequentially consistent special operations), the VAX memory mode... / can be guaranteed with high performance. However each model br caches common uniprocessor hardware optimizations such as write
102 Zeus: A System for Algorithm Animation and Multi-View Editing - Brown (1992)(Correct)
Algorithm animation is a form of program visualization that is concerned with
dynamic and interactive graphical displays of a program's fundamental operations.
This paper describes the Zeus algorithm ... / work includes exploring high-performance personal computing br Our approach to both hardware and software research is to
101 Fine-grain Access Control for Distributed Shared Memory - Schoinas (1994)(Correct)
This paper discusses implementations of fine-grain memory
access control, which selectively restricts reads and
writes to cache-block-sized memory regions. Fine-grain
access control forms the basis of... / shared-memory machines achieve high performance by using hardware-intensive br require little or no additional hardware. These techniques permit
98 Alternative Implementations of Two-Level Adaptive Branch Prediction - Yeh, Patt (1992)(Correct)
As the issue rate and depth of pipelining of high performance
Superscalar processors increase, the importance
of an excellent branch predictor becomes more vital to
delivering the potential performanc... / and depth of pipelining of high performance Superscalar processors br gathered. We compute the hardware costs of implementing each of the
95 Beowulf: A Parallel Workstation For Scientific Computation - Sterling, Becker, al. (1995)(Correct)
Network-of-Workstations technology is applied
to the challenge of implementing very high performance
workstations for Earth and space science applications.
The Beowulf parallel workstation employs 16 ... / challenge of implementing very high performance workstations for Earth and br tracks the evolution of commodity hardware as well as new ports of Linux to
87 Complexity-Effective Superscalar Processors - Palacharla (1997)(Correct)
The performance tradeoff between hardware complexity and
clock speed is studied. First, a generic superscalar pipeline is defined.
Then the specific areas of register renaming, instruction window
wake... / microarchitecture achieves high performance as measured by instructions br The performance tradeoff between hardware complexity and clock speed is
84 Application Performance and Flexibility on Exokernel Systems - Kaashoek, Engler, Ganger.. (1997)(Correct)
The exokernel operating system architecture safely gives untrusted
software efficient control over hardware and software resources by
separating management from protection. This paper describes an
exo... / applications to achieve high performance without sacrificing the br software efficient control over hardware and software resources by
78 Software Write Detection for a Distributed Shared Memory - Zekauskas (1994)(Correct)
Most software-based distributed shared memory (DSM) systems rely on the operating system's virtual memory interface to detect writes to shared data. Strategies based on virtual memory page protection ... / Software System Support for High Performance Multicomputing contract br shared memory but do not rely on hardware page protection such as Orca
74 Principles of Metareasoning - Russell, Wefald (1991)(Correct)
In this paper we outline a general approach to the study of metareasoning, not in
the sense of explicating the semantics of explicitly specified meta-level control policies,
but in the sense of provid... / in applications demanding high performance and negligible response br Agents with Limited Performance Hardware project at Berkeley. We see
73 Improving IPC by Kernel Design - Liedtke (1993)(Correct)
Inter-process communication (ipc) has to be fast and effective, otherwise programmers will not use remote procedure calls (RPC), multithreading and multitasking adequately. Thus ipc performance is vit... / trick to obtaining this high performance rather a synergetic br to -Kbyte messages Although hardware specific details influence both
71 Unifying Data and Control Transformations for Distributed Shared.. - Cierniak (1994)(Correct)
We present a unified approach to locality optimization that employs both data and control transformations. Data transformations include changing the array layout in memory. Control transformations inv... / to be a serious obstacle to high performance on distributed shared memory br Most shared-memory machines both hardware and software based rely on data
71 Unifying Data and Control Transformations for Distributed.. - Cierniak, Li (1994)(Correct)
We present a unified approach to locality optimization that employs both data and control transformations. Data transformations include changing the array layout in memory. Control transformations inv... / to be a serious obstacle to high performance on distributed shared memory br Most shared-memory machines both hardware and software based rely on data
70 Amoeba - A Distributed Operating System for the 1990s - Mullender, van Rossum, Tanenbaum.. (1990)(Correct)
Amoeba is the distributed system developed at the Free University (VU) and
Centre for Mathematics and Computer Science (CWI), both in Amsterdam.
Throughout the project's ten-year history, a major conc... / with simplicity and high performance. Distributed systems are br systems on its class of hardware reported so far in the
68 Supercomputer Performance Evaluation and the Perfect Benchmarks - Cybenko (1990)(Correct)
In the past three years, the Perfect Benchmark
TM
Suite has evolved
from a supercomputer performance evaluation plan, presented by Kuck
and Sameh at the 1987 International Conference on Supercomputi... / benchmarking to high performance workstations An br in large part to increases in hardware speed averaging an order of
68 Compiler-directed Data Prefetching in Multiprocessors with Memory.. - Edward Gornish (1990)(Correct)
Memory hierarchies are used by multiprocessor systems
to reduce large memory access times. It is necessary to
automatically manage such a hierarchy, to obtain effective
memory utilization. In this pap... / networks MINs To achieve high performance in a hierarchical memory br of caches. In addition without hardware prefetching of cache lines no
67 Graphical Fisheye Views of Graphs - Sarkar, Brown (1992)(Correct)
A fisheye camera lens is a very wide angle lens that magnifies nearby objects while
shrinking distant objects. It is a valuable tool for seeing both "local detail" and
"global context" simultaneously.... / work includes exploring high-performance personal computing br Our approach to both hardware and software research is to
65 A Practical System for Intermodule Code Optimization at Link-Time - Srivastava, Wall (1992)(Correct)
We have developed a system called OM to explore the problem of code optimization at link-time. OM takes a collection of object modules constituting the entire program, and converts the object code int... / the design and application of high performance scientific computers. We br advance. Technologies both hardware and software do not all advance
65 Plan 9 from Bell Labs - Pike (1990)(Correct)
Plan 9 is a distributed computing environment. It is assembled from separate machines acting as CPU servers, file servers, and terminals. The pieces are connected by a single file-oriented protocol an... / high-speed networks and in high-performance microprocessors. A common br adapt well to changes in computing hardware. In particular we wanted to
62 High Time-Resolution Measurement and Analysis of LAN Traffic.. - Leland, Wilson (1991)(Correct)
The interconnection of local area networks is increasingly important, but little data are available on the
characteristics of the aggregate traffic that LANs will be submitting to the interconnection ... / SBC we are able to dedicate a high performance processor to servicing the br We present a high time-resolution hardware monitor for Ethernet LANs that
59 Threads and Input/Output in the Synthesis Kernel - Massalin, Pu (1995)(Correct)
The Synthesis operating system kernel combines several techniques
to provide high performance, including kernel code synthesis, fine-grain
scheduling, and optimistic synchronization. Kernel code synth... / several techniques to provide high performance including kernel code br system implementations. Using hardware and software emulating a SUN
58 A Sorting Classification of Parallel Rendering - Molnar (1994)(Correct)
We describe three broad classes of parallel rendering methods, based on where the sort from object-space to screen space occurs. These classes encompass most feedforward parallel software and hardware... / designers and implementers of high-performance parallel rendering systems. br feedforward parallel software and hardware rendering architectures that
58 Implementing Multiple Protection Domains in Java - Hawblitzel, Chang, Czajkowski, Hu.. (1998)(Correct)
Safe language technology can be used for protection
within a single address space. This protection is
enforced by the language's type system, which ensures
that references to objects cannot be forged... / language technology to offer high performance as well as protection in a br components without relying on hardware support. In a safe language
58 An Overview of the Pablo Performance Analysis Environment - Reed, Aydt, Madhyastha, Noe.. (1992)(Correct)
As massively parallel, distributed memory systems replace traditional vector supercomputers, effective application program optimization and system resource management become more than research curiosi... / based on the emerging High Performance Fortran HPF standard. br peak performance of the largest hardware configuration approaches
57 The Relative Importance of Concurrent Writers and Weak Consistency.. - Peter Keleher (1996)(Correct)
This paper presents a detailed comparison of the relative
importance of allowing concurrent writers versus the
choice of the underlying consistency model. Our comparison
is based on single- and multip... / memory DSM systems achieve high performance through a combination of br to overall performance. Hardware shared memory systems typically
57 A Case for NOW (Networks of Workstations) - Anderson, Culler, Patterson, team (1994)(Correct)
In this paper, we argue that because of recent technology advances, networks of workstations (NOWs) are poised to become the primary computing infrastructure for science and engineering, from low en... / micro would take over high-performance computing Brooks Today br The xFS goal is high performance highly available network file
55 Measured Capacity of an Ethernet: Myths and Reality - Boggs, Mogul, Kent (1988)(Correct)
Ethernet, a 10 Mbit/sec CSMA/CD network, is one of the most successful
LAN technologies. Considerable confusion exists as to the actual capacity of
an Ethernet, especially since some theoretical studi... / the design and application of high performance scientific computers. We br advance. Technologies both hardware and software do not all advance
55 Compiler-Based Prefetching for Recursive Data Structures - Luk (1996)(Correct)
Software-controlled data prefetching offers the potential for bridging the ever-increasing speed gap between the memory subsystem and today's high-performance processors. While prefetching has enjoyed... / memory subsystem and today's high-performance processors. While br can be controlled either by hardware or software. Hardware-based
53 Using Profile Information to Assist Classic Code Optimizations - Chang (1991)(Correct)
This paper describes the design and implementation of an optimizing compiler that automatically generates profile information to assist classic code optimizations. This compiler contains two new compo... / Hwu Center for Reliable and High-performance Computing University of br time performs as well as the best hardware schemes Trace scheduling
53 Automatic Creation of an Autonomous Agent: Genetic Evolution of a.. - Floreano, Mondada (1994)(Correct)
The paper describes the results of the evolutionary development of a real, neural-network driven mobile robot. The evolutionary approach to the development of neural controllers for autonomous agents ... / simulations are fast. High performance serial machines and massively br links or malfunctioning of some hardware components do not strongly
52 Assigning Confidence to Conditional Branch Predictions - Jacobsen, Rotenberg, Smith (1996)(Correct)
Many high performance processors predict conditional
branches and consume processor resources based
on the prediction. In some situations, resource allocation
can be better optimized if a confidence l... / Abstract Many high performance processors predict br such optimizations we consider hardware mechanisms that partition
51 Improving Release-Consistent Shared Virtual Memory using Automatic.. - Iftode (1996)(Correct)
Shared virtual memory is a software technique to
provide shared memory on a network of computers
without special hardware support. Although several
relaxed consistency models and implementations are
q... / nd International Symposium on High-Performance Comp uter Architecture br of computers without special hardware support. Although several
50 MPI: A Message Passing Interface - Forum (1993)(Correct)
This paper presents an overview of mpi, a proposed
standard message passing interface for MIMD distributed
memory concurrent computers. The design
of mpi has been a collective effort involving researc... / and organization of the High Performance Fortran Forum. Subcommittees br or in some cases provide hardware or low-level system support for
50 The Galley Parallel File System - Nieuwejaar, Kotz (1996)(Correct)
As the I/O needs of parallel scientific applications increase,
file systems for multiprocessors are being designed to provide
applications with parallel access to multiple disks. Many
parallel file sy... / that is intended to deliver high performance to a variety of applications br has not been keeping pace. Hardware limitations are one reason for
49 Applications-Driven Parallel I/O - Galbreath, Gropp, Levine(Correct)
We investigate the needs of some massively parallel
applications running on distributed-memory parallel
computers at Argonne National Laboratory and
identify some common parallel I/O operations. For
t... / a file be accessed either as a high-performance parallel file or as a br the algorithms software and hardware involved must be efficient and
49 Increasing the Instruction Fetch Rate via Multiple Branch Prediction.. - Yeh (1993)(Correct)
High performance computer implementation today is
increasingly directed toward parallelism in the hardware.
Superscalar machines, where the hardware can
issue more than one instruction each cycle, are... / Michigan Abstract High performance computer implementation today br directed toward parallelism in the hardware. Superscalar machines where the
49 High-Performance Parallel Programming in Java: Exploiting Native.. - Getov (1998)(Correct)
With most of today's fast scientific software written in Fortran and C, Java has
a lot of catching up to do. In this paper we discuss how new Java programs can
capitalize on high-performance libraries... / High-Performance Parallel Programming in Java br implementations on a range of hardware architectures. ScaLAPACK is
48 The Design of Nectar: A Network Backplane for Heterogeneous.. - Arnould (1989)(Correct)
Nectar is a "network backplane" for use in heterogeneous
multicomputers. The initial system consists of a starshaped
fiber-optic network with an aggregate bandwidth
of 1.6 gigabits/second and a switch... / and specialized high-performance machines. It is often not br for Nectar and describes its hardware and software. The presentation
48 Processor Allocation Policies for Message-Passing Parallel Computers - Mccann (1994)(Correct)
When multiple jobs compete for processing resources on a parallel computer, the operating system kernel's processor allocation policy determines how many and which processors to allocate to each. This... / the potential for achieving high performance scalability and br . The Hardware and Software Environment
47 Legion: The Next Logical Step Toward a Nationwide Virtual Computer - Grimshaw, Wulf, French, Weaver.. (1994)(Correct)
The coming of giga-bit networks makes possible the realization of a single nationwide virtual computer
comprised of a variety of geographically distributed high-performance machines and workstations. ... / of geographically distributed high-performance machines and workstations. To br These are software problems the hardware challenges are being addressed
47 A Lock-Free Multiprocessor OS Kernel - Massalin, Pu (1991)(Correct)
Typical shared-memory multiprocessor OS kernels use interlocking, implemented as spinlocks
or waiting semaphores. We have implemented a complete multiprocessor OS kernel (including
threads, virtual me... / an OS kernel achieves very high performance. The remaining of this paper br A. Hardware Measurement Tools
47 Optimizing Triangle Strips for Fast Rendering - Evans, Skiena, Varshney (1996)(Correct)
Almost all scientific visualization involving surfaces is currently done via triangles.
The speed at which such triangulated surfaces can be displayed is crucial to interactive
visualization and is bo... / virtual reality. The speed of high-performance rendering engines on br vertex. Special-purpose rendering hardware is needed to fully exploit the
47 A Comparative Analysis of Schemes for Correlated Branch Prediction - Young, Gloy, Smith (1995)(Correct)
Modern high-performance architectures require extremely accurate branch prediction to overcome the performance limitations of conditional branches. We present a framework that categorizes branch predi... / Abstract Modern high-performance architectures require br led to the development of both hardware and software schemes that achieve
47 PPFS: A High Performance Portable Parallel File System - Huber, Jr., Elford, Reed, Chien.. (1995)(Correct)
Rapid increases in processor performance over the past decade have outstripped performance improvements in input/output devices, increasing the importance of input /output performance to overall syste... / PPFS A High Performance Portable Parallel File System br on a variety of Intel Paragon XP S hardware configurations using the Intel
45 An Argument for Simple COMA - Ashley Saulsbury (1995)(Correct)
We present design details and some initial performance
results of a novel scalable shared memory
multiprocessor architecture. This architecture features
the automatic data migration and replication ca... / without the accompanying hardware complexity. A software layer br DVSM systems leaving simpler hardware to maintain shared memory
45 On the Design of Chant: A Talking Threads Package - Matthew Haines (1994)(Correct)
Lightweight threads are becoming increasingly useful in supporting parallelism and
asynchronous control structures in applications and language implementations. However,
lightweight thread packages tr... / support our extensions to the High Performance Fortran standard for br of a Unix process includes the hardware register kernel stack
45 Lazy Release Consistency for Distributed Shared Memory - Keleher (1995)(Correct)
A software distributed shared memory (DSM) system allows shared memory parallel
programs to execute on networks of workstations. This thesis presents a new class
of protocols that has lower communicat... / a viable alternative for high-performance parallel processing. br opportunities to bring high performance and high usability to a wide
45 Can Logic Programming Execute as Fast as Imperative Programming?.. - Van Roy (1990)(Correct)
Bibliographic references of "Can Logic Programming Execute as Fast as Imperative Programming?", Van Roy unknown
170
79. P. Voda, Trilogy version 1.0, Complete Logic Systems, Inc, September 1987.
80. ... / . . T. P. Dobry A High Performance Architecture for Prolog br . H. Nakashima and K. Nakajima Hardware Architecture of the Sequential
44 A High-Performance Microarchitecture with Hardware-Programmable.. - Razdan, Smith (1994)(Correct)
This paper explores a novel way to incorporate hardware-programmable
resources into a processor microarchitecture to improve the
performance of general-purpose applications. Through a coupling
of comp... / November A High-Performance Microarchitecture with br Microarchitecture with Hardware-Programmable Functional Units
44 Scalable Performance Environments for Parallel Systems - Reed (1991)(Correct)
As parallel systems expand in size and complexity, the absence of performance tools for these parallel systems exacerbates the already difficult problems of application program and system software per... / As the class of high-performance computer systems extends br analysis levels including hardware system software and
44 Automatic Blocking of Nested Loops - Schreiber, Dongarra (1990)(Correct)
Blocked algorithms have much better properties of data locality and therefore
can be much more efficient than ordinary algorithms when a memory hierarchy is
Supported by the NAS Systems Division an... / The concomitant fact in highperformance computing especially br The memory may be managed by hardware on a demand basis cache or
44 Scalable Computing - McColl (1996)(Correct)
Scalable computing will, over the next few years, become
the normal form of computing. In this paper we present a unified framework,
based on the BSP model, which aims to serve as a foundation for
t... / universal offering high performance in a predictable way on any br The two parts of that industry hardware and software are quite
43 Real-Time Occlusion Culling for Models with Large Occluders - Coorg (1997)(Correct)
Efficiently identifying polygons that are visible from a
dynamic synthetic viewpoint is an important problem
in computer graphics. Typically, visibility determination
is performed using the z-buffer a... / Despite the availability of high performance z-buffer hardware a br of high performance z-buffer hardware a significant fraction of
43 Pipeline Gating: Speculation Control For Energy Reduction - Manne (1998)(Correct)
Branch prediction has enabled microprocessors to increase instruction
level parallelism (ILP) by allowing programs to speculatively
execute beyond control boundaries. Although speculative
execution is... / performance reduces power in high-performance microprocessors without br In particular we introduce a hardware mechanism called pipeline gating
43 Scheduling From the Perspective of the Application - Berman, Wolski (1996)(Correct)
Metacomputing is the aggregation of distributed and
high-performance resources on coordinated networks. With
careful scheduling, resource-intensive applications can be
implemented efficiently on metac... / of distributed and high-performance resources on coordinated br taking advantage of multiprocessor hardware features to execute multiple
43 ATM Internetworking - Alles (1995)(Correct)
this paper was presented at Engineering InterOp, Las Vegas, March 1995.
1.0 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 unknown ATM Internetworking
Anthony Alle... / of hardware intensive high performance ATM switches the deployment br do facilitate the development of hardware intensive high performance ATM
43 Software Support for Speculative Loads - Rogers (1992)(Correct)
This paper describes a simple hardware mechanism and
related compiler support for software-controlled speculative
loads. The compiler issues speculative load instructions
based on anticipated data re... / to hide memory latency in high-performance processors. The architectural br This paper describes a simple hardware mechanism and related compiler
43 Maximizing Parallelism and Minimizing Synchronization with Affine.. - Lim, Lam (1998)(Correct)
This paper presents an algorithm to find the optimal affine partitions that maximize the degree of parallelism and minimize the degree of synchronization in programs with arbitrary loop nestings and a... / suggests that achieving high performance on such machines is br needed to exploit a particular hardware configuration. The algorithm
42 MPI-FM: High Performance MPI on Workstation Clusters - Lauria, Chien (1997)(Correct)
Despite the emergence of high speed LANs, the communication performance available
to applications on workstation clusters still falls short of that available on MPPs.
A new generation of efficient mes... / MPI-FM High Performance MPI on Workstation Clusters br is needed to take advantage of the hardware performance and to deliver it to
42 Dynamic Memory Disambiguation Using the Memory Conflict Buffer - Gallagher (1994)(Correct)
To exploit instruction level parallelism, compilers for
VLIW and superscalar processors often employ static
code scheduling. However, the available code reordering
may be severely restricted due to am... / Hwu Center for Reliable and High-Performance Computing University of br This paper introduces a simple hardware mechanism referred to as the
41 MGS: A Multigrain Shared Memory System - Yeung (1996)(Correct)
Parallel workstations, each comprising 10-100 processors, promise cost-effective general-purpose multiprocessing. This paper explores the coupling of such small- to medium-scale shared memory multipro... / communication interfaces high performance VLSI networks and br enables the collaboration of hardware and software shared memory and
41 Metasystems: An Approach Combining Parallel Processing and.. - Grimshaw (1994)(Correct)
A metasystem is a single computing resource composed of a heterogeneous group of autonomous computers
linked together by a network. The interconnection network needed to construct large metasystems
wi... / are not a serious obstacle to high performance but that load imbalance br and coercion and schedules all hardware resources across the different
41 An Object-Oriented Concurrent Reflective Language for Dynamic.. - Masuhara (1994)(Correct)
this paper proposes an object-oriented concurrent reflective language unknown (in IPSJ SIG Notes, 94-PRG-18, pp.57--64, 1994)
23567?;!ABD9-E4?=@?
:/C,35GKIFHJ81.0
NP)R
SWX"
ZOO(`[]Y+VT+LQ
nlpl~aszhiy... / DRM. One example is that HPF High Performance Fortran has directives for br to the application and or hardware architecture for efficient
41 Beowulf: Harnessing the Power of Parallelism in a Pile-of-PCs - Ridge (1997)(Correct)
The rapid increase in performance
of mass market commodity microprocessors and
significant disparity in pricing between PCs and
scientific workstations has provided an opportunity
for substantial gain... / Thomas Sterling High Performance Computing Systems Group Jet br using standard commodity hardware and software components. This
41 Maximizing Parallelism and Minimizing Synchronization with Affine.. - Lim, Lam (1997)(Correct)
This paper presents the first algorithm to find the optimal
affine transform that maximizes the degree of parallelism
while minimizing the degree of synchronization in a program
with arbitrary loop ne... / parallel code. Getting high performance on a multiprocessor requires br to exploit a particular parallel hardware configuration. From these affine
41 The Design of the TAO Real-Time Object Request Broker - Schmidt, Levine, Mungee (1999)(Correct)
Many real-time application domains can benefit from flexible
and open distributed architectures, such as those defined
by the CORBA specification. CORBA is an architecture
for distributed object compu... / design of TAO which is our high-performance real-time CORBAcompliant br backplanes and shared memory. Hardware CORBA shields applications from
40 A Survey of QoS Architectures - Aurrecoechea, Campbell, Hauw (1996)(Correct)
Over the past several years there has been a considerable amount of research within the field of quality of service (QoS) support for distributed multimedia systems. To date, most of the work has been... / communication protocols for high performance in accordance with systems br protocols and the use of hardware assists for efficient protocol
40 Synchronization and Communication in the T3E Multiprocessor - Scott (1996)(Correct)
This paper describes the synchronization and communication primitives of the Cray T3E multiprocessor, a shared memory system scalable to 2048 processors. We discuss what we have learned from the T3D p... / programming model e.g.High Performance Fortran HPF or the br memories. Load store performance highlights the memory pipelining
40 Performance Analysis of Embedded Software Using Implicit Path.. - Li, Malik (1995)(Correct)
Embedded computer systems are characterized by the presence of a processor running application specific dedicated software. A large number of these systems must satisfy realtime constraints. This pape... / software. For example in a high-performance engine controller design br selection of the partition between hardware and software as well as
40 Run-time Adaptive Cache Hierarchy Management via Reference Analysis - Johnson, Hwn (1997)(Correct)
Improvements in main memory speeds have not kept pace with increasing processor clock frequency and improved exploitation of instruction-level parallelism. Consequently, the gap between processor and ... / Hwu Center for Reliable and High-Performance Computing University of br scheme where the hardware determines data placement based
40 Minimizing Register Requirements under Resource-Constrained.. - Govindarajan, Altman, Gao (1995)(Correct)
The rapid advances in high-performance computer architecture and compilation techniques provide both challenges and opportunities to exploit the rich solution space of software pipelined loop schedule... / The rapid advances in high-performance computer architecture and br in computer architecture -hardware and software technology -
40 A Language-Based Approach To Protocol Implementation - Abbott (1993)(Correct)
15
CHAPTER 1: INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : 17
1.1 Introduction to Network Software : : : :... / protocol layering entails a high performance cost developers are br as data programs and specialized hardware. Communicating data between
39 a Color and Sound in Algorithm Animation - Brown (1991)(Correct)
Although systems for animating algorithms are becoming more powerful and easier
for programmers to use, not enough attention has been given to the techniques
that an algorithm animator needs to create... / work includes exploring high-performance personal computing br Our approach to both hardware and software research is to
38 Embra: Fast and Flexible Machine Simulation - Witchel, Rosenblum (1996)(Correct)
This paper describes Embra, a simulator for the processors, caches, and memory systems of uniprocessors and cache-coherent multiprocessors. When running as part of the SimOS simulation environment, Em... / used by Embra to achieve high performance focusing on the br multiprocessors. Embra models the hardware of these machines in enough
38 The Desk Area Network - Hayter (1991)(Correct)
A novel architecture for use within an end computing system is described.
This attempts to extend the concepts used in modern high speed networks
into computer system design. A multimedia workstation ... / and memory systems in high performance multiprocessor machines br gain further understanding of the hardware and software architecture of such
38 Public International Benchmarks for Parallel Computers - Hockney, Berry (1994)(Correct)
this report: David Bailey (NASA Ames Research Center)
, Michael Berry (University of Tennessee), Jack Dongarra (University of Tennessee/Oak
Ridge National Laboratory), Vladimir Getov (University of So... / problems Chapter- and High Performance Fortran kernels to test the br . . Hardware Performance
38 Architectural Support for Single Address Space Operating Systems - Koldinger, Chase, Eggers (1992)(Correct)
Recent microprocessor announcements show a trend toward
wide-address computers: architectures that support
64 bits of virtual address space. Such architectures
facilitate fundamentally new operating s... / This simplifies the use of high-performance virtually indexed data br protection lookaside buffer a hardware structure that implements this
38 Processor Coupling: Integrating Compile Time and Runtime Scheduling.. - Keckler (1992)(Correct)
The technology to implement a single-chip node composed of 4 high-performance floating-point ALUs will be available by 1995. This paper presents processor coupling,a mechanism for controlling multiple... / node composed of high-performance floating-point ALUs will be br Trace system At runtime the hardware scheduling mechanism interleaves
38 A Novel Framework of Register Allocation for Software Pipelining - Ning, al. (1993)(Correct)
ing with
credit is permitted. To copy otherwise, to republish, to post on servers, or to
redistribute to lists, requires prior specific permission and/or a fee. Request
permissions from Publications D... / pipelining can be applied to high-performance pipelined processor br schemes with or without special hardware support are discussed. We have
38 Speculative Versioning Cache - Gopal (1998)(Correct)
Dependences among loads and stores whose addresses
are unknown hinder the extraction of instruction level parallelism
during the execution of a sequential program. Such
ambiguous memory dependences ca... / International Symposium on High-Performance Computer Architecture. br instructions from a common set of hardware buffers e.g. reservation
37 Parallel Performance Prediction Using Lost Cycles Analysis - Crovella, LeBlanc (1994)(Correct)
Most performance debugging and tuning of parallel programs is based on the "measure-modify" approach, which is heavily dependent on detailed measurements of programs during execution. This approach is... / Research Assistantship in High Performance Computing administered by the br e.g.load imbalance and hardware e.g.resource contention A
37 The Microarchitecture of Superscalar Processors - Smith, Sohi (1995)(Correct)
Superscalar processing is the latest in a long series of innovations aimed at producing ever-faster microprocessors. By exploiting instruction-level parallelism, superscalar processors are capable of ... / method for implementing high performance microprocessors. . . The br Processing Model Because hardware and software evolve it is rare
37 Horus: A Flexible Group Communications System - van Renesse, Birman, Glade, Guo.. (1996)(Correct)
The Horus system offers flexible group communication support for distributed applications.
It is extensively layered and highly reconfigurable, allowing applications to only pay
for services they use,... / novel mechanisms in support of high performance reliable group br has become popular it wraps a hardware group abstraction with a simple
37 Data Access Microarchitectures for Superscalar Processors with.. - Chen (1991)(Correct)
The performance of superscalar processors is more sensitive to the memory system delay than their single-issue predecessors. This paper examines alternative data access microarchitectures that effecti... / Hwu Center for Reliable and High-Performance Computing University of br of a separate prefetch buffer. Hardware issues concerning both
37 The Virtual Windtunnel: An Environment for the Exploration of.. - Bryson, Levit (1991)(Correct)
We describe a recently completed implementation of a virtual environment for exploring numerically-generated three-dimensional unsteady flowfields. A boom-mounted six degree of freedom head-position-s... / are all feasible using modern high performance graphics workstations and or br the implementation first the hardware and then the software. We review
36 On Multicast Wormhole Routing in Multicomputer Networks - Boppana, Chalasani, Raghavendra (1994)(Correct)
We show that deadlocks due to dependencies on consumption channels is a fundamental problem in multicast wormhole routing. This issue of deadlocks has not been addressed in many previously proposed ... / is important for achieving high performance in parallel computers. The br multicomputers with minimal hardware support. We present a simulation
36 A Hardware / Software Codesign Methodology for DSP Applications - Kalavade, Lee (1993)(Correct)
Embedded systems typically require a mix of hardware and software components. To design these
systems, tools should support simultaneous specification, synthesis, and simulation of the software
and ha... / simple yet they demand high performance and throughput. Furthermore br A Hardware Software Codesign Methodology
36 Sentinel Scheduling for VLIW and Superscalar Processors - Mahlke (1992)(Correct)
Speculative execution is an important source of parallelism
for VLIW and superscalar processors. A serious challenge
with compiler-controlled speculative execution is to accurately
detect and report a... / Hwu Center for Reliable and High-Performance Computing University of br overcome by providing sufficient hardware storage to buffer results until
35 Analyzing Stability in Wide-Area Network Performance - Balakrishnan, Seshan, Stemm, Katz (1997)(Correct)
The Internet is a very large scale, complex, dynamical system that
is hard to model and analyze. In this paper, we develop and analyze
statistical models for the observed end-to-end network performanc... / and software used at this high-performance server are available from br of the Web site's network and the hardware used at the site. During the
35 The Synergy Between Non-blocking Synchronization and Operating System .. - Greenwald, Cheriton (1996)(Correct)
Non-blocking synchronization has significant advantages over
blocking synchronization: however, it has not been used to a
significant degree in practice. We designed and implemented
a multiprocessor o... / and run-time library for high-performance reliability and modularity. br for our approach and a potential hardware implementation. Section
34 A High-performance Endsystem Architecture for Real-time CORBA - Douglas Schmidt (1997)(Correct)
Many application domains (such as avionics, telecommunications,
and multimedia) require real-time guarantees from
the underlying networks, operating systems, and middleware
components to achieve their... / A High-performance Endsystem Architecture for br ATM and Fast Ethernet ffl Hardware such as RISC vs. CISC. The
34 Software DSM Protocols that Adapt between Single Writer and Multiple.. - Cristiana Amza (1997)(Correct)
We present two software DSM protocols that dynamically
adapt between a single writer (SW) and a
multiple writer (MW) protocol based on the application
's sharing patterns. The first protocol (WFS)
ad... / In Proceedings of the Second High Performance Computer Architecture br memory DSM on commodity hardware. Both single writer SW and
34 Falcon: On-line Monitoring and Steering of Large-Scale Parallel.. - Gu (1995)(Correct)
Falcon is a system for on-line monitoring and steering of large-scale parallel programs. The purpose of such interactive steering is to improve its performance or to affect its execution behavior. The... / Introduction The high performance of current parallel br basis. Falcon runs on several hardware platforms including the Kendall
34 Separating Data and Control Transfer in Distributed Operating Systems - Thekkath, Levy, Lazowska (1994)(Correct)
Advances in processor architecture and technology have resulted in workstations in the 100+ MIPS
range. As well, newer local-area networks such as ATM promise a ten- to hundred-fold increase in
throug... / transfer of that byte. Even in high-performance RPC systems control transfer br of distributed systems at the hardware level and that distributed
34 High-Performance Local Area Communication With Fast Sockets - Rodrigues, Anderson, Culler (1997)(Correct)
Modern switched networks such as ATM and Myrinet
enable low-latency, high-bandwidth communication.
This performance has not been realized by current
applications, because of the high processing overhe... / High-Performance Local Area Communication With br to the ability of modern network hardware however. While TCP is capable
34 PASSION: Parallel And Scalable Software for Input-Output - Choudhary, Bordawekar, Harry.. (1994)(Correct)
We are developing a software system called PASSION: Parallel And Scalable Software for InputOutput
which provides software support for high performance parallel I/O. PASSION provides support
at the la... / provides software support for high performance parallel I O. PASSION br nCUBE etc. provide some kind of hardware and software support for parallel
34 Parallel Simulation Today - Nicol, Fujimoto (1994)(Correct)
This paper surveys topics that presently define the state of the art in parallel simulation. Included in the
tutorial are discussions on new protocols, mathematical performance analysis, time parallel... / and ready availability of high-performance multiprocessors. The number br analysis time parallelism hardware support for parallel simulation
34 Branch Classification: a New Mechanism for Improving Branch Predictor .. - Chang (1994)(Correct)
There is wide agreement that one of the most important
impediments to the performance of current and future
pipelined superscalar processors is the presence of conditional
branches in the instruction ... / algorithm is important to a high-performance microprocessor. If we br hard-topredict branches or the hardware can special case the handling of
33 Message-Passing Performance of Various Computers - Dongarra, Dunigan (1995)(Correct)
This report compares the performance of different computer systems for basic message-passing. Latency and bandwidth are measured on Convex, Cray, IBM, Intel, KSR, Meiko, nCUBE, NEC, SGI, and TMC multi... / Laboratory The vendors of high-performance computing have turned to RISC br processors are interconnected by hardware and software to attack various
33 Data Transformations for Eliminating Conflict Misses - Rivera, Tseng (1998)(Correct)
Many cache misses in scientific programs are due to conflicts
caused by limited set associativity. We examine two
compile-time data-layout transformations for eliminating conflict
misses, concentratin... / speeds programs can achieve high performance only if they use caches br use caches effectively. Due to hardware constraints caches have limited
33 Filters: QoS Support Mechanisms for Multipeer Communications - Yeadon, García, Hutchison, Shepherd (1996)(Correct)
The nature of distributed multimedia applications
is such that they require multipeer communication support
mechanisms. The multimedia traffic needs to be delivered to
end-systems, networks and end-us... / full quality media playout at high-performance workstations while at the br of the expense in the specialised hardware required to implement them. As
32 Home-based SVM protocols for SMP clusters: Design and Performance - Samanta (1998)(Correct)
As small-scale shared memory multiprocessors proliferate in the market, it is very attractive to construct largescale systems by connecting smaller multiprocessors together in software using efficient... / In The nd IEEE Symposium on High-Performance Computer Architecture Feb. br advantage of the intra-node hardware cache coherence and
32 Server Operating Systems - Kaashoek, Engler, Ganger, Wallach (1996)(Correct)
We introduce server operating systems, which are sets of abstractions and runtime support for specialized, highperformance
server applications. We have designed and are implementing a prototype server... / support for specialized highperformance server applications. We have br and that can safely timeshare the hardware platform with other applications.
32 ASHs: Application-Specific Handlers for High-Performance Messaging - Wallach (1996)(Correct)
Application-specific safe message handlers (ASHs) are designed to provide applications with hardware-level network performance. ASHs are user-written code fragments that safely and efficiently execute... / Handlers for High-Performance Messaging Deborah A. br to provide applications with hardware-level network performance. ASHs
32 High-Performance Schedulers - Berman(Correct)
Introduction 1
Scheduling -- the assignment of work to resources within a specified
timeframe.
0.1 Introduction
The computational grid will provide a platform for a new generation of applications.
Gr... / High-Performance Schedulers Francine Berman br Both the software and hardware resources of the underlying
32 A Wireless Broadband Ad-Hoc ATM Local-Area Network - Eng, Karol, Veeraraghavan, Ayanoglu, .. (1995)(Correct)
this paper, the exact method by which the look-up table is generated is not important.
In this section we are interested in the updates to the routing tables at each PBS in the
K. Y. Eng et al. / A ... / are designed for simplicity high performance and modular implementations. br connections in the network. PBS hardware and software architectures are
32 Paging Tradeoffs in Distributed-Shared-Memory Multiprocessors - Douglas Burger (1994)(Correct)
Massively parallel processors have begun using commodity operating systems that
support demand-paged virtual memory. To evaluate the utility of virtual memory, we measured
the behavior of seven shar... / is a ubiquitous feature of high-performance workstations but has been br A DSM machine model Our target hardware system contains processing
32 Bridge: A High-Performance File System for Parallel Processors - Dibble (1988)(Correct)
Faster storage devices cannot solve the I/O bottleneck problem
for large multiprocessor systems if data passes through a file system
on a single processor. Implementing the file system as a parallel
p... / Bridge A High-Performance File System for Parallel br the art in parallel storage device hardware can deliver effectively
31 An Architecture for Optimal All-to-All Personalized Communication - Hinrichs, Kosak, O'Hallaron.. (1994)(Correct)
In all-to-all personalized communication (AAPC), every node of a parallel system sends a potentially
unique packet to every other node. AAPC is an important primitive operation for modern parallel
com... / of data parallel compilers for High Performance Fortran Hig include br utilizing all links. A simple hardware addition for synchronized
31 DPGA-Coupled Microprocessors: Commodity ICs for the Early 21st Century - Andr Dehon (1994)(Correct)
During the past decade the microprocessor has become
a key commodity component for building all kinds of computational
systems. During this time frame large, reconfigurable
logic arrays have exploited... / microprocessors. Today's high-performance microprocessors sport - br to specialize the processing hardware to match the application
31 Virtual Network Transport Protocols for Myrinet - Chun (1998)(Correct)
This paper describes a protocol for a general-purpose cluster communication system that supports multiprogramming with virtual networks, direct and protected network access, reliable message delivery ... / these systems achieved high-performance oftentimes on par with br processor and interconnection hardware. This sections presents a brief
31 Evolving the UNIX System Interface to Support Multithreaded Programs - Mcjones (1987)(Correct)
Multiple threads (program counters executing in the same address space) make it easier to write programs that deal with related asynchronous activities and that execute faster on shared-memory multipr... / work includes exploring high-performance personal computing br Our approach to both hardware and software research is to
30 Should Scalable Parallel Computers Support Efficient Hardware.. - Ni (1995)(Correct)
Multicast communication is a frequently invoked
communication pattern in many parallel algorithms.
Although some parallel computer vendors have tried
to directly support multicast in hardware, most ve... / multicast. ffl HPF High Performance Fortran In a highlevel br Computers Support Efficient Hardware Multicast Lionel M. Ni
30 GLUnix: a Global Layer Unix for a Network of Workstations - Ghormley, Petrou, Rodrigues, Vahdat, .. (1997)(Correct)
Recent improvements in network and workstation performance have made clusters an attractive architecture for diverse
workloads, including sequential and parallel interactive applications. However, alt... / the availability of commodity high-performance workstations and networks br However although viable hardware solutions are available today
30 The Peregrine High-Performance RPC System - Johnson (1993)(Correct)
This paper
identifies some of the key performance optimizations used in Peregrine, and quantitatively assesses
their benefits.
Keywords: Peregrine, remote procedure call, interprocess communication, p... / The Peregrine High-Performance RPC System David B. br to the optimum allowed by the hardware limits while still supporting
29 Reduced Overhead Logging for Rollback Recovery in Distributed Shared.. - Suri (1995)(Correct)
Rollback techniques that use message logging and deterministic replay can be used in parallel systems to recover a failed node without involving other nodes. Distributed shared memory (DSM) systems ca... / Center for Reliable and High-Performance Computing Mountain br paging mechanism or in hardware using directory-based cache
29 Replication Using Group Communication Over a Partitioned Network - Amir (1995)(Correct)
In systems based on the client-server model, a single server may serve many clients and
the heavy load on the server may cause the response time to be adversely affected. In such
circumstances, replic... / necessarily consistent reply. High performance of the architecture is br the available non-reliable hardware multicast for efficient
29 The Bird-Meertens Formalism as a Parallel Model - Skillicorn (1993)(Correct)
The expense of developing and maintaining software is the major obstacle to the routine use of parallel computation. Architecture independent programming offers a way of avoiding the problem, but the ... / typically much lower than a high performance uniprocessor. The difficulty br surely does not lie with parallel hardware whose performance follows a
29 Multi-Protocol Active Messages on a Cluster of SMP's - Lumetta, Mainwaring, Culler (1997)(Correct)
Clusters of multiprocessors, or Clumps, promise to be
the supercomputers of the future, but obtaining high
performance on these architectures requires an understanding
of interactions between the mult... / of the future but obtaining high performance on these architectures br and analyzes the effects of the hardware and software architectures on
29 Control Flow Speculation in Multiscalar Processors - Jacobson, al. (1997)(Correct)
The Multiscalar architecture executes a single
sequential program following multiple flows of control. In
the Multiscalar hardware, a global sequencer, with help
from the compiler, takes large steps t... / International Symposium on High Performance Computer Architecture br of control. In the Multiscalar hardware a global sequencer with help
28 Auto-Blocking Matrix-Multiplication or Tracking BLAS3 Performance.. - Frens, Wise (1997)(Correct)
An elementary, machine-independent, recursive algorithm for matrix multiplication C+=A*B provides implicit blocking at every level of the memory hierarchy and tests out faster than classically optimal... / to run well on extant high-performance systems. . Outline of br the compiler's knowledge of the hardware parameters to fit the target
28 Texture Mapping as a Fundamental Drawing Primitive - Haeberli, Segal (1993)(Correct)
Texture mapping has traditionally been used to add
realism to computer graphics images. In recent years,
this technique has moved from the domain of software
rendering systems to that of high performa... / rendering systems to that of high performance graphics hardware. But br that of high performance graphics hardware. But texture mapping hardware
28 ADAPTIVE: A Dynamically Assembled Protocol Transformation.. - Schmidt, Box, Suda (1993)(Correct)
Computer communication systems must undergo significant
changes to keep pace with the increasingly demanding and
diverse multimediaapplications that will run on the next generation
of high-performance... / run on the next generation of high-performance networks. To facilitate these br and process management and hardware de- tures that support
28 Fast Interrupt Priority Management in Operating System Kernels - Stodolsky (1993)(Correct)
In this paper we describe a new, low-overhead technique for manipulating processor interrupt state in an operating
system kernel. Both uniprocessor and multiprocessor operating systems protect against... / and motivate the need for a high performance mechanism. In Section we br maps well onto a diverse array of hardware from systems with a single
28 Relaxing Consistency in Recoverable Distributed Shared Memory - Janssens (1993)(Correct)
Relaxed memory consistency models tolerate increased
memory access latency in both hardware and software distributed
shared memory systems. In recoverable systems,
relaxing consistency has the added b... / Center for Reliable and High-Performance Computing Coordinated br memory access latency in both hardware and software distributed shared
28 Software Strategies for Portable Computer Energy Management - Lorch, Smith (1998)(Correct)
Limiting the energy consumption of computers, especially portables, is becoming increasingly important. Thus, new energy-saving computer components and architectures have been and continue to be devel... / features have both high performance and low power modes with br created by existing and suggested hardware innovations. Introduction
28 A More Efficient RMI for Java - Nester, Philippsen, Haumacher (1999)(Correct)
In current Java implementations, Remote Method Invocation (RMI) is too slow, especially for high performance computing. RMI is designed for wide-area and high-latency networks, it is based on a slow o... / is too slow especially for high performance computing. RMI is designed br used over non-TCP IP networking hardware. Section discusses the
28 Architecture Validation for Processors - Ho (1995)(Correct)
Modern, high performance microprocessors are extremely
complex machines which require substantial validation effort to
ensure functional correctness prior to tapeout. Generating the
corner cases to te... / . Abstract Modern high performance microprocessors are extremely br through simulation often using hardware-assist Gat to reduce
27 Optimizing Instruction Cache Performance for Operating System.. - Josep Torrellas (1995)(Correct)
High instruction cache hit rates are key to high performance.
One known technique to improve the hit rate of caches is to
use an optimizing compiler to minimize cache interference
via an improved layo... / cache hit rates are key to high performance. One known technique to br Firstly with the help of a hardware performance monitor we
27 Exploiting In-Kernel Data Paths to Improve I/O Throughput and CPU.. - Fall (1993)(Correct)
We present the motivation, design, implementation, and performance evaluation of a UNIX kernel mechanism
capable of establishing fast in-kernel data pathways between I/O objects. A new system call, sp... / for most scientific and high performance computing platforms today br Introduction Improved computer hardware has enabled the development of
27 BSPlib - The BSP Programming Library - Hill, McColl, Stefanescu, Goudreau.. (1997)(Correct)
This memory area is regarded as unregistered. 6. While registration is designed for "full duplex" communication, a process can do half duplex communication by, appropriately, registering an area of si... / be able to run unchanged with high performance on any general purpose br provide a clear focus for future hardware developments. For a model to
27 Stage Scheduling: A Technique to Reduce the Register Requirements of.. - Eichenberger, Davidson (1995)(Correct)
Modulo scheduling is an efficient technique for
exploiting instruction level parallelism in a variety
of loops, resulting in high performance code but
increased register requirements. We present a set... / of loops resulting in high performance code but increased register br be eliminated by using special hardware such as rotating register files
27 A High-Performance, Portable Implementation of the MPI Message.. - Gropp (1996)(Correct)
MPI (Message Passing Interface) is a specification for a standard library for message
passing that was defined by the MPI Forum, a broadly based group of parallel computer
vendors, library writers, an... / A High-Performance Portable Implementation of br being followed the current hardware and software environment for
27 Reverse If-Conversion - Warter, Mahlke, Hwu, Rau (1993)(Correct)
In this paper we present a set of isomorphic control transformations
that allow the compiler to apply local scheduling
techniques to acyclic subgraphs of the control flow graph.
Thus, the code motion ... / Hwu Center for Reliable and High-Performance Computing University of br instruction level parallelism hardware need a large pool of operations
27 BSPlib: The BSP Programming Library - Hill, McColl, Stefanescu, Goudreau.. (1998)(Correct)
BSPlib is a small communications library for bulk synchronous parallel (BSP) programming which consists of only 20 basic operations. This paper presents the full definition of BSPlib in C, motivates t... / be able to run unchanged with high performance on any general purpose br provide a clear focus for future hardware developments. For a model to
26 Beating the I/O Bottleneck: A Case for Log-Structured File Systems - Ousterhout (1988)(Correct)
CPU speeds are improving at a dramatic rate, while disk speeds are not. This technology shift suggests that many engineering and office applications may become so I/O-limited that they cannot benefit ... / these and other approaches to high-performance I O. With luck an br file system yet and the hardware for which it is most suitable
26 A Hardware Implementation of Pure Esterel - Berry (1991)(Correct)
Esterel is a synchronous concurrent programming language dedicated to reactive systems (controllers, protocols, man-machine interfaces, etc.). Esterel has an efficient standard software implementation... / programs in hardware to match high performance constraints. For example we br G erard Berry A Hardware Implementation of Pure Esterel