This directory is created automatically and some papers may be mislabeled. Only document within the CiteSeer database are listed. The directory is intended to provide entry points for browsing the database and is not intended to be authoritative. Papers may not appear in all relevant categories. For example, papers in a sub-category may not appear in higher level categories.
646.3 High Performance Messaging on Workstations: Illinois Fast Messages.. - Pakin, Lauria, Chien (1995)(Correct)
ing with credit is permitted. To
copy otherwise, to republish, to post on servers or to redistribute to
lists, requires prior specific permission and/or a fee. Request Permissions
from Publication Dep... / High Performance Messaging on Workstations br layers are needed to deliver the hardware performance to the application
594.0 PVM: A Framework for Parallel Distributed Computing - Sunderam (1990)(Correct)
The PVM system is a programming environment for the development and
execution of large concurrent or parallel applications that consist of many
interacting, but relatively independent, components. It ... / high-bandwidth external I O or high performance graphics thereby br environments already possess the hardware diversity required to solve such
540.7 The NAS Parallel Benchmarks - Bailey, Barszcz, Barton, Browning.. (1994)(Correct)
A new set of benchmarks has been developed for the performance
evaluation of highly parallel supercomputers. These benchmarks consist
of five "parallel kernel" benchmarks and three "simulated applicat... / community by the year a high-performance operational computing system br not kept pace with advances in hardware software and algorithms. In
534.0 Efficient Software-Based Fault Isolation - Wahbe, Lucco, Anderson, Graham (1993)(Correct)
One way to provide fault isolation among cooperating software modules is to place each in its own address space. However, for tightly-coupled modules, this solution incurs prohibitive context switch o... / Unfortunately there is a high performance cost to providing fault br poses a tradeoff relative to hardware fault isolation substantially
497.1 Complexity-Effective Superscalar Processors - Palacharla (1998)(Correct)
The performance trade-off between hardware complexity and clock speed in the design of superscalar microarchitectures is first investigated. Using the results of this trade-off analysis, the thesis pr... / with the goal of achieving high performance by reducing complexity. This br The performance trade-off between hardware complexity and clock speed in the
478.2 Multiscalar Processors - Sohi (1995)(Correct)
Multiscalar processors use a new, aggressive implementation
paradigm for extracting large quantities of instruction
level parallelism from ordinary high level language programs.
A single program is di... / in the program. To achieve high performance however modern processors br by a combination of software and hardware. The tasks are distributed to a
469.5 Performance of Various Computers Using Standard Linear Equations.. - Jack Dongarra (1995)(Correct)
This report compares the performance of different computer systems in solving dense systems
of linear equations. The comparison involves approximately a hundred computers, ranging from
a Cray Y-MP to ... / special features.Thus many high-performance machines may not have br as new machines are added and as hardware and software systems improve.
385.5 Totem: A Fault-Tolerant Multicast Group Communication System - Moser, Melliar-Smith, Agarwal.. (1996)(Correct)
When Totem delivers multicast messages, it invokes operations in the same total
order throughout the distributed system. The result: consistency of replicated
data and simplified programming of applic... / systems use inexpensive highperformance computers and can be br networks LANs and exploits the hardware broadcasts of such networks to
379.3 Compiler Transformations for High-Performance Computing - Bacon (1993)(Correct)
In the last three decades a large number of compiler transformations for optimizing programs have been implemented. Most optimizations for uniprocessors reduce the number of instructions executed by t... / Compiler Transformations for High-Performance Computing DAVID F. BACON br organizations. Simultaneously hardware designers are able to employ
372.7 The Design of the TAO Real-Time Object Request Broker - Schmidt, Levine, Mungee (1999)(Correct)
Many real-time application domains can benefit from flexible
and open distributed architectures, such as those defined
by the CORBA specification. CORBA is an architecture
for distributed object compu... / design of TAO which is our high-performance real-time CORBAcompliant br backplanes and shared memory. Hardware CORBA shields applications from
358.0 The Network Architecture of the Connection Machine CM-5 - Leiserson, Abuhamdeh, Douglas.. (1994)(Correct)
The Connection Machine Model CM-5 Supercomputer is a massively parallel computer system designed to offer performance in
the range of 1 teraflops (10
12
floating-point operations per second). The CM... / second The CM- obtains its high performance while offering ease of br back-door access to all system hardware to test system integrity and to
357.4 Application Performance and Flexibility on Exokernel Systems - Kaashoek, Engler, Ganger.. (1997)(Correct)
The exokernel operating system architecture safely gives untrusted
software efficient control over hardware and software resources by
separating management from protection. This paper describes an
exo... / applications to achieve high performance without sacrificing the br software efficient control over hardware and software resources by
356.5 The Nexus Approach to Integrating Multithreading and Communication - Foster (1996)(Correct)
Lightweight threads have an important role to play in parallel systems: they can be used to exploit shared-memory parallelism, to mask communication and I/O latencies, to implement remote memory acces... / threads and communication in high-performance distributed-memory systems. br handlers At the lower-performance higher-functionality end of the
342.0 A Metaobject Protocol for C++ - Chiba (1995)(Correct)
This paper presents a metaobject protocol (MOP)
for C++. This MOP was designed to bring the
power of meta-programming to C++ programmers.
It avoids penalties on runtime performance
by adopting a new m... / criteria of such a MOP are high performance and arbitrary br runtime. If this is not done in hardware the software will need to be
342.0 The Paradyn Parallel Performance Measurement Tools - Miller, Callaghan (1995)(Correct)
Paradyn is a performance measurement tool for parallel and distributed programs. Paradyn uses several novel technologies so that it scales to long running programs (hours or days) and large (thousand ... / an ARPA Graduate Fellowship in High Performance Computing. br to accept new operating system hardware and application specific
333.3 NetSolve: A Network Server for Solving Computational Science Problems - Casanova (1995)(Correct)
This paper presents a new system, called NetSolve, that allows users to access computational resources,
such as hardware and software, distributed across the network. The development of NetSolve
was m... / based Information Library for high performance computing Ninf project br computational resources such as hardware and software distributed across
331.4 Implementing Multiple Protection Domains in Java - Hawblitzel, Chang, Czajkowski, Hu.. (1998)(Correct)
Safe language technology can be used for protection
within a single address space. This protection is
enforced by the language's type system, which ensures
that references to objects cannot be forged... / language technology to offer high performance as well as protection in a br components without relying on hardware support. In a safe language
323.4 Shoring Up Persistent Applications - Carey, DeWitt, Franklin, Hall.. (1994)(Correct)
SHORE (Scalable Heterogeneous Object REpository) is a persistent object system under development at the University of Wisconsin. SHORE represents a merger of objectoriented database and file system te... / systems or on the kinds of high-performance multicomputer hardware br of high-performance multicomputer hardware needed for certain large scale
315.9 Exploiting Choice: Instruction Fetch and Issue on an Implementable.. - Tullsen, Eggers, Emer, Levy, Lo.. (1996)(Correct)
Simultaneous multithreading is a technique that permits multiple
independent threads to issue multiple instructions each cycle. In
previous work we demonstrated the performance potential of simultaneo... / architecture is derived from a high-performance outof order superscalar br wide-issue superscalar either in hardware structures or sizes. We present
315.9 Shared Memory Consistency Models: A Tutorial - Adve, Gharachorloo (1995)(Correct)
Parallel systems that support the shared memory abstraction are becoming
widely accepted in many areas of computing. Writing correct and efficient
programs for such systems requires a formal specifica... / the design and application of high performance scientific computers. We br advance. Technologies both hardware and software do not all advance
283.9 PVM: Parallel Virtual Machine - Geist, Beguelin, Dongarra, Jiang.. (1994)(Correct)
this reporting is to be turned on (1) or turned off (0) for subsequent
calls. A value of (2) will cause the program to exit after printing the error message (not
implemented in 3.2). The default is re... / J. Petrie Jr. The High Performance Fortran Handbook by br fast pace of change in computer hardware software and algorithms often
279.9 High-Performance Parallel Programming in Java: Exploiting Native.. - Getov (1998)(Correct)
With most of today's fast scientific software written in Fortran and C, Java has
a lot of catching up to do. In this paper we discuss how new Java programs can
capitalize on high-performance libraries... / High-Performance Parallel Programming in Java br implementations on a range of hardware architectures. ScaLAPACK is
275.3 Beowulf: A Parallel Workstation For Scientific Computation - Sterling, Becker, al. (1995)(Correct)
Network-of-Workstations technology is applied
to the challenge of implementing very high performance
workstations for Earth and space science applications.
The Beowulf parallel workstation employs 16 ... / challenge of implementing very high performance workstations for Earth and br tracks the evolution of commodity hardware as well as new ports of Linux to
254.5 A More Efficient RMI for Java - Nester, Philippsen, Haumacher (1999)(Correct)
In current Java implementations, Remote Method Invocation (RMI) is too slow, especially for high performance computing. RMI is designed for wide-area and high-latency networks, it is based on a slow o... / is too slow especially for high performance computing. RMI is designed br used over non-TCP IP networking hardware. Section discusses the
249.3 Fine-grain Access Control for Distributed Shared Memory - Schoinas (1994)(Correct)
This paper discusses implementations of fine-grain memory
access control, which selectively restricts reads and
writes to cache-block-sized memory regions. Fine-grain
access control forms the basis of... / shared-memory machines achieve high performance by using hardware-intensive br require little or no additional hardware. These techniques permit
245.7 Pipeline Gating: Speculation Control For Energy Reduction - Manne (1998)(Correct)
Branch prediction has enabled microprocessors to increase instruction
level parallelism (ILP) by allowing programs to speculatively
execute beyond control boundaries. Although speculative
execution is... / performance reduces power in high-performance microprocessors without br In particular we introduce a hardware mechanism called pipeline gating
245.7 Maximizing Parallelism and Minimizing Synchronization with Affine.. - Lim, Lam (1998)(Correct)
This paper presents an algorithm to find the optimal affine partitions that maximize the degree of parallelism and minimize the degree of synchronization in programs with arbitrary loop nestings and a... / suggests that achieving high performance on such machines is br needed to exploit a particular hardware configuration. The algorithm
243.2 The Zebra Striped Network File System - Hartman, Ousterhout (1993)(Correct)
Zebra is a network file system that increases throughput by striping file data across multiple servers. Rather than striping each file separately, Zebra forms all the new data from each client into a ... / file system. This provides high performance for writes of small files as br Sprite file system on the same hardware. For small files the Zebra
243.2 Zebra: A Striped Network File System - Hartman, Ousterhout (1993)(Correct)
This paper presents the design of Zebra, a striped network file system. Zebra applies ideas from log-structured file system (LFS) and RAID research to network file systems, resulting in a network file... / designed to provide both high performance and high availability. This br to provide both high performance and high availability. This is
236.7 The MIT Alewife Machine: A Large-Scale Distributed-Memory.. - Agarwal, Chaiken, Johnson, Kranz.. (1991)(Correct)
The Alewife multiprocessor project focuses on the architecture and design of a large-scale parallel machine. The machine uses a low dimension direct interconnection network to provide scalable communi... / processor. Introduction High-performance computer design is driven by br and concentrates on the novel hardware features of the machine including
232.9 High Speed Switch Scheduling for Local Area Networks - Anderson, Owicki, Saxe, Thacker (1993)(Correct)
Current technology trends make it possible to build communication networks that can support high performance distributed computing. This paper describes issues in the design of a prototype switch for ... / networks that can support high performance distributed computing. This br switch architectures use the same hardware for both scheduling and data
228.9 Extensibility, Safety and Performance in the SPIN Operating System - Bershad, Savage, Pardyak, Sirer.. (1995)(Correct)
This paper describes the motivation, architecture and
performance of SPIN, an extensible operating system.
SPIN provides an extension infrastructure together with
a core set of extensible services th... / by the need to support high performance applications which present br rather than runtime using either hardware or software mechanisms. Strict
220.3 Weak Ordering - A New Definition - Adve (1990)(Correct)
A memory model for a shared memory, multiprocessor
commonly and often implicitly assumed by programmers
is that of sequential consistency. This model
guarantees that all memory accesses will appear to... / that weak ordering facilitates high performance implementations but that br in terms of a set of rules for hardware that have to be made visible to
218.1 The Jalapeño Dynamic Optimizing Compiler for Java - Burke, Choi, Fink, Grove, Hind.. (1999)(Correct)
The Jalape~no Dynamic Optimizing Compiler is a key component of the Jalape~no Virtual Machine, a new Java 1 Virtual Machine (JVM) designed to support efficient and scalable execution of Java applicati... / Jalape no JVM is to deliver high performance and scalability of Java br Compiler to a variety of hardware platforms. Building a dynamic
217.1 Speculative Versioning Cache - Gopal (1998)(Correct)
Dependences among loads and stores whose addresses
are unknown hinder the extraction of instruction level parallelism
during the execution of a sequential program. Such
ambiguous memory dependences ca... / International Symposium on High-Performance Computer Architecture. br instructions from a common set of hardware buffers e.g. reservation
214.4 A Unified Formalization of Four Shared-Memory Models - Adve (1993)(Correct)
This paper presents a shared-memory model, data-race-free-1, that unifies four earlier models: weak ordering,
release consistency (with sequentially consistent special operations), the VAX memory mode... / can be guaranteed with high performance. However each model br caches common uniprocessor hardware optimizations such as write
205.7 Programmable Active Memories: Reconfigurable Systems Come of Age - Vuillemin, Bertin, Roncin, Shand.. (1996)(Correct)
Programmable Active Memories (PAM) are a novel form of universal reconfigurable hardware co-processor. Based on Field-Programmable Gate Array (FPGA) technology, a PAM is a virtual machine, controlled ... / The proposal is a standard high-performance microprocessor enhanced by a br form of universal reconfigurable hardware co-processor. Based on
192.5 Software Write Detection for a Distributed Shared Memory - Zekauskas (1994)(Correct)
Most software-based distributed shared memory (DSM) systems rely on the operating system's virtual memory interface to detect writes to shared data. Strategies based on virtual memory page protection ... / Software System Support for High Performance Multicomputing contract br shared memory but do not rely on hardware page protection such as Orca
188.5 Data Transformations for Eliminating Conflict Misses - Rivera, Tseng (1998)(Correct)
Many cache misses in scientific programs are due to conflicts
caused by limited set associativity. We examine two
compile-time data-layout transformations for eliminating conflict
misses, concentratin... / speeds programs can achieve high performance only if they use caches br use caches effectively. Due to hardware constraints caches have limited
183.0 The Existence of Refinement Mappings - Abadi, Lamport (1988)(Correct)
Refinement mappings are used to prove that a lower-level specification correctly
implements a higher-level one. We consider specifications consisting
of a state machine (which may be infinite-state) t... / work includes exploring high-performance personal computing br Our approach to both hardware and software research is to
182.9 Real-Time Occlusion Culling for Models with Large Occluders - Coorg (1997)(Correct)
Efficiently identifying polygons that are visible from a
dynamic synthetic viewpoint is an important problem
in computer graphics. Typically, visibility determination
is performed using the z-buffer a... / Despite the availability of high performance z-buffer hardware a br of high performance z-buffer hardware a significant fraction of
182.8 Home-based SVM protocols for SMP clusters: Design and Performance - Samanta (1998)(Correct)
As small-scale shared memory multiprocessors proliferate in the market, it is very attractive to construct largescale systems by connecting smaller multiprocessors together in software using efficient... / In The nd IEEE Symposium on High-Performance Computer Architecture Feb. br advantage of the intra-node hardware cache coherence and
181.0 The Amber System: Parallel Programming on a Network of Multiprocessors - Chase (1989)(Correct)
Microprocessor-based shared-memory multiprocessors are becoming widely available and
promise to provide cost-effective high-performance computing.
This paper describes a programming system called Ambe... / to provide cost-effective high-performance computing. This paper br in which coherence is provided by hardware means for locally-executing
178.7 MPI-FM: High Performance MPI on Workstation Clusters - Lauria, Chien (1997)(Correct)
Despite the emergence of high speed LANs, the communication performance available
to applications on workstation clusters still falls short of that available on MPPs.
A new generation of efficient mes... / MPI-FM High Performance MPI on Workstation Clusters br is needed to take advantage of the hardware performance and to deliver it to
177.1 Virtual Network Transport Protocols for Myrinet - Chun (1998)(Correct)
This paper describes a protocol for a general-purpose cluster communication system that supports multiprogramming with virtual networks, direct and protected network access, reliable message delivery ... / these systems achieved high-performance oftentimes on par with br processor and interconnection hardware. This sections presents a brief
175.8 Zeus: A System for Algorithm Animation and Multi-View Editing - Brown (1992)(Correct)
Algorithm animation is a form of program visualization that is concerned with
dynamic and interactive graphical displays of a program's fundamental operations.
This paper describes the Zeus algorithm ... / work includes exploring high-performance personal computing br Our approach to both hardware and software research is to
175.3 Unifying Data and Control Transformations for Distributed.. - Cierniak, Li (1994)(Correct)
We present a unified approach to locality optimization that employs both data and control transformations. Data transformations include changing the array layout in memory. Control transformations inv... / to be a serious obstacle to high performance on distributed shared memory br Most shared-memory machines both hardware and software based rely on data
175.3 Unifying Data and Control Transformations for Distributed Shared.. - Cierniak (1994)(Correct)
We present a unified approach to locality optimization that employs both data and control transformations. Data transformations include changing the array layout in memory. Control transformations inv... / to be a serious obstacle to high performance on distributed shared memory br Most shared-memory machines both hardware and software based rely on data
174.4 Maximizing Parallelism and Minimizing Synchronization with Affine.. - Lim, Lam (1997)(Correct)
This paper presents the first algorithm to find the optimal
affine transform that maximizes the degree of parallelism
while minimizing the degree of synchronization in a program
with arbitrary loop ne... / parallel code. Getting high performance on a multiprocessor requires br to exploit a particular parallel hardware configuration. From these affine
174.4 Beowulf: Harnessing the Power of Parallelism in a Pile-of-PCs - Ridge (1997)(Correct)
The rapid increase in performance
of mass market commodity microprocessors and
significant disparity in pricing between PCs and
scientific workstations has provided an opportunity
for substantial gain... / Thomas Sterling High Performance Computing Systems Group Jet br using standard commodity hardware and software components. This
171.0 Threads and Input/Output in the Synthesis Kernel - Massalin, Pu (1995)(Correct)
The Synthesis operating system kernel combines several techniques
to provide high performance, including kernel code synthesis, fine-grain
scheduling, and optimistic synchronization. Kernel code synth... / several techniques to provide high performance including kernel code br system implementations. Using hardware and software emulating a SUN
170.2 Run-time Adaptive Cache Hierarchy Management via Reference Analysis - Johnson, Hwn (1997)(Correct)
Improvements in main memory speeds have not kept pace with increasing processor clock frequency and improved exploitation of instruction-level parallelism. Consequently, the gap between processor and ... / Hwu Center for Reliable and High-Performance Computing University of br scheme where the hardware determines data placement based
168.9 Alternative Implementations of Two-Level Adaptive Branch Prediction - Yeh, Patt (1992)(Correct)
As the issue rate and depth of pipelining of high performance
Superscalar processors increase, the importance
of an excellent branch predictor becomes more vital to
delivering the potential performanc... / and depth of pipelining of high performance Superscalar processors br gathered. We compute the hardware costs of implementing each of the
165.2 The Relative Importance of Concurrent Writers and Weak Consistency.. - Keleher (1996)(Correct)
This paper presents a detailed comparison of the relative importance of allowing concurrent writers versus the choice of the underlying consistency model. Our comparison is based on single- and multip... / memory DSM systems achieve high performance through a combination of br to overall performance. Hardware shared memory systems typically
161.0 The Sprite Network Operating System - Ousterhout, Cherenson, Douglis.. (1988)(Correct)
Sprite is a new operating system for networked uniprocessor and multiprocessor workstations with large physical memories. It implements a set of kernel calls much like those of 4.3 BSD UNIX, with exte... / machines which provide high performance even for diskless br workstation with special hardware support for Lisp applications
159.9 Software Strategies for Portable Computer Energy Management - Lorch, Smith (1998)(Correct)
Limiting the energy consumption of computers, especially portables, is becoming increasingly important. Thus, new energy-saving computer components and architectures have been and continue to be devel... / features have both high performance and low power modes with br created by existing and suggested hardware innovations. Introduction
159.4 Compiler-Based Prefetching for Recursive Data Structures - Luk (1996)(Correct)
Software-controlled data prefetching offers the potential for bridging the ever-increasing speed gap between the memory subsystem and today's high-performance processors. While prefetching has enjoyed... / memory subsystem and today's high-performance processors. While br can be controlled either by hardware or software. Hardware-based
154.2 BSPlib: The BSP Programming Library - Hill, McColl, Stefanescu, Goudreau.. (1998)(Correct)
BSPlib is a small communications library for bulk synchronous parallel (BSP) programming which consists of only 20 basic operations. This paper presents the full definition of BSPlib in C, motivates t... / be able to run unchanged with high performance on any general purpose br provide a clear focus for future hardware developments. For a model to
150.7 Assigning Confidence to Conditional Branch Predictions - Jacobsen, Rotenberg, Smith (1996)(Correct)
Many high performance processors predict conditional
branches and consume processor resources based
on the prediction. In some situations, resource allocation
can be better optimized if a confidence l... / Abstract Many high performance processors predict br such optimizations we consider hardware mechanisms that partition
150.5 Improving IPC by Kernel Design - Liedtke (1993)(Correct)
Inter-process communication (ipc) has to be fast and effective, otherwise programmers will not use remote procedure calls (RPC), multithreading and multitasking adequately. Thus ipc performance is vit... / trick to obtaining this high performance rather a synergetic br to -Kbyte messages Although hardware specific details influence both
148.9 Analyzing Stability in Wide-Area Network Performance - Balakrishnan, Seshan, Stemm, Katz (1997)(Correct)
The Internet is a very large scale, complex, dynamical system that
is hard to model and analyze. In this paper, we develop and analyze
statistical models for the observed end-to-end network performanc... / and software used at this high-performance server are available from br of the Web site's network and the hardware used at the site. During the
148.0 Why Aren't Operating Systems Getting Faster As Fast As Hardware? - Ousterhout (1989)(Correct)
This note evaluates several hardware platforms and operating systems using a set of benchmarks that test memory bandwidth and various operating system features such as kernel entry/exit and file syste... / the design and application of high performance scientific computers. We br Getting Faster As Fast As Hardware John Ousterhout d i g i t a
147.8 Improving Release-Consistent Shared Virtual Memory using Automatic.. - Iftode (1996)(Correct)
Shared virtual memory is a software technique to
provide shared memory on a network of computers
without special hardware support. Although several
relaxed consistency models and implementations are
q... / nd International Symposium on High-Performance Comp uter Architecture br of computers without special hardware support. Although several
145.4 Instruction Pre-Processing in Trace Processors - Jacobson, Smith (1999)(Correct)
In trace processors, a sequential program is
partitioned at run time into "traces." A trace is an
encapsulation of a dynamic sequence of instructions. A
processor that uses traces as the unit of sequ... / International Symposium on High Performance Computer Architecture br We propose a new class of hardware optimizations that transform the
144.9 The Galley Parallel File System - Nieuwejaar, Kotz (1996)(Correct)
As the I/O needs of parallel scientific applications increase,
file systems for multiprocessors are being designed to provide
applications with parallel access to multiple disks. Many
parallel file sy... / that is intended to deliver high performance to a variety of applications br has not been keeping pace. Hardware limitations are one reason for
144.6 High-Performance Local Area Communication With Fast Sockets - Rodrigues, Anderson, Culler (1997)(Correct)
Modern switched networks such as ATM and Myrinet
enable low-latency, high-bandwidth communication.
This performance has not been realized by current
applications, because of the high processing overhe... / High-Performance Local Area Communication With br to the ability of modern network hardware however. While TCP is capable
144.6 Software DSM Protocols that Adapt between Single Writer and Multiple.. - Cristiana Amza (1997)(Correct)
We present two software DSM protocols that dynamically
adapt between a single writer (SW) and a
multiple writer (MW) protocol based on the application
's sharing patterns. The first protocol (WFS)
ad... / In Proceedings of the Second High Performance Computer Architecture br memory DSM on commodity hardware. Both single writer SW and
143.2 A Sorting Classification of Parallel Rendering - Molnar (1994)(Correct)
We describe three broad classes of parallel rendering methods, based on where the sort from object-space to screen space occurs. These classes encompass most feedforward parallel software and hardware... / designers and implementers of high-performance parallel rendering systems. br feedforward parallel software and hardware rendering architectures that
140.7 A Case for NOW (Networks of Workstations) - Anderson, Culler, Patterson, team (1994)(Correct)
In this paper, we argue that because of recent technology advances, networks of workstations (NOWs) are poised to become the primary computing infrastructure for science and engineering, from low en... / micro would take over high-performance computing Brooks Today br The xFS goal is high performance highly available network file
136.2 Optimizing Triangle Strips for Fast Rendering - Evans, Skiena, Varshney (1996)(Correct)
Almost all scientific visualization involving surfaces is currently done via triangles.
The speed at which such triangulated surfaces can be displayed is crucial to interactive
visualization and is bo... / virtual reality. The speed of high-performance rendering engines on br vertex. Special-purpose rendering hardware is needed to fully exploit the
136.2 PPFS: A High Performance Portable Parallel File System - Huber, Jr., Elford, Reed, Chien.. (1995)(Correct)
Rapid increases in processor performance over the past decade have outstripped performance improvements in input/output devices, increasing the importance of input /output performance to overall syste... / PPFS A High Performance Portable Parallel File System br on a variety of Intel Paragon XP S hardware configurations using the Intel
136.2 A Comparative Analysis of Schemes for Correlated Branch Prediction - Young, Gloy, Smith (1995)(Correct)
Modern high-performance architectures require extremely accurate branch prediction to overcome the performance limitations of conditional branches. We present a framework that categorizes branch predi... / Abstract Modern high-performance architectures require br led to the development of both hardware and software schemes that achieve
133.8 IMPACT: An Architectural Framework for Multiple-Instruction-Issue.. - Chang, Mahlke, Chen, Warter, Hwu (1991)(Correct)
The performance of multiple-instruction-issue processors can be severely limited by the compiler's ability to generate efficient code for concurrent hardware. In the IMPACT project, we have developed ... / Hwu Center for Reliable and High-Performance Computing University of br efficient code for concurrent hardware. In the IMPACT project we have
130.8 Automatic Creation of an Autonomous Agent: Genetic Evolution of a.. - Floreano, Mondada (1994)(Correct)
The paper describes the results of the evolutionary development of a real, neural-network driven mobile robot. The evolutionary approach to the development of neural controllers for autonomous agents ... / simulations are fast. High performance serial machines and massively br links or malfunctioning of some hardware components do not strongly
130.4 Lazy Release Consistency for Distributed Shared Memory - Keleher (1995)(Correct)
A software distributed shared memory (DSM) system allows shared memory parallel
programs to execute on networks of workstations. This thesis presents a new class
of protocols that has lower communicat... / a viable alternative for high-performance parallel processing. br opportunities to bring high performance and high usability to a wide
130.4 An Argument for Simple COMA - Ashley Saulsbury (1995)(Correct)
We present design details and some initial performance
results of a novel scalable shared memory
multiprocessor architecture. This architecture features
the automatic data migration and replication ca... / without the accompanying hardware complexity. A software layer br DVSM systems leaving simpler hardware to maintain shared memory
130.0 An Evaluation of Directory Schemes for Cache Coherence - Agarwal, al. (1988)(Correct)
The problem of cache coherence in shared-memory multiprocessors has been addressed using two basic approaches: directory schemes and snoopy cache schemes. Directory schemes have been given less attent... / cycle time especially in a high performance machine. Attempts to reduce br cache coherency support in hardware. These snoopy cache schemes also
127.6 GLUnix: a Global Layer Unix for a Network of Workstations - Ghormley, Petrou, Rodrigues, Vahdat, .. (1997)(Correct)
Recent improvements in network and workstation performance have made clusters an attractive architecture for diverse
workloads, including sequential and parallel interactive applications. However, alt... / the availability of commodity high-performance workstations and networks br However although viable hardware solutions are available today
127.5 Scalable Computing - McColl (1996)(Correct)
Scalable computing will, over the next few years, become
the normal form of computing. In this paper we present a unified framework,
based on the BSP model, which aims to serve as a foundation for
t... / universal offering high performance in a predictable way on any br The two parts of that industry hardware and software are quite
124.6 Scheduling From the Perspective of the Application - Berman, Wolski (1996)(Correct)
Metacomputing is the aggregation of distributed and
high-performance resources on coordinated networks. With
careful scheduling, resource-intensive applications can be
implemented efficiently on metac... / of distributed and high-performance resources on coordinated br taking advantage of multiprocessor hardware features to execute multiple
124.6 ATM Internetworking - Alles (1995)(Correct)
this paper was presented at Engineering InterOp, Las Vegas, March 1995.
1.0 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 unknown ATM Internetworking
Anthony Alle... / of hardware intensive high performance ATM switches the deployment br do facilitate the development of hardware intensive high performance ATM
123.4 Multi-Protocol Active Messages on a Cluster of SMP's - Lumetta, Mainwaring, Culler (1997)(Correct)
Clusters of multiprocessors, or Clumps, promise to be
the supercomputers of the future, but obtaining high
performance on these architectures requires an understanding
of interactions between the mult... / of the future but obtaining high performance on these architectures br and analyzes the effects of the hardware and software architectures on
123.4 Control Flow Speculation in Multiscalar Processors - Jacobson, al. (1997)(Correct)
The Multiscalar architecture executes a single
sequential program following multiple flows of control. In
the Multiscalar hardware, a global sequencer, with help
from the compiler, takes large steps t... / International Symposium on High Performance Computer Architecture br of control. In the Multiscalar hardware a global sequencer with help
119.1 Auto-Blocking Matrix-Multiplication or Tracking BLAS3 Performance.. - Frens, Wise (1997)(Correct)
An elementary, machine-independent, recursive algorithm for matrix multiplication C+=A*B provides implicit blocking at every level of the memory hierarchy and tests out faster than classically optimal... / to run well on extant high-performance systems. . Outline of br the compiler's knowledge of the hardware parameters to fit the target
118.8 MGS: A Multigrain Shared Memory System - Yeung (1996)(Correct)
Parallel workstations, each comprising 10-100 processors, promise cost-effective general-purpose multiprocessing. This paper explores the coupling of such small- to medium-scale shared memory multipro... / communication interfaces high performance VLSI networks and br enables the collaboration of hardware and software shared memory and
118.5 Processor Allocation Policies for Message-Passing Parallel Computers - Mccann (1994)(Correct)
When multiple jobs compete for processing resources on a parallel computer, the operating system kernel's processor allocation policy determines how many and which processors to allocate to each. This... / the potential for achieving high performance scalability and br . The Hardware and Software Environment
116.0 Legion: The Next Logical Step Toward a Nationwide Virtual Computer - Grimshaw, Wulf, French, Weaver.. (1994)(Correct)
The coming of giga-bit networks makes possible the realization of a single nationwide virtual computer
comprised of a variety of geographically distributed high-performance machines and workstations. ... / of geographically distributed high-performance machines and workstations. To br These are software problems the hardware challenges are being addressed
115.9 Synchronization and Communication in the T3E Multiprocessor - Scott (1996)(Correct)
This paper describes the synchronization and communication primitives of the Cray T3E multiprocessor, a shared memory system scalable to 2048 processors. We discuss what we have learned from the T3D p... / programming model e.g.High Performance Fortran HPF or the br memories. Load store performance highlights the memory pipelining
115.9 A Survey of QoS Architectures - Aurrecoechea, Campbell, Hauw (1996)(Correct)
Over the past several years there has been a considerable amount of research within the field of quality of service (QoS) support for distributed multimedia systems. To date, most of the work has been... / communication protocols for high performance in accordance with systems br protocols and the use of hardware assists for efficient protocol
115.9 Minimizing Register Requirements under Resource-Constrained.. - Govindarajan, Altman, Gao (1995)(Correct)
The rapid advances in high-performance computer architecture and compilation techniques provide both challenges and opportunities to exploit the rich solution space of software pipelined loop schedule... / The rapid advances in high-performance computer architecture and br in computer architecture -hardware and software technology -
115.9 Performance Analysis of Embedded Software Using Implicit Path.. - Li, Malik (1995)(Correct)
Embedded computer systems are characterized by the presence of a processor running application specific dedicated software. A large number of these systems must satisfy realtime constraints. This pape... / software. For example in a high-performance engine controller design br selection of the partition between hardware and software as well as
115.5 Graphical Fisheye Views of Graphs - Sarkar, Brown (1992)(Correct)
A fisheye camera lens is a very wide angle lens that magnifies nearby objects while
shrinking distant objects. It is a valuable tool for seeing both "local detail" and
"global context" simultaneously.... / work includes exploring high-performance personal computing br Our approach to both hardware and software research is to
114.8 BSPlib - The BSP Programming Library - Hill, McColl, Stefanescu, Goudreau.. (1997)(Correct)
This memory area is regarded as unregistered. 6. While registration is designed for "full duplex" communication, a process can do half duplex communication by, appropriately, registering an area of si... / be able to run unchanged with high performance on any general purpose br provide a clear focus for future hardware developments. For a model to
114.2 Executing Java Threads in Parallel in a Distributed-Memory Environment - MacBeth, McGuigan, Hatcher (1998)(Correct)
We present the design and initial implementation
of Hyperion, an environment for the highperformance
execution of Java programs. Hyperion
supports high performance by utilizing
a Java-bytecode-to-C tr... / an environment for the highperformance execution of Java programs. br want the details of the parallel hardware to be hidden as much as possible.
114.2 The MOSIX Multicomputer Operating System for High Performance Cluster .. - Barak (1998)(Correct)
The scalable computing cluster at Hebrew University consists of 64 Pentium and PentiumPro
servers that are connected by fast Ethernet and the Myrinet LANs. It is running the MOSIX
operating system, an... / Operating System for High Performance Cluster Computing br of affordable low-cost commodity hardware e.g. Pentium based Personal
112.0 A Practical System for Intermodule Code Optimization at Link-Time - Srivastava, Wall (1992)(Correct)
We have developed a system called OM to explore the problem of code optimization at link-time. OM takes a collection of object modules constituting the entire program, and converts the object code int... / the design and application of high performance scientific computers. We br advance. Technologies both hardware and software do not all advance
111.1 On the Design of Chant: A Talking Threads Package - Matthew Haines (1994)(Correct)
Lightweight threads are becoming increasingly useful in supporting parallelism and
asynchronous control structures in applications and language implementations. However,
lightweight thread packages tr... / support our extensions to the High Performance Fortran standard for br of a Unix process includes the hardware register kernel stack
110.1 Embra: Fast and Flexible Machine Simulation - Witchel, Rosenblum (1996)(Correct)
This paper describes Embra, a simulator for the processors, caches, and memory systems of uniprocessors and cache-coherent multiprocessors. When running as part of the SimOS simulation environment, Em... / used by Embra to achieve high performance focusing on the br multiprocessors. Embra models the hardware of these machines in enough
108.8 Principles of Metareasoning - Russell, Wefald (1991)(Correct)
In this paper we outline a general approach to the study of metareasoning, not in
the sense of explicating the semantics of explicitly specified meta-level control policies,
but in the sense of provid... / in applications demanding high performance and negligible response br Agents with Limited Performance Hardware project at Berkeley. We see
108.6 A High-Performance Microarchitecture with Hardware-Programmable.. - Razdan, Smith (1994)(Correct)
This paper explores a novel way to incorporate hardware-programmable
resources into a processor microarchitecture to improve the
performance of general-purpose applications. Through a coupling
of comp... / November A High-Performance Microarchitecture with br Microarchitecture with Hardware-Programmable Functional Units
107.2 Horus: A Flexible Group Communications System - van Renesse, Birman, Glade, Guo.. (1996)(Correct)
The Horus system offers flexible group communication support for distributed applications.
It is extensively layered and highly reconfigurable, allowing applications to only pay
for services they use,... / novel mechanisms in support of high performance reliable group br has become popular it wraps a hardware group abstraction with a simple
107.2 The Microarchitecture of Superscalar Processors - Smith, Sohi (1995)(Correct)
Superscalar processing is the latest in a long series of innovations aimed at producing ever-faster microprocessors. By exploiting instruction-level parallelism, superscalar processors are capable of ... / method for implementing high performance microprocessors. . . The br Processing Model Because hardware and software evolve it is rare
106.3 IP Switching and Gigabit Routers - Newman, Minshall, Lyon, Huston (1997)(Correct)
This paper examines two approaches to
the design of a high-performance router, the gigabit router and the IP switch, and then provides some
detail on the implementation of an IP switch and the protoco... / approaches to the design of a high-performance router the gigabit router br of an ATM switch is that the hardware is standardized and is available
106.3 Ninf: A Network based Information Library for Global World-Wide.. - Sato (1997)(Correct)
Ninf is an ongoing global network-wide computing infrastructure
project which allows users to access computational resources including hardware,
software and scientific data distributed across a wid... / intended not only to exploit high performance in network parallel br computational resources including hardware software and scientific data
103.7 Dynamic Memory Disambiguation Using the Memory Conflict Buffer - Gallagher (1994)(Correct)
To exploit instruction level parallelism, compilers for
VLIW and superscalar processors often employ static
code scheduling. However, the available code reordering
may be severely restricted due to am... / Hwu Center for Reliable and High-Performance Computing University of br This paper introduces a simple hardware mechanism referred to as the
103.0 MPI: A Message Passing Interface - Forum (1993)(Correct)
This paper presents an overview of mpi, a proposed
standard message passing interface for MIMD distributed
memory concurrent computers. The design
of mpi has been a collective effort involving researc... / and organization of the High Performance Fortran Forum. Subcommittees br or in some cases provide hardware or low-level system support for
102.8 Faster IP Lookups using Controlled Prefix Expansion - Srinivasan, Varghese (1998)(Correct)
Internet (IP) address lookup is a major bottleneck in high performance routers. IP address lookup is challenging because it requires a longest matching prefix lookup. It is compounded by increasing ro... / is a major bottleneck in high performance routers. IP address lookup br Model We will consider both hardware and software platforms for
102.1 The Chimaera Reconfigurable Functional Unit - Scott Hauck (1997)(Correct)
By strictly separating reconfigurable logic from
their host processor, current custom computing
systems suffer from a significant communication
bottleneck. In this paper we describe Chimaera, a
system... / execution model key to high performance general-purpose br benefits to justify the hardware costs and extra complexities.
101.4 The Synergy Between Non-blocking Synchronization and Operating System .. - Greenwald, Cheriton (1996)(Correct)
Non-blocking synchronization has significant advantages over
blocking synchronization: however, it has not been used to a
significant degree in practice. We designed and implemented
a multiprocessor o... / and run-time library for high-performance reliability and modularity. br for our approach and a potential hardware implementation. Section
101.2 Metasystems: An Approach Combining Parallel Processing and.. - Grimshaw (1994)(Correct)
A metasystem is a single computing resource composed of a heterogeneous group of autonomous computers
linked together by a network. The interconnection network needed to construct large metasystems
wi... / are not a serious obstacle to high performance but that load imbalance br and coercion and schedules all hardware resources across the different
101.2 An Object-Oriented Concurrent Reflective Language for Dynamic.. - Masuhara (1994)(Correct)
this paper proposes an object-oriented concurrent reflective language unknown (in IPSJ SIG Notes, 94-PRG-18, pp.57--64, 1994)
23567?;!ABD9-E4?=@?
:/C,35GKIFHJ81.0
NP)R
SWX"
ZOO(`[]Y+VT+LQ
nlpl~aszhiy... / DRM. One example is that HPF High Performance Fortran has directives for br to the application and or hardware architecture for efficient
101.0 Increasing the Instruction Fetch Rate via Multiple Branch Prediction.. - Yeh (1993)(Correct)
High performance computer implementation today is
increasingly directed toward parallelism in the hardware.
Superscalar machines, where the hardware can
issue more than one instruction each cycle, are... / Michigan Abstract High performance computer implementation today br directed toward parallelism in the hardware. Superscalar machines where the
100.0 An Overview of the Pablo Performance Analysis Environment - Reed, Aydt, Madhyastha, Noe.. (1992)(Correct)
As massively parallel, distributed memory systems replace traditional vector supercomputers, effective application program optimization and system resource management become more than research curiosi... / based on the emerging High Performance Fortran HPF standard. br peak performance of the largest hardware configuration approaches
98.5 Falcon: On-line Monitoring and Steering of Large-Scale Parallel.. - Gu (1995)(Correct)
Falcon is a system for on-line monitoring and steering of large-scale parallel programs. The purpose of such interactive steering is to improve its performance or to affect its execution behavior. The... / Introduction The high performance of current parallel br basis. Falcon runs on several hardware platforms including the Kendall
97.8 The Design and Performance of a Real-time CORBA Event Service - Timothy Harrison (1997)(Correct)
The CORBA Event Service provides a flexible model for
asynchronous communication among objects. However,
the standard CORBA Event Service specification lacks
important features required by real-time a... / Endsystem Architecture for High-Performance Real-time CORBA. using a br the object is written in the OS hardware platform or the type of
97.1 Efficiently Adapting to Sharing Patterns in Software DSMs - Monnerat, Bianchini (1998)(Correct)
In this paper we introduce a page-based Lazy Release Consistency protocol called ADSM that constantly and efficiently adapts to the applications' sharing patterns. Adaptation in ADSM is based on our d... / machine communicating via its highperformance network. For each br memory on top of message-passing hardware. These systems provide a
97.1 Variable Length Path Branch Prediction - Stark, al. (1998)(Correct)
ing with credit is permitted. To copy otherwise, to
republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions
from Publications D... / is required to achieve high performance in deeply pipelined br rate of . given a K byte hardware budget. For comparison the
95.6 Filters: QoS Support Mechanisms for Multipeer Communications - Yeadon, García, Hutchison, Shepherd (1996)(Correct)
The nature of distributed multimedia applications
is such that they require multipeer communication support
mechanisms. The multimedia traffic needs to be delivered to
end-systems, networks and end-us... / full quality media playout at high-performance workstations while at the br of the expense in the specialised hardware required to implement them. As
95.6 Message-Passing Performance Of Various Computers - Dongarra, Dunigan (1996)(Correct)
This report compares the performance of different computer systems
for basic message passing. Latency and bandwidth are measured on Convex,
Cray, IBM, Intel, KSR, Meiko, nCUBE, NEC, SGI, and TMC multi... / result is that the vendors of high-performance computing have turned to br processors are interconnected by hardware and software to attack various
93.8 Public International Benchmarks for Parallel Computers - Hockney, Berry (1994)(Correct)
this report: David Bailey (NASA Ames Research Center)
, Michael Berry (University of Tennessee), Jack Dongarra (University of Tennessee/Oak
Ridge National Laboratory), Vladimir Getov (University of So... / problems Chapter- and High Performance Fortran kernels to test the br . . Hardware Performance
93.6 SIMPLE: A Methodology for Programming High Performance Algorithms on.. - Bader, JaJa (1997)(Correct)
We describe a methodology for developing high performance programs running on clusters of SMP nodes. Our methodology is based on a small kernel (SIMPLE ) of collective communication primitives that ma... / A Methodology for Programming High Performance Algorithms on Clusters of br smps Technology for Example Hardware The Support By Nsf Cise
93.6 Design and Implementation of Virtual Memory-Mapped Communication on.. - Dubnicki (1997)(Correct)
This paper describes the design and implementation of
the virtual memory-mapped communication model(VMMC)
on a Myrinet network of PCI-based PCs. VMMC has been
designed and implemented for the SHRIMP m... / Introduction Low cost and high performance are the potential advantages br limits imposed by the underlying hardware. The goal of this work is to
92.7 ASHs: Application-Specific Handlers for High-Performance Messaging - Wallach (1996)(Correct)
Application-specific safe message handlers (ASHs) are designed to provide applications with hardware-level network performance. ASHs are user-written code fragments that safely and efficiently execute... / Handlers for High-Performance Messaging Deborah A. br to provide applications with hardware-level network performance. ASHs
92.7 Server Operating Systems - Kaashoek, Engler, Ganger, Wallach (1996)(Correct)
We introduce server operating systems, which are sets of abstractions and runtime support for specialized, highperformance
server applications. We have designed and are implementing a prototype server... / support for specialized highperformance server applications. We have br and that can safely timeshare the hardware platform with other applications.
92.7 A Wireless Broadband Ad-Hoc ATM Local-Area Network - Eng, Karol, Veeraraghavan, Ayanoglu, .. (1995)(Correct)
this paper, the exact method by which the look-up table is generated is not important.
In this section we are interested in the updates to the routing tables at each PBS in the
K. Y. Eng et al. / A ... / are designed for simplicity high performance and modular implementations. br connections in the network. PBS hardware and software architectures are
91.4 Compiling for the Multiscalar Architecture - Vijaykumar (1998)(Correct)
High-performance, general-purpose microprocessors serve as compute engines for computers ranging from personal computers to supercomputers. Sequential programs constitute a major portion of real-world... / i Abstract High-performance general-purpose br by speculation and verification in hardware. Since this thesis is the
91.4 SvPablo: A Multi-Language Performance Analysis System - De Rose, Zhang, Reed (1998)(Correct)
In this paper we present the design of SvPablo, a language independent performance analysis
and visualization system that can be easily extended to new contexts with minimal changes to
the software in... / applications that achieve high performance on a current parallel and br SvPablo also exploits hardware performance counters to capture
91.3 Parallel Performance Prediction Using Lost Cycles Analysis - Crovella, LeBlanc (1994)(Correct)
Most performance debugging and tuning of parallel programs is based on the "measure-modify" approach, which is heavily dependent on detailed measurements of programs during execution. This approach is... / Research Assistantship in High Performance Computing administered by the br e.g.load imbalance and hardware e.g.resource contention A
91.1 High Time-Resolution Measurement and Analysis of LAN Traffic.. - Leland, Wilson (1991)(Correct)
The interconnection of local area networks is increasingly important, but little data are available on the
characteristics of the aggregate traffic that LANs will be submitting to the interconnection ... / SBC we are able to dedicate a high performance processor to servicing the br We present a high time-resolution hardware monitor for Ethernet LANs that
89.3 Compressionless Routing: A Framework for Adaptive and Fault-tolerant.. - Kim, Liu, Chien (1997)(Correct)
Compressionless Routing (CR) is a new adaptive routing framework which provides a unified framework for efficient deadlock-free adaptive routing and fault-tolerance. CR exploits the tightcoupling betw... / the results. Background High performance routing networks the subject br These results show that the hardware for CR and FCR networks is
89.3 PM: An Operating System Coordinated High Performance Communication.. - Hiroshi Tezuka (1997)(Correct)
We have developed a new communication library, called PM,
for the Myrinet gigabit LAN card, that has a dedicated processor and
on-board memory to handle a communication protocol. In order to obtain
... / Operating System Coordinated High Performance Communication Library br directly accesses the network hardware to eliminate kernel traps and
88.8 On Multicast Wormhole Routing in Multicomputer Networks - Boppana, Chalasani, Raghavendra (1994)(Correct)
We show that deadlocks due to dependencies on consumption channels is a fundamental problem in multicast wormhole routing. This issue of deadlocks has not been addressed in many previously proposed ... / is important for achieving high performance in parallel computers. The br multicomputers with minimal hardware support. We present a simulation
86.9 Should Scalable Parallel Computers Support Efficient Hardware.. - Ni (1995)(Correct)
Multicast communication is a frequently invoked
communication pattern in many parallel algorithms.
Although some parallel computer vendors have tried
to directly support multicast in hardware, most ve... / multicast. ffl HPF High Performance Fortran In a highlevel br Computers Support Efficient Hardware Multicast Lionel M. Ni
85.7 Design Techniques for Low Power Systems - Havinga, Smit (2000)(Correct)
Portable products are being used increasingly. Because these systems are battery powered,
reducing power consumption is vital. In this report we give the properties of low power
design and techniques ... / current trend is to focus on high performance processors as this is the br concentrate on dedicated low-power hardware and software architectures. A
85.7 Eliminating Conflict Misses for High Performance Architectures - Rivera, Tseng (1998)(Correct)
Many cache misses in scientific programs are due to conflicts
caused by limited set associativity. Two data-layout transformations,
inter- and intra-variable padding, can eliminate
many conflict misse... / Conflict Misses for High Performance Architectures Gabriel br on modern microprocessors. Due to hardware constraints caches have limited
85.7 Power and Performance Tradeoffs using Various Caching Strategies - Bahar, Albera, Manne (1998)(Correct)
In this paper, we propose several different data and instruction
cache configurations and analyze their power as well as performance
implications on the processor. Unlike most existing work in
low pow... / design we explore a high performance processor with the latest br which make use of these aggressive hardware-based techniques. In particular
85.1 Java for Parallel Computing and as a General Language for Scientific.. - Fox (1997)(Correct)
We discuss the role of Java and Web technologies for general simulation. We classify the
classes of concurrency typical in problems and analyze separately the role of Java in user
interfaces, coarse g... / of the pyramid and the few high-performance systems as the top Figure br The distributed computing hardware of the Web has remarkable
84.0 Replication Using Group Communication Over a Partitioned Network - Amir (1995)(Correct)
In systems based on the client-server model, a single server may serve many clients and
the heavy load on the server may cause the response time to be adversely affected. In such
circumstances, replic... / necessarily consistent reply. High performance of the architecture is br the available non-reliable hardware multicast for efficient
84.0 Reduced Overhead Logging for Rollback Recovery in Distributed Shared.. - Suri (1995)(Correct)
Rollback techniques that use message logging and deterministic replay can be used in parallel systems to recover a failed node without involving other nodes. Distributed shared memory (DSM) systems ca... / Center for Reliable and High-Performance Computing Mountain br paging mechanism or in hardware using directory-based cache
83.9 Parallel Simulation Today - Nicol, Fujimoto (1994)(Correct)
This paper surveys topics that presently define the state of the art in parallel simulation. Included in the
tutorial are discussions on new protocols, mathematical performance analysis, time parallel... / and ready availability of high-performance multiprocessors. The number br analysis time parallelism hardware support for parallel simulation
83.9 PASSION: Parallel And Scalable Software for Input-Output - Choudhary, Bordawekar, Harry.. (1994)(Correct)
We are developing a software system called PASSION: Parallel And Scalable Software for InputOutput
which provides software support for high performance parallel I/O. PASSION provides support
at the la... / provides software support for high performance parallel I O. PASSION br nCUBE etc. provide some kind of hardware and software support for parallel
83.9 Separating Data and Control Transfer in Distributed Operating Systems - Thekkath, Levy, Lazowska (1994)(Correct)
Advances in processor architecture and technology have resulted in workstations in the 100+ MIPS
range. As well, newer local-area networks such as ATM promise a ten- to hundred-fold increase in
throug... / transfer of that byte. Even in high-performance RPC systems control transfer br of distributed systems at the hardware level and that distributed
83.9 Branch Classification: a New Mechanism for Improving Branch Predictor .. - Chang (1994)(Correct)
There is wide agreement that one of the most important
impediments to the performance of current and future
pipelined superscalar processors is the presence of conditional
branches in the instruction ... / algorithm is important to a high-performance microprocessor. If we br hard-topredict branches or the hardware can special case the handling of
83.8 Amoeba - A Distributed Operating System for the 1990s - Mullender, van Rossum, Tanenbaum.. (1990)(Correct)
Amoeba is the distributed system developed at the Free University (VU) and
Centre for Mathematics and Computer Science (CWI), both in Amsterdam.
Throughout the project's ten-year history, a major conc... / with simplicity and high performance. Distributed systems are br systems on its class of hardware reported so far in the
82.4 A Language-Based Approach To Protocol Implementation - Abbott (1993)(Correct)
15
CHAPTER 1: INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : 17
1.1 Introduction to Network Software : : : :... / protocol layering entails a high performance cost developers are br as data programs and specialized hardware. Communicating data between
81.8 Efficient Support for P-HTTP in Cluster-Based Web Servers - Aron, Druschel, Zwaenepoel (1999)(Correct)
This paper studies mechanisms and policies
for supporting HTTP/1.1 persistent connections
in cluster-based Web servers that employ contentbased
request distribution. We present two mechanisms
for the ... / platform for cost-effective high performance network servers. Achieving br becoming an increasingly popular hardware platform for cost-effective high
81.8 Supporting Fine-Grained Synchronization on a Simultaneous.. - Dean Tullsen (1999)(Correct)
This paper proposes and evaluates new synchronization
schemes for a simultaneous multithreaded processor. We
present a scalable mechanism that permits threads to cheaply
synchronize within the process... / th International Symposium on High Performance Computer Architecture br should be High Performance. High performance implies both
81.4 Compiler-directed Data Prefetching in Multiprocessors with Memory.. - Edward Gornish (1990)(Correct)
Memory hierarchies are used by multiprocessor systems
to reduce large memory access times. It is necessary to
automatically manage such a hierarchy, to obtain effective
memory utilization. In this pap... / networks MINs To achieve high performance in a hierarchical memory br of caches. In addition without hardware prefetching of cache lines no
81.4 Supercomputer Performance Evaluation and the Perfect Benchmarks - Cybenko (1990)(Correct)
In the past three years, the Perfect Benchmark
TM
Suite has evolved
from a supercomputer performance evaluation plan, presented by Kuck
and Sameh at the 1987 International Conference on Supercomputi... / benchmarking to high performance workstations An br in large part to increases in hardware speed averaging an order of
81.1 Architecture Validation for Processors - Ho (1995)(Correct)
Modern, high performance microprocessors are extremely
complex machines which require substantial validation effort to
ensure functional correctness prior to tapeout. Generating the
corner cases to te... / . Abstract Modern high performance microprocessors are extremely br through simulation often using hardware-assist Gat to reduce
80.8 Applying Compiler Techniques to Cache Behavior Prediction - Ferdinand, Martin, Wilhelm (1997)(Correct)
In previous work [1], we have developed the theoretical
basis for the prediction of the cache behavior of programs
by abstract interpretation. Abstract interpretation
is a technique for the static ana... / more relevant and used for high performance microcontrollers and DSPs. br tailored for the analysis of hardware with states is presented. This
79.0 Paging Tradeoffs in Distributed-Shared-Memory Multiprocessors - Douglas Burger (1994)(Correct)
Massively parallel processors have begun using commodity operating systems that
support demand-paged virtual memory. To evaluate the utility of virtual memory, we measured
the behavior of seven shar... / is a ubiquitous feature of high-performance workstations but has been br A DSM machine model Our target hardware system contains processing
78.3 A Novel Framework of Register Allocation for Software Pipelining - Ning, al. (1993)(Correct)
ing with
credit is permitted. To copy otherwise, to republish, to post on servers, or to
redistribute to lists, requires prior specific permission and/or a fee. Request
permissions from Publications D... / pipelining can be applied to high-performance pipelined processor br schemes with or without special hardware support are discussed. We have
78.2 A High-Performance, Portable Implementation of the MPI Message.. - Gropp (1996)(Correct)
MPI (Message Passing Interface) is a specification for a standard library for message
passing that was defined by the MPI Forum, a broadly based group of parallel computer
vendors, library writers, an... / A High-Performance Portable Implementation of br being followed the current hardware and software environment for
78.2 Stage Scheduling: A Technique to Reduce the Register Requirements of.. - Eichenberger, Davidson (1995)(Correct)
Modulo scheduling is an efficient technique for
exploiting instruction level parallelism in a variety
of loops, resulting in high performance code but
increased register requirements. We present a set... / of loops resulting in high performance code but increased register br be eliminated by using special hardware such as rotating register files
78.2 Optimizing Instruction Cache Performance for Operating System.. - Torrellas, Xia, Daigle (1995)(Correct)
High instruction cache hit rates are key to high performance. One known technique to
improve the hit rate of caches is to use an optimizing compiler to minimize cache interference
via an improved layo... / cache hit rates are key to high performance. One known technique to br Firstly with the help of a hardware performance monitor we
77.9 Using Profile Information to Assist Classic Code Optimizations - Chang (1991)(Correct)
This paper describes the design and implementation of an optimizing compiler that automatically generates profile information to assist classic code optimizations. This compiler contains two new compo... / Hwu Center for Reliable and High-performance Computing University of br time performs as well as the best hardware schemes Trace scheduling
77.8 Plan 9 from Bell Labs - Pike (1990)(Correct)
Plan 9 is a distributed computing environment. It is assembled from separate machines acting as CPU servers, file servers, and terminals. The pieces are connected by a single file-oriented protocol an... / high-speed networks and in high-performance microprocessors. A common br adapt well to changes in computing hardware. In particular we wanted to
76.5 Quantitative Analysis and Model Checking - Huth (1997)(Correct)
Many notions of models in computer science provide quantitative information, or uncertainties, which necessitate a quantitative model checking paradigm. We present such a framework for reactive and ge... / are utilized to achieve high performance at a cost of obtaining br verification especially in hardware design. Model checking as such
76.5 An Architecture for Optimal All-to-All Personalized Communication - Hinrichs, Kosak, O'Hallaron.. (1994)(Correct)
In all-to-all personalized communication (AAPC), every node of a parallel system sends a potentially
unique packet to every other node. AAPC is an important primitive operation for modern parallel
com... / of data parallel compilers for High Performance Fortran Hig include br utilizing all links. A simple hardware addition for synchronized
76.5 DPGA-Coupled Microprocessors: Commodity ICs for the Early 21st Century - Andr Dehon (1994)(Correct)
During the past decade the microprocessor has become
a key commodity component for building all kinds of computational
systems. During this time frame large, reconfigurable
logic arrays have exploited... / microprocessors. Today's high-performance microprocessors sport - br to specialize the processing hardware to match the application