Home     Top: Hardware: High Performance    [CISC   High Performance   Logic Design   Memory Structures   Microprogramming   RISC   Storage   VLSI]

Change ordering:   Authority   Hubs (tutorials)   Date   Expected authority       Show titles only
Ordered by the expected number of citations based on the year of publication

This directory is created automatically and some papers may be mislabeled. Only document within the CiteSeer database are listed. The directory is intended to provide entry points for browsing the database and is not intended to be authoritative. Papers may not appear in all relevant categories. For example, papers in a sub-category may not appear in higher level categories.

920.6   Active Messages: a Mechanism for Integrated Communication and.. - von Eicken, Culler, Goldstein.. (1992)   (Correct)
The design challenge for large-scale multiprocessors is (1) to minimize communication overhead, (2) allow communication to overlap computation, and (3) coordinate the two without sacrificing processor... / T communicate A high-performance network is required to br allows cost effective use of the hardware and offers tremendous

802.4   TreadMarks: Distributed Shared Memory on Standard Workstations and.. - Keleher, Cox, Dwarkadas, Zwaenepoel (1994)   (Correct)
TreadMarks is a distributed shared memory (DSM) system for standard Unix systems such as SunOS and Ultrix. This paper presents a performance evaluation of TreadMarks running on Ultrix using DECstation... / is the bottleneck in achieving high performance for finer grained br workstation base no special hardware is required to use this facility

646.3   High Performance Messaging on Workstations: Illinois Fast Messages.. - Pakin, Lauria, Chien (1995)   (Correct)
ing with credit is permitted. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request Permissions from Publication Dep... / High Performance Messaging on Workstations br layers are needed to deliver the hardware performance to the application

594.0   PVM: A Framework for Parallel Distributed Computing - Sunderam (1990)   (Correct)
The PVM system is a programming environment for the development and execution of large concurrent or parallel applications that consist of many interacting, but relatively independent, components. It ... / high-bandwidth external I O or high performance graphics thereby br environments already possess the hardware diversity required to solve such

540.7   The NAS Parallel Benchmarks - Bailey, Barszcz, Barton, Browning.. (1994)   (Correct)
A new set of benchmarks has been developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of five "parallel kernel" benchmarks and three "simulated applicat... / community by the year a high-performance operational computing system br not kept pace with advances in hardware software and algorithms. In

534.0   Efficient Software-Based Fault Isolation - Wahbe, Lucco, Anderson, Graham (1993)   (Correct)
One way to provide fault isolation among cooperating software modules is to place each in its own address space. However, for tightly-coupled modules, this solution incurs prohibitive context switch o... / Unfortunately there is a high performance cost to providing fault br poses a tradeoff relative to hardware fault isolation substantially

497.1   Complexity-Effective Superscalar Processors - Palacharla (1998)   (Correct)
The performance trade-off between hardware complexity and clock speed in the design of superscalar microarchitectures is first investigated. Using the results of this trade-off analysis, the thesis pr... / with the goal of achieving high performance by reducing complexity. This br The performance trade-off between hardware complexity and clock speed in the

478.2   Multiscalar Processors - Sohi (1995)   (Correct)
Multiscalar processors use a new, aggressive implementation paradigm for extracting large quantities of instruction level parallelism from ordinary high level language programs. A single program is di... / in the program. To achieve high performance however modern processors br by a combination of software and hardware. The tasks are distributed to a

469.5   Performance of Various Computers Using Standard Linear Equations.. - Jack Dongarra (1995)   (Correct)
This report compares the performance of different computer systems in solving dense systems of linear equations. The comparison involves approximately a hundred computers, ranging from a Cray Y-MP to ... / special features.Thus many high-performance machines may not have br as new machines are added and as hardware and software systems improve.

453.6   Virtual Memory Mapped Network Interface for the SHRIMP Multicomputer - Blumrich, Alpert, Dubnicki, Felten (1993)   (Correct)
The network interfaces of existing multicomputers require a significant amount of software overhead at the operating system and user levels to provide protection and to implement message passing proto... / to construct scalable high-performance multicomputers. Our focus is br to these software overheads hardware communication latencies are

385.5   Totem: A Fault-Tolerant Multicast Group Communication System - Moser, Melliar-Smith, Agarwal.. (1996)   (Correct)
When Totem delivers multicast messages, it invokes operations in the same total order throughout the distributed system. The result: consistency of replicated data and simplified programming of applic... / systems use inexpensive highperformance computers and can be br networks LANs and exploits the hardware broadcasts of such networks to

379.3   Compiler Transformations for High-Performance Computing - Bacon (1993)   (Correct)
In the last three decades a large number of compiler transformations for optimizing programs have been implemented. Most optimizations for uniprocessors reduce the number of instructions executed by t... / Compiler Transformations for High-Performance Computing DAVID F. BACON br organizations. Simultaneously hardware designers are able to employ

372.7   The Design of the TAO Real-Time Object Request Broker - Schmidt, Levine, Mungee (1999)   (Correct)
Many real-time application domains can benefit from flexible and open distributed architectures, such as those defined by the CORBA specification. CORBA is an architecture for distributed object compu... / design of TAO which is our high-performance real-time CORBAcompliant br backplanes and shared memory. Hardware CORBA shields applications from

363.6   The MultiSpace: an Evolutionary Platform for Infrastructural Services - Gribble, Welsh, Brewer, Culler (1999)   (Correct)
This paper presents the architecture for a Base, a clustered environment for building and executing highly available, scalable, but exible and adaptable infrastructure services. Our architecture has t... / sound it leads to robust and high-performance services. However the br set operating system and hardware platform. Examples of such

358.0   The Network Architecture of the Connection Machine CM-5 - Leiserson, Abuhamdeh, Douglas.. (1994)   (Correct)
The Connection Machine Model CM-5 Supercomputer is a massively parallel computer system designed to offer performance in the range of 1 teraflops (10 12 floating-point operations per second). The CM... / second The CM- obtains its high performance while offering ease of br back-door access to all system hardware to test system integrity and to

357.4   Application Performance and Flexibility on Exokernel Systems - Kaashoek, Engler, Ganger.. (1997)   (Correct)
The exokernel operating system architecture safely gives untrusted software efficient control over hardware and software resources by separating management from protection. This paper describes an exo... / applications to achieve high performance without sacrificing the br software efficient control over hardware and software resources by

356.5   The Nexus Approach to Integrating Multithreading and Communication - Foster (1996)   (Correct)
Lightweight threads have an important role to play in parallel systems: they can be used to exploit shared-memory parallelism, to mask communication and I/O latencies, to implement remote memory acces... / threads and communication in high-performance distributed-memory systems. br handlers At the lower-performance higher-functionality end of the

342.0   A Metaobject Protocol for C++ - Chiba (1995)   (Correct)
This paper presents a metaobject protocol (MOP) for C++. This MOP was designed to bring the power of meta-programming to C++ programmers. It avoids penalties on runtime performance by adopting a new m... / criteria of such a MOP are high performance and arbitrary br runtime. If this is not done in hardware the software will need to be

342.0   The Paradyn Parallel Performance Measurement Tools - Miller, Callaghan (1995)   (Correct)
Paradyn is a performance measurement tool for parallel and distributed programs. Paradyn uses several novel technologies so that it scales to long running programs (hours or days) and large (thousand ... / an ARPA Graduate Fellowship in High Performance Computing. br to accept new operating system hardware and application specific

333.3   NetSolve: A Network Server for Solving Computational Science Problems - Casanova (1995)   (Correct)
This paper presents a new system, called NetSolve, that allows users to access computational resources, such as hardware and software, distributed across the network. The development of NetSolve was m... / based Information Library for high performance computing Ninf project br computational resources such as hardware and software distributed across

331.4   Implementing Multiple Protection Domains in Java - Hawblitzel, Chang, Czajkowski, Hu.. (1998)   (Correct)
Safe language technology can be used for protection within a single address space. This protection is enforced by the language's type system, which ensures that references to objects cannot be forged... / language technology to offer high performance as well as protection in a br components without relying on hardware support. In a safe language

323.4   Shoring Up Persistent Applications - Carey, DeWitt, Franklin, Hall.. (1994)   (Correct)
SHORE (Scalable Heterogeneous Object REpository) is a persistent object system under development at the University of Wisconsin. SHORE represents a merger of objectoriented database and file system te... / systems or on the kinds of high-performance multicomputer hardware br of high-performance multicomputer hardware needed for certain large scale

315.9   Exploiting Choice: Instruction Fetch and Issue on an Implementable.. - Tullsen, Eggers, Emer, Levy, Lo.. (1996)   (Correct)
Simultaneous multithreading is a technique that permits multiple independent threads to issue multiple instructions each cycle. In previous work we demonstrated the performance potential of simultaneo... / architecture is derived from a high-performance outof order superscalar br wide-issue superscalar either in hardware structures or sizes. We present

315.9   Shared Memory Consistency Models: A Tutorial - Adve, Gharachorloo (1995)   (Correct)
Parallel systems that support the shared memory abstraction are becoming widely accepted in many areas of computing. Writing correct and efficient programs for such systems requires a formal specifica... / the design and application of high performance scientific computers. We br advance. Technologies both hardware and software do not all advance

283.9   PVM: Parallel Virtual Machine - Geist, Beguelin, Dongarra, Jiang.. (1994)   (Correct)
this reporting is to be turned on (1) or turned off (0) for subsequent calls. A value of (2) will cause the program to exit after printing the error message (not implemented in 3.2). The default is re... / J. Petrie Jr. The High Performance Fortran Handbook by br fast pace of change in computer hardware software and algorithms often

279.9   High-Performance Parallel Programming in Java: Exploiting Native.. - Getov (1998)   (Correct)
With most of today's fast scientific software written in Fortran and C, Java has a lot of catching up to do. In this paper we discuss how new Java programs can capitalize on high-performance libraries... / High-Performance Parallel Programming in Java br implementations on a range of hardware architectures. ScaLAPACK is

275.3   Beowulf: A Parallel Workstation For Scientific Computation - Sterling, Becker, al. (1995)   (Correct)
Network-of-Workstations technology is applied to the challenge of implementing very high performance workstations for Earth and space science applications. The Beowulf parallel workstation employs 16 ... / challenge of implementing very high performance workstations for Earth and br tracks the evolution of commodity hardware as well as new ports of Linux to

254.5   A More Efficient RMI for Java - Nester, Philippsen, Haumacher (1999)   (Correct)
In current Java implementations, Remote Method Invocation (RMI) is too slow, especially for high performance computing. RMI is designed for wide-area and high-latency networks, it is based on a slow o... / is too slow especially for high performance computing. RMI is designed br used over non-TCP IP networking hardware. Section discusses the

249.3   Fine-grain Access Control for Distributed Shared Memory - Schoinas (1994)   (Correct)
This paper discusses implementations of fine-grain memory access control, which selectively restricts reads and writes to cache-block-sized memory regions. Fine-grain access control forms the basis of... / shared-memory machines achieve high performance by using hardware-intensive br require little or no additional hardware. These techniques permit

245.7   Pipeline Gating: Speculation Control For Energy Reduction - Manne (1998)   (Correct)
Branch prediction has enabled microprocessors to increase instruction level parallelism (ILP) by allowing programs to speculatively execute beyond control boundaries. Although speculative execution is... / performance reduces power in high-performance microprocessors without br In particular we introduce a hardware mechanism called pipeline gating

245.7   Maximizing Parallelism and Minimizing Synchronization with Affine.. - Lim, Lam (1998)   (Correct)
This paper presents an algorithm to find the optimal affine partitions that maximize the degree of parallelism and minimize the degree of synchronization in programs with arbitrary loop nestings and a... / suggests that achieving high performance on such machines is br needed to exploit a particular hardware configuration. The algorithm

243.2   The Zebra Striped Network File System - Hartman, Ousterhout (1993)   (Correct)
Zebra is a network file system that increases throughput by striping file data across multiple servers. Rather than striping each file separately, Zebra forms all the new data from each client into a ... / file system. This provides high performance for writes of small files as br Sprite file system on the same hardware. For small files the Zebra

243.2   Zebra: A Striped Network File System - Hartman, Ousterhout (1993)   (Correct)
This paper presents the design of Zebra, a striped network file system. Zebra applies ideas from log-structured file system (LFS) and RAID research to network file systems, resulting in a network file... / designed to provide both high performance and high availability. This br to provide both high performance and high availability. This is

236.7   The MIT Alewife Machine: A Large-Scale Distributed-Memory.. - Agarwal, Chaiken, Johnson, Kranz.. (1991)   (Correct)
The Alewife multiprocessor project focuses on the architecture and design of a large-scale parallel machine. The machine uses a low dimension direct interconnection network to provide scalable communi... / processor. Introduction High-performance computer design is driven by br and concentrates on the novel hardware features of the machine including

232.9   High Speed Switch Scheduling for Local Area Networks - Anderson, Owicki, Saxe, Thacker (1993)   (Correct)
Current technology trends make it possible to build communication networks that can support high performance distributed computing. This paper describes issues in the design of a prototype switch for ... / networks that can support high performance distributed computing. This br switch architectures use the same hardware for both scheduling and data

228.9   Extensibility, Safety and Performance in the SPIN Operating System - Bershad, Savage, Pardyak, Sirer.. (1995)   (Correct)
This paper describes the motivation, architecture and performance of SPIN, an extensible operating system. SPIN provides an extension infrastructure together with a core set of extensible services th... / by the need to support high performance applications which present br rather than runtime using either hardware or software mechanisms. Strict

220.3   Weak Ordering - A New Definition - Adve (1990)   (Correct)
A memory model for a shared memory, multiprocessor commonly and often implicitly assumed by programmers is that of sequential consistency. This model guarantees that all memory accesses will appear to... / that weak ordering facilitates high performance implementations but that br in terms of a set of rules for hardware that have to be made visible to

218.1   The Jalapeño Dynamic Optimizing Compiler for Java - Burke, Choi, Fink, Grove, Hind.. (1999)   (Correct)
The Jalape~no Dynamic Optimizing Compiler is a key component of the Jalape~no Virtual Machine, a new Java 1 Virtual Machine (JVM) designed to support efficient and scalable execution of Java applicati... / Jalape no JVM is to deliver high performance and scalability of Java br Compiler to a variety of hardware platforms. Building a dynamic

217.1   Speculative Versioning Cache - Gopal (1998)   (Correct)
Dependences among loads and stores whose addresses are unknown hinder the extraction of instruction level parallelism during the execution of a sequential program. Such ambiguous memory dependences ca... / International Symposium on High-Performance Computer Architecture. br instructions from a common set of hardware buffers e.g. reservation

214.4   A Unified Formalization of Four Shared-Memory Models - Adve (1993)   (Correct)
This paper presents a shared-memory model, data-race-free-1, that unifies four earlier models: weak ordering, release consistency (with sequentially consistent special operations), the VAX memory mode... / can be guaranteed with high performance. However each model br caches common uniprocessor hardware optimizations such as write

205.7   Programmable Active Memories: Reconfigurable Systems Come of Age - Vuillemin, Bertin, Roncin, Shand.. (1996)   (Correct)
Programmable Active Memories (PAM) are a novel form of universal reconfigurable hardware co-processor. Based on Field-Programmable Gate Array (FPGA) technology, a PAM is a virtual machine, controlled ... / The proposal is a standard high-performance microprocessor enhanced by a br form of universal reconfigurable hardware co-processor. Based on

192.5   Software Write Detection for a Distributed Shared Memory - Zekauskas (1994)   (Correct)
Most software-based distributed shared memory (DSM) systems rely on the operating system's virtual memory interface to detect writes to shared data. Strategies based on virtual memory page protection ... / Software System Support for High Performance Multicomputing contract br shared memory but do not rely on hardware page protection such as Orca

191.1   PROTEUS: A High-Performance Parallel-Architecture Simulator - Brewer, Dellarocas, Colbrook, Weihl (1991)   (Correct)
Proteus is a high-performance simulator for MIMD multiprocessors. It is fast, accurate, and flexible: it is one to two orders of magnitude faster than comparable simulators, it can reproduce results f... / PROTEUS A High-Performance Parallel-Architecture br is zero. Proteus can simulate hardware cache coherence for global

188.5   Data Transformations for Eliminating Conflict Misses - Rivera, Tseng (1998)   (Correct)
Many cache misses in scientific programs are due to conflicts caused by limited set associativity. We examine two compile-time data-layout transformations for eliminating conflict misses, concentratin... / speeds programs can achieve high performance only if they use caches br use caches effectively. Due to hardware constraints caches have limited

183.0   The Existence of Refinement Mappings - Abadi, Lamport (1988)   (Correct)
Refinement mappings are used to prove that a lower-level specification correctly implements a higher-level one. We consider specifications consisting of a state machine (which may be infinite-state) t... / work includes exploring high-performance personal computing br Our approach to both hardware and software research is to

182.9   Real-Time Occlusion Culling for Models with Large Occluders - Coorg (1997)   (Correct)
Efficiently identifying polygons that are visible from a dynamic synthetic viewpoint is an important problem in computer graphics. Typically, visibility determination is performed using the z-buffer a... / Despite the availability of high performance z-buffer hardware a br of high performance z-buffer hardware a significant fraction of

182.8   Home-based SVM protocols for SMP clusters: Design and Performance - Samanta (1998)   (Correct)
As small-scale shared memory multiprocessors proliferate in the market, it is very attractive to construct largescale systems by connecting smaller multiprocessors together in software using efficient... / In The nd IEEE Symposium on High-Performance Computer Architecture Feb. br advantage of the intra-node hardware cache coherence and

181.0   The Amber System: Parallel Programming on a Network of Multiprocessors - Chase (1989)   (Correct)
Microprocessor-based shared-memory multiprocessors are becoming widely available and promise to provide cost-effective high-performance computing. This paper describes a programming system called Ambe... / to provide cost-effective high-performance computing. This paper br in which coherence is provided by hardware means for locally-executing

178.7   MPI-FM: High Performance MPI on Workstation Clusters - Lauria, Chien (1997)   (Correct)
Despite the emergence of high speed LANs, the communication performance available to applications on workstation clusters still falls short of that available on MPPs. A new generation of efficient mes... / MPI-FM High Performance MPI on Workstation Clusters br is needed to take advantage of the hardware performance and to deliver it to

177.1   Virtual Network Transport Protocols for Myrinet - Chun (1998)   (Correct)
This paper describes a protocol for a general-purpose cluster communication system that supports multiprogramming with virtual networks, direct and protected network access, reliable message delivery ... / these systems achieved high-performance oftentimes on par with br processor and interconnection hardware. This sections presents a brief

175.8   Zeus: A System for Algorithm Animation and Multi-View Editing - Brown (1992)   (Correct)
Algorithm animation is a form of program visualization that is concerned with dynamic and interactive graphical displays of a program's fundamental operations. This paper describes the Zeus algorithm ... / work includes exploring high-performance personal computing br Our approach to both hardware and software research is to

175.3   Unifying Data and Control Transformations for Distributed.. - Cierniak, Li (1994)   (Correct)
We present a unified approach to locality optimization that employs both data and control transformations. Data transformations include changing the array layout in memory. Control transformations inv... / to be a serious obstacle to high performance on distributed shared memory br Most shared-memory machines both hardware and software based rely on data

175.3   Unifying Data and Control Transformations for Distributed Shared.. - Cierniak (1994)   (Correct)
We present a unified approach to locality optimization that employs both data and control transformations. Data transformations include changing the array layout in memory. Control transformations inv... / to be a serious obstacle to high performance on distributed shared memory br Most shared-memory machines both hardware and software based rely on data

174.4   Maximizing Parallelism and Minimizing Synchronization with Affine.. - Lim, Lam (1997)   (Correct)
This paper presents the first algorithm to find the optimal affine transform that maximizes the degree of parallelism while minimizing the degree of synchronization in a program with arbitrary loop ne... / parallel code. Getting high performance on a multiprocessor requires br to exploit a particular parallel hardware configuration. From these affine

174.4   Beowulf: Harnessing the Power of Parallelism in a Pile-of-PCs - Ridge (1997)   (Correct)
The rapid increase in performance of mass market commodity microprocessors and significant disparity in pricing between PCs and scientific workstations has provided an opportunity for substantial gain... / Thomas Sterling High Performance Computing Systems Group Jet br using standard commodity hardware and software components. This

171.0   Threads and Input/Output in the Synthesis Kernel - Massalin, Pu (1995)   (Correct)
The Synthesis operating system kernel combines several techniques to provide high performance, including kernel code synthesis, fine-grain scheduling, and optimistic synchronization. Kernel code synth... / several techniques to provide high performance including kernel code br system implementations. Using hardware and software emulating a SUN

170.2   Run-time Adaptive Cache Hierarchy Management via Reference Analysis - Johnson, Hwn (1997)   (Correct)
Improvements in main memory speeds have not kept pace with increasing processor clock frequency and improved exploitation of instruction-level parallelism. Consequently, the gap between processor and ... / Hwu Center for Reliable and High-Performance Computing University of br scheme where the hardware determines data placement based

168.9   Alternative Implementations of Two-Level Adaptive Branch Prediction - Yeh, Patt (1992)   (Correct)
As the issue rate and depth of pipelining of high performance Superscalar processors increase, the importance of an excellent branch predictor becomes more vital to delivering the potential performanc... / and depth of pipelining of high performance Superscalar processors br gathered. We compute the hardware costs of implementing each of the

165.4   Npsnet: A Network Software Architecture For Large Scale Virtual.. - Macedonia, Zyda, Pratt, Barham.. (1994)   (Correct)
This paper explores the issues involved in designing and developing network software architectures for large scale virtual environments. We present our ideas in the context of NPSNET-IV, the first 3D ... / and for the development of a high performance network software interface. br environment construction. hardware and operating system

165.2   The Relative Importance of Concurrent Writers and Weak Consistency.. - Keleher (1996)   (Correct)
This paper presents a detailed comparison of the relative importance of allowing concurrent writers versus the choice of the underlying consistency model. Our comparison is based on single- and multip... / memory DSM systems achieve high performance through a combination of br to overall performance. Hardware shared memory systems typically

161.0   The Sprite Network Operating System - Ousterhout, Cherenson, Douglis.. (1988)   (Correct)
Sprite is a new operating system for networked uniprocessor and multiprocessor workstations with large physical memories. It implements a set of kernel calls much like those of 4.3 BSD UNIX, with exte... / machines which provide high performance even for diskless br workstation with special hardware support for Lisp applications

159.9   Software Strategies for Portable Computer Energy Management - Lorch, Smith (1998)   (Correct)
Limiting the energy consumption of computers, especially portables, is becoming increasingly important. Thus, new energy-saving computer components and architectures have been and continue to be devel... / features have both high performance and low power modes with br created by existing and suggested hardware innovations. Introduction

159.4   Compiler-Based Prefetching for Recursive Data Structures - Luk (1996)   (Correct)
Software-controlled data prefetching offers the potential for bridging the ever-increasing speed gap between the memory subsystem and today's high-performance processors. While prefetching has enjoyed... / memory subsystem and today's high-performance processors. While br can be controlled either by hardware or software. Hardware-based

154.2   BSPlib: The BSP Programming Library - Hill, McColl, Stefanescu, Goudreau.. (1998)   (Correct)
BSPlib is a small communications library for bulk synchronous parallel (BSP) programming which consists of only 20 basic operations. This paper presents the full definition of BSPlib in C, motivates t... / be able to run unchanged with high performance on any general purpose br provide a clear focus for future hardware developments. For a model to

150.7   Assigning Confidence to Conditional Branch Predictions - Jacobsen, Rotenberg, Smith (1996)   (Correct)
Many high performance processors predict conditional branches and consume processor resources based on the prediction. In some situations, resource allocation can be better optimized if a confidence l... / Abstract Many high performance processors predict br such optimizations we consider hardware mechanisms that partition

150.5   Improving IPC by Kernel Design - Liedtke (1993)   (Correct)
Inter-process communication (ipc) has to be fast and effective, otherwise programmers will not use remote procedure calls (RPC), multithreading and multitasking adequately. Thus ipc performance is vit... / trick to obtaining this high performance rather a synergetic br to -Kbyte messages Although hardware specific details influence both

148.9   Analyzing Stability in Wide-Area Network Performance - Balakrishnan, Seshan, Stemm, Katz (1997)   (Correct)
The Internet is a very large scale, complex, dynamical system that is hard to model and analyze. In this paper, we develop and analyze statistical models for the observed end-to-end network performanc... / and software used at this high-performance server are available from br of the Web site's network and the hardware used at the site. During the

148.0   Why Aren't Operating Systems Getting Faster As Fast As Hardware? - Ousterhout (1989)   (Correct)
This note evaluates several hardware platforms and operating systems using a set of benchmarks that test memory bandwidth and various operating system features such as kernel entry/exit and file syste... / the design and application of high performance scientific computers. We br Getting Faster As Fast As Hardware John Ousterhout d i g i t a

147.8   Improving Release-Consistent Shared Virtual Memory using Automatic.. - Iftode (1996)   (Correct)
Shared virtual memory is a software technique to provide shared memory on a network of computers without special hardware support. Although several relaxed consistency models and implementations are q... / nd International Symposium on High-Performance Comp uter Architecture br of computers without special hardware support. Although several

145.4   An Efficient Implementation of Java's Remote Method Invocation - Maassen, van Nieuwpoort, Veldema.. (1999)   (Correct)
Java offers interesting opportunities for parallel computing. In particular, Java Remote Method Invocation provides an unusually flexible kind of Remote Procedure Call. Unlike RPC, RMI supports polymo... / interest in using Java for high-performance parallel applications. br native compilers and specialized hardware The communication

145.4   Instruction Pre-Processing in Trace Processors - Jacobson, Smith (1999)   (Correct)
In trace processors, a sequential program is partitioned at run time into "traces." A trace is an encapsulation of a dynamic sequence of instructions. A processor that uses traces as the unit of sequ... / International Symposium on High Performance Computer Architecture br We propose a new class of hardware optimizations that transform the

144.9   The Galley Parallel File System - Nieuwejaar, Kotz (1996)   (Correct)
As the I/O needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file sy... / that is intended to deliver high performance to a variety of applications br has not been keeping pace. Hardware limitations are one reason for

144.6   High-Performance Local Area Communication With Fast Sockets - Rodrigues, Anderson, Culler (1997)   (Correct)
Modern switched networks such as ATM and Myrinet enable low-latency, high-bandwidth communication. This performance has not been realized by current applications, because of the high processing overhe... / High-Performance Local Area Communication With br to the ability of modern network hardware however. While TCP is capable

144.6   Software DSM Protocols that Adapt between Single Writer and Multiple.. - Cristiana Amza (1997)   (Correct)
We present two software DSM protocols that dynamically adapt between a single writer (SW) and a multiple writer (MW) protocol based on the application 's sharing patterns. The first protocol (WFS) ad... / In Proceedings of the Second High Performance Computer Architecture br memory DSM on commodity hardware. Both single writer SW and

144.6   A High-performance Endsystem Architecture for Real-time CORBA - Schmidt, Gokhale, Harrison, Parulkar (1997)   (Correct)
Many application domains (such as avionics, telecommunications, and multimedia) require real-time guarantees from the underlying networks, operating systems, and middleware components to achieve their... / A High-performance Endsystem Architecture for br ATM and Fast Ethernet ffl Hardware such as RISC vs. CISC. The

143.2   SPIN - An Extensible Microkernel for Application-specific Operating.. - Bershad, Chambers, Eggers, Maeda.. (1994)   (Correct)
Application domains, such as multimedia, databases, and parallel computing, require operating system services with high performance and high functionality. Existing operating systems provide fixed int... / system services with high performance and high functionality. br system services with high performance and high functionality. Existing

143.2   A Sorting Classification of Parallel Rendering - Molnar (1994)   (Correct)
We describe three broad classes of parallel rendering methods, based on where the sort from object-space to screen space occurs. These classes encompass most feedforward parallel software and hardware... / designers and implementers of high-performance parallel rendering systems. br feedforward parallel software and hardware rendering architectures that

142.0   Efficient Support for Irregular Applications on Distributed-Memory.. - Mukherjee, Sharma, Hill, Larus.. (1995)   (Correct)
Irregular computation problems underlie many important scientific applications. Although these problems are computationally expensive, and so would seem appropriate for parallel machines, their irregu... / crucial issues for achieving high performance on distributed memory br alternative systems on the same hardware base a Thinking Machines CM-

140.7   A Case for NOW (Networks of Workstations) - Anderson, Culler, Patterson, team (1994)   (Correct)
In this paper, we argue that because of recent technology advances, networks of workstations (NOWs) are poised to become the primary computing infrastructure for science and engineering, from low en... / micro would take over high-performance computing Brooks Today br The xFS goal is high performance highly available network file

137.1   Distributed Packet Rewriting - and its Application to Scalable Server .. - Bestavros, Crovella, Liu, Martin (1998)   (Correct)
To construct high performance Web servers, system builders are increasingly turning to distributed designs. An important challenge that arises in such designs is the need to direct incoming connection... / Abstract To construct high performance Web servers system builders br requires using special-purpose hardware to distribute incoming HTTP

136.2   Optimizing Triangle Strips for Fast Rendering - Evans, Skiena, Varshney (1996)   (Correct)
Almost all scientific visualization involving surfaces is currently done via triangles. The speed at which such triangulated surfaces can be displayed is crucial to interactive visualization and is bo... / virtual reality. The speed of high-performance rendering engines on br vertex. Special-purpose rendering hardware is needed to fully exploit the

136.2   PPFS: A High Performance Portable Parallel File System - Huber, Jr., Elford, Reed, Chien.. (1995)   (Correct)
Rapid increases in processor performance over the past decade have outstripped performance improvements in input/output devices, increasing the importance of input /output performance to overall syste... / PPFS A High Performance Portable Parallel File System br on a variety of Intel Paragon XP S hardware configurations using the Intel

136.2   A Comparative Analysis of Schemes for Correlated Branch Prediction - Young, Gloy, Smith (1995)   (Correct)
Modern high-performance architectures require extremely accurate branch prediction to overcome the performance limitations of conditional branches. We present a framework that categorizes branch predi... / Abstract Modern high-performance architectures require br led to the development of both hardware and software schemes that achieve

133.8   IMPACT: An Architectural Framework for Multiple-Instruction-Issue.. - Chang, Mahlke, Chen, Warter, Hwu (1991)   (Correct)
The performance of multiple-instruction-issue processors can be severely limited by the compiler's ability to generate efficient code for concurrent hardware. In the IMPACT project, we have developed ... / Hwu Center for Reliable and High-Performance Computing University of br efficient code for concurrent hardware. In the IMPACT project we have

130.8   Automatic Creation of an Autonomous Agent: Genetic Evolution of a.. - Floreano, Mondada (1994)   (Correct)
The paper describes the results of the evolutionary development of a real, neural-network driven mobile robot. The evolutionary approach to the development of neural controllers for autonomous agents ... / simulations are fast. High performance serial machines and massively br links or malfunctioning of some hardware components do not strongly

130.4   Lazy Release Consistency for Distributed Shared Memory - Keleher (1995)   (Correct)
A software distributed shared memory (DSM) system allows shared memory parallel programs to execute on networks of workstations. This thesis presents a new class of protocols that has lower communicat... / a viable alternative for high-performance parallel processing. br opportunities to bring high performance and high usability to a wide

130.4   An Argument for Simple COMA - Ashley Saulsbury (1995)   (Correct)
We present design details and some initial performance results of a novel scalable shared memory multiprocessor architecture. This architecture features the automatic data migration and replication ca... / without the accompanying hardware complexity. A software layer br DVSM systems leaving simpler hardware to maintain shared memory

130.0   An Evaluation of Directory Schemes for Cache Coherence - Agarwal, al. (1988)   (Correct)
The problem of cache coherence in shared-memory multiprocessors has been addressed using two basic approaches: directory schemes and snoopy cache schemes. Directory schemes have been given less attent... / cycle time especially in a high performance machine. Attempts to reduce br cache coherency support in hardware. These snoopy cache schemes also

127.8   Integrated PVM Framework Supports Heterogeneous Network Computing - Dongarra, Geist, Manchek, Sunderam (1993)   (Correct)
The Parallel Virtual Machine (PVM), an integrated framework for heterogeneous network computing, lets scientists exploit collections of networked machines when carrying out complex scientific computat... / components provides a coherent high-performance computing environment. In br has not kept pace with hardware advances. In order to fully

127.6   GLUnix: a Global Layer Unix for a Network of Workstations - Ghormley, Petrou, Rodrigues, Vahdat, .. (1997)   (Correct)
Recent improvements in network and workstation performance have made clusters an attractive architecture for diverse workloads, including sequential and parallel interactive applications. However, alt... / the availability of commodity high-performance workstations and networks br However although viable hardware solutions are available today

127.6   Converting Thread-Level Parallelism to Instruction-Level Parallelism.. - Lo, Eggers, Emer, Levy, Stamm.. (1997)   (Correct)
ing with credit is permitted. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Abstract To achieve high performance, co... / fee. Abstract To achieve high performance contemporary computer br insufficient ILP multiple-issue hardware on a superscalar is wasted. This

127.5   Scalable Computing - McColl (1996)   (Correct)
Scalable computing will, over the next few years, become the normal form of computing. In this paper we present a unified framework, based on the BSP model, which aims to serve as a foundation for t... / universal offering high performance in a predictable way on any br The two parts of that industry hardware and software are quite

124.6   Scheduling From the Perspective of the Application - Berman, Wolski (1996)   (Correct)
Metacomputing is the aggregation of distributed and high-performance resources on coordinated networks. With careful scheduling, resource-intensive applications can be implemented efficiently on metac... / of distributed and high-performance resources on coordinated br taking advantage of multiprocessor hardware features to execute multiple

124.6   ATM Internetworking - Alles (1995)   (Correct)
this paper was presented at Engineering InterOp, Las Vegas, March 1995. 1.0 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 unknown ATM Internetworking Anthony Alle... / of hardware intensive high performance ATM switches the deployment br do facilitate the development of hardware intensive high performance ATM

123.4   Multi-Protocol Active Messages on a Cluster of SMP's - Lumetta, Mainwaring, Culler (1997)   (Correct)
Clusters of multiprocessors, or Clumps, promise to be the supercomputers of the future, but obtaining high performance on these architectures requires an understanding of interactions between the mult... / of the future but obtaining high performance on these architectures br and analyzes the effects of the hardware and software architectures on

123.4   Control Flow Speculation in Multiscalar Processors - Jacobson, al. (1997)   (Correct)
The Multiscalar architecture executes a single sequential program following multiple flows of control. In the Multiscalar hardware, a global sequencer, with help from the compiler, takes large steps t... / International Symposium on High Performance Computer Architecture br of control. In the Multiscalar hardware a global sequencer with help

119.1   Auto-Blocking Matrix-Multiplication or Tracking BLAS3 Performance.. - Frens, Wise (1997)   (Correct)
An elementary, machine-independent, recursive algorithm for matrix multiplication C+=A*B provides implicit blocking at every level of the memory hierarchy and tests out faster than classically optimal... / to run well on extant high-performance systems. . Outline of br the compiler's knowledge of the hardware parameters to fit the target

118.8   MGS: A Multigrain Shared Memory System - Yeung (1996)   (Correct)
Parallel workstations, each comprising 10-100 processors, promise cost-effective general-purpose multiprocessing. This paper explores the coupling of such small- to medium-scale shared memory multipro... / communication interfaces high performance VLSI networks and br enables the collaboration of hardware and software shared memory and

118.5   Processor Allocation Policies for Message-Passing Parallel Computers - Mccann (1994)   (Correct)
When multiple jobs compete for processing resources on a parallel computer, the operating system kernel's processor allocation policy determines how many and which processors to allocate to each. This... / the potential for achieving high performance scalability and br . The Hardware and Software Environment

116.0   Legion: The Next Logical Step Toward a Nationwide Virtual Computer - Grimshaw, Wulf, French, Weaver.. (1994)   (Correct)
The coming of giga-bit networks makes possible the realization of a single nationwide virtual computer comprised of a variety of geographically distributed high-performance machines and workstations. ... / of geographically distributed high-performance machines and workstations. To br These are software problems the hardware challenges are being addressed

115.9   Synchronization and Communication in the T3E Multiprocessor - Scott (1996)   (Correct)
This paper describes the synchronization and communication primitives of the Cray T3E multiprocessor, a shared memory system scalable to 2048 processors. We discuss what we have learned from the T3D p... / programming model e.g.High Performance Fortran HPF or the br memories. Load store performance highlights the memory pipelining

115.9   A Survey of QoS Architectures - Aurrecoechea, Campbell, Hauw (1996)   (Correct)
Over the past several years there has been a considerable amount of research within the field of quality of service (QoS) support for distributed multimedia systems. To date, most of the work has been... / communication protocols for high performance in accordance with systems br protocols and the use of hardware assists for efficient protocol

115.9   Minimizing Register Requirements under Resource-Constrained.. - Govindarajan, Altman, Gao (1995)   (Correct)
The rapid advances in high-performance computer architecture and compilation techniques provide both challenges and opportunities to exploit the rich solution space of software pipelined loop schedule... / The rapid advances in high-performance computer architecture and br in computer architecture -hardware and software technology -

115.9   Performance Analysis of Embedded Software Using Implicit Path.. - Li, Malik (1995)   (Correct)
Embedded computer systems are characterized by the presence of a processor running application specific dedicated software. A large number of these systems must satisfy realtime constraints. This pape... / software. For example in a high-performance engine controller design br selection of the partition between hardware and software as well as

115.5   Graphical Fisheye Views of Graphs - Sarkar, Brown (1992)   (Correct)
A fisheye camera lens is a very wide angle lens that magnifies nearby objects while shrinking distant objects. It is a valuable tool for seeing both "local detail" and "global context" simultaneously.... / work includes exploring high-performance personal computing br Our approach to both hardware and software research is to

114.8   BSPlib - The BSP Programming Library - Hill, McColl, Stefanescu, Goudreau.. (1997)   (Correct)
This memory area is regarded as unregistered. 6. While registration is designed for "full duplex" communication, a process can do half duplex communication by, appropriately, registering an area of si... / be able to run unchanged with high performance on any general purpose br provide a clear focus for future hardware developments. For a model to

114.2   Executing Java Threads in Parallel in a Distributed-Memory Environment - MacBeth, McGuigan, Hatcher (1998)   (Correct)
We present the design and initial implementation of Hyperion, an environment for the highperformance execution of Java programs. Hyperion supports high performance by utilizing a Java-bytecode-to-C tr... / an environment for the highperformance execution of Java programs. br want the details of the parallel hardware to be hidden as much as possible.

114.2   Integrated Predicated and Speculative Execution in the IMPACT EPIC.. - August, Connors, Mahlke, Sias.. (1998)   (Correct)
Explicitly Parallel Instruction Computing (EPIC) architectures require the compiler to express program instruction level parallelism directly to the hardware. EPIC techniques which enable the compiler... / Hwu Center for Reliable and High-Performance Computing y br level parallelism directly to the hardware. EPIC techniques which enable the

114.2   The MOSIX Multicomputer Operating System for High Performance Cluster .. - Barak (1998)   (Correct)
The scalable computing cluster at Hebrew University consists of 64 Pentium and PentiumPro servers that are connected by fast Ethernet and the Myrinet LANs. It is running the MOSIX operating system, an... / Operating System for High Performance Cluster Computing br of affordable low-cost commodity hardware e.g. Pentium based Personal

112.0   A Practical System for Intermodule Code Optimization at Link-Time - Srivastava, Wall (1992)   (Correct)
We have developed a system called OM to explore the problem of code optimization at link-time. OM takes a collection of object modules constituting the entire program, and converts the object code int... / the design and application of high performance scientific computers. We br advance. Technologies both hardware and software do not all advance

111.1   On the Design of Chant: A Talking Threads Package - Matthew Haines (1994)   (Correct)
Lightweight threads are becoming increasingly useful in supporting parallelism and asynchronous control structures in applications and language implementations. However, lightweight thread packages tr... / support our extensions to the High Performance Fortran standard for br of a Unix process includes the hardware register kernel stack

110.1   Embra: Fast and Flexible Machine Simulation - Witchel, Rosenblum (1996)   (Correct)
This paper describes Embra, a simulator for the processors, caches, and memory systems of uniprocessors and cache-coherent multiprocessors. When running as part of the SimOS simulation environment, Em... / used by Embra to achieve high performance focusing on the br multiprocessors. Embra models the hardware of these machines in enough

108.8   Principles of Metareasoning - Russell, Wefald (1991)   (Correct)
In this paper we outline a general approach to the study of metareasoning, not in the sense of explicating the semantics of explicitly specified meta-level control policies, but in the sense of provid... / in applications demanding high performance and negligible response br Agents with Limited Performance Hardware project at Berkeley. We see

108.6   A High-Performance Microarchitecture with Hardware-Programmable.. - Razdan, Smith (1994)   (Correct)
This paper explores a novel way to incorporate hardware-programmable resources into a processor microarchitecture to improve the performance of general-purpose applications. Through a coupling of comp... / November A High-Performance Microarchitecture with br Microarchitecture with Hardware-Programmable Functional Units

107.2   Horus: A Flexible Group Communications System - van Renesse, Birman, Glade, Guo.. (1996)   (Correct)
The Horus system offers flexible group communication support for distributed applications. It is extensively layered and highly reconfigurable, allowing applications to only pay for services they use,... / novel mechanisms in support of high performance reliable group br has become popular it wraps a hardware group abstraction with a simple

107.2   The Microarchitecture of Superscalar Processors - Smith, Sohi (1995)   (Correct)
Superscalar processing is the latest in a long series of innovations aimed at producing ever-faster microprocessors. By exploiting instruction-level parallelism, superscalar processors are capable of ... / method for implementing high performance microprocessors. . . The br Processing Model Because hardware and software evolve it is rare

106.3   Disco: Running Commodity Operating Systems on Scalable Multiprocessors - Bugnion, Devine, Govil, Rosenblum (1997)   (Correct)
ing with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works, requires prior specific permission and... / multiprocessors to form a high performance system software base. Disco br these machines has often trailed hardware in reaching the functionality and

106.3   IP Switching and Gigabit Routers - Newman, Minshall, Lyon, Huston (1997)   (Correct)
This paper examines two approaches to the design of a high-performance router, the gigabit router and the IP switch, and then provides some detail on the implementation of an IP switch and the protoco... / approaches to the design of a high-performance router the gigabit router br of an ATM switch is that the hardware is standardized and is available

106.3   Ninf: A Network based Information Library for Global World-Wide.. - Sato (1997)   (Correct)
Ninf is an ongoing global network-wide computing infrastructure project which allows users to access computational resources including hardware, software and scientific data distributed across a wid... / intended not only to exploit high performance in network parallel br computational resources including hardware software and scientific data

103.7   Dynamic Memory Disambiguation Using the Memory Conflict Buffer - Gallagher (1994)   (Correct)
To exploit instruction level parallelism, compilers for VLIW and superscalar processors often employ static code scheduling. However, the available code reordering may be severely restricted due to am... / Hwu Center for Reliable and High-Performance Computing University of br This paper introduces a simple hardware mechanism referred to as the

103.0   MPI: A Message Passing Interface - Forum (1993)   (Correct)
This paper presents an overview of mpi, a proposed standard message passing interface for MIMD distributed memory concurrent computers. The design of mpi has been a collective effort involving researc... / and organization of the High Performance Fortran Forum. Subcommittees br or in some cases provide hardware or low-level system support for

102.8   Faster IP Lookups using Controlled Prefix Expansion - Srinivasan, Varghese (1998)   (Correct)
Internet (IP) address lookup is a major bottleneck in high performance routers. IP address lookup is challenging because it requires a longest matching prefix lookup. It is compounded by increasing ro... / is a major bottleneck in high performance routers. IP address lookup br Model We will consider both hardware and software platforms for

102.1   The Chimaera Reconfigurable Functional Unit - Scott Hauck (1997)   (Correct)
By strictly separating reconfigurable logic from their host processor, current custom computing systems suffer from a significant communication bottleneck. In this paper we describe Chimaera, a system... / execution model key to high performance general-purpose br benefits to justify the hardware costs and extra complexities.

101.4   The Synergy Between Non-blocking Synchronization and Operating System .. - Greenwald, Cheriton (1996)   (Correct)
Non-blocking synchronization has significant advantages over blocking synchronization: however, it has not been used to a significant degree in practice. We designed and implemented a multiprocessor o... / and run-time library for high-performance reliability and modularity. br for our approach and a potential hardware implementation. Section

101.2   Metasystems: An Approach Combining Parallel Processing and.. - Grimshaw (1994)   (Correct)
A metasystem is a single computing resource composed of a heterogeneous group of autonomous computers linked together by a network. The interconnection network needed to construct large metasystems wi... / are not a serious obstacle to high performance but that load imbalance br and coercion and schedules all hardware resources across the different

101.2   An Object-Oriented Concurrent Reflective Language for Dynamic.. - Masuhara (1994)   (Correct)
this paper proposes an object-oriented concurrent reflective language unknown (in IPSJ SIG Notes, 94-PRG-18, pp.57--64, 1994) 23567?;!ABD9-E4?=@? :/C,35GKIFHJ81.0 NP)R SWX" ZOO(`[]Y+VT+LQ nlpl~aszhiy... / DRM. One example is that HPF High Performance Fortran has directives for br to the application and or hardware architecture for efficient

101.0   Increasing the Instruction Fetch Rate via Multiple Branch Prediction.. - Yeh (1993)   (Correct)
High performance computer implementation today is increasingly directed toward parallelism in the hardware. Superscalar machines, where the hardware can issue more than one instruction each cycle, are... / Michigan Abstract High performance computer implementation today br directed toward parallelism in the hardware. Superscalar machines where the

100.0   An Overview of the Pablo Performance Analysis Environment - Reed, Aydt, Madhyastha, Noe.. (1992)   (Correct)
As massively parallel, distributed memory systems replace traditional vector supercomputers, effective application program optimization and system resource management become more than research curiosi... / based on the emerging High Performance Fortran HPF standard. br peak performance of the largest hardware configuration approaches

98.5   Hardware-Efficient Fair Queueing Architectures for High-Speed Networks - Rexford, Greenberg, Bonomi (1996)   (Correct)
In emerging communication networks, a single link may carry traffic for thousands of connections with different traffic parameters and quality-of-service requirements. High-speed links, coupled with s... / connection admissibility and performance high-speed links require simple br Hardware-Efficient Fair Queueing

98.5   Falcon: On-line Monitoring and Steering of Large-Scale Parallel.. - Gu (1995)   (Correct)
Falcon is a system for on-line monitoring and steering of large-scale parallel programs. The purpose of such interactive steering is to improve its performance or to affect its execution behavior. The... / Introduction The high performance of current parallel br basis. Falcon runs on several hardware platforms including the Kendall

97.8   The Design and Performance of a Real-time CORBA Event Service - Timothy Harrison (1997)   (Correct)
The CORBA Event Service provides a flexible model for asynchronous communication among objects. However, the standard CORBA Event Service specification lacks important features required by real-time a... / Endsystem Architecture for High-Performance Real-time CORBA. using a br the object is written in the OS hardware platform or the type of

97.1   Efficiently Adapting to Sharing Patterns in Software DSMs - Monnerat, Bianchini (1998)   (Correct)
In this paper we introduce a page-based Lazy Release Consistency protocol called ADSM that constantly and efficiently adapts to the applications' sharing patterns. Adaptation in ADSM is based on our d... / machine communicating via its highperformance network. For each br memory on top of message-passing hardware. These systems provide a

97.1   Variable Length Path Branch Prediction - Stark, al. (1998)   (Correct)
ing with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications D... / is required to achieve high performance in deeply pipelined br rate of . given a K byte hardware budget. For comparison the

95.6   Filters: QoS Support Mechanisms for Multipeer Communications - Yeadon, García, Hutchison, Shepherd (1996)   (Correct)
The nature of distributed multimedia applications is such that they require multipeer communication support mechanisms. The multimedia traffic needs to be delivered to end-systems, networks and end-us... / full quality media playout at high-performance workstations while at the br of the expense in the specialised hardware required to implement them. As

95.6   Message-Passing Performance Of Various Computers - Dongarra, Dunigan (1996)   (Correct)
This report compares the performance of different computer systems for basic message passing. Latency and bandwidth are measured on Convex, Cray, IBM, Intel, KSR, Meiko, nCUBE, NEC, SGI, and TMC multi... / result is that the vendors of high-performance computing have turned to br processors are interconnected by hardware and software to attack various

93.8   Public International Benchmarks for Parallel Computers - Hockney, Berry (1994)   (Correct)
this report: David Bailey (NASA Ames Research Center) , Michael Berry (University of Tennessee), Jack Dongarra (University of Tennessee/Oak Ridge National Laboratory), Vladimir Getov (University of So... / problems Chapter- and High Performance Fortran kernels to test the br . . Hardware Performance

93.6   SIMPLE: A Methodology for Programming High Performance Algorithms on.. - Bader, JaJa (1997)   (Correct)
We describe a methodology for developing high performance programs running on clusters of SMP nodes. Our methodology is based on a small kernel (SIMPLE ) of collective communication primitives that ma... / A Methodology for Programming High Performance Algorithms on Clusters of br smps Technology for Example Hardware The Support By Nsf Cise

93.6   Design and Implementation of Virtual Memory-Mapped Communication on.. - Dubnicki (1997)   (Correct)
This paper describes the design and implementation of the virtual memory-mapped communication model(VMMC) on a Myrinet network of PCI-based PCs. VMMC has been designed and implemented for the SHRIMP m... / Introduction Low cost and high performance are the potential advantages br limits imposed by the underlying hardware. The goal of this work is to

92.7   ASHs: Application-Specific Handlers for High-Performance Messaging - Wallach (1996)   (Correct)
Application-specific safe message handlers (ASHs) are designed to provide applications with hardware-level network performance. ASHs are user-written code fragments that safely and efficiently execute... / Handlers for High-Performance Messaging Deborah A. br to provide applications with hardware-level network performance. ASHs

92.7   Server Operating Systems - Kaashoek, Engler, Ganger, Wallach (1996)   (Correct)
We introduce server operating systems, which are sets of abstractions and runtime support for specialized, highperformance server applications. We have designed and are implementing a prototype server... / support for specialized highperformance server applications. We have br and that can safely timeshare the hardware platform with other applications.

92.7   A Wireless Broadband Ad-Hoc ATM Local-Area Network - Eng, Karol, Veeraraghavan, Ayanoglu, .. (1995)   (Correct)
this paper, the exact method by which the look-up table is generated is not important. In this section we are interested in the updates to the routing tables at each PBS in the K. Y. Eng et al. / A ... / are designed for simplicity high performance and modular implementations. br connections in the network. PBS hardware and software architectures are

91.4   Compiling for the Multiscalar Architecture - Vijaykumar (1998)   (Correct)
High-performance, general-purpose microprocessors serve as compute engines for computers ranging from personal computers to supercomputers. Sequential programs constitute a major portion of real-world... / i Abstract High-performance general-purpose br by speculation and verification in hardware. Since this thesis is the

91.4   SvPablo: A Multi-Language Performance Analysis System - De Rose, Zhang, Reed (1998)   (Correct)
In this paper we present the design of SvPablo, a language independent performance analysis and visualization system that can be easily extended to new contexts with minimal changes to the software in... / applications that achieve high performance on a current parallel and br SvPablo also exploits hardware performance counters to capture

91.3   Parallel Performance Prediction Using Lost Cycles Analysis - Crovella, LeBlanc (1994)   (Correct)
Most performance debugging and tuning of parallel programs is based on the "measure-modify" approach, which is heavily dependent on detailed measurements of programs during execution. This approach is... / Research Assistantship in High Performance Computing administered by the br e.g.load imbalance and hardware e.g.resource contention A

91.1   High Time-Resolution Measurement and Analysis of LAN Traffic.. - Leland, Wilson (1991)   (Correct)
The interconnection of local area networks is increasingly important, but little data are available on the characteristics of the aggregate traffic that LANs will be submitting to the interconnection ... / SBC we are able to dedicate a high performance processor to servicing the br We present a high time-resolution hardware monitor for Ethernet LANs that

90.9   The Design and Performance of a Pluggable Protocols Framework for.. - Kuhns, O'Ryan, Schmidt, Parsons (1999)   (Correct)
To be an effective platform for performance-sensitive realtime and embedded applications, off-the-shelf OO middleware like CORBA, DCOM, and Java RMI must preserve communication-layer quality of servic... / we describe how TAO our high-performance real-time CORBAcompliant br protocols and interconnects or hardware However the general lack of

89.3   Compressionless Routing: A Framework for Adaptive and Fault-tolerant.. - Kim, Liu, Chien (1997)   (Correct)
Compressionless Routing (CR) is a new adaptive routing framework which provides a unified framework for efficient deadlock-free adaptive routing and fault-tolerance. CR exploits the tightcoupling betw... / the results. Background High performance routing networks the subject br These results show that the hardware for CR and FCR networks is

89.3   PM: An Operating System Coordinated High Performance Communication.. - Hiroshi Tezuka (1997)   (Correct)
We have developed a new communication library, called PM, for the Myrinet gigabit LAN card, that has a dedicated processor and on-board memory to handle a communication protocol. In order to obtain ... / Operating System Coordinated High Performance Communication Library br directly accesses the network hardware to eliminate kernel traps and

89.0   The Duality of Memory and Communication in the Implementation of a.. - Young, Tevanian, Rashid, Golub.. (1987)   (Correct)
Mach is a multiprocessor operating system being implemented at Carnegie-Mellon University. An important component of the Mach design is the use of memory objects which can be managed either by the ker... / in Accent with extremely high performance through its use of br surviving the introduction of new hardware architectures and was never able

88.8   On Multicast Wormhole Routing in Multicomputer Networks - Boppana, Chalasani, Raghavendra (1994)   (Correct)
We show that deadlocks due to dependencies on consumption channels is a fundamental problem in multicast wormhole routing. This issue of deadlocks has not been addressed in many previously proposed ... / is important for achieving high performance in parallel computers. The br multicomputers with minimal hardware support. We present a simulation

86.9   Should Scalable Parallel Computers Support Efficient Hardware.. - Ni (1995)   (Correct)
Multicast communication is a frequently invoked communication pattern in many parallel algorithms. Although some parallel computer vendors have tried to directly support multicast in hardware, most ve... / multicast. ffl HPF High Performance Fortran In a highlevel br Computers Support Efficient Hardware Multicast Lionel M. Ni

85.7   Design Techniques for Low Power Systems - Havinga, Smit (2000)   (Correct)
Portable products are being used increasingly. Because these systems are battery powered, reducing power consumption is vital. In this report we give the properties of low power design and techniques ... / current trend is to focus on high performance processors as this is the br concentrate on dedicated low-power hardware and software architectures. A

85.7   Eliminating Conflict Misses for High Performance Architectures - Rivera, Tseng (1998)   (Correct)
Many cache misses in scientific programs are due to conflicts caused by limited set associativity. Two data-layout transformations, inter- and intra-variable padding, can eliminate many conflict misse... / Conflict Misses for High Performance Architectures Gabriel br on modern microprocessors. Due to hardware constraints caches have limited

85.7   Power and Performance Tradeoffs using Various Caching Strategies - Bahar, Albera, Manne (1998)   (Correct)
In this paper, we propose several different data and instruction cache configurations and analyze their power as well as performance implications on the processor. Unlike most existing work in low pow... / design we explore a high performance processor with the latest br which make use of these aggressive hardware-based techniques. In particular

85.1   Java for Parallel Computing and as a General Language for Scientific.. - Fox (1997)   (Correct)
We discuss the role of Java and Web technologies for general simulation. We classify the classes of concurrency typical in problems and analyze separately the role of Java in user interfaces, coarse g... / of the pyramid and the few high-performance systems as the top Figure br The distributed computing hardware of the Web has remarkable

84.0   Replication Using Group Communication Over a Partitioned Network - Amir (1995)   (Correct)
In systems based on the client-server model, a single server may serve many clients and the heavy load on the server may cause the response time to be adversely affected. In such circumstances, replic... / necessarily consistent reply. High performance of the architecture is br the available non-reliable hardware multicast for efficient

84.0   Reduced Overhead Logging for Rollback Recovery in Distributed Shared.. - Suri (1995)   (Correct)
Rollback techniques that use message logging and deterministic replay can be used in parallel systems to recover a failed node without involving other nodes. Distributed shared memory (DSM) systems ca... / Center for Reliable and High-Performance Computing Mountain br paging mechanism or in hardware using directory-based cache

83.9   Parallel Simulation Today - Nicol, Fujimoto (1994)   (Correct)
This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallel... / and ready availability of high-performance multiprocessors. The number br analysis time parallelism hardware support for parallel simulation

83.9   PASSION: Parallel And Scalable Software for Input-Output - Choudhary, Bordawekar, Harry.. (1994)   (Correct)
We are developing a software system called PASSION: Parallel And Scalable Software for InputOutput which provides software support for high performance parallel I/O. PASSION provides support at the la... / provides software support for high performance parallel I O. PASSION br nCUBE etc. provide some kind of hardware and software support for parallel

83.9   Separating Data and Control Transfer in Distributed Operating Systems - Thekkath, Levy, Lazowska (1994)   (Correct)
Advances in processor architecture and technology have resulted in workstations in the 100+ MIPS range. As well, newer local-area networks such as ATM promise a ten- to hundred-fold increase in throug... / transfer of that byte. Even in high-performance RPC systems control transfer br of distributed systems at the hardware level and that distributed

83.9   Branch Classification: a New Mechanism for Improving Branch Predictor .. - Chang (1994)   (Correct)
There is wide agreement that one of the most important impediments to the performance of current and future pipelined superscalar processors is the presence of conditional branches in the instruction ... / algorithm is important to a high-performance microprocessor. If we br hard-topredict branches or the hardware can special case the handling of

83.8   Amoeba - A Distributed Operating System for the 1990s - Mullender, van Rossum, Tanenbaum.. (1990)   (Correct)
Amoeba is the distributed system developed at the Free University (VU) and Centre for Mathematics and Computer Science (CWI), both in Amsterdam. Throughout the project's ten-year history, a major conc... / with simplicity and high performance. Distributed systems are br systems on its class of hardware reported so far in the

82.4   A Language-Based Approach To Protocol Implementation - Abbott (1993)   (Correct)
15 CHAPTER 1: INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : 17 1.1 Introduction to Network Software : : : :... / protocol layering entails a high performance cost developers are br as data programs and specialized hardware. Communicating data between

81.8   Efficient Support for P-HTTP in Cluster-Based Web Servers - Aron, Druschel, Zwaenepoel (1999)   (Correct)
This paper studies mechanisms and policies for supporting HTTP/1.1 persistent connections in cluster-based Web servers that employ contentbased request distribution. We present two mechanisms for the ... / platform for cost-effective high performance network servers. Achieving br becoming an increasingly popular hardware platform for cost-effective high

81.8   Supporting Fine-Grained Synchronization on a Simultaneous.. - Dean Tullsen (1999)   (Correct)
This paper proposes and evaluates new synchronization schemes for a simultaneous multithreaded processor. We present a scalable mechanism that permits threads to cheaply synchronize within the process... / th International Symposium on High Performance Computer Architecture br should be High Performance. High performance implies both

81.4   Compiler-directed Data Prefetching in Multiprocessors with Memory.. - Edward Gornish (1990)   (Correct)
Memory hierarchies are used by multiprocessor systems to reduce large memory access times. It is necessary to automatically manage such a hierarchy, to obtain effective memory utilization. In this pap... / networks MINs To achieve high performance in a hierarchical memory br of caches. In addition without hardware prefetching of cache lines no

81.4   Supercomputer Performance Evaluation and the Perfect Benchmarks - Cybenko (1990)   (Correct)
In the past three years, the Perfect Benchmark TM Suite has evolved from a supercomputer performance evaluation plan, presented by Kuck and Sameh at the 1987 International Conference on Supercomputi... / benchmarking to high performance workstations An br in large part to increases in hardware speed averaging an order of

81.1   Architecture Validation for Processors - Ho (1995)   (Correct)
Modern, high performance microprocessors are extremely complex machines which require substantial validation effort to ensure functional correctness prior to tapeout. Generating the corner cases to te... / . Abstract Modern high performance microprocessors are extremely br through simulation often using hardware-assist Gat to reduce

80.8   Applying Compiler Techniques to Cache Behavior Prediction - Ferdinand, Martin, Wilhelm (1997)   (Correct)
In previous work [1], we have developed the theoretical basis for the prediction of the cache behavior of programs by abstract interpretation. Abstract interpretation is a technique for the static ana... / more relevant and used for high performance microcontrollers and DSPs. br tailored for the analysis of hardware with states is presented. This

79.0   Paging Tradeoffs in Distributed-Shared-Memory Multiprocessors - Douglas Burger (1994)   (Correct)
Massively parallel processors have begun using commodity operating systems that support demand-paged virtual memory. To evaluate the utility of virtual memory, we measured the behavior of seven shar... / is a ubiquitous feature of high-performance workstations but has been br A DSM machine model Our target hardware system contains processing

78.3   A Novel Framework of Register Allocation for Software Pipelining - Ning, al. (1993)   (Correct)
ing with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications D... / pipelining can be applied to high-performance pipelined processor br schemes with or without special hardware support are discussed. We have

78.2   A High-Performance, Portable Implementation of the MPI Message.. - Gropp (1996)   (Correct)
MPI (Message Passing Interface) is a specification for a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, an... / A High-Performance Portable Implementation of br being followed the current hardware and software environment for

78.2   Stage Scheduling: A Technique to Reduce the Register Requirements of.. - Eichenberger, Davidson (1995)   (Correct)
Modulo scheduling is an efficient technique for exploiting instruction level parallelism in a variety of loops, resulting in high performance code but increased register requirements. We present a set... / of loops resulting in high performance code but increased register br be eliminated by using special hardware such as rotating register files

78.2   Optimizing Instruction Cache Performance for Operating System.. - Torrellas, Xia, Daigle (1995)   (Correct)
High instruction cache hit rates are key to high performance. One known technique to improve the hit rate of caches is to use an optimizing compiler to minimize cache interference via an improved layo... / cache hit rates are key to high performance. One known technique to br Firstly with the help of a hardware performance monitor we

77.9   Using Profile Information to Assist Classic Code Optimizations - Chang (1991)   (Correct)
This paper describes the design and implementation of an optimizing compiler that automatically generates profile information to assist classic code optimizations. This compiler contains two new compo... / Hwu Center for Reliable and High-performance Computing University of br time performs as well as the best hardware schemes Trace scheduling

77.8   Plan 9 from Bell Labs - Pike (1990)   (Correct)
Plan 9 is a distributed computing environment. It is assembled from separate machines acting as CPU servers, file servers, and terminals. The pieces are connected by a single file-oriented protocol an... / high-speed networks and in high-performance microprocessors. A common br adapt well to changes in computing hardware. In particular we wanted to

76.5   Quantitative Analysis and Model Checking - Huth (1997)   (Correct)
Many notions of models in computer science provide quantitative information, or uncertainties, which necessitate a quantitative model checking paradigm. We present such a framework for reactive and ge... / are utilized to achieve high performance at a cost of obtaining br verification especially in hardware design. Model checking as such

76.5   An Architecture for Optimal All-to-All Personalized Communication - Hinrichs, Kosak, O'Hallaron.. (1994)   (Correct)
In all-to-all personalized communication (AAPC), every node of a parallel system sends a potentially unique packet to every other node. AAPC is an important primitive operation for modern parallel com... / of data parallel compilers for High Performance Fortran Hig include br utilizing all links. A simple hardware addition for synchronized

76.5   DPGA-Coupled Microprocessors: Commodity ICs for the Early 21st Century - Andr Dehon (1994)   (Correct)
During the past decade the microprocessor has become a key commodity component for building all kinds of computational systems. During this time frame large, reconfigurable logic arrays have exploited... / microprocessors. Today's high-performance microprocessors sport - br to specialize the processing hardware to match the application

CiteSeer - citeseer.org - Terms of Service - Privacy Policy - Copyright © 1997-2002 NEC Research Institute