Home     Top: Hardware: High Performance    [CISC   High Performance   Logic Design   Memory Structures   Microprogramming   RISC   Storage   VLSI]

Change ordering:   Authority   Hubs (tutorials)   Date   Expected authority       Show titles only
Ordered by the number of citations

This directory is created automatically and some papers may be mislabeled. Only document within the CiteSeer database are listed. The directory is intended to provide entry points for browsing the database and is not intended to be authoritative. Papers may not appear in all relevant categories. For example, papers in a sub-category may not appear in higher level categories.

534   Active Messages: a Mechanism for Integrated Communication and.. - von Eicken, Culler, Goldstein.. (1992)   (Correct)
The design challenge for large-scale multiprocessors is (1) to minimize communication overhead, (2) allow communication to overlap computation, and (3) coordinate the two without sacrificing processor... / T communicate A high-performance network is required to br allows cost effective use of the hardware and offers tremendous

496   PVM: A Framework for Parallel Distributed Computing - Sunderam (1990)   (Correct)
The PVM system is a programming environment for the development and execution of large concurrent or parallel applications that consist of many interacting, but relatively independent, components. It ... / high-bandwidth external I O or high performance graphics thereby br environments already possess the hardware diversity required to solve such

325   TreadMarks: Distributed Shared Memory on Standard Workstations and.. - Keleher, Cox, Dwarkadas, Zwaenepoel (1994)   (Correct)
TreadMarks is a distributed shared memory (DSM) system for standard Unix systems such as SunOS and Ultrix. This paper presents a performance evaluation of TreadMarks running on Ultrix using DECstation... / is the bottleneck in achieving high performance for finer grained br workstation base no special hardware is required to use this facility

259   Efficient Software-Based Fault Isolation - Wahbe, Lucco, Anderson, Graham (1993)   (Correct)
One way to provide fault isolation among cooperating software modules is to place each in its own address space. However, for tightly-coupled modules, this solution incurs prohibitive context switch o... / Unfortunately there is a high performance cost to providing fault br poses a tradeoff relative to hardware fault isolation substantially

223   High Performance Messaging on Workstations: Illinois Fast Messages.. - Pakin, Lauria, Chien (1995)   (Correct)
ing with credit is permitted. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request Permissions from Publication Dep... / High Performance Messaging on Workstations br layers are needed to deliver the hardware performance to the application

220   Virtual Memory Mapped Network Interface for the SHRIMP Multicomputer - Blumrich, Alpert, Dubnicki, Felten (1993)   (Correct)
The network interfaces of existing multicomputers require a significant amount of software overhead at the operating system and user levels to provide protection and to implement message passing proto... / to construct scalable high-performance multicomputers. Our focus is br to these software overheads hardware communication latencies are

219   The NAS Parallel Benchmarks - Bailey, Barszcz, Barton, Browning.. (1994)   (Correct)
A new set of benchmarks has been developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of five "parallel kernel" benchmarks and three "simulated applicat... / community by the year a high-performance operational computing system br not kept pace with advances in hardware software and algorithms. In

184   Weak Ordering - A New Definition - Adve (1990)   (Correct)
A memory model for a shared memory, multiprocessor commonly and often implicitly assumed by programmers is that of sequential consistency. This model guarantees that all memory accesses will appear to... / that weak ordering facilitates high performance implementations but that br in terms of a set of rules for hardware that have to be made visible to

184   Compiler Transformations for High-Performance Computing - Bacon (1993)   (Correct)
In the last three decades a large number of compiler transformations for optimizing programs have been implemented. Most optimizations for uniprocessors reduce the number of instructions executed by t... / Compiler Transformations for High-Performance Computing DAVID F. BACON br organizations. Simultaneously hardware designers are able to employ

183   The Existence of Refinement Mappings - Abadi, Lamport (1988)   (Correct)
Refinement mappings are used to prove that a lower-level specification correctly implements a higher-level one. We consider specifications consisting of a state machine (which may be infinite-state) t... / work includes exploring high-performance personal computing br Our approach to both hardware and software research is to

181   The Amber System: Parallel Programming on a Network of Multiprocessors - Chase (1989)   (Correct)
Microprocessor-based shared-memory multiprocessors are becoming widely available and promise to provide cost-effective high-performance computing. This paper describes a programming system called Ambe... / to provide cost-effective high-performance computing. This paper br in which coherence is provided by hardware means for locally-executing

165   Multiscalar Processors - Sohi (1995)   (Correct)
Multiscalar processors use a new, aggressive implementation paradigm for extracting large quantities of instruction level parallelism from ordinary high level language programs. A single program is di... / in the program. To achieve high performance however modern processors br by a combination of software and hardware. The tasks are distributed to a

162   Performance of Various Computers Using Standard Linear Equations.. - Dongarra (1995)   (Correct)
This report compares the performance of different computer systems in solving dense systems of linear equations. The comparison involves approximately a hundred computers, ranging from a Cray Y-MP to ... / special features.Thus many high-performance machines may not have br as new machines are added and as hardware and software systems improve.

161   The MIT Alewife Machine: A Large-Scale Distributed-Memory.. - Agarwal, Chaiken, Johnson, Kranz.. (1991)   (Correct)
The Alewife multiprocessor project focuses on the architecture and design of a large-scale parallel machine. The machine uses a low dimension direct interconnection network to provide scalable communi... / processor. Introduction High-performance computer design is driven by br and concentrates on the novel hardware features of the machine including

161   The Sprite Network Operating System - Ousterhout, Cherenson, Douglis.. (1988)   (Correct)
Sprite is a new operating system for networked uniprocessor and multiprocessor workstations with large physical memories. It implements a set of kernel calls much like those of 4.3 BSD UNIX, with exte... / machines which provide high performance even for diskless br workstation with special hardware support for Lisp applications

148   Why Aren't Operating Systems Getting Faster As Fast As Hardware? - Ousterhout (1989)   (Correct)
This note evaluates several hardware platforms and operating systems using a set of benchmarks that test memory bandwidth and various operating system features such as kernel entry/exit and file syste... / the design and application of high performance scientific computers. We br Getting Faster As Fast As Hardware John Ousterhout d i g i t a

145   The Network Architecture of the Connection Machine CM-5 - Leiserson, Abuhamdeh, Douglas.. (1994)   (Correct)
The Connection Machine Model CM-5 Supercomputer is a massively parallel computer system designed to offer performance in the range of 1 teraflops (10 12 floating-point operations per second). The CM... / second The CM- obtains its high performance while offering ease of br back-door access to all system hardware to test system integrity and to

133   Totem: A Fault-Tolerant Multicast Group Communication System - Moser, Melliar-Smith, Agarwal.. (1996)   (Correct)
When Totem delivers multicast messages, it invokes operations in the same total order throughout the distributed system. The result: consistency of replicated data and simplified programming of applic... / systems use inexpensive highperformance computers and can be br networks LANs and exploits the hardware broadcasts of such networks to

131   Shoring Up Persistent Applications - Carey, DeWitt, Franklin, Hall.. (1994)   (Correct)
SHORE (Scalable Heterogeneous Object REpository) is a persistent object system under development at the University of Wisconsin. SHORE represents a merger of objectoriented database and file system te... / systems or on the kinds of high-performance multicomputer hardware br of high-performance multicomputer hardware needed for certain large scale

130   An Evaluation of Directory Schemes for Cache Coherence - Agarwal, al. (1988)   (Correct)
The problem of cache coherence in shared-memory multiprocessors has been addressed using two basic approaches: directory schemes and snoopy cache schemes. Directory schemes have been given less attent... / cycle time especially in a high performance machine. Attempts to reduce br cache coherency support in hardware. These snoopy cache schemes also

130   PROTEUS: A High-Performance Parallel-Architecture Simulator - Brewer, Dellarocas, Colbrook, Weihl (1991)   (Correct)
Proteus is a high-performance simulator for MIMD multiprocessors. It is fast, accurate, and flexible: it is one to two orders of magnitude faster than comparable simulators, it can reproduce results f... / PROTEUS A High-Performance Parallel-Architecture br is zero. Proteus can simulate hardware cache coherence for global

123   The Nexus Approach to Integrating Multithreading and Communication - Foster (1996)   (Correct)
Lightweight threads have an important role to play in parallel systems: they can be used to exploit shared-memory parallelism, to mask communication and I/O latencies, to implement remote memory acces... / threads and communication in high-performance distributed-memory systems. br handlers At the lower-performance higher-functionality end of the

118   The Zebra Striped Network File System - Hartman, Ousterhout (1993)   (Correct)
Zebra is a network file system that increases throughput by striping file data across multiple servers. Rather than striping each file separately, Zebra forms all the new data from each client into a ... / file system. This provides high performance for writes of small files as br Sprite file system on the same hardware. For small files the Zebra

118   The Paradyn Parallel Performance Measurement Tools - Miller, Callaghan (1995)   (Correct)
Paradyn is a performance measurement tool for parallel and distributed programs. Paradyn uses several novel technologies so that it scales to long running programs (hours or days) and large (thousand ... / an ARPA Graduate Fellowship in High Performance Computing. br to accept new operating system hardware and application specific

118   Zebra: A Striped Network File System - Hartman, Ousterhout (1993)   (Correct)
This paper presents the design of Zebra, a striped network file system. Zebra applies ideas from log-structured file system (LFS) and RAID research to network file systems, resulting in a network file... / designed to provide both high performance and high availability. This br to provide both high performance and high availability. This is

118   A Metaobject Protocol for C++ - Chiba (1995)   (Correct)
This paper presents a metaobject protocol (MOP) for C++. This MOP was designed to bring the power of meta-programming to C++ programmers. It avoids penalties on runtime performance by adopting a new m... / criteria of such a MOP are high performance and arbitrary br runtime. If this is not done in hardware the software will need to be

115   PVM: Parallel Virtual Machine - Geist, Beguelin, Dongarra, Jiang.. (1994)   (Correct)
this reporting is to be turned on (1) or turned off (0) for subsequent calls. A value of (2) will cause the program to exit after printing the error message (not implemented in 3.2). The default is re... / J. Petrie Jr. The High Performance Fortran Handbook by br fast pace of change in computer hardware software and algorithms often

115   NetSolve: A Network Server for Solving Computational Science Problems - Casanova (1995)   (Correct)
This paper presents a new system, called NetSolve, that allows users to access computational resources, such as hardware and software, distributed across the network. The development of NetSolve was m... / based Information Library for high performance computing Ninf project br computational resources such as hardware and software distributed across

113   High Speed Switch Scheduling for Local Area Networks - Anderson, Owicki, Saxe, Thacker (1993)   (Correct)
Current technology trends make it possible to build communication networks that can support high performance distributed computing. This paper describes issues in the design of a prototype switch for ... / networks that can support high performance distributed computing. This br switch architectures use the same hardware for both scheduling and data

109   Exploiting Choice: Instruction Fetch and Issue on an Implementable.. - Tullsen, Eggers, Emer, Levy, Lo.. (1996)   (Correct)
Simultaneous multithreading is a technique that permits multiple independent threads to issue multiple instructions each cycle. In previous work we demonstrated the performance potential of simultaneo... / architecture is derived from a high-performance outof order superscalar br wide-issue superscalar either in hardware structures or sizes. We present

109   Shared Memory Consistency Models: A Tutorial - Adve, Gharachorloo (1995)   (Correct)
Parallel systems that support the shared memory abstraction are becoming widely accepted in many areas of computing. Writing correct and efficient programs for such systems requires a formal specifica... / the design and application of high performance scientific computers. We br advance. Technologies both hardware and software do not all advance

104   A Unified Formalization of Four Shared-Memory Models - Adve (1993)   (Correct)
This paper presents a shared-memory model, data-race-free-1, that unifies four earlier models: weak ordering, release consistency (with sequentially consistent special operations), the VAX memory mode... / can be guaranteed with high performance. However each model br caches common uniprocessor hardware optimizations such as write

102   Zeus: A System for Algorithm Animation and Multi-View Editing - Brown (1992)   (Correct)
Algorithm animation is a form of program visualization that is concerned with dynamic and interactive graphical displays of a program's fundamental operations. This paper describes the Zeus algorithm ... / work includes exploring high-performance personal computing br Our approach to both hardware and software research is to

101   Fine-grain Access Control for Distributed Shared Memory - Schoinas (1994)   (Correct)
This paper discusses implementations of fine-grain memory access control, which selectively restricts reads and writes to cache-block-sized memory regions. Fine-grain access control forms the basis of... / shared-memory machines achieve high performance by using hardware-intensive br require little or no additional hardware. These techniques permit

98   Alternative Implementations of Two-Level Adaptive Branch Prediction - Yeh, Patt (1992)   (Correct)
As the issue rate and depth of pipelining of high performance Superscalar processors increase, the importance of an excellent branch predictor becomes more vital to delivering the potential performanc... / and depth of pipelining of high performance Superscalar processors br gathered. We compute the hardware costs of implementing each of the

95   Beowulf: A Parallel Workstation For Scientific Computation - Sterling, Becker, al. (1995)   (Correct)
Network-of-Workstations technology is applied to the challenge of implementing very high performance workstations for Earth and space science applications. The Beowulf parallel workstation employs 16 ... / challenge of implementing very high performance workstations for Earth and br tracks the evolution of commodity hardware as well as new ports of Linux to

91   IMPACT: An Architectural Framework for Multiple-Instruction-Issue.. - Chang, Mahlke, Chen, Warter, Hwu (1991)   (Correct)
The performance of multiple-instruction-issue processors can be severely limited by the compiler's ability to generate efficient code for concurrent hardware. In the IMPACT project, we have developed ... / Hwu Center for Reliable and High-Performance Computing University of br efficient code for concurrent hardware. In the IMPACT project we have

89   The Duality of Memory and Communication in the Implementation of a.. - Young, Tevanian, Rashid, Golub.. (1987)   (Correct)
Mach is a multiprocessor operating system being implemented at Carnegie-Mellon University. An important component of the Mach design is the use of memory objects which can be managed either by the ker... / in Accent with extremely high performance through its use of br surviving the introduction of new hardware architectures and was never able

87   Complexity-Effective Superscalar Processors - Palacharla (1997)   (Correct)
The performance tradeoff between hardware complexity and clock speed is studied. First, a generic superscalar pipeline is defined. Then the specific areas of register renaming, instruction window wake... / microarchitecture achieves high performance as measured by instructions br The performance tradeoff between hardware complexity and clock speed is

84   Application Performance and Flexibility on Exokernel Systems - Kaashoek, Engler, Ganger.. (1997)   (Correct)
The exokernel operating system architecture safely gives untrusted software efficient control over hardware and software resources by separating management from protection. This paper describes an exo... / applications to achieve high performance without sacrificing the br software efficient control over hardware and software resources by

79   Extensibility, Safety and Performance in the SPIN Operating System - Bershad, Savage, Pardyak, Sirer.. (1995)   (Correct)
This paper describes the motivation, architecture and performance of SPIN, an extensible operating system. SPIN provides an extension infrastructure together with a core set of extensible services th... / by the need to support high performance applications which present br rather than runtime using either hardware or software mechanisms. Strict

78   Software Write Detection for a Distributed Shared Memory - Zekauskas (1994)   (Correct)
Most software-based distributed shared memory (DSM) systems rely on the operating system's virtual memory interface to detect writes to shared data. Strategies based on virtual memory page protection ... / Software System Support for High Performance Multicomputing contract br shared memory but do not rely on hardware page protection such as Orca

74   Principles of Metareasoning - Russell, Wefald (1991)   (Correct)
In this paper we outline a general approach to the study of metareasoning, not in the sense of explicating the semantics of explicitly specified meta-level control policies, but in the sense of provid... / in applications demanding high performance and negligible response br Agents with Limited Performance Hardware project at Berkeley. We see

73   Improving IPC by Kernel Design - Liedtke (1993)   (Correct)
Inter-process communication (ipc) has to be fast and effective, otherwise programmers will not use remote procedure calls (RPC), multithreading and multitasking adequately. Thus ipc performance is vit... / trick to obtaining this high performance rather a synergetic br to -Kbyte messages Although hardware specific details influence both

71   Unifying Data and Control Transformations for Distributed Shared.. - Cierniak (1994)   (Correct)
We present a unified approach to locality optimization that employs both data and control transformations. Data transformations include changing the array layout in memory. Control transformations inv... / to be a serious obstacle to high performance on distributed shared memory br Most shared-memory machines both hardware and software based rely on data

71   Unifying Data and Control Transformations for Distributed.. - Cierniak, Li (1994)   (Correct)
We present a unified approach to locality optimization that employs both data and control transformations. Data transformations include changing the array layout in memory. Control transformations inv... / to be a serious obstacle to high performance on distributed shared memory br Most shared-memory machines both hardware and software based rely on data

71   Programmable Active Memories: Reconfigurable Systems Come of Age - Vuillemin, Bertin, Roncin, Shand.. (1996)   (Correct)
Programmable Active Memories (PAM) are a novel form of universal reconfigurable hardware co-processor. Based on Field-Programmable Gate Array (FPGA) technology, a PAM is a virtual machine, controlled ... / The proposal is a standard high-performance microprocessor enhanced by a br form of universal reconfigurable hardware co-processor. Based on

70   Amoeba - A Distributed Operating System for the 1990s - Mullender, van Rossum, Tanenbaum.. (1990)   (Correct)
Amoeba is the distributed system developed at the Free University (VU) and Centre for Mathematics and Computer Science (CWI), both in Amsterdam. Throughout the project's ten-year history, a major conc... / with simplicity and high performance. Distributed systems are br systems on its class of hardware reported so far in the

68   Supercomputer Performance Evaluation and the Perfect Benchmarks - Cybenko (1990)   (Correct)
In the past three years, the Perfect Benchmark TM Suite has evolved from a supercomputer performance evaluation plan, presented by Kuck and Sameh at the 1987 International Conference on Supercomputi... / benchmarking to high performance workstations An br in large part to increases in hardware speed averaging an order of

68   Compiler-directed Data Prefetching in Multiprocessors with Memory.. - Edward Gornish (1990)   (Correct)
Memory hierarchies are used by multiprocessor systems to reduce large memory access times. It is necessary to automatically manage such a hierarchy, to obtain effective memory utilization. In this pap... / networks MINs To achieve high performance in a hierarchical memory br of caches. In addition without hardware prefetching of cache lines no

67   Npsnet: A Network Software Architecture For Large Scale Virtual.. - Macedonia, Zyda, Pratt, Barham.. (1994)   (Correct)
This paper explores the issues involved in designing and developing network software architectures for large scale virtual environments. We present our ideas in the context of NPSNET-IV, the first 3D ... / and for the development of a high performance network software interface. br environment construction. hardware and operating system

67   Graphical Fisheye Views of Graphs - Sarkar, Brown (1992)   (Correct)
A fisheye camera lens is a very wide angle lens that magnifies nearby objects while shrinking distant objects. It is a valuable tool for seeing both "local detail" and "global context" simultaneously.... / work includes exploring high-performance personal computing br Our approach to both hardware and software research is to

65   A Practical System for Intermodule Code Optimization at Link-Time - Srivastava, Wall (1992)   (Correct)
We have developed a system called OM to explore the problem of code optimization at link-time. OM takes a collection of object modules constituting the entire program, and converts the object code int... / the design and application of high performance scientific computers. We br advance. Technologies both hardware and software do not all advance

65   Plan 9 from Bell Labs - Pike (1990)   (Correct)
Plan 9 is a distributed computing environment. It is assembled from separate machines acting as CPU servers, file servers, and terminals. The pieces are connected by a single file-oriented protocol an... / high-speed networks and in high-performance microprocessors. A common br adapt well to changes in computing hardware. In particular we wanted to

62   Integrated PVM Framework Supports Heterogeneous Network Computing - Dongarra, Geist, Manchek, Sunderam (1993)   (Correct)
The Parallel Virtual Machine (PVM), an integrated framework for heterogeneous network computing, lets scientists exploit collections of networked machines when carrying out complex scientific computat... / components provides a coherent high-performance computing environment. In br has not kept pace with hardware advances. In order to fully

62   High Time-Resolution Measurement and Analysis of LAN Traffic.. - Leland, Wilson (1991)   (Correct)
The interconnection of local area networks is increasingly important, but little data are available on the characteristics of the aggregate traffic that LANs will be submitting to the interconnection ... / SBC we are able to dedicate a high performance processor to servicing the br We present a high time-resolution hardware monitor for Ethernet LANs that

59   Threads and Input/Output in the Synthesis Kernel - Massalin, Pu (1995)   (Correct)
The Synthesis operating system kernel combines several techniques to provide high performance, including kernel code synthesis, fine-grain scheduling, and optimistic synchronization. Kernel code synth... / several techniques to provide high performance including kernel code br system implementations. Using hardware and software emulating a SUN

58   SPIN - An Extensible Microkernel for Application-specific Operating.. - Bershad, Chambers, Eggers, Maeda.. (1994)   (Correct)
Application domains such as multimedia, databases, and parallel computing, require operating system services with high performance and high functionality. Existing operating systems provide fixed inte... / system services with high performance and high functionality. br system services with high performance and high functionality. Existing

58   A Sorting Classification of Parallel Rendering - Molnar (1994)   (Correct)
We describe three broad classes of parallel rendering methods, based on where the sort from object-space to screen space occurs. These classes encompass most feedforward parallel software and hardware... / designers and implementers of high-performance parallel rendering systems. br feedforward parallel software and hardware rendering architectures that

58   Implementing Multiple Protection Domains in Java - Hawblitzel, Chang, Czajkowski, Hu.. (1998)   (Correct)
Safe language technology can be used for protection within a single address space. This protection is enforced by the language's type system, which ensures that references to objects cannot be forged... / language technology to offer high performance as well as protection in a br components without relying on hardware support. In a safe language

58   An Overview of the Pablo Performance Analysis Environment - Reed, Aydt, Madhyastha, Noe.. (1992)   (Correct)
As massively parallel, distributed memory systems replace traditional vector supercomputers, effective application program optimization and system resource management become more than research curiosi... / based on the emerging High Performance Fortran HPF standard. br peak performance of the largest hardware configuration approaches

57   The Relative Importance of Concurrent Writers and Weak Consistency.. - Peter Keleher (1996)   (Correct)
This paper presents a detailed comparison of the relative importance of allowing concurrent writers versus the choice of the underlying consistency model. Our comparison is based on single- and multip... / memory DSM systems achieve high performance through a combination of br to overall performance. Hardware shared memory systems typically

57   A Case for NOW (Networks of Workstations) - Anderson, Culler, Patterson, team (1994)   (Correct)
In this paper, we argue that because of recent technology advances, networks of workstations (NOWs) are poised to become the primary computing infrastructure for science and engineering, from low en... / micro would take over high-performance computing Brooks Today br The xFS goal is high performance highly available network file

55   Measured Capacity of an Ethernet: Myths and Reality - Boggs, Mogul, Kent (1988)   (Correct)
Ethernet, a 10 Mbit/sec CSMA/CD network, is one of the most successful LAN technologies. Considerable confusion exists as to the actual capacity of an Ethernet, especially since some theoretical studi... / the design and application of high performance scientific computers. We br advance. Technologies both hardware and software do not all advance

55   Compiler-Based Prefetching for Recursive Data Structures - Luk (1996)   (Correct)
Software-controlled data prefetching offers the potential for bridging the ever-increasing speed gap between the memory subsystem and today's high-performance processors. While prefetching has enjoyed... / memory subsystem and today's high-performance processors. While br can be controlled either by hardware or software. Hardware-based

53   Using Profile Information to Assist Classic Code Optimizations - Chang (1991)   (Correct)
This paper describes the design and implementation of an optimizing compiler that automatically generates profile information to assist classic code optimizations. This compiler contains two new compo... / Hwu Center for Reliable and High-performance Computing University of br time performs as well as the best hardware schemes Trace scheduling

53   Automatic Creation of an Autonomous Agent: Genetic Evolution of a.. - Floreano, Mondada (1994)   (Correct)
The paper describes the results of the evolutionary development of a real, neural-network driven mobile robot. The evolutionary approach to the development of neural controllers for autonomous agents ... / simulations are fast. High performance serial machines and massively br links or malfunctioning of some hardware components do not strongly

52   Assigning Confidence to Conditional Branch Predictions - Jacobsen, Rotenberg, Smith (1996)   (Correct)
Many high performance processors predict conditional branches and consume processor resources based on the prediction. In some situations, resource allocation can be better optimized if a confidence l... / Abstract Many high performance processors predict br such optimizations we consider hardware mechanisms that partition

51   Improving Release-Consistent Shared Virtual Memory using Automatic.. - Iftode (1996)   (Correct)
Shared virtual memory is a software technique to provide shared memory on a network of computers without special hardware support. Although several relaxed consistency models and implementations are q... / nd International Symposium on High-Performance Comp uter Architecture br of computers without special hardware support. Although several

50   MPI: A Message Passing Interface - Forum (1993)   (Correct)
This paper presents an overview of mpi, a proposed standard message passing interface for MIMD distributed memory concurrent computers. The design of mpi has been a collective effort involving researc... / and organization of the High Performance Fortran Forum. Subcommittees br or in some cases provide hardware or low-level system support for

50   The Galley Parallel File System - Nieuwejaar, Kotz (1996)   (Correct)
As the I/O needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file sy... / that is intended to deliver high performance to a variety of applications br has not been keeping pace. Hardware limitations are one reason for

49   Efficient Support for Irregular Applications on Distributed-Memory.. - Mukherjee, Sharma, Hill, Larus.. (1995)   (Correct)
Irregular computation problems underlie many important scientific applications. Although these problems are computationally expensive, and so would seem appropriate for parallel machines, their irregu... / crucial issues for achieving high performance on distributed memory br alternative systems on the same hardware base a Thinking Machines CM-

49   Applications-Driven Parallel I/O - Galbreath, Gropp, Levine   (Correct)
We investigate the needs of some massively parallel applications running on distributed-memory parallel computers at Argonne National Laboratory and identify some common parallel I/O operations. For t... / a file be accessed either as a high-performance parallel file or as a br the algorithms software and hardware involved must be efficient and

49   Increasing the Instruction Fetch Rate via Multiple Branch Prediction.. - Yeh (1993)   (Correct)
High performance computer implementation today is increasingly directed toward parallelism in the hardware. Superscalar machines, where the hardware can issue more than one instruction each cycle, are... / Michigan Abstract High performance computer implementation today br directed toward parallelism in the hardware. Superscalar machines where the

49   High-Performance Parallel Programming in Java: Exploiting Native.. - Getov (1998)   (Correct)
With most of today's fast scientific software written in Fortran and C, Java has a lot of catching up to do. In this paper we discuss how new Java programs can capitalize on high-performance libraries... / High-Performance Parallel Programming in Java br implementations on a range of hardware architectures. ScaLAPACK is

48   The Design of Nectar: A Network Backplane for Heterogeneous.. - Arnould (1989)   (Correct)
Nectar is a "network backplane" for use in heterogeneous multicomputers. The initial system consists of a starshaped fiber-optic network with an aggregate bandwidth of 1.6 gigabits/second and a switch... / and specialized high-performance machines. It is often not br for Nectar and describes its hardware and software. The presentation

48   Processor Allocation Policies for Message-Passing Parallel Computers - Mccann (1994)   (Correct)
When multiple jobs compete for processing resources on a parallel computer, the operating system kernel's processor allocation policy determines how many and which processors to allocate to each. This... / the potential for achieving high performance scalability and br . The Hardware and Software Environment

47   Legion: The Next Logical Step Toward a Nationwide Virtual Computer - Grimshaw, Wulf, French, Weaver.. (1994)   (Correct)
The coming of giga-bit networks makes possible the realization of a single nationwide virtual computer comprised of a variety of geographically distributed high-performance machines and workstations. ... / of geographically distributed high-performance machines and workstations. To br These are software problems the hardware challenges are being addressed

47   A Lock-Free Multiprocessor OS Kernel - Massalin, Pu (1991)   (Correct)
Typical shared-memory multiprocessor OS kernels use interlocking, implemented as spinlocks or waiting semaphores. We have implemented a complete multiprocessor OS kernel (including threads, virtual me... / an OS kernel achieves very high performance. The remaining of this paper br A. Hardware Measurement Tools

47   Optimizing Triangle Strips for Fast Rendering - Evans, Skiena, Varshney (1996)   (Correct)
Almost all scientific visualization involving surfaces is currently done via triangles. The speed at which such triangulated surfaces can be displayed is crucial to interactive visualization and is bo... / virtual reality. The speed of high-performance rendering engines on br vertex. Special-purpose rendering hardware is needed to fully exploit the

47   A Comparative Analysis of Schemes for Correlated Branch Prediction - Young, Gloy, Smith (1995)   (Correct)
Modern high-performance architectures require extremely accurate branch prediction to overcome the performance limitations of conditional branches. We present a framework that categorizes branch predi... / Abstract Modern high-performance architectures require br led to the development of both hardware and software schemes that achieve

47   PPFS: A High Performance Portable Parallel File System - Huber, Jr., Elford, Reed, Chien.. (1995)   (Correct)
Rapid increases in processor performance over the past decade have outstripped performance improvements in input/output devices, increasing the importance of input /output performance to overall syste... / PPFS A High Performance Portable Parallel File System br on a variety of Intel Paragon XP S hardware configurations using the Intel

45   An Argument for Simple COMA - Ashley Saulsbury (1995)   (Correct)
We present design details and some initial performance results of a novel scalable shared memory multiprocessor architecture. This architecture features the automatic data migration and replication ca... / without the accompanying hardware complexity. A software layer br DVSM systems leaving simpler hardware to maintain shared memory

45   On the Design of Chant: A Talking Threads Package - Matthew Haines (1994)   (Correct)
Lightweight threads are becoming increasingly useful in supporting parallelism and asynchronous control structures in applications and language implementations. However, lightweight thread packages tr... / support our extensions to the High Performance Fortran standard for br of a Unix process includes the hardware register kernel stack

45   Lazy Release Consistency for Distributed Shared Memory - Keleher (1995)   (Correct)
A software distributed shared memory (DSM) system allows shared memory parallel programs to execute on networks of workstations. This thesis presents a new class of protocols that has lower communicat... / a viable alternative for high-performance parallel processing. br opportunities to bring high performance and high usability to a wide

45   Can Logic Programming Execute as Fast as Imperative Programming?.. - Van Roy (1990)   (Correct)
Bibliographic references of "Can Logic Programming Execute as Fast as Imperative Programming?", Van Roy unknown 170 79. P. Voda, Trilogy version 1.0, Complete Logic Systems, Inc, September 1987. 80. ... / . . T. P. Dobry A High Performance Architecture for Prolog br . H. Nakashima and K. Nakajima Hardware Architecture of the Sequential

44   A High-Performance Microarchitecture with Hardware-Programmable.. - Razdan, Smith (1994)   (Correct)
This paper explores a novel way to incorporate hardware-programmable resources into a processor microarchitecture to improve the performance of general-purpose applications. Through a coupling of comp... / November A High-Performance Microarchitecture with br Microarchitecture with Hardware-Programmable Functional Units

44   Scalable Performance Environments for Parallel Systems - Reed (1991)   (Correct)
As parallel systems expand in size and complexity, the absence of performance tools for these parallel systems exacerbates the already difficult problems of application program and system software per... / As the class of high-performance computer systems extends br analysis levels including hardware system software and

44   Automatic Blocking of Nested Loops - Schreiber, Dongarra (1990)   (Correct)
Blocked algorithms have much better properties of data locality and therefore can be much more efficient than ordinary algorithms when a memory hierarchy is Supported by the NAS Systems Division an... / The concomitant fact in highperformance computing especially br The memory may be managed by hardware on a demand basis cache or

44   Scalable Computing - McColl (1996)   (Correct)
Scalable computing will, over the next few years, become the normal form of computing. In this paper we present a unified framework, based on the BSP model, which aims to serve as a foundation for t... / universal offering high performance in a predictable way on any br The two parts of that industry hardware and software are quite

43   Real-Time Occlusion Culling for Models with Large Occluders - Coorg (1997)   (Correct)
Efficiently identifying polygons that are visible from a dynamic synthetic viewpoint is an important problem in computer graphics. Typically, visibility determination is performed using the z-buffer a... / Despite the availability of high performance z-buffer hardware a br of high performance z-buffer hardware a significant fraction of

43   Pipeline Gating: Speculation Control For Energy Reduction - Manne (1998)   (Correct)
Branch prediction has enabled microprocessors to increase instruction level parallelism (ILP) by allowing programs to speculatively execute beyond control boundaries. Although speculative execution is... / performance reduces power in high-performance microprocessors without br In particular we introduce a hardware mechanism called pipeline gating

43   Scheduling From the Perspective of the Application - Berman, Wolski (1996)   (Correct)
Metacomputing is the aggregation of distributed and high-performance resources on coordinated networks. With careful scheduling, resource-intensive applications can be implemented efficiently on metac... / of distributed and high-performance resources on coordinated br taking advantage of multiprocessor hardware features to execute multiple

43   ATM Internetworking - Alles (1995)   (Correct)
this paper was presented at Engineering InterOp, Las Vegas, March 1995. 1.0 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 unknown ATM Internetworking Anthony Alle... / of hardware intensive high performance ATM switches the deployment br do facilitate the development of hardware intensive high performance ATM

43   Software Support for Speculative Loads - Rogers (1992)   (Correct)
This paper describes a simple hardware mechanism and related compiler support for software-controlled speculative loads. The compiler issues speculative load instructions based on anticipated data re... / to hide memory latency in high-performance processors. The architectural br This paper describes a simple hardware mechanism and related compiler

43   Maximizing Parallelism and Minimizing Synchronization with Affine.. - Lim, Lam (1998)   (Correct)
This paper presents an algorithm to find the optimal affine partitions that maximize the degree of parallelism and minimize the degree of synchronization in programs with arbitrary loop nestings and a... / suggests that achieving high performance on such machines is br needed to exploit a particular hardware configuration. The algorithm

42   MPI-FM: High Performance MPI on Workstation Clusters - Lauria, Chien (1997)   (Correct)
Despite the emergence of high speed LANs, the communication performance available to applications on workstation clusters still falls short of that available on MPPs. A new generation of efficient mes... / MPI-FM High Performance MPI on Workstation Clusters br is needed to take advantage of the hardware performance and to deliver it to

42   Dynamic Memory Disambiguation Using the Memory Conflict Buffer - Gallagher (1994)   (Correct)
To exploit instruction level parallelism, compilers for VLIW and superscalar processors often employ static code scheduling. However, the available code reordering may be severely restricted due to am... / Hwu Center for Reliable and High-Performance Computing University of br This paper introduces a simple hardware mechanism referred to as the

41   MGS: A Multigrain Shared Memory System - Yeung (1996)   (Correct)
Parallel workstations, each comprising 10-100 processors, promise cost-effective general-purpose multiprocessing. This paper explores the coupling of such small- to medium-scale shared memory multipro... / communication interfaces high performance VLSI networks and br enables the collaboration of hardware and software shared memory and

41   Metasystems: An Approach Combining Parallel Processing and.. - Grimshaw (1994)   (Correct)
A metasystem is a single computing resource composed of a heterogeneous group of autonomous computers linked together by a network. The interconnection network needed to construct large metasystems wi... / are not a serious obstacle to high performance but that load imbalance br and coercion and schedules all hardware resources across the different

41   An Object-Oriented Concurrent Reflective Language for Dynamic.. - Masuhara (1994)   (Correct)
this paper proposes an object-oriented concurrent reflective language unknown (in IPSJ SIG Notes, 94-PRG-18, pp.57--64, 1994) 23567?;!ABD9-E4?=@? :/C,35GKIFHJ81.0 NP)R SWX" ZOO(`[]Y+VT+LQ nlpl~aszhiy... / DRM. One example is that HPF High Performance Fortran has directives for br to the application and or hardware architecture for efficient

41   Beowulf: Harnessing the Power of Parallelism in a Pile-of-PCs - Ridge (1997)   (Correct)
The rapid increase in performance of mass market commodity microprocessors and significant disparity in pricing between PCs and scientific workstations has provided an opportunity for substantial gain... / Thomas Sterling High Performance Computing Systems Group Jet br using standard commodity hardware and software components. This

41   Maximizing Parallelism and Minimizing Synchronization with Affine.. - Lim, Lam (1997)   (Correct)
This paper presents the first algorithm to find the optimal affine transform that maximizes the degree of parallelism while minimizing the degree of synchronization in a program with arbitrary loop ne... / parallel code. Getting high performance on a multiprocessor requires br to exploit a particular parallel hardware configuration. From these affine

41   The Design of the TAO Real-Time Object Request Broker - Schmidt, Levine, Mungee (1999)   (Correct)
Many real-time application domains can benefit from flexible and open distributed architectures, such as those defined by the CORBA specification. CORBA is an architecture for distributed object compu... / design of TAO which is our high-performance real-time CORBAcompliant br backplanes and shared memory. Hardware CORBA shields applications from

40   A Survey of QoS Architectures - Aurrecoechea, Campbell, Hauw (1996)   (Correct)
Over the past several years there has been a considerable amount of research within the field of quality of service (QoS) support for distributed multimedia systems. To date, most of the work has been... / communication protocols for high performance in accordance with systems br protocols and the use of hardware assists for efficient protocol

40   Synchronization and Communication in the T3E Multiprocessor - Scott (1996)   (Correct)
This paper describes the synchronization and communication primitives of the Cray T3E multiprocessor, a shared memory system scalable to 2048 processors. We discuss what we have learned from the T3D p... / programming model e.g.High Performance Fortran HPF or the br memories. Load store performance highlights the memory pipelining

40   The MultiSpace: an Evolutionary Platform for Infrastructural Services - Gribble, Welsh, Brewer, Culler (1999)   (Correct)
This paper presents the architecture for a Base, a clustered environment for building and executing highly available, scalable, but exible and adaptable infrastructure services. Our architecture has t... / sound it leads to robust and high-performance services. However the br set operating system and hardware platform. Examples of such

40   Performance Analysis of Embedded Software Using Implicit Path.. - Li, Malik (1995)   (Correct)
Embedded computer systems are characterized by the presence of a processor running application specific dedicated software. A large number of these systems must satisfy realtime constraints. This pape... / software. For example in a high-performance engine controller design br selection of the partition between hardware and software as well as

40   Run-time Adaptive Cache Hierarchy Management via Reference Analysis - Johnson, Hwn (1997)   (Correct)
Improvements in main memory speeds have not kept pace with increasing processor clock frequency and improved exploitation of instruction-level parallelism. Consequently, the gap between processor and ... / Hwu Center for Reliable and High-Performance Computing University of br scheme where the hardware determines data placement based

40   Minimizing Register Requirements under Resource-Constrained.. - Govindarajan, Altman, Gao (1995)   (Correct)
The rapid advances in high-performance computer architecture and compilation techniques provide both challenges and opportunities to exploit the rich solution space of software pipelined loop schedule... / The rapid advances in high-performance computer architecture and br in computer architecture -hardware and software technology -

40   A Language-Based Approach To Protocol Implementation - Abbott (1993)   (Correct)
15 CHAPTER 1: INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : 17 1.1 Introduction to Network Software : : : :... / protocol layering entails a high performance cost developers are br as data programs and specialized hardware. Communicating data between

39   a Color and Sound in Algorithm Animation - Brown (1991)   (Correct)
Although systems for animating algorithms are becoming more powerful and easier for programmers to use, not enough attention has been given to the techniques that an algorithm animator needs to create... / work includes exploring high-performance personal computing br Our approach to both hardware and software research is to

38   Embra: Fast and Flexible Machine Simulation - Witchel, Rosenblum (1996)   (Correct)
This paper describes Embra, a simulator for the processors, caches, and memory systems of uniprocessors and cache-coherent multiprocessors. When running as part of the SimOS simulation environment, Em... / used by Embra to achieve high performance focusing on the br multiprocessors. Embra models the hardware of these machines in enough

38   The Desk Area Network - Hayter (1991)   (Correct)
A novel architecture for use within an end computing system is described. This attempts to extend the concepts used in modern high speed networks into computer system design. A multimedia workstation ... / and memory systems in high performance multiprocessor machines br gain further understanding of the hardware and software architecture of such

38   Public International Benchmarks for Parallel Computers - Hockney, Berry (1994)   (Correct)
this report: David Bailey (NASA Ames Research Center) , Michael Berry (University of Tennessee), Jack Dongarra (University of Tennessee/Oak Ridge National Laboratory), Vladimir Getov (University of So... / problems Chapter- and High Performance Fortran kernels to test the br . . Hardware Performance

38   Architectural Support for Single Address Space Operating Systems - Koldinger, Chase, Eggers (1992)   (Correct)
Recent microprocessor announcements show a trend toward wide-address computers: architectures that support 64 bits of virtual address space. Such architectures facilitate fundamentally new operating s... / This simplifies the use of high-performance virtually indexed data br protection lookaside buffer a hardware structure that implements this

38   Processor Coupling: Integrating Compile Time and Runtime Scheduling.. - Keckler (1992)   (Correct)
The technology to implement a single-chip node composed of 4 high-performance floating-point ALUs will be available by 1995. This paper presents processor coupling,a mechanism for controlling multiple... / node composed of high-performance floating-point ALUs will be br Trace system At runtime the hardware scheduling mechanism interleaves

38   A Novel Framework of Register Allocation for Software Pipelining - Ning, al. (1993)   (Correct)
ing with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications D... / pipelining can be applied to high-performance pipelined processor br schemes with or without special hardware support are discussed. We have

38   Speculative Versioning Cache - Gopal (1998)   (Correct)
Dependences among loads and stores whose addresses are unknown hinder the extraction of instruction level parallelism during the execution of a sequential program. Such ambiguous memory dependences ca... / International Symposium on High-Performance Computer Architecture. br instructions from a common set of hardware buffers e.g. reservation

37   Parallel Performance Prediction Using Lost Cycles Analysis - Crovella, LeBlanc (1994)   (Correct)
Most performance debugging and tuning of parallel programs is based on the "measure-modify" approach, which is heavily dependent on detailed measurements of programs during execution. This approach is... / Research Assistantship in High Performance Computing administered by the br e.g.load imbalance and hardware e.g.resource contention A

37   The Microarchitecture of Superscalar Processors - Smith, Sohi (1995)   (Correct)
Superscalar processing is the latest in a long series of innovations aimed at producing ever-faster microprocessors. By exploiting instruction-level parallelism, superscalar processors are capable of ... / method for implementing high performance microprocessors. . . The br Processing Model Because hardware and software evolve it is rare

37   Horus: A Flexible Group Communications System - van Renesse, Birman, Glade, Guo.. (1996)   (Correct)
The Horus system offers flexible group communication support for distributed applications. It is extensively layered and highly reconfigurable, allowing applications to only pay for services they use,... / novel mechanisms in support of high performance reliable group br has become popular it wraps a hardware group abstraction with a simple

37   Data Access Microarchitectures for Superscalar Processors with.. - Chen (1991)   (Correct)
The performance of superscalar processors is more sensitive to the memory system delay than their single-issue predecessors. This paper examines alternative data access microarchitectures that effecti... / Hwu Center for Reliable and High-Performance Computing University of br of a separate prefetch buffer. Hardware issues concerning both

37   The Virtual Windtunnel: An Environment for the Exploration of.. - Bryson, Levit (1991)   (Correct)
We describe a recently completed implementation of a virtual environment for exploring numerically-generated three-dimensional unsteady flowfields. A boom-mounted six degree of freedom head-position-s... / are all feasible using modern high performance graphics workstations and or br the implementation first the hardware and then the software. We review

36   On Multicast Wormhole Routing in Multicomputer Networks - Boppana, Chalasani, Raghavendra (1994)   (Correct)
We show that deadlocks due to dependencies on consumption channels is a fundamental problem in multicast wormhole routing. This issue of deadlocks has not been addressed in many previously proposed ... / is important for achieving high performance in parallel computers. The br multicomputers with minimal hardware support. We present a simulation

36   A Hardware / Software Codesign Methodology for DSP Applications - Kalavade, Lee (1993)   (Correct)
Embedded systems typically require a mix of hardware and software components. To design these systems, tools should support simultaneous specification, synthesis, and simulation of the software and ha... / simple yet they demand high performance and throughput. Furthermore br A Hardware Software Codesign Methodology

36   Sentinel Scheduling for VLIW and Superscalar Processors - Mahlke (1992)   (Correct)
Speculative execution is an important source of parallelism for VLIW and superscalar processors. A serious challenge with compiler-controlled speculative execution is to accurately detect and report a... / Hwu Center for Reliable and High-Performance Computing University of br overcome by providing sufficient hardware storage to buffer results until

35   Analyzing Stability in Wide-Area Network Performance - Balakrishnan, Seshan, Stemm, Katz (1997)   (Correct)
The Internet is a very large scale, complex, dynamical system that is hard to model and analyze. In this paper, we develop and analyze statistical models for the observed end-to-end network performanc... / and software used at this high-performance server are available from br of the Web site's network and the hardware used at the site. During the

35   The Synergy Between Non-blocking Synchronization and Operating System .. - Greenwald, Cheriton (1996)   (Correct)
Non-blocking synchronization has significant advantages over blocking synchronization: however, it has not been used to a significant degree in practice. We designed and implemented a multiprocessor o... / and run-time library for high-performance reliability and modularity. br for our approach and a potential hardware implementation. Section

34   A High-performance Endsystem Architecture for Real-time CORBA - Douglas Schmidt (1997)   (Correct)
Many application domains (such as avionics, telecommunications, and multimedia) require real-time guarantees from the underlying networks, operating systems, and middleware components to achieve their... / A High-performance Endsystem Architecture for br ATM and Fast Ethernet ffl Hardware such as RISC vs. CISC. The

34   Software DSM Protocols that Adapt between Single Writer and Multiple.. - Cristiana Amza (1997)   (Correct)
We present two software DSM protocols that dynamically adapt between a single writer (SW) and a multiple writer (MW) protocol based on the application 's sharing patterns. The first protocol (WFS) ad... / In Proceedings of the Second High Performance Computer Architecture br memory DSM on commodity hardware. Both single writer SW and

34   Falcon: On-line Monitoring and Steering of Large-Scale Parallel.. - Gu (1995)   (Correct)
Falcon is a system for on-line monitoring and steering of large-scale parallel programs. The purpose of such interactive steering is to improve its performance or to affect its execution behavior. The... / Introduction The high performance of current parallel br basis. Falcon runs on several hardware platforms including the Kendall

34   Separating Data and Control Transfer in Distributed Operating Systems - Thekkath, Levy, Lazowska (1994)   (Correct)
Advances in processor architecture and technology have resulted in workstations in the 100+ MIPS range. As well, newer local-area networks such as ATM promise a ten- to hundred-fold increase in throug... / transfer of that byte. Even in high-performance RPC systems control transfer br of distributed systems at the hardware level and that distributed

34   High-Performance Local Area Communication With Fast Sockets - Rodrigues, Anderson, Culler (1997)   (Correct)
Modern switched networks such as ATM and Myrinet enable low-latency, high-bandwidth communication. This performance has not been realized by current applications, because of the high processing overhe... / High-Performance Local Area Communication With br to the ability of modern network hardware however. While TCP is capable

34   PASSION: Parallel And Scalable Software for Input-Output - Choudhary, Bordawekar, Harry.. (1994)   (Correct)
We are developing a software system called PASSION: Parallel And Scalable Software for InputOutput which provides software support for high performance parallel I/O. PASSION provides support at the la... / provides software support for high performance parallel I O. PASSION br nCUBE etc. provide some kind of hardware and software support for parallel

34   Parallel Simulation Today - Nicol, Fujimoto (1994)   (Correct)
This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallel... / and ready availability of high-performance multiprocessors. The number br analysis time parallelism hardware support for parallel simulation

34   Hardware-Efficient Fair Queueing Architectures for High-Speed Networks - Rexford, Greenberg, Bonomi (1996)   (Correct)
In emerging communication networks, a single link may carry traffic for thousands of connections with different traffic parameters and quality-of-service requirements. High-speed links, coupled with s... / connection admissibility and performance high-speed links require simple br Hardware-Efficient Fair Queueing

34   Branch Classification: a New Mechanism for Improving Branch Predictor .. - Chang (1994)   (Correct)
There is wide agreement that one of the most important impediments to the performance of current and future pipelined superscalar processors is the presence of conditional branches in the instruction ... / algorithm is important to a high-performance microprocessor. If we br hard-topredict branches or the hardware can special case the handling of

33   Message-Passing Performance of Various Computers - Dongarra, Dunigan (1995)   (Correct)
This report compares the performance of different computer systems for basic message-passing. Latency and bandwidth are measured on Convex, Cray, IBM, Intel, KSR, Meiko, nCUBE, NEC, SGI, and TMC multi... / Laboratory The vendors of high-performance computing have turned to RISC br processors are interconnected by hardware and software to attack various

33   Data Transformations for Eliminating Conflict Misses - Rivera, Tseng (1998)   (Correct)
Many cache misses in scientific programs are due to conflicts caused by limited set associativity. We examine two compile-time data-layout transformations for eliminating conflict misses, concentratin... / speeds programs can achieve high performance only if they use caches br use caches effectively. Due to hardware constraints caches have limited

33   Filters: QoS Support Mechanisms for Multipeer Communications - Yeadon, García, Hutchison, Shepherd (1996)   (Correct)
The nature of distributed multimedia applications is such that they require multipeer communication support mechanisms. The multimedia traffic needs to be delivered to end-systems, networks and end-us... / full quality media playout at high-performance workstations while at the br of the expense in the specialised hardware required to implement them. As

32   Home-based SVM protocols for SMP clusters: Design and Performance - Samanta (1998)   (Correct)
As small-scale shared memory multiprocessors proliferate in the market, it is very attractive to construct largescale systems by connecting smaller multiprocessors together in software using efficient... / In The nd IEEE Symposium on High-Performance Computer Architecture Feb. br advantage of the intra-node hardware cache coherence and

32   Server Operating Systems - Kaashoek, Engler, Ganger, Wallach (1996)   (Correct)
We introduce server operating systems, which are sets of abstractions and runtime support for specialized, highperformance server applications. We have designed and are implementing a prototype server... / support for specialized highperformance server applications. We have br and that can safely timeshare the hardware platform with other applications.

32   ASHs: Application-Specific Handlers for High-Performance Messaging - Wallach (1996)   (Correct)
Application-specific safe message handlers (ASHs) are designed to provide applications with hardware-level network performance. ASHs are user-written code fragments that safely and efficiently execute... / Handlers for High-Performance Messaging Deborah A. br to provide applications with hardware-level network performance. ASHs

32   High-Performance Schedulers - Berman   (Correct)
Introduction 1 Scheduling -- the assignment of work to resources within a specified timeframe. 0.1 Introduction The computational grid will provide a platform for a new generation of applications. Gr... / High-Performance Schedulers Francine Berman br Both the software and hardware resources of the underlying

32   A Wireless Broadband Ad-Hoc ATM Local-Area Network - Eng, Karol, Veeraraghavan, Ayanoglu, .. (1995)   (Correct)
this paper, the exact method by which the look-up table is generated is not important. In this section we are interested in the updates to the routing tables at each PBS in the K. Y. Eng et al. / A ... / are designed for simplicity high performance and modular implementations. br connections in the network. PBS hardware and software architectures are

32   Paging Tradeoffs in Distributed-Shared-Memory Multiprocessors - Douglas Burger (1994)   (Correct)
Massively parallel processors have begun using commodity operating systems that support demand-paged virtual memory. To evaluate the utility of virtual memory, we measured the behavior of seven shar... / is a ubiquitous feature of high-performance workstations but has been br A DSM machine model Our target hardware system contains processing

32   Bridge: A High-Performance File System for Parallel Processors - Dibble (1988)   (Correct)
Faster storage devices cannot solve the I/O bottleneck problem for large multiprocessor systems if data passes through a file system on a single processor. Implementing the file system as a parallel p... / Bridge A High-Performance File System for Parallel br the art in parallel storage device hardware can deliver effectively

31   An Architecture for Optimal All-to-All Personalized Communication - Hinrichs, Kosak, O'Hallaron.. (1994)   (Correct)
In all-to-all personalized communication (AAPC), every node of a parallel system sends a potentially unique packet to every other node. AAPC is an important primitive operation for modern parallel com... / of data parallel compilers for High Performance Fortran Hig include br utilizing all links. A simple hardware addition for synchronized

31   DPGA-Coupled Microprocessors: Commodity ICs for the Early 21st Century - Andr Dehon (1994)   (Correct)
During the past decade the microprocessor has become a key commodity component for building all kinds of computational systems. During this time frame large, reconfigurable logic arrays have exploited... / microprocessors. Today's high-performance microprocessors sport - br to specialize the processing hardware to match the application

31   Virtual Network Transport Protocols for Myrinet - Chun (1998)   (Correct)
This paper describes a protocol for a general-purpose cluster communication system that supports multiprogramming with virtual networks, direct and protected network access, reliable message delivery ... / these systems achieved high-performance oftentimes on par with br processor and interconnection hardware. This sections presents a brief

31   Evolving the UNIX System Interface to Support Multithreaded Programs - Mcjones (1987)   (Correct)
Multiple threads (program counters executing in the same address space) make it easier to write programs that deal with related asynchronous activities and that execute faster on shared-memory multipr... / work includes exploring high-performance personal computing br Our approach to both hardware and software research is to

30   Converting Thread-Level Parallelism to Instruction-Level Parallelism.. - Lo, Eggers, Emer, Levy, Stamm.. (1997)   (Correct)
ing with credit is permitted. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Abstract To achieve high performance, co... / fee. Abstract To achieve high performance contemporary computer br insufficient ILP multiple-issue hardware on a superscalar is wasted. This

30   Should Scalable Parallel Computers Support Efficient Hardware.. - Ni (1995)   (Correct)
Multicast communication is a frequently invoked communication pattern in many parallel algorithms. Although some parallel computer vendors have tried to directly support multicast in hardware, most ve... / multicast. ffl HPF High Performance Fortran In a highlevel br Computers Support Efficient Hardware Multicast Lionel M. Ni

30   GLUnix: a Global Layer Unix for a Network of Workstations - Ghormley, Petrou, Rodrigues, Vahdat, .. (1997)   (Correct)
Recent improvements in network and workstation performance have made clusters an attractive architecture for diverse workloads, including sequential and parallel interactive applications. However, alt... / the availability of commodity high-performance workstations and networks br However although viable hardware solutions are available today

30   The Peregrine High-Performance RPC System - Johnson (1993)   (Correct)
This paper identifies some of the key performance optimizations used in Peregrine, and quantitatively assesses their benefits. Keywords: Peregrine, remote procedure call, interprocess communication, p... / The Peregrine High-Performance RPC System David B. br to the optimum allowed by the hardware limits while still supporting

29   Reduced Overhead Logging for Rollback Recovery in Distributed Shared.. - Suri (1995)   (Correct)
Rollback techniques that use message logging and deterministic replay can be used in parallel systems to recover a failed node without involving other nodes. Distributed shared memory (DSM) systems ca... / Center for Reliable and High-Performance Computing Mountain br paging mechanism or in hardware using directory-based cache

29   Replication Using Group Communication Over a Partitioned Network - Amir (1995)   (Correct)
In systems based on the client-server model, a single server may serve many clients and the heavy load on the server may cause the response time to be adversely affected. In such circumstances, replic... / necessarily consistent reply. High performance of the architecture is br the available non-reliable hardware multicast for efficient

29   The Bird-Meertens Formalism as a Parallel Model - Skillicorn (1993)   (Correct)
The expense of developing and maintaining software is the major obstacle to the routine use of parallel computation. Architecture independent programming offers a way of avoiding the problem, but the ... / typically much lower than a high performance uniprocessor. The difficulty br surely does not lie with parallel hardware whose performance follows a

29   Multi-Protocol Active Messages on a Cluster of SMP's - Lumetta, Mainwaring, Culler (1997)   (Correct)
Clusters of multiprocessors, or Clumps, promise to be the supercomputers of the future, but obtaining high performance on these architectures requires an understanding of interactions between the mult... / of the future but obtaining high performance on these architectures br and analyzes the effects of the hardware and software architectures on

29   Control Flow Speculation in Multiscalar Processors - Jacobson, al. (1997)   (Correct)
The Multiscalar architecture executes a single sequential program following multiple flows of control. In the Multiscalar hardware, a global sequencer, with help from the compiler, takes large steps t... / International Symposium on High Performance Computer Architecture br of control. In the Multiscalar hardware a global sequencer with help

28   Auto-Blocking Matrix-Multiplication or Tracking BLAS3 Performance.. - Frens, Wise (1997)   (Correct)
An elementary, machine-independent, recursive algorithm for matrix multiplication C+=A*B provides implicit blocking at every level of the memory hierarchy and tests out faster than classically optimal... / to run well on extant high-performance systems. . Outline of br the compiler's knowledge of the hardware parameters to fit the target

28   Texture Mapping as a Fundamental Drawing Primitive - Haeberli, Segal (1993)   (Correct)
Texture mapping has traditionally been used to add realism to computer graphics images. In recent years, this technique has moved from the domain of software rendering systems to that of high performa... / rendering systems to that of high performance graphics hardware. But br that of high performance graphics hardware. But texture mapping hardware

28   ADAPTIVE: A Dynamically Assembled Protocol Transformation.. - Schmidt, Box, Suda (1993)   (Correct)
Computer communication systems must undergo significant changes to keep pace with the increasingly demanding and diverse multimediaapplications that will run on the next generation of high-performance... / run on the next generation of high-performance networks. To facilitate these br and process management and hardware de- tures that support

28   Fast Interrupt Priority Management in Operating System Kernels - Stodolsky (1993)   (Correct)
In this paper we describe a new, low-overhead technique for manipulating processor interrupt state in an operating system kernel. Both uniprocessor and multiprocessor operating systems protect against... / and motivate the need for a high performance mechanism. In Section we br maps well onto a diverse array of hardware from systems with a single

28   Relaxing Consistency in Recoverable Distributed Shared Memory - Janssens (1993)   (Correct)
Relaxed memory consistency models tolerate increased memory access latency in both hardware and software distributed shared memory systems. In recoverable systems, relaxing consistency has the added b... / Center for Reliable and High-Performance Computing Coordinated br memory access latency in both hardware and software distributed shared

28   Software Strategies for Portable Computer Energy Management - Lorch, Smith (1998)   (Correct)
Limiting the energy consumption of computers, especially portables, is becoming increasingly important. Thus, new energy-saving computer components and architectures have been and continue to be devel... / features have both high performance and low power modes with br created by existing and suggested hardware innovations. Introduction

28   A More Efficient RMI for Java - Nester, Philippsen, Haumacher (1999)   (Correct)
In current Java implementations, Remote Method Invocation (RMI) is too slow, especially for high performance computing. RMI is designed for wide-area and high-latency networks, it is based on a slow o... / is too slow especially for high performance computing. RMI is designed br used over non-TCP IP networking hardware. Section discusses the

28   Architecture Validation for Processors - Ho (1995)   (Correct)
Modern, high performance microprocessors are extremely complex machines which require substantial validation effort to ensure functional correctness prior to tapeout. Generating the corner cases to te... / . Abstract Modern high performance microprocessors are extremely br through simulation often using hardware-assist Gat to reduce

27   Optimizing Instruction Cache Performance for Operating System.. - Josep Torrellas (1995)   (Correct)
High instruction cache hit rates are key to high performance. One known technique to improve the hit rate of caches is to use an optimizing compiler to minimize cache interference via an improved layo... / cache hit rates are key to high performance. One known technique to br Firstly with the help of a hardware performance monitor we

27   Exploiting In-Kernel Data Paths to Improve I/O Throughput and CPU.. - Fall (1993)   (Correct)
We present the motivation, design, implementation, and performance evaluation of a UNIX kernel mechanism capable of establishing fast in-kernel data pathways between I/O objects. A new system call, sp... / for most scientific and high performance computing platforms today br Introduction Improved computer hardware has enabled the development of

27   BSPlib - The BSP Programming Library - Hill, McColl, Stefanescu, Goudreau.. (1997)   (Correct)
This memory area is regarded as unregistered. 6. While registration is designed for "full duplex" communication, a process can do half duplex communication by, appropriately, registering an area of si... / be able to run unchanged with high performance on any general purpose br provide a clear focus for future hardware developments. For a model to

27   Stage Scheduling: A Technique to Reduce the Register Requirements of.. - Eichenberger, Davidson (1995)   (Correct)
Modulo scheduling is an efficient technique for exploiting instruction level parallelism in a variety of loops, resulting in high performance code but increased register requirements. We present a set... / of loops resulting in high performance code but increased register br be eliminated by using special hardware such as rotating register files

27   A High-Performance, Portable Implementation of the MPI Message.. - Gropp (1996)   (Correct)
MPI (Message Passing Interface) is a specification for a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, an... / A High-Performance Portable Implementation of br being followed the current hardware and software environment for

27   Reverse If-Conversion - Warter, Mahlke, Hwu, Rau (1993)   (Correct)
In this paper we present a set of isomorphic control transformations that allow the compiler to apply local scheduling techniques to acyclic subgraphs of the control flow graph. Thus, the code motion ... / Hwu Center for Reliable and High-Performance Computing University of br instruction level parallelism hardware need a large pool of operations

27   BSPlib: The BSP Programming Library - Hill, McColl, Stefanescu, Goudreau.. (1998)   (Correct)
BSPlib is a small communications library for bulk synchronous parallel (BSP) programming which consists of only 20 basic operations. This paper presents the full definition of BSPlib in C, motivates t... / be able to run unchanged with high performance on any general purpose br provide a clear focus for future hardware developments. For a model to

26   Beating the I/O Bottleneck: A Case for Log-Structured File Systems - Ousterhout (1988)   (Correct)
CPU speeds are improving at a dramatic rate, while disk speeds are not. This technology shift suggests that many engineering and office applications may become so I/O-limited that they cannot benefit ... / these and other approaches to high-performance I O. With luck an br file system yet and the hardware for which it is most suitable

26   A Hardware Implementation of Pure Esterel - Berry (1991)   (Correct)
Esterel is a synchronous concurrent programming language dedicated to reactive systems (controllers, protocols, man-machine interfaces, etc.). Esterel has an efficient standard software implementation... / programs in hardware to match high performance constraints. For example we br G erard Berry A Hardware Implementation of Pure Esterel

CiteSeer - citeseer.org - Terms of Service - Privacy Policy - Copyright © 1997-2002 NEC Research Institute