This directory is created automatically and some papers may be mislabeled. Only document within the CiteSeer database are listed. The directory is intended to provide entry points for browsing the database and is not intended to be authoritative. Papers may not appear in all relevant categories. For example, papers in a sub-category may not appear in higher level categories.
14758.6 A Survey of Multiprocessor Operating System Kernels - Mukherjee, Schwan, Gopinath (1993)(Correct)
Multiprocessors have been accepted as vehicles for improved computing speeds,
cost/performance, and enhanced reliability or availability. However, the added
performance requirements of user programs a... / machines linked by high performance networks especially local br capabilities of parallel hardware introduce new challenges to
8455.8 Advanced Vector Architectures - Espasa (1997)(Correct)
Vector architectures have long been the architecture of choice for numerical high performance
computing. Their large memory bandwidth and the ability to tolerate relatively
long memory latencies have ... / of choice for numerical high performance computing. Their large br processors rely on high-performance highly-interleaved memory systems
7911.2 Designing Memory Consistency Models For Shared-Memory Multiprocessors - Adve (1993)(Correct)
The memory consistency model (or memory model) of a shared-memory multiprocessor system influences
both the performance and the programmability of the system. The simplest and most intuitive model for... / the system guarantees both high performance and sequential consistency br However many of these are hardware-centric in nature and difficult
7792.0 Hardware Learning in Analogue VLSI Neural Networks - Lehmann (1994)(Correct)
English
In this thesis we are concerned with the hardware implementation of learning algorithms
for analogue VLSI artificial neural networks. Artificial neural networks
(ANNs) are often successfully ... / these properties providing high performance systems. Analogue VLSI br Electronics Institute Hardware Learning In Analogue Vlsi
7745.2 Mechanisms for Distributed Shared Memory - Reinhardt (1996)(Correct)
Distributed shared memory (DSM) systems simplify the task of writing distributedmemory
parallel programs by automating data distribution and communication. Unfortunately,
DSM systems control memory an... / integration. Typhoon achieves high performance by integrating key components br and describes a working hardware prototype of Typhoon- the
7735.1 Runtime Support For In-Core And Out-Of-Core Data-Parallel Programs - Thakur (1995)(Correct)
Distributed memory parallel computers or distributed computer systems are widely
recognized as the only cost-effective means of achieving teraflops performance in the
near future. However, the fact re... / data-parallel language like High Performance Fortran HPF to node br not kept pace with advances in hardware. This thesis addresses several
7308.8 Bandwidth And Latency Guarantees In Low-Cost, High-Performance.. - Kim (1997)(Correct)
ng limitations of existing solutions, we present a novel, cost-effective
resource control algorithm for service guarantees. Such cost-effective service guarantees not
only provide substantial benefits... / Guarantees In Low-Cost High-Performance Networks By Jae H. Kim br prefer better average performance higher throughput and lower
7234.9 Compiling for the Multiscalar Architecture - Vijaykumar (1998)(Correct)
High-performance, general-purpose microprocessors serve as compute engines for computers ranging from personal computers to supercomputers. Sequential programs constitute a major portion of real-world... / i Abstract High-performance general-purpose br by speculation and verification in hardware. Since this thesis is the
6914.2 Multithreaded Architectures: Principles, Projects and Issues - Dennis, Gao (1994)(Correct)
this paper benefited from discussions about their architectures. Anoop Gupta has helped us in the understanding of the DASH architecture and its memory hierarchy. Finally, the second author would like... / parallel computers in high performance scientific computation and br a new domain of computing hardware program models and compiling
6314.6 Future Research Directions In Problem Solving Environments For.. - Gallopoulos, Houstis, Rice (1991)(Correct)
this report was partially supported by Grant CCR-90-24549 from the National
Science Foundation. This is a report to the National Science Foundation and other agencies; it is
not a report by or of the ... / be the reality of the s high performance computers combined with br of the underlying computer hardware or software system. One might say
5981.4 High-Performance All-Software Distributed Shared Memory - Johnson (1995)(Correct)
The C Region Library (CRL) is a new all-software distributed shared memory (DSM) system.
CRL requires no special compiler, hardware, or operating system support beyond the ability to
send and receive ... / High-Performance All-Software Distributed br CRL requires no special compiler hardware or operating system support
5915.6 Architectures and Patterns for Developing High-performance, Real-time .. - Schmidt, Levine, Cleeland (1999)(Correct)
Many types of applications can benefit from flexible and open
middleware. CORBA is an emerging middleware standard
for Object Request Brokers (ORBs) that simplifies the development
of distributed appl... / and Patterns for Developing High-performance Real-time ORB Endsystems br backplanes and shared memory. Hardware CORBA shields applications from
5864.9 Efficient Machine-Independent Programming of High-Performance.. - Tseng (1995)(Correct)
mainly because the cost of interprocessor communication is too great compared to computation and local memory
accesses [74, 77]. To achieve high performance, COSMIC will perform communicationanalysis ... / Programming of High-Performance Multiprocessors Chau-Wen br efficient use of the underlying hardware. Sharedmemory machines typically
5839.0 Design and Implementation of a Multi-purpose Cluster System Network.. - Ang (1999)(Correct)
Today, the interface between a high speed network and a high performance computation
node is the least mature hardware technology in scalable general purpose
cluster computing. Currently, the one-inte... / a high speed network and a high performance computation node is the br superior message passing performance -higher bandwidth for large
5780.2 Complexity-Effective Superscalar Processors - Palacharla (1998)(Correct)
The performance trade-off between hardware complexity and clock speed in the design of superscalar microarchitectures is first investigated. Using the results of this trade-off analysis, the thesis pr... / with the goal of achieving high performance by reducing complexity. This br The performance trade-off between hardware complexity and clock speed in the
5699.6 Distributed Runtime Support For Task And Data Management - Haines (1993)(Correct)
OF PH.D. DISSERTATION
DISTRIBUTED RUNTIME SUPPORT FOR TASK AND DATA MANAGEMENT
High-performance computer architectures are evolving into larger and faster systems
and, in particular, distributed memor... / For Task And Data Management High-Performance Computer Architectures Are br . . Hardware-Based Approaches
5645.7 A Compiler-Directed Distributed Shared Memory System - Verma (1996)(Correct)
of the Dissertation
A Compiler-Directed Distributed Shared Memory System
by
Manish Verma
Doctor of Philosophy
in
Computer Science
State University of New York at Stony Brook
1996
This dissertation p... / . Evolution in High Performance Computing . br . . . The Hardware Platform .
5629.1 Resource Management for Responsive Web Computing - Bestavros, Chen, Crovella, Heddaya.. (1996)(Correct)
ion of generic classes for spatial operations, attributes, and indexing will leverage work in the IUE.
Established algorithms will be used for spatial subdivision (quad-trees, R*-trees) of images base... / to support the demands of High Performance Computing HPC applications. br the process of bringing hardware software and expertise from
5455.8 The Design of the TAO Real-Time Object Request Broker - Schmidt, Levine, Mungee (1999)(Correct)
Many real-time application domains can benefit from flexible
and open distributed architectures, such as those defined
by the CORBA specification. CORBA is an architecture
for distributed object compu... / design of TAO which is our high-performance real-time CORBAcompliant br backplanes and shared memory. Hardware CORBA shields applications from
5243.8 VLIW Processor Codesign for Video Processing - Wilberg, Camposano (1997)(Correct)
A codesign approach for complex video compression systems is presented. The system
is based on a flexible and programmable VLIW (Very Long Instruction Word) architecture. The
design approach can be ... / since most of the opcode for high-performance telecommunication systems is br for generating the processor hardware and the compiler back-end. The
5090.7 Goodness Definition And Goodness Measure For High Speed Transport.. - Sebuktekin (1992)(Correct)
Recent advances in optical communications, VLSI, and fiber-optic technologies
have created new horizons for high-speed protocols and applications
seeking end-to-end data transport at Gb/s speeds. In t... / bottleneck today in most high performance wide area networks WANs br ment algorithms and providing hardware support for the purpose of
5068.0 Massively Parallel Computing: Mathematics and communications libraries - Johnsson, Mathur (1993)(Correct)
Massively parallel computing holds the promise of extreme performance. The utility of
these systems will depend heavily upon the availability of libraries until compilation and run--
time system techn... / is addressed in the proposed High Performance Fortran standard Below br patterns efficiently through hardware and software. In addition load
5041.6 Synchronization, Coherence, and Consistency for High Performance.. - Dwarkadas (1992)(Correct)
Although improved device technology has increased the performance of computer systems,
fundamental hardware limitations and the need to build faster systems using
existing technology have led many com... / and Consistency for High Performance Shared-Memory br of computer systems fundamental hardware limitations and the need to build
4961.8 Report of the Working Group on Storage I/O for Large-Scale Computing - Gibson, Vitter, Wilkes (1996)(Correct)
We discuss the strategic directions and challenges in the management and use of storage systems -- those components of computer systems responsible for the storage and retrieval of data. The performan... / enough simply to store data. High performance access to data must be br of drives and cartridges. Storage hardware sales in topped billion
4896.6 Interconnection Networks And Data Prefetching For Large-Scale.. - Kim (1995)(Correct)
This memory access bottleneck is more serious in shared-memory multiprocessor systems,
where processors and memory units are connected through interconnection networks.
Processors cooperate to perform... / a more serious bottleneck in high-performance computer systems. Therefore br . . Hardware Data Prefetching
4832.7 The Meerkat Multicomputer: Tradeoffs in Multicomputer Architecture - Bedichek (1994)(Correct)
The Meerkat Multicomputer: Tradeoffs in Multicomputer Architecture
by Robert C. Bedichek
Co-Chairpersons of Supervisory Committee: Professor Henry M. Levy
Professor Edward D. Lazowska
Department of C... / . . Beyond Steady State High Performance Bus Signalling br results obtained from our hardware prototype and a calibrated
4830.8 An Application Perspective on High-Performance Computing and.. - Fox (1996)(Correct)
We review possible and probable industrial applications of HPCC focusing on the software
and hardware issues. Thirty-three separate categories are illustrated by detailed descriptions
of five areas---... / An Application Perspective on High-Performance Computing and Communications br ffl MPPs used as high performance high capacity multi-media
4794.3 Code Optimizers and Register Organizations for Vector Architectures - Lee (1992)(Correct)
A major challenge facing computer architects today is designing cost-effective hardware
that executes multiple operations simultaneously. The goal of such designs is to improve
performance by taking a... / . . High-Performance Memory System br today is designing cost-effective hardware that executes multiple
4723.4 Software and Hardware Requirements for Some Applications of Parallel.. - Fox (1995)(Correct)
We discuss the hardware and software requirements that appear relevant
for a set of industrial applications of parallel computing. these
are divided into 33 separate categories, and come from a recent... / characteristics. We consider High Performance Fortran and its extensions br ffl MPPs used as high performance high capacity multi-media
4697.4 An Efficient Virtual Network Interface in the FUGU Scalable.. - Mackenzie (1998)(Correct)
A scalable workstation is one vision of a mainstream parallel computer: a machine that combines
scalable, fine-grain communication facilities for parallel applications with virtual memory and preempti... / problem. The problem is that high performance communication for parallel br access to network interface hardware but also transparently backs the
4497.7 Region-Oriented Main Memory Management in Shared-Memory NUMA.. - Benjamin Gamsa (1992)(Correct)
The need to achieve higher performance through greater degrees of parallelism necessitates
distributing the memory throughout a multiprocessor system to reduce contention
and increase scalability. Unf... / because of their promise of high performance and a familiar programming br Related Work . Hardware
4482.4 Distributed Laboratories: A Research Proposal - Fujimoto, Schwan, Ahamad, Hudson.. (1996)(Correct)
this memory management to
multi-granular distributed computing environments. Issues that must be addressed include the lack
of a global memory pool and high communication latencies. New, efficient mem... / computations executing on high performance distributed computing br to support interactive and hardware-in-the-loop simulations by
4466.8 Lazy Release Consistency for Distributed Shared Memory - Keleher (1995)(Correct)
A software distributed shared memory (DSM) system allows shared memory parallel
programs to execute on networks of workstations. This thesis presents a new class
of protocols that has lower communicat... / a viable alternative for high-performance parallel processing. br opportunities to bring high performance and high usability to a wide
4404.8 Optimizing Fortran 90D Programs for SIMD Execution - Roth (1993)(Correct)
SIMD architectures offer an alternative to MIMD architectures for obtaining high
performance computation through parallelism. These architectures can offer impressive
price/performance ratios for cert... / architectures for obtaining high performance computation through br and exploit the massively parallel hardware closer to its full potential. To
4296.4 Naming, State Management, and User-Level Extensions in the Sprite.. - Welch (1990)(Correct)
This memory use could be reduced by introducing a
shared buffer pool, or setting the limit below 50 server processes. This limit is somewhat
arbitrary because the server processes are multiplexed amon... / power of a network of high-performance personal workstations. Our br workstations. We felt that new hardware features changed our computing
4288.4 Data Prefetching for High-Performance Processors - Chen (1993)(Correct)
Data Prefetching for High-Performance Processors by Tien-Fu Chen Chairperson of Supervisory Committee: Professor Jean-Loup Baer Department of Computer Science and Engineering Recent technological adva... / Data Prefetching for High-Performance Processors Tien-Fu Chen br problems. First we propose a hardware-based data prefetching approach
4252.4 Loop Optimization for Aggregate Array Computations - Liu, Stoller (1997)(Correct)
An aggregate array computation is a loop that computes accumulated quantities over array
elements. Such computations are common in programs that use arrays, and the array elements
involved in such com... / The large body of work on high performance computing has dealt with br compiler optimizations. Changes in hardware design have reduced the
4187.7 Parallel Simulation Today - Nicol, Fujimoto (1994)(Correct)
This paper surveys topics that presently define the state of the art in parallel simulation. Included in the
tutorial are discussions on new protocols, mathematical performance analysis, time parallel... / and ready availability of high-performance multiprocessors. The number br analysis time parallelism hardware support for parallel simulation
4163.7 Directions in Parallel Programming: HPF, Shared Virtual Memory and.. - Bodin, Priol, Mehrotra, Gannon(Correct)
Fortran and C++ are the dominant programming languages used in scientific computation.
Consequently, extensions to these languages are the most popular for programming massively
parallel computers. We... / and one approach to CThe High Performance Fortran Forum has designed br directly reflect the underlying hardware such an explicit-tasking
4153.2 Asynchronous Parallel Game-Tree Search - Brockington (1998)(Correct)
Tree searching is a fundamental and computationally intensive problem in artificial intelligence.
Parallelization of tree-searching algorithms is one method of improving the
speed of these algorithms.... / these algorithms. However a high-performance parallel two-player game-tree br . . . The Hardware .
4128.6 Data Layout Optimizations for High-Performance Architectures - Chau-Wen Tseng(Correct)
padding, transposing, and reindexing array
dimensions, and modifying heap allocation policies. Most optimizations must be applied at compile time,
but link-time and run-time optimizations are also pos... / Data Layout Optimizations for High-Performance Architectures Chau-Wen br with the details of the underlying hardware architecture. In particular
4037.9 Performance, Safety and Idioms in Parallel Programming Systems - Lu (1995)(Correct)
ions are too low level. Many PPSs are designed around specific mechanisms,
instead of around problem-solving techniques. The programmer is responsible for
correctness and performance tuning.
The need ... / processing is to achieve high performance for applications at a br some basic concepts of parallel hardware. The features of the hardware
3994.2 Massively Parallel Computing: Data distribution and communication - Johnsson (1993)(Correct)
We discuss some techniques for preserving locality of reference in index spaces
when mapped to memory units in a distributed memory architecture. In particular,
we discuss the use of multidimensional ... / the techniques used to achieve high performance for these primitives. br For fine grain architectures hardware techniques have been devised to
3975.3 Hardware Support for Flexible Distributed Shared Memory - Reinhardt, al.(Correct)
Workstation-based parallel systems are attractive due to their low cost and competitive uniprocessor
performance. However, supporting a cache-coherent global address space on these systems involves si... / computers use the same high-performance microprocessors found in br must be obtained from the IEEE. Hardware Support for Flexible Distributed
3973.4 Rule-Based Program Restructuring For High Performance Parallel.. - Tenny (1992)(Correct)
Writing good programs for high performance parallel computers is difficult. The
programmer must have a deep understanding of the underlying machine architecture.
Issues such as memory hierarchy, commu... / Program Restructuring For High Performance Parallel Processor Systems br From Algorithms to Languages to Hardware . .
3922.3 Compiler Optimizations For Parallel Loops With Fine-Grained.. - Chen (1994)(Correct)
this paper, we presented and evaluated a new runtime
algorithm to parallelize these loops. Our scheme handles any type of data dependence pattern without
requiring any special architectural support. F... / an integral part of the future high performance parallelizing compilers. iv br processing approach. As the hardware technology enables us to keep
3920.1 Shared Virtual Memory: A Survey - Shi, Hu, Tang (1998)(Correct)
1. Introduction ...SVM is an alias of softDSM. In the rest of this technical report, we will use them interchangely. Although the hardware approach to implement DSM has been shown to perform quite wel... / Hu Zhimin Tang Center of High Performance Computing Institute of br them interchangely. Although the hardware approach to implement DSM has
3911.0 Extensibility, Safety and Performance in the SPIN Operating System - Bershad, Savage, Pardyak, Sirer.. (1995)(Correct)
This paper describes the motivation, architecture and
performance of SPIN, an extensible operating system.
SPIN provides an extension infrastructure together with
a core set of extensible services th... / by the need to support high performance applications which present br rather than runtime using either hardware or software mechanisms. Strict
3845.8 CTK: Configurable Object Abstractions for Multiprocessors - Silva, Schwan (1997)(Correct)
ions for Multiprocessors
Dilma M. Silva Karsten Schwan
Computer Science Department College of Computing
University of S~ao Paulo Georgia Institute of Technology
S~ao Paulo, Brazil Atlanta, GA 30332
di... / Terms configurable systems high performance objects object br of the underlying multiprocessor hardware. ffl CTK provides efficient
3834.9 WebOS: Software Support for Scalable Web Services - Amin Vahdat (1997)(Correct)
The burgeoning popularity of the Web is pushing against
the performance limits of the underlying infrastructure, presenting
a number of difficult challenges for the Web as a system.
We believe that re... / describes our requirements for highperformance scalable Web services. br mostly read-only requests to hardwareconstrained servers over an
3785.4 Compiler Representations for Heterogeneous Processing - Weaver (1995)(Correct)
The emergence of heterogeneous parallel systems opens the possibility of higher performance for
complex, heterogeneous applications. Unfortunately, heterogeneous parallel systems are even more
complex... / can deliver consistent high performance by incorporating multiple br use of heterogeneous hardware to execute a single application
3769.6 Programmable Arithmetic Devices for High Speed Digital Signal.. - Chen(Correct)
The high throughput computation requirements of real-time digital signal processing (dsp)
systems usually dictate hardware intensive solutions. Often attendant to hardware approaches are
problems of h... / . Techniques for High Performance br Compiled Code Processor Performance High Machine Language Level
3740.4 The Design and Performance of an I/O Subsystem for Real-time ORB.. - Schmidt, Kuhns, Bector, Levine(Correct)
There is increasing demand to extend Object Request Broker
(ORB) middleware to support applications with stringent
quality of service (QoS) requirements. However, conventional
ORBs do not define stand... / Ace Orb tao Tao Is A High-Performance Real-Time Orb Endsystem br running on off-the-shelf hardware and software. Second it
3739.8 PASSION: Parallel And Scalable Software for Input-Output - Choudhary, Bordawekar, Harry.. (1994)(Correct)
We are developing a software system called PASSION: Parallel And Scalable Software for InputOutput
which provides software support for high performance parallel I/O. PASSION provides support
at the la... / provides software support for high performance parallel I O. PASSION br nCUBE etc. provide some kind of hardware and software support for parallel
3733.7 Efficient Runtime Support for Cluster-Based Distributed Shared Memory .. - Speight (1997)(Correct)
Distributed shared memory (DSM) systems provide a shared memory programming
paradigm on top of a physically distributed network of computers. The DSM system
removes the necessity for programmers to mo... / interconnects constructing a high-performance multiprocessor from a network br . . . Hardware SMP Performance vs. Brazos DSM
3716.9 Disk-directed I/O for MIMD Multiprocessors - Kotz (1996)(Correct)
Many scientific applications that run on today's multiprocessors, such as weather forecasting
and seismic analysis, are bottlenecked by their file-I/O needs. Even if the multiprocessor
is configured w... / technique provided consistent high performance that was largely independent br is configured with sufficient I O hardware the file-system software often
3716.0 Exploiting Fine-grain Parallelism in Concurrent Constraint Languages - Montelius (1997)(Correct)
Montelius, J., 1997. Exploiting Fine-grain Parallelism in Concurrent Constraint
Languages, 220 pp. Uppsala Thesis in Computing Science 28, ISSN 0283-359X,
ISBN 91-506-1215-8. SICS Dissertation Series ... / system was implemented on a high-performance shared-memory multiprocessor. br . The Hardware
3705.6 Performance Measurement Tools For Highlevel Parallel Programming.. - Irvin (1995)(Correct)
Users of high-level parallel programming languages require accurate performance information
that is relevant to their source code. Furthermore, when their programs experience performance
problems at t... / languages cannot guarantee high performance because compilers are often br at the lowest levels of their hardware and software systems programmers
3657.1 Efficient Reliable Group Communication For Distributed Systems - Kaashoek, Tanenbaum (1994)(Correct)
Many applications can profit from broadcast communication, but few operating systems provide primitives
that make broadcast communication available to user applications. In this paper we introduce pri... / multicasting to support high-performance multi-media applications br the n destinations fail due to hardware or software errors. In this
3645.5 DPF: A Data Parallel Fortran Benchmark Suite - Yu Hu (1995)(Correct)
The Data Parallel Fortran (DPF) benchmark suite is designed for evaluating data parallel
compilers and scalable architectures. Many of the DPF codes are provided in three versions:
basic, optimized an... / intended target language is High Performance Fortran HPF However due br not be available until late in the hardware product cycle. This in turn
3637.6 Replication Using Group Communication Over a Partitioned Network - Amir (1995)(Correct)
In systems based on the client-server model, a single server may serve many clients and
the heavy load on the server may cause the response time to be adversely affected. In such
circumstances, replic... / necessarily consistent reply. High performance of the architecture is br the available non-reliable hardware multicast for efficient
3602.0 Experience with a Clustered Parallel Reduction Machine - Beemster, Hartel, Hertzberger.. (1993)(Correct)
A clustered architecture has been designed to exploit divide and conquer parallelism in
functional programs. The programming methodology developed for the machine is based
on explicit annotations and ... / as a basis for a high performance compiler of the functional br has been constructed with stock hardware. This paper describes the
3567.6 Alternative Analysis for Computational Holon Architectures - Zeigler, Vahie, Kim (1994)(Correct)
Simulator : : : : : : : : : : : : : : : : : : : : : : : : : 87
Appendix E. Examples of Human Performance Process Hierarchical Decomposition
92
Appendix F. Scalable Coherent Interfaces 96
Contents (c... / . High Performance Supercomputers br Models Hardware Architecture Requirements
3540.3 Efficient MultiThreaded User-Space Transport for Network Computing.. - Gomez, Rego (1997)(Correct)
We present a novel user-space and transaction-oriented protocol for use in high-performance
distributed computing applications. The TRAP protocol is designed to support low-latency
communication in mu... / protocol for use in high-performance distributed computing br versions of TCP. But advances in hardware OS design and the pressures of
3535.0 Performance Prediction and Network Media Selection for PVM Clusters - Steed (1996)(Correct)
Workstation clusters are becoming more popular as a parallel computing platform. Several
programming libraries, including Parallel Virtual Machine (PVM), allow workstation clusters
to be programmed as... / parallelism is the future of high-performance computing Although br by complex interactions between hardware and software. Performance
3513.9 Transport System Architectures for High-Performance Communications.. - Schmidt, Suda (1993)(Correct)
Providing end-to-end gigabit communication support for
bandwidth-intensive distributed applications requires highperformance
transport systems. This paper describes and
classifies transport system mec... / System Architectures for High-Performance Communications Subsystems br and process management and hardware devices such as high-speed
3506.8 Runtime Mechanisms for Efficient Dynamic Multithreading - Karamcheti, Plevyak, Chien (1996)(Correct)
High performance on distributed memory machines for programming models with dynamic thread
creation and multithreading requires efficient thread management and communication. Traditional multithreadin... / Abstract High performance on distributed memory br that assume minimal compiler and hardware support are suitable for
3497.1 ASHs: Application-Specific Handlers for High-Performance Messaging - Wallach (1996)(Correct)
Application-specific safe message handlers (ASHs) are designed to provide applications with hardware-level network performance. ASHs are user-written code fragments that safely and efficiently execute... / Handlers for High-Performance Messaging Deborah A. br to provide applications with hardware-level network performance. ASHs
3490.2 The Microarchitecture of Superscalar Processors - Smith, Sohi (1995)(Correct)
Superscalar processing is the latest in a long series of innovations aimed at producing ever-faster microprocessors. By exploiting instruction-level parallelism, superscalar processors are capable of ... / method for implementing high performance microprocessors. . . The br Processing Model Because hardware and software evolve it is rare
3487.5 Adaptive Operating System Abstractions: A Case Study of.. - Bodhisattwa Mukherjee (1994)(Correct)
ions:
A Case Study of Multiprocessor
Locks
Bodhisattwa Mukherjee (bodhi@cc.gatech.edu)
Karsten Schwan (schwan@cc.gatech.edu)
GIT--CC--94/39
10 June 1994
Abstract
Operating system kernels typical... / However the attainment of high performance for a variety of parallel br properties of the underlying hardware ffl Adaptability how can
3470.5 Compiling for Heterogeneous Systems: A Survey and an Approach - McKinley, Moss, Singhai, Weaver.. (1995)(Correct)
Large applications tend to contain several models of parallelism, but only a few of these map efficiently to the single model of parallelism embodied in a homogeneous parallel system. Heterogeneous pa... / resources for achieving high performance. Unfortunately heterogeneity br use of heterogeneous hardware to execute a single application
3455.8 Algorithmic Redistribution Methods for Block Cyclic Decompositions - Petitet, Dongarra (1998)(Correct)
In a serial computational environment, transportable efficiency is the essential motivation
for developing blocking strategies and block-partitioned algorithms. An algorithmic blocking
factor adjust... / shown to be able to achieve high performance and efficiency for a given br to maximize the efficiency of the hardware resources. In a
3440.4 Parallel Rendering - Crockett (1995)(Correct)
In computer graphics, rendering is the process by which an abstract description of a scene is converted to
an image. When the scene is complex, or when high-quality images or high frame rates are requ... / as a primary driver of high-performance graphics systems. By the end br this situation. Today parallel hardware is routinely used in graphics
3438.4 Studies of Integration and Optimization of Interpreted and Compiled.. - Fox, Li, Wen, Zhang (1997)(Correct)
an view
our front end compiler as similar to the javac compiler's function of producing JavaVM
bytecodes. The II/CVM will naturally need the study of such issues as Just in Time compilation,
dynamic l... / of combining productivity with high performance. We intend to use Web br model within the same hardware technology. Thus a trend is to
3416.0 Pattern-Driven Automatic Parallelization - Christoph W. Keßler (1996)(Correct)
This paper describes a knowledge--based system for automatic parallelization of a wide class of
sequential numeric codes operating on vectors and dense matrices, and for execution on distributed
memor... / Vienna Fortran CMZ High Performance Fortran HPF and others. br protocols buffering undocumented hardware features and other problems.
3410.9 Software Strategies for Portable Computer Energy Management - Lorch, Smith (1998)(Correct)
Limiting the energy consumption of computers, especially portables, is becoming increasingly important. Thus, new energy-saving computer components and architectures have been and continue to be devel... / features have both high performance and low power modes with br created by existing and suggested hardware innovations. Introduction
3406.7 Filesystems for Network-Attached Secure Disks - Gibson, al. (1997)(Correct)
Network-attached storage enables network-striped data transfers directly between client and storage to provide clients with scalable bandwidth on large transfers. Network-attached storage also decoupl... / To be most effective all high-performance clients should be br encouraging the inclusion of hardware support for message digest to
3397.3 High Performance Software Coherence for Current and Future.. - Leonidas Kontothanassis (1994)(Correct)
Shared memory provides an attractive and intuitive programmingmodel for large-scale parallel
computing, but requires a coherence mechanism to allow caching for performance while ensuring
that processo... / High Performance Software Coherence for br end of this spectrum to the other. Hardware cache coherence is fast but
3380.6 Programming Techniques For Eagersharing Distributed Memory Systems - By Ai(Correct)
of the Dissertation
PROGRAMMING TECHNIQUES FOR
EAGERSHARING DISTRIBUTED
MEMORY SYSTEMS
by
Ai Li
Doctor of Philosophy
in
Computer Science
State University of New York
at Stony Brook
1993
To overcome th... / realize the potential for high performance and execution efficiency of br in this dissertation includes hardware and software iv techniques for
3336.7 High-Performance Local Area Communication With Fast Sockets - Rodrigues, Anderson, Culler (1997)(Correct)
Modern switched networks such as ATM and Myrinet
enable low-latency, high-bandwidth communication.
This performance has not been realized by current
applications, because of the high processing overhe... / High-Performance Local Area Communication With br to the ability of modern network hardware however. While TCP is capable
3298.1 An Empirical Evaluation of OS Support for Real-time CORBA Object.. - Levine, Flores-Gaitan, Schmidt (1999)(Correct)
There is increasing demand to extend Object Request Broker
(ORB) middleware to support distributed applications with
stringent real-time requirements. However, lack of proper OS
support can yield subs... / Overview Of Tao Tao Is A High-Performance Real-Time Orb Endsystem br ORB middleware. While holding the hardware and ORB constant we vary the
3285.5 Optimizing a CORBA Inter-ORB Protocol (IIOP) Engine for Minimal.. - Gokhale, Schmidt(Correct)
To support the quality of service (QoS) requirements of embedded
multimedia applications, such as real-time audio and
video, electronic mail and fax, and Internet telephony, off-theshelf
middleware li... / for TAO which is our high-performance real-time ORB. Second we br imposed by embedded system hardware necessitates a minimal footprint
3283.2 Processor Allocation Policies for Message-Passing Parallel Computers - Mccann (1994)(Correct)
When multiple jobs compete for processing resources on a parallel computer, the operating system kernel's processor allocation policy determines how many and which processors to allocate to each. This... / the potential for achieving high performance scalability and br . The Hardware and Software Environment
3248.7 Using Information from the Programmer to Implement System.. - Adve (1996)(Correct)
The memory consistency model of a shared-memory system is a formal specification of the semantics of sharedmemory.
The most commonly assumed model, sequential consistency, provides simple semantics bu... / but is not easily amenable to high performance. Researchers have proposed br uniprocessor systems including hardware and compiler overlap and reorder
3247.2 A Survey of User-Level Network Interfaces for System Area Networks - Mukherjee (1997)(Correct)
System Area Networks (SANs), such as Myricom Myrinet and IBM Vulcan, provide latency, bandwidth, and reliability that
are orders of magnitude better than traditional local area networks. SAN benefits ... / network. The demand for high performance communication subsystems-to br the operating system and the hardware latency to access e.g.read
3240.9 GLUnix: a Global Layer Unix for a Network of Workstations - Ghormley (1997)(Correct)
ions
To provide remote execution of both parallel and sequential jobs, GLUnix extends some existing UNIX abstractions and introduces
new abstractions, borrowing heavily from MPP environments such as ... / By leveraging commodity high-performance workstations and networks br applications. Although viable hardware solutions are available today
3240.5 Overview of neural hardware - Heemskerk (1995)(Correct)
Neural hardware has undergone rapid development during the last few years. This paper presents an
overview of neural hardware projects within industries and academia. It describes digital, analog, and... / in many cases very high performance rates have been obtained. br Overview of neural hardware Jan N. H. Heemskerk Unit of
3232.2 Placement of Objects in Parallel Object-Based Systems - Ghandeharizadeh, Wilhite (1994)(Correct)
Parallelism is a viable solution to constructing high performance object-oriented database
systems. This paper analyzes the role of parallelism in such systems. In parallel systems
based on a shared-n... / solution to constructing high performance object-oriented database br SQL a Distributed High-Performance High Availability Implementation
3218.6 Alleviating Priority Inversion and Non-determinism in Real-time CORBA .. - Schmidt, Mungee, Gokhale (1998)(Correct)
There is increasing demand to extend CORBA to support applications with stringent real-time requirements. However, conventional CORBA Object Request Brokers (ORBs) exhibit substantial priority inversi... / components that support high-performance real-time applications br objects are written in what OS hardware platform they run on or what
3198.4 An Overview of Rewrite Rule Laboratory (RRL) - Kapur, Zhang (1995)(Correct)
RRL (Rewrite Rule Laboratory) was originally developed as an environment for experimenting with automated reasoning algorithms for equational logic based on rewrite techniques. It has now matured into... / specific purposes. Herky High-Performance Key Operations is a fast br the use of formal methods in hardware and software design. We provide
3192.8 Mechanisms for Efficient, Protected Messaging - Lee(Correct)
Fine-grain parallelism is the key to high performance muticomputing. By partitioning
problems into small sub-tasks -- grain-sizes as small as 70 cycles have been found in common
benchmark programs -- ... / parallelism is the key to high performance muticomputing. By br network interfaces fast hardware is defeated by software layers
3188.2 Willow: A Scalable Shared-Memory Multiprocessor - Bennett, Dwarkadas, Greenwood.. (1992)(Correct)
We are currently developing Willow, a shared-memory multiprocessor whose design provides system
capacity and performance capable of supporting over a thousand commercial microprocessors. Most
recently... / Virtually all designers of high performance computers confronted with br simulators one a detailed hardware-level simulator the other an
3174.0 A Conflict-Free Memory Design For Multiprocessors - Shing (1991)(Correct)
A CONFLICT-FREE MEMORY DESIGN FOR
MULTIPROCESSORS
By
Honda Shing
Multiprocessors have been widely used in achieving high performance computation.
In a multiprocessor, applications are implemented wit... / been widely used in achieving high performance computation. In a br and still maintain good performance and high efficiency. Furthermore
3124.9 X Vision: A Portable Substrate for Real-Time Vision Applications - Hager, Toyama (1996)(Correct)
In the past several years, the speed of standard processors has reached the point where
interesting problems requiring visual tracking can be carried out on standard workstations.
However, relatively ... / vision which provides high performance on standard workstations br is accelerated using specialized hardware for a notable exception see
3117.0 Compiler Technology for Future Microprocessors - Hwu, Hank, Gallagher, Mahlke.. (1995)(Correct)
Advances in hardware technology have made it possible for microprocessors to execute a large
number of instructions concurrently (i.e., in parallel). These microprocessors take advantage of
the opport... / Center for Reliable and High-Performance Computing University of br Abstract Advances in hardware technology have made it possible
3099.1 Massively Parallel Programming Languages - A Classification of Design .. - Gellerich, Gutzmann(Correct)
This paper presents the results of a study in which
we examined about 50 parallel programming languages
in order to detect typical approaches towards supporting
massive parallelism. Based on a classif... / parallelism to achieve high performance and therefore excluded br parallel structures available in hardware yielding a mapped
3083.5 Scheduling Threads for Low Space Requirement and Good Locality - Girija Narlikar (1999)(Correct)
The running time and memory requirement of a parallel program
with dynamic, lightweight threads depends heavily on
the underlying thread scheduler. In this paper, we present
a simple, asynchronous, sp... / used work stealing to provide high performance br space efficient . Today's hardware-coherent shared memory
3083.0 Scheduler-Conscious Synchronization - Kontothanassis, Wisniewski, Scott (1994)(Correct)
Efficient synchronization is important for achieving good performance in parallel
programs, especially on large-scale multiprocessors. Most synchronization algorithms
have been designed to run on a de... / Science and Technology-High Performance Computing Software Science br ones provide more sophisticated hardware support for synchronization.
3072.0 Multiprocessor Cache Coherence Based on Virtual Memory Support - Petersen, Li (1995)(Correct)
Virtual memory based cache coherence is a mechanism that relies only on hardware
that already exists on the microprocessors of a shared memory multiprocessor system,
yet dynamically detects and res... / which focuses on building high-performance multicomputers from commodity br is a mechanism that relies only on hardware that already exists on the
3052.9 Architectural Considerations for Deterministic Real-Time ORB.. - Levine, Schmidt, Gill (1997)(Correct)
There is increasing demand to extend object-oriented middleware
to support applications with stringent quality of service
(QoS) requirements. However, conventional object-oriented
middleware does not ... / components that support high-performance real-time applications and br objects are written in what OS hardware platform they run on or what
3045.7 Implementation of Stack-Based Languages on Register Machines - Ertl (1996)(Correct)
Languages with programmer-visible stacks (stack-based languages) are
used widely, as intermediate languages (e.g., JavaVM, FCode), and as languages
for human programmers (e.g., Forth, PostScript). How... / processor architecture in high-performance computers from br efficiently on mainstream hardware using aggressive compiler
3044.1 Architectural Support for Single Address Space Operating Systems - Koldinger, Chase, Eggers (1992)(Correct)
Recent microprocessor announcements show a trend toward
wide-address computers: architectures that support
64 bits of virtual address space. Such architectures
facilitate fundamentally new operating s... / This simplifies the use of high-performance virtually indexed data br protection lookaside buffer a hardware structure that implements this
3041.3 Compiler Architectures for Heterogeneous Systems - McKinley, Singhai, Weaver, Weems (1995)(Correct)
Heterogeneous parallel systems incorporate diverse models of parallelism within a single machine or across machines and are better suited for diverse applications [25, 43, 30]. These systems are alr... / resources for achieving high performance. Unfortunately heterogeneity br use of heterogeneous hardware to execute a single application
3038.7 Software-Managed Address Translation - Jacob, Mudge (1997)(Correct)
In this paper we explore software-managed address translation.
The purpose of the study is to specify the memory
management design for a high clock-rate PowerPC implementation
in which a simple design... / International Symposium on High Performance Computer Architecture br is just as efficient as hardware managed address translation
3036.6 Effects of Communication Latency, Overhead, and Bandwidth in a.. - Martin, Vahdat, Culler, Anderson (1997)(Correct)
This work provides a systematic study of the impact of communication
performance on parallel applications in a high performance
network of workstations. We develop an experimental system in
which the ... / on parallel applications in a high performance network of workstations. We br realistic inputs on a flexible hardware prototype that can vary its
3031.3 A Multithreaded Communication System for ATM-Based High Performance.. - Park, Lee(Correct)
Current advances in processor technology and the rapid development of high-speed networking
technology (e.g., Asynchronous Transfer Mode (ATM), Myrinet, and Fast Ethernet) have made
network-based comp... / System for ATM-Based High Performance Distributed Computing br and MPCs where the networking hardware and communication software are
3028.5 HFS: A Performance-Oriented Flexible File System Based on.. - Krieger (1996)(Correct)
ing with credit is permitted. To copy otherwise, to republish, to post on
servers, to redistribute to lists, or to use any component of this work in other works, requires prior
specific permission and... / have poor support for high performance I O and as a result the br file systems both because of the hardware resources it must manage and
3009.8 Message Passing Support for Multi-grained, Multi-threading, and.. - Ang, Chiou, Rudolph, Arvind (1996)(Correct)
In order to become generally useful, message passing mechanisms not only need to provide
high performance, but also the three M's: multi-granularity, multi-threading and multiprocessing.
In this paper... / not only need to provide high performance but also the three M's br thread. Proper design of the hardware network interface can
2995.7 Measuring and Optimizing CORBA Latency and Scalability Over.. - Gokhale, Schmidt (1998)(Correct)
There is increasing demand to extend object-oriented middleware,
such as OMG CORBA, to support applications with
stringent quality of service (QoS) requirements. However,
conventional CORBA Object Req... / in TAO which is a high-performance real-time implementation of br backplanes and shared memory. Hardware CORBA shields applications from
2962.8 Hardware Support for Dynamic Access Ordering: Performance of Some.. - McKee (1993)(Correct)
Hardware Support for Dynamic Access Ordering:
Performance of Some Design Options
Sally A. McKee
Department of Computer Science
University of Virginia
Charlottesville, VA, 22903
mckee@virginia.edu
Mem... / in the application of high performance microprocessors to br Hardware Support for Dynamic Access
2961.5 Dynamic Access Ordering for Symmetric Shared-Memory Multiprocessors - McKee (1994)(Correct)
Dynamic Access Ordering for Symmetric SharedMemory
Multiprocessors
Sally A. McKee
Department of Computer Science
University of Virginia
Charlottesville, VA 22903
mckee@cs.virginia.edu
Memory bandwidth... / in the application of high performance microprocessors to br This paper describes the use of hardware-assisted access ordering in
2957.0 A Synopsis of the Legion Project - Grimshaw, Wulf, French, Weaver, Jr. (1994)(Correct)
The coming of giga-bit networks makes possible the realization of a single nationwide virtual computer
comprised of a variety of geographically distributed high-performance machines and workstations. ... / of geographically distributed high-performance machines and workstations. To br These are software problems the hardware challenges are being addressed
2952.1 The Nexus Approach to Integrating Multithreading and Communication - Foster (1996)(Correct)
Lightweight threads have an important role to play in parallel systems: they can be used to exploit shared-memory parallelism, to mask communication and I/O latencies, to implement remote memory acces... / threads and communication in high-performance distributed-memory systems. br handlers At the lower-performance higher-functionality end of the
2952.1 Analysis and Applications of Receptive Safety Properties in.. - Matos(Correct)
Formal verification for complex concurrent systems is a computationally intensive and, in some
cases, intractable process. The complexity is an inherent part of the verification process due to
the ... / is required in reliable and high performance systems. We can specify the br user profiles heterogeneous hardware distributed execution
2947.5 Parallel Algorithms For Test Generation And Fault Simulation - Patil (1990)(Correct)
INTRODUCTION
1.1. Parallel Processing for VLSI CAD
With the increased complexity of VLSI circuits, existing Computer-Aided Design (CAD)
algorithms will not be able to handle large circuits in a reason... / called the HIPERCAD High Performance CAD project whose br interfaces. Parallel processing hardware has also become more affordable
2946.5 Experience with a Distributed File System Implementation - Wang, Anderson, Dahlin (1997)(Correct)
this paper we report on some of the
lessons we have learned from the implementation
effort. We believe our experience may offer insight
for future system builders and encourage the development
of new ... / The recent emergence of high-performance local area networks br in xFS is close to those seen in hardware DSM systems such as DASH
2945.2 Phoneme Probability Estimation with Dynamic Sparsely Connected.. - Ström (1997)(Correct)
This paper presents new methods for training large neural networks for phoneme
probability estimation. An architecture combining time-delay windows and
recurrent connections is used to capture the imp... / for robust training of large high performance ANNs based on sparsely br is trained with special parallel hardware and a rather complex training
2935.3 Exploiting Multiprocessor Memory Hierarchies For Operating Systems - Xia (1996)(Correct)
d this mentorship into a joyful and valuable
life experience. Working very closely together, under his guidance, we persisted through
numerous difficult times together as well as shared many happy rew... / memory hierarchy is key to high performance. However the operating br trade-offs of the software hardware optimization schemes are also
2904.2 PANDA - Supporting Distributed Programming in C++ - Assenmacher, Breitbach, Buhler.. (1993)(Correct)
PANDA is a run-time package based on a very small operating system kernel
which supports distributed applications written in C++. It provides powerful abstractions
such as very efficient user-level ... / Presto Faust and Levy high performance and flexibility of br PANDA has been designed for a hardware platform consisting of a network
2901.3 The MIT Alewife Machine: A Large-Scale Distributed-Memory.. - Agarwal, Chaiken, Johnson, Kranz.. (1991)(Correct)
The Alewife multiprocessor project focuses on the architecture and design of a large-scale parallel machine. The machine uses a low dimension direct interconnection network to provide scalable communi... / processor. Introduction High-performance computer design is driven by br and concentrates on the novel hardware features of the machine including
2886.3 Cache Performance of Garbage-Collected Programs - Reinhold (1994)(Correct)
As processor speeds continue to improve relative
to main-memory access times, cache performance is becoming
an increasingly important component of program performance.
Prior work on the cache perfor... / A cache miss on current high-performance machines costs tens of br This widening gap has motivated hardware designers to seek improved
2881.8 Message Passing Support on StarT-Voyager - Computation Structures(Correct)
No single message passing mechanism can efficiently support all the different types of communication
that occur naturally in most parallel or distributed programs. MIT's StarT-Voyager, a
hybrid messag... / mechanisms to achieve very high performance over a wide spectrum of br of communication types and sizes. Hardware and operating system enforced
2875.9 Application Performance and Flexibility on Exokernel Systems - Kaashoek, Engler, Ganger.. (1997)(Correct)
The exokernel operating system architecture safely gives untrusted
software efficient control over hardware and software resources by
separating management from protection. This paper describes an
exo... / applications to achieve high performance without sacrificing the br software efficient control over hardware and software resources by
2866.9 ASPEN: High-Performance Hardware Support for Distributed Shared-Memory - Maxham (1994)(Correct)
ASPEN: High-Performance Hardware Support
for Distributed Shared-Memory
Kenneth Mark Maxham
This thesis describes and evaluates an integrated memory and network subsystem
designed to provide the abstr... / Rice University Aspen High-Performance Hardware Support For br ASPEN High-Performance Hardware Support for Distributed
2859.9 Sather 2: A Language Design for Safe, High-Performance Computing - Gomes, Löwe, Quittek, Weissman (1997)(Correct)
Consistency of objects in a concurrent computing environment is usually ensured by serializing all incoming
method calls. However, for high performance parallel computing intra-object parallelism, i.e... / A Language Design for Safe High-Performance Computing Benedict Gomes br networks Myrinet ATM parallel hardware platforms are now more widely
2851.1 Synchronization and Communication in the T3E Multiprocessor - Scott (1996)(Correct)
This paper describes the synchronization and communication primitives of the Cray T3E multiprocessor, a shared memory system scalable to 2048 processors. We discuss what we have learned from the T3D p... / programming model e.g.High Performance Fortran HPF or the br memories. Load store performance highlights the memory pipelining
2829.5 Parallel Computers and Complex Systems - Fox, Coddington (1994)(Correct)
We present an overview of the state of the art and future trends in
high performance parallel and distributed computing, and discuss techniques
for using such computers in the simulation of complex pr... / the art and future trends in high performance parallel and distributed br before the end of the decade. Hardware trends imply that all computers
2824.2 Issues in Autonomous Mobile Robot Navigation - Singhal (1997)(Correct)
Three main problems facing outdoor autonomous mobile robot navigation are unstructured
environments, moving obstacles, and multiple sensors. Each of these leads
to uncertainties that usually cannot be... / passed off as robots. Today's high performance world demands precise and br Beacons for Localization B Hardware for Proposed Experimentation
2805.9 Lessons from FTM: an Experiment in the Design and Implementation of a .. - Muller, al. (1995)(Correct)
This report describes an experiment in the design of a general purpose
fault tolerant system, FTM. The main objective of the FTM design was to implement
a "low-cost" fault tolerant system that could... / our approach relies on a high performance stable storage br which can be implemented either by hardware or software. We first motivate
2805.0 Falcon: On-line Monitoring and Steering of Large-Scale Parallel.. - Gu (1995)(Correct)
Falcon is a system for on-line monitoring and steering of large-scale parallel programs. The purpose of such interactive steering is to improve its performance or to affect its execution behavior. The... / Introduction The high performance of current parallel br basis. Falcon runs on several hardware platforms including the Kendall
2804.2 Implementing Fine-Grain Distributed Shared Memory On Commodity SMP.. - Schoinas (1996)(Correct)
This paper reports our experience implementing the Blizzard fine-grain distributed shared memory system on a
network of unmodified dual-processor workstations running a commercial operating system. Th... / shared memory but that high performance requires either custom br optimized software commodity hardware and custom hardware a
2798.3 Vision-Based Road Detection in Automotive Systems: A Real-Time.. - Broggi, Bertè (1995)(Correct)
The main aim of this work is the development of a vision-based road detection system
fast enough to cope with the difficult real-time constraints imposed by moving vehicle
applications. The hardware p... / Figure .b But due to the high performance levels achieved it will be br moving vehicle applications. The hardware platform a special-purpose
2797.6 Object Models for Distributed or Persistent Programming - Cahill Nixon (1997)(Correct)
As use of object orientation for application development has increased,
many researchers have investigated the design of object-based programming
languages for the distributed and persistent programmi... / as well as the traditional high performance community. A distributed br consideration for the underlying hardware except when this is part of the
2773.0 Flexibility and Performance of Parallel File Systems - Kotz, Nieuwejaar (1996)(Correct)
Many scientific applications for high-performance multiprocessors have tremendous I/O requirements.
As a result, the I/O system is often the limiting factor of application performance.
Several new par... / scientific applications for high-performance multiprocessors have br systems with sufficient I O hardware Kot Most of today's
2758.9 Implementing Multidestination Worms in Switch Based Parallel Systems: .. - Craig Stunkely(Correct)
Multidestination message passing has been proposed as an attractive mechanism for efficiently
implementing multicast and other collective operations on direct networks. However, applying this
mechanis... / on these systems to achieve high performance parallel computation. Many br traffic the central-queue based hardware multicast implementation affects
2756.6 An Interaction of Coherence Protocols and Memory Consistency Models.. - Shi, Hu, Tang (1997)(Correct)
Coherence protocols and memory consistency models are two important issues in hardware
coherent shared memory multiprocessors and software distributed shared memory(DSM) systems.
Over the years, many ... / Zhimin Tang Center of High Performance Computing Institute of br models are two important issues in hardware coherent shared memory
2755.3 MORPH: A System Architecture for Robust High Performance Using.. - Chien, Gupta (1996)(Correct)
Achieving 100 TeraOps performance within a tenyear
horizon will require massively-parallel architectures
that exploit both commodity software and hardware
technology for cost efficiency. Increasing cl... / System Architecture for Robust High Performance Using Customization An br both commodity software and hardware technology for cost efficiency.
2739.8 Informing Loads: Enabling Software To Observe And React To Memory.. - Horowitz, Martonosi, Mowry, Smith (1995)(Correct)
Memory latency is an important bottleneck in system performance that cannot be adequately solved by
hardware alone. Several promising software techniques have been shown to address this problem succes... / is already present in today's high-performance processors. Key Words and br cannot be adequately solved by hardware alone. Several promising software
2732.6 Indigo: User-level Support for Building Distributed Shared.. - Prince Kohli (1995)(Correct)
ions
Prince Kohli
Mustaque Ahamad
y
Karsten Schwan
College of Computing
Georgia Institute of Technology
June 12, 1996
Abstract
Distributed systems that consist of workstations connected by high p... / of workstations connected by high performance interconnects offer br the programmability of such hardware by presenting to application
2728.8 Lazy Threads: Implementing a Fast Parallel Call - Goldstein, Schauser, Culler (1996)(Correct)
In this paper we describe lazy threads, a new approach for implementing multi-threaded execution
models on conventional machines. We show how they can implement a parallel call at nearly the
efficienc... / thread of control such as High Performance Fortran or a fixed set br parallel execution directly in hardware In many cases the
2717.7 Calypso: An Environment for Reliable Distributed Parallel Processing - Baratloo, Dasgupta, Kedem (1995)(Correct)
The importance of adapting networks of workstations for use as parallel processing platforms
is well established. However, current solutions do not always satisfactorily address important
issues that ... / th IEEE Intl. Symp. on High Performance Distributed Computing br COTS Commercial Off-The-Shelf hardware software and operating systems.
2714.7 Active tracking of foveated feature clusters using affine structure - Reid, Murray (1996)(Correct)
We describe a novel method of obtaining a fixation point on a moving object for a real-time gaze control system. The method makes use of a real-time implementation of a corner detector and tracker a... / for the method in use with a high performance head eye platform and br in sluggish and unreliable performance -highlight the need for a
2707.8 Enterprise: An Interactive Graphical Programming Environment For.. - Chan, Lu, al. (1992)(Correct)
Workstation environments have been in use for more than a decade now. Although a network
of workstations together represents a large amount of aggregate computing power, single users
often cannot util... / the need for more costly high performance computers and utilizing br programs in a distributed hardware environment. Enterprise code
2706.0 Non-Blocking Algorithms and Preemption-Safe Locking on.. - Michael, Scott (1998)(Correct)
Most multiprocessors are multiprogrammed in order to achieve acceptable response time and to increase their utilization.
Unfortunately, inopportune preemption may significantly degrade the performance... / Science and Technology-High Performance Computing Software Science br Another alternative is hardware partitioning under which no two
2705.6 BSPlib: The BSP Programming Library - Hill, McColl, Stefanescu, Goudreau.. (1998)(Correct)
BSPlib is a small communications library for bulk synchronous parallel (BSP) programming which consists of only 20 basic operations. This paper presents the full definition of BSPlib in C, motivates t... / be able to run unchanged with high performance on any general purpose br provide a clear focus for future hardware developments. For a model to
2702.5 An Efficient and Scalable Approach for Implementing Fault Tolerant.. - Morin, al. (1997)(Correct)
Distributed Shared Memory (dsm) architectures are attractive to execute
high performance parallel applications. Made up of a large number of components,
these architectures have however a high proba... / are attractive to execute high performance parallel applications. Made br despite a significant increase in hardware reliability these architectures
2693.7 Communication Throughput of Interconnection Networks - Monien, Diekmann, Lüling (1994)(Correct)
Modern flow control techniques used for massively parallel computers
have made network capacity a more important parameter for the application
performance than network latency. Network latency is us... / a call for proposal for the US High Performance Computing project HPC br in their programming models and hardware realizations full custom design
2672.6 Data placement in shared-nothing parallel database systems - Mehta, DeWitt (1994)(Correct)
F3.733e+05> Data placement in shared-nothing database systems
has been studied extensively in the past and various
placement algorithms have been proposed. However, there is
no consensus on the most... / management is essential for high performance in such large systems. An br The results show that current hardware technology trends have
2657.3 Use of Computational Kernels in Full and Sparse Linear Solvers.. - Daydé, Duff (1996)(Correct)
We believe that the availability of portable and efficient serial and parallel
numerical libraries that can be used as building blocks is extremely important
for both simplifying application software ... / Efficient Code Design on High-Performance RISC Processors br as efficiently as possible the hardware of high-performance computers
2651.3 ADAPTIVE: A Flexible and Adaptive Transport System Architecture to.. - Schmidt, Box, Suda (1992)(Correct)
Transport systems integrate operating system services such
as memory and process management together with communication
protocols that utilize these OS services to support
distributed applications run... / for Multimedia Applications on High-Performance Networks Douglas C. br and general operating system hardware and software factors such as
2648.5 Compiling for Shared-Memory and Message-Passing Computer - Larus (1994)(Correct)
Many parallel languages presume a shared address space in which any portion
of a computation can access any datum. Some parallel computers directly support
this abstraction with hardware shared memory... / reduces communication High-Performance Fortran HPF for example br support this abstraction with hardware shared memory. Other computers
2646.7 Parallel Algorithms For CAD With Applications To Circuit Extraction - Belkhale (1991)(Correct)
INTRODUCTION
1.1. Parallel Processing for CAD
As the sizes of VLSI circuits increases in the future, the computational requirements for performing
various computer-aided design (CAD) tasks such as sim... / In the HIPERCAD HIgh PERformance CAD project currently br the use of special-purpose hardware accelerators. Special-purpose
2607.5 A Kernel Implementation of Distributed Shared Memory on a Network of.. - Brett Fleisch (1994)(Correct)
We describe the evolution of a distributed shared memory (DSM) system, Mirage,
and the difficulties encountered when moving the system from a Unix-based
1
kernel on
the VAX to a UNIX-based kernel on... / Our goal was to design a high performance DSM system. However an br improved on conventional hardware by applying three well-known
2602.7 A Selective Caching Technique - John, Radhakrishnan(Correct)
Efficient caches are extremely important for achieving good performance from modern high
performance processors. Conventional cache architectures exploit locality, but do so rather
blindly. Since all ... / good performance from modern high performance processors. Conventional br It does not require complex hardware as in or detailed cache
2589.8 The Zebra Striped Network File System - Hartman, Ousterhout (1993)(Correct)
Zebra is a network file system that increases throughput by striping file data across multiple servers. Rather than striping each file separately, Zebra forms all the new data from each client into a ... / file system. This provides high performance for writes of small files as br Sprite file system on the same hardware. For small files the Zebra
2581.5 Replication Techniques For Speeding Up Parallel Applications On.. - Henri Bal (1992)(Correct)
This paper discusses the design choices involved in replicating objects and their effect on performance.
Important issues are: how to maintain consistency among different copies of an object;
how to i... / is intended for parallel high-performance applications. Orca is not br of Orca on different hardware configurations have been in use
2579.5 Fast Messages (FM): Efficient, Portable Communication for Workstation .. - Pakin, Karamcheti, Chien (1997)(Correct)
Illinois Fast Messages (FM) is a low-level software messaging layer designed to meet the
demands of high performance network hardware. It delivers much of the hardware's raw performance
to both applic... / to meet the demands of high performance network hardware. It delivers br of high performance network hardware. It delivers much of the
2562.1 Improving the Parallelism and Concurrency in Decoupled Architectures - K.J., Naresh.C(Correct)
This paper investigates a technique to facilitate anticipatory loading to queues even in presence of data dependent control dependencies. The proposed method consists of fetching along one or both pat... / between them and yield high performance and increased flexibility. br dynamically scheduled processors hardware a scoreboard could reorder
2537.7 An Optimized Hardware Architecture and Communication Protocol for.. - Shoemaker (1997)(Correct)
Managing communications in parallel processing systems has proven to be one of the
most critical problems facing designers. As processor speeds continue to increase, communication
latency and bandwidt... / the construction of modular high-performance digital systems in which the br An Optimized Hardware Architecture and Communication
2508.9 Implementing Multidestination Worms in Switch-Based Parallel Systems: .. - Craig Stunkel (1997)(Correct)
Multidestination message passing has been proposed as an attractive
mechanism for efficiently implementing multicast and other
collective operations on direct networks. However, applying this
mechanis... / on these systems to achieve high performance parallel computation. Many br traffic the central-buffer-based hardware multicast implementation affects
2504.2 Using Memory-Mapped Network Interfaces to Improve the Performance of.. - Leonidas Kontothanassis (1996)(Correct)
Shared memory is widely believed to provide an easier
programming model than message passing for expressing
parallel algorithms. Distributed Shared Memory (DSM)
systems provide the illusion of shared ... / International Conference on High Performance Computer Architecture br top of standard message passing hardware at very low implementation cost
2502.7 Cluster I/O with River: Making the Fast Case Common - Remzi Arpaci-Dusseau(Correct)
We introduce River, a data-flow programming environment
and I/O substrate for clusters of computers. River
is designed to provide maximum performance in the
common case --- even in the face of non-uni... / two simple design features a high-performance distributed queue and a br the face of non-uniformities in hardware software and workload. River is
2499.4 Requirements for Data-Parallel Programming Environments - Adve, Carle, Granston, Hiranandani.. (1994)(Correct)
this paper is to convey an understanding
of the tools and strategies that will be needed to adequately support efficient, machineindependent
data-parallel programming. To achieve our goal, we will exa... / languages such as Fortran D High Performance Fortran HPF and br extended to reflect the underlying hardware. For example distributed-memory
2496.3 Parallel Processing on Networks of Workstations: A Fault-Tolerant.. - Dasgupta, al. (1995)(Correct)
One of the most sought after software innovation of this decade is the construction of systems using off-the-shelf workstations that actually deliver, and even surpass, the power and reliability of su... / A Fault-Tolerant High Performance Approach Abstract One br systems and other software hardware artifacts. Such computations are
2493.8 Distributed and Parallel Database Systems - Özsu, Valduriez (1996)(Correct)
this paper, we present an overview of the distributed DBMS and parallel DBMS technologies,
highlight the unique characteristics of each, and indicate the similarities between them. This discussion
sho... / in order to deliver high-performance and high-availability br in order to deliver high-performance and high-availability database
2486.3 Languages and Tools for Real-time Systems: Problems, Solutions and.. - Gerber (1994)(Correct)
This report summarizes two talks I gave at the ACM SIGPLAN Workshop on Language,
Compiler, and Tool Support for Real-Time Systems, which took place on June 21, 1994, in
in Orlando, Florida. The worksh... / tools CASE tool suites high-performance compilers etc. What I mean br buying some very expensive hardware which may not even be upgradable
2481.6 The MIT Alewife Machine - Agarwal, Bianchini, Chaiken, al (1991)(Correct)
A variety of models for parallel architectures such as shared memory,
message passing, and dataflow, have converged in the recent
past to a hybrid architecture form called distributed shared memory
(D... / allow programmers to write high-performance applications quickly. A br DSM By using a combination of hardware and software mechanisms DSM
2480.9 Unifying Data and Control Transformations for Distributed.. - Cierniak, Li (1994)(Correct)
We present a unified approach to locality optimization that employs both data and control transformations. Data transformations include changing the array layout in memory. Control transformations inv... / to be a serious obstacle to high performance on distributed shared memory br Most shared-memory machines both hardware and software based rely on data
2480.1 MRPC: A High Performance RPC System for MPMD Parallel Computing - Chang, Czajkowski, von Eicken (1997)(Correct)
MRPC is an RPC system that is designed and optimized for MPMD parallel computing. Existing systems based on
standard RPC incur an unnecessarily high cost when used on high-performance multi-computers,... / high cost when used on high-performance multi-computers limiting the br as well as of specialized hardware allowing it for example to
2479.5 Compiler Assisted Distributed Memory Parallelization of an Iterative.. - Pommerell, Rühl (1993)(Correct)
Distributed memory parallel processors (DMPPs) can deliver high peak performance comparable
or higher than vector supercomputers while promising a better cost-performance ratio.
Programming, however, ... / a programming language like High Performance Fortran HPF support br low investment in computer hardware powerful and cheap
2473.1 Performance of the Galley Parallel File System - Nils Nieuwejaar (1996)(Correct)
As the I/O needs of parallel scientific applications increase,
file systems for multiprocessors are being designed to provide
applications with parallel access to multiple disks. Many
parallel file sy... / is capable of providing high-performance I O to applications that br physical limitations of storage hardware but a more significant reason
2467.1 Communication overlap in multi-tier parallel algorithms - Baden, Fink (1998)(Correct)
Hierarchically organized multicomputers such as
SMP clusters offer new opportunities and new challenges
for high-performance computation, but realizing
their full potential remains a formidable task.
... / and new challenges for high-performance computation but realizing br calculations realizing the hardware's potential remains a formidable
2465.7 BSP Clusters: High Performance, Reliable And Very Low Cost - Donaldson, Hill, Skillicorn (1998)(Correct)
We describe a transport protocol suitable for BSPlib programs running on a
cluster of PCs connected by a 100Mbps Ethernet switch. The protocol provides a
reliable packet-delivery mechanism that uses g... / Research Group Bsp Clusters High Performance Reliable And Very Low Cost br low-latency protocols on similar hardware but the addition of reliability
2464.3 Kaxiras@cs.wisc.edu - Sc Edu(Correct)
In this paper we propose Instruction-based Prediction as a means to optimize directory-based cache
coherent NUMA shared-memory. Instruction-based prediction is based on observing the behavior of load ... / used to transparently offer high performance while preserving programmers' br of Instruction-Based Prediction in Hardware SharedMemory Stefanos
2458.9 Performance Modeling of Distributed Memory Architectures - Johnsson (1991)(Correct)
We provide performance models for several primitive operations on data structures distributed
over memory units interconnected by a Boolean cube network. In particular, we
model single source, and mul... / of fundamental importance for high performance. We present analytic models br render a fair assessment of the hardware capabilities while modeling the
2456.5 Practical Parallel Algorithms for Personalized Communication and.. - Bader, Helman, JaJa (1995)(Correct)
A fundamental challenge for parallel computing is to obtain high-level, architecture independent, algorithms
which efficiently execute on general-purpose parallel machines. With the emergence of messa... / implementation will allow high performance implementations of a large br parallel algorithms. Each of our hardware platforms can be viewed as a
2454.3 Network Performance Under Hybrid Traffic Loads - Kim, Chien (1996)(Correct)
In actual multicomputer networks, communications consist of hybrid traffic in which messages
exhibit a variety of sizes. However to date, most studies on network performance are based on
traffic loads... / Background and Related Work High performance routing networks the subject br to packetization with different hardware requirements. Finally we study
2447.3 Optimizing Parallel Applications for Wide-Area Clusters - Bal, Plaat, Bakker, Dozy, Hofman (1998)(Correct)
Recent developments in networking technology cause a growing interest in connecting local-area
clusters of workstations over wide-area links, creating multilevel clusters. Often, latency and bandwidth... / into account most obtain high performance. The optimizations we used br for our research including the hardware and systems software. Also we
2444.6 The Impact of Data Transfer and Buffering Alternatives on Network.. - Mukherjee (1998)(Correct)
The explosive growth in the performance of microprocessors and networks has created a new opportunity to reduce the latency of fine-grain communication. Microprocessor clock speeds are now approaching... / International Symposium on High-Performance Computer Architecture HPCA br parallelism. Network hardware continues to advance towards
2442.2 StarT the Next Generation: Integrating Global Caches and Dataflow.. - Ang, Arvind, Chiou (1994)(Correct)
The implicitly parallel programming model provides an attractive approach to deal with the complexity
of parallel programming. Implementing this model efficiently, especially on stock processors, rema... / Fortran Fortran D and High Performance Fortran HPF is a very br as to make maximum use of existing hardware and software subsystems. This
2423.4 Process Introspection: A Checkpoint Mechanism for High Performance.. - Ferrari (1996)(Correct)
The Process Introspection project is a design and implementation effort, the main goal of which is to construct a general purpose, flexible, efficient checkpoint/restart mechanism appropriate for use ... / A Checkpoint Mechanism for High Performance Heterogeneous Distributed br computing and networking hardware have made the use of networks of
2420.3 A High-Performance, Portable Implementation of the MPI Message.. - Gropp (1996)(Correct)
MPI (Message Passing Interface) is a specification for a standard library for message
passing that was defined by the MPI Forum, a broadly based group of parallel computer
vendors, library writers, an... / A High-Performance Portable Implementation of br being followed the current hardware and software environment for
2415.5 Fine-grain Access Control for Distributed Shared Memory - Schoinas (1994)(Correct)
This paper discusses implementations of fine-grain memory
access control, which selectively restricts reads and
writes to cache-block-sized memory regions. Fine-grain
access control forms the basis of... / shared-memory machines achieve high performance by using hardware-intensive br require little or no additional hardware. These techniques permit