Home     Top: Hardware: High Performance    [CISC   High Performance   Logic Design   Memory Structures   Microprogramming   RISC   Storage   VLSI]

Change ordering:   Authority   Hubs (tutorials)   Date   Expected authority       Show titles only
Tutorials/surveys/introductory articles (ordered by the degree of citation of authoritative articles)

This directory is created automatically and some papers may be mislabeled. Only document within the CiteSeer database are listed. The directory is intended to provide entry points for browsing the database and is not intended to be authoritative. Papers may not appear in all relevant categories. For example, papers in a sub-category may not appear in higher level categories.

14758.6   A Survey of Multiprocessor Operating System Kernels - Mukherjee, Schwan, Gopinath (1993)   (Correct)
Multiprocessors have been accepted as vehicles for improved computing speeds, cost/performance, and enhanced reliability or availability. However, the added performance requirements of user programs a... / machines linked by high performance networks especially local br capabilities of parallel hardware introduce new challenges to

8455.8   Advanced Vector Architectures - Espasa (1997)   (Correct)
Vector architectures have long been the architecture of choice for numerical high performance computing. Their large memory bandwidth and the ability to tolerate relatively long memory latencies have ... / of choice for numerical high performance computing. Their large br processors rely on high-performance highly-interleaved memory systems

7911.2   Designing Memory Consistency Models For Shared-Memory Multiprocessors - Adve (1993)   (Correct)
The memory consistency model (or memory model) of a shared-memory multiprocessor system influences both the performance and the programmability of the system. The simplest and most intuitive model for... / the system guarantees both high performance and sequential consistency br However many of these are hardware-centric in nature and difficult

7792.0   Hardware Learning in Analogue VLSI Neural Networks - Lehmann (1994)   (Correct)
English In this thesis we are concerned with the hardware implementation of learning algorithms for analogue VLSI artificial neural networks. Artificial neural networks (ANNs) are often successfully ... / these properties providing high performance systems. Analogue VLSI br Electronics Institute Hardware Learning In Analogue Vlsi

7745.2   Mechanisms for Distributed Shared Memory - Reinhardt (1996)   (Correct)
Distributed shared memory (DSM) systems simplify the task of writing distributedmemory parallel programs by automating data distribution and communication. Unfortunately, DSM systems control memory an... / integration. Typhoon achieves high performance by integrating key components br and describes a working hardware prototype of Typhoon- the

7735.1   Runtime Support For In-Core And Out-Of-Core Data-Parallel Programs - Thakur (1995)   (Correct)
Distributed memory parallel computers or distributed computer systems are widely recognized as the only cost-effective means of achieving teraflops performance in the near future. However, the fact re... / data-parallel language like High Performance Fortran HPF to node br not kept pace with advances in hardware. This thesis addresses several

7308.8   Bandwidth And Latency Guarantees In Low-Cost, High-Performance.. - Kim (1997)   (Correct)
ng limitations of existing solutions, we present a novel, cost-effective resource control algorithm for service guarantees. Such cost-effective service guarantees not only provide substantial benefits... / Guarantees In Low-Cost High-Performance Networks By Jae H. Kim br prefer better average performance higher throughput and lower

7234.9   Compiling for the Multiscalar Architecture - Vijaykumar (1998)   (Correct)
High-performance, general-purpose microprocessors serve as compute engines for computers ranging from personal computers to supercomputers. Sequential programs constitute a major portion of real-world... / i Abstract High-performance general-purpose br by speculation and verification in hardware. Since this thesis is the

6914.2   Multithreaded Architectures: Principles, Projects and Issues - Dennis, Gao (1994)   (Correct)
this paper benefited from discussions about their architectures. Anoop Gupta has helped us in the understanding of the DASH architecture and its memory hierarchy. Finally, the second author would like... / parallel computers in high performance scientific computation and br a new domain of computing hardware program models and compiling

6314.6   Future Research Directions In Problem Solving Environments For.. - Gallopoulos, Houstis, Rice (1991)   (Correct)
this report was partially supported by Grant CCR-90-24549 from the National Science Foundation. This is a report to the National Science Foundation and other agencies; it is not a report by or of the ... / be the reality of the s high performance computers combined with br of the underlying computer hardware or software system. One might say

5981.4   High-Performance All-Software Distributed Shared Memory - Johnson (1995)   (Correct)
The C Region Library (CRL) is a new all-software distributed shared memory (DSM) system. CRL requires no special compiler, hardware, or operating system support beyond the ability to send and receive ... / High-Performance All-Software Distributed br CRL requires no special compiler hardware or operating system support

5915.6   Architectures and Patterns for Developing High-performance, Real-time .. - Schmidt, Levine, Cleeland (1999)   (Correct)
Many types of applications can benefit from flexible and open middleware. CORBA is an emerging middleware standard for Object Request Brokers (ORBs) that simplifies the development of distributed appl... / and Patterns for Developing High-performance Real-time ORB Endsystems br backplanes and shared memory. Hardware CORBA shields applications from

5864.9   Efficient Machine-Independent Programming of High-Performance.. - Tseng (1995)   (Correct)
mainly because the cost of interprocessor communication is too great compared to computation and local memory accesses [74, 77]. To achieve high performance, COSMIC will perform communicationanalysis ... / Programming of High-Performance Multiprocessors Chau-Wen br efficient use of the underlying hardware. Sharedmemory machines typically

5839.0   Design and Implementation of a Multi-purpose Cluster System Network.. - Ang (1999)   (Correct)
Today, the interface between a high speed network and a high performance computation node is the least mature hardware technology in scalable general purpose cluster computing. Currently, the one-inte... / a high speed network and a high performance computation node is the br superior message passing performance -higher bandwidth for large

5780.2   Complexity-Effective Superscalar Processors - Palacharla (1998)   (Correct)
The performance trade-off between hardware complexity and clock speed in the design of superscalar microarchitectures is first investigated. Using the results of this trade-off analysis, the thesis pr... / with the goal of achieving high performance by reducing complexity. This br The performance trade-off between hardware complexity and clock speed in the

5699.6   Distributed Runtime Support For Task And Data Management - Haines (1993)   (Correct)
OF PH.D. DISSERTATION DISTRIBUTED RUNTIME SUPPORT FOR TASK AND DATA MANAGEMENT High-performance computer architectures are evolving into larger and faster systems and, in particular, distributed memor... / For Task And Data Management High-Performance Computer Architectures Are br . . Hardware-Based Approaches

5645.7   A Compiler-Directed Distributed Shared Memory System - Verma (1996)   (Correct)
of the Dissertation A Compiler-Directed Distributed Shared Memory System by Manish Verma Doctor of Philosophy in Computer Science State University of New York at Stony Brook 1996 This dissertation p... / . Evolution in High Performance Computing . br . . . The Hardware Platform .

5629.1   Resource Management for Responsive Web Computing - Bestavros, Chen, Crovella, Heddaya.. (1996)   (Correct)
ion of generic classes for spatial operations, attributes, and indexing will leverage work in the IUE. Established algorithms will be used for spatial subdivision (quad-trees, R*-trees) of images base... / to support the demands of High Performance Computing HPC applications. br the process of bringing hardware software and expertise from

5455.8   The Design of the TAO Real-Time Object Request Broker - Schmidt, Levine, Mungee (1999)   (Correct)
Many real-time application domains can benefit from flexible and open distributed architectures, such as those defined by the CORBA specification. CORBA is an architecture for distributed object compu... / design of TAO which is our high-performance real-time CORBAcompliant br backplanes and shared memory. Hardware CORBA shields applications from

5243.8   VLIW Processor Codesign for Video Processing - Wilberg, Camposano (1997)   (Correct)
A codesign approach for complex video compression systems is presented. The system is based on a flexible and programmable VLIW (Very Long Instruction Word) architecture. The design approach can be ... / since most of the opcode for high-performance telecommunication systems is br for generating the processor hardware and the compiler back-end. The

5090.7   Goodness Definition And Goodness Measure For High Speed Transport.. - Sebuktekin (1992)   (Correct)
Recent advances in optical communications, VLSI, and fiber-optic technologies have created new horizons for high-speed protocols and applications seeking end-to-end data transport at Gb/s speeds. In t... / bottleneck today in most high performance wide area networks WANs br ment algorithms and providing hardware support for the purpose of

5068.0   Massively Parallel Computing: Mathematics and communications libraries - Johnsson, Mathur (1993)   (Correct)
Massively parallel computing holds the promise of extreme performance. The utility of these systems will depend heavily upon the availability of libraries until compilation and run-- time system techn... / is addressed in the proposed High Performance Fortran standard Below br patterns efficiently through hardware and software. In addition load

5041.6   Synchronization, Coherence, and Consistency for High Performance.. - Dwarkadas (1992)   (Correct)
Although improved device technology has increased the performance of computer systems, fundamental hardware limitations and the need to build faster systems using existing technology have led many com... / and Consistency for High Performance Shared-Memory br of computer systems fundamental hardware limitations and the need to build

4961.8   Report of the Working Group on Storage I/O for Large-Scale Computing - Gibson, Vitter, Wilkes (1996)   (Correct)
We discuss the strategic directions and challenges in the management and use of storage systems -- those components of computer systems responsible for the storage and retrieval of data. The performan... / enough simply to store data. High performance access to data must be br of drives and cartridges. Storage hardware sales in topped billion

4896.6   Interconnection Networks And Data Prefetching For Large-Scale.. - Kim (1995)   (Correct)
This memory access bottleneck is more serious in shared-memory multiprocessor systems, where processors and memory units are connected through interconnection networks. Processors cooperate to perform... / a more serious bottleneck in high-performance computer systems. Therefore br . . Hardware Data Prefetching

4832.7   The Meerkat Multicomputer: Tradeoffs in Multicomputer Architecture - Bedichek (1994)   (Correct)
The Meerkat Multicomputer: Tradeoffs in Multicomputer Architecture by Robert C. Bedichek Co-Chairpersons of Supervisory Committee: Professor Henry M. Levy Professor Edward D. Lazowska Department of C... / . . Beyond Steady State High Performance Bus Signalling br results obtained from our hardware prototype and a calibrated

4830.8   An Application Perspective on High-Performance Computing and.. - Fox (1996)   (Correct)
We review possible and probable industrial applications of HPCC focusing on the software and hardware issues. Thirty-three separate categories are illustrated by detailed descriptions of five areas---... / An Application Perspective on High-Performance Computing and Communications br ffl MPPs used as high performance high capacity multi-media

4794.3   Code Optimizers and Register Organizations for Vector Architectures - Lee (1992)   (Correct)
A major challenge facing computer architects today is designing cost-effective hardware that executes multiple operations simultaneously. The goal of such designs is to improve performance by taking a... / . . High-Performance Memory System br today is designing cost-effective hardware that executes multiple

4723.4   Software and Hardware Requirements for Some Applications of Parallel.. - Fox (1995)   (Correct)
We discuss the hardware and software requirements that appear relevant for a set of industrial applications of parallel computing. these are divided into 33 separate categories, and come from a recent... / characteristics. We consider High Performance Fortran and its extensions br ffl MPPs used as high performance high capacity multi-media

4697.4   An Efficient Virtual Network Interface in the FUGU Scalable.. - Mackenzie (1998)   (Correct)
A scalable workstation is one vision of a mainstream parallel computer: a machine that combines scalable, fine-grain communication facilities for parallel applications with virtual memory and preempti... / problem. The problem is that high performance communication for parallel br access to network interface hardware but also transparently backs the

4525.2   SPIN - An Extensible Microkernel for Application-specific Operating.. - Bershad, Chambers, Eggers, Maeda.. (1994)   (Correct)
Application domains, such as multimedia, databases, and parallel computing, require operating system services with high performance and high functionality. Existing operating systems provide fixed int... / system services with high performance and high functionality. br system services with high performance and high functionality. Existing

4497.7   Region-Oriented Main Memory Management in Shared-Memory NUMA.. - Benjamin Gamsa (1992)   (Correct)
The need to achieve higher performance through greater degrees of parallelism necessitates distributing the memory throughout a multiprocessor system to reduce contention and increase scalability. Unf... / because of their promise of high performance and a familiar programming br Related Work . Hardware

4482.4   Distributed Laboratories: A Research Proposal - Fujimoto, Schwan, Ahamad, Hudson.. (1996)   (Correct)
this memory management to multi-granular distributed computing environments. Issues that must be addressed include the lack of a global memory pool and high communication latencies. New, efficient mem... / computations executing on high performance distributed computing br to support interactive and hardware-in-the-loop simulations by

4466.8   Lazy Release Consistency for Distributed Shared Memory - Keleher (1995)   (Correct)
A software distributed shared memory (DSM) system allows shared memory parallel programs to execute on networks of workstations. This thesis presents a new class of protocols that has lower communicat... / a viable alternative for high-performance parallel processing. br opportunities to bring high performance and high usability to a wide

4404.8   Optimizing Fortran 90D Programs for SIMD Execution - Roth (1993)   (Correct)
SIMD architectures offer an alternative to MIMD architectures for obtaining high performance computation through parallelism. These architectures can offer impressive price/performance ratios for cert... / architectures for obtaining high performance computation through br and exploit the massively parallel hardware closer to its full potential. To

4296.4   Naming, State Management, and User-Level Extensions in the Sprite.. - Welch (1990)   (Correct)
This memory use could be reduced by introducing a shared buffer pool, or setting the limit below 50 server processes. This limit is somewhat arbitrary because the server processes are multiplexed amon... / power of a network of high-performance personal workstations. Our br workstations. We felt that new hardware features changed our computing

4288.4   Data Prefetching for High-Performance Processors - Chen (1993)   (Correct)
Data Prefetching for High-Performance Processors by Tien-Fu Chen Chairperson of Supervisory Committee: Professor Jean-Loup Baer Department of Computer Science and Engineering Recent technological adva... / Data Prefetching for High-Performance Processors Tien-Fu Chen br problems. First we propose a hardware-based data prefetching approach

4252.4   Loop Optimization for Aggregate Array Computations - Liu, Stoller (1997)   (Correct)
An aggregate array computation is a loop that computes accumulated quantities over array elements. Such computations are common in programs that use arrays, and the array elements involved in such com... / The large body of work on high performance computing has dealt with br compiler optimizations. Changes in hardware design have reduced the

4187.7   Parallel Simulation Today - Nicol, Fujimoto (1994)   (Correct)
This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallel... / and ready availability of high-performance multiprocessors. The number br analysis time parallelism hardware support for parallel simulation

4163.7   Directions in Parallel Programming: HPF, Shared Virtual Memory and.. - Bodin, Priol, Mehrotra, Gannon   (Correct)
Fortran and C++ are the dominant programming languages used in scientific computation. Consequently, extensions to these languages are the most popular for programming massively parallel computers. We... / and one approach to CThe High Performance Fortran Forum has designed br directly reflect the underlying hardware such an explicit-tasking

4153.2   Asynchronous Parallel Game-Tree Search - Brockington (1998)   (Correct)
Tree searching is a fundamental and computationally intensive problem in artificial intelligence. Parallelization of tree-searching algorithms is one method of improving the speed of these algorithms.... / these algorithms. However a high-performance parallel two-player game-tree br . . . The Hardware .

4128.6   Data Layout Optimizations for High-Performance Architectures - Chau-Wen Tseng   (Correct)
padding, transposing, and reindexing array dimensions, and modifying heap allocation policies. Most optimizations must be applied at compile time, but link-time and run-time optimizations are also pos... / Data Layout Optimizations for High-Performance Architectures Chau-Wen br with the details of the underlying hardware architecture. In particular

4037.9   Performance, Safety and Idioms in Parallel Programming Systems - Lu (1995)   (Correct)
ions are too low level. Many PPSs are designed around specific mechanisms, instead of around problem-solving techniques. The programmer is responsible for correctness and performance tuning. The need ... / processing is to achieve high performance for applications at a br some basic concepts of parallel hardware. The features of the hardware

3994.2   Massively Parallel Computing: Data distribution and communication - Johnsson (1993)   (Correct)
We discuss some techniques for preserving locality of reference in index spaces when mapped to memory units in a distributed memory architecture. In particular, we discuss the use of multidimensional ... / the techniques used to achieve high performance for these primitives. br For fine grain architectures hardware techniques have been devised to

3975.3   Hardware Support for Flexible Distributed Shared Memory - Reinhardt, al.   (Correct)
Workstation-based parallel systems are attractive due to their low cost and competitive uniprocessor performance. However, supporting a cache-coherent global address space on these systems involves si... / computers use the same high-performance microprocessors found in br must be obtained from the IEEE. Hardware Support for Flexible Distributed

3973.4   Rule-Based Program Restructuring For High Performance Parallel.. - Tenny (1992)   (Correct)
Writing good programs for high performance parallel computers is difficult. The programmer must have a deep understanding of the underlying machine architecture. Issues such as memory hierarchy, commu... / Program Restructuring For High Performance Parallel Processor Systems br From Algorithms to Languages to Hardware . .

3922.3   Compiler Optimizations For Parallel Loops With Fine-Grained.. - Chen (1994)   (Correct)
this paper, we presented and evaluated a new runtime algorithm to parallelize these loops. Our scheme handles any type of data dependence pattern without requiring any special architectural support. F... / an integral part of the future high performance parallelizing compilers. iv br processing approach. As the hardware technology enables us to keep

3920.1   Shared Virtual Memory: A Survey - Shi, Hu, Tang (1998)   (Correct)
1. Introduction ...SVM is an alias of softDSM. In the rest of this technical report, we will use them interchangely. Although the hardware approach to implement DSM has been shown to perform quite wel... / Hu Zhimin Tang Center of High Performance Computing Institute of br them interchangely. Although the hardware approach to implement DSM has

3911.0   Extensibility, Safety and Performance in the SPIN Operating System - Bershad, Savage, Pardyak, Sirer.. (1995)   (Correct)
This paper describes the motivation, architecture and performance of SPIN, an extensible operating system. SPIN provides an extension infrastructure together with a core set of extensible services th... / by the need to support high performance applications which present br rather than runtime using either hardware or software mechanisms. Strict

3845.8   CTK: Configurable Object Abstractions for Multiprocessors - Silva, Schwan (1997)   (Correct)
ions for Multiprocessors Dilma M. Silva Karsten Schwan Computer Science Department College of Computing University of S~ao Paulo Georgia Institute of Technology S~ao Paulo, Brazil Atlanta, GA 30332 di... / Terms configurable systems high performance objects object br of the underlying multiprocessor hardware. ffl CTK provides efficient

3834.9   WebOS: Software Support for Scalable Web Services - Amin Vahdat (1997)   (Correct)
The burgeoning popularity of the Web is pushing against the performance limits of the underlying infrastructure, presenting a number of difficult challenges for the Web as a system. We believe that re... / describes our requirements for highperformance scalable Web services. br mostly read-only requests to hardwareconstrained servers over an

3785.4   Compiler Representations for Heterogeneous Processing - Weaver (1995)   (Correct)
The emergence of heterogeneous parallel systems opens the possibility of higher performance for complex, heterogeneous applications. Unfortunately, heterogeneous parallel systems are even more complex... / can deliver consistent high performance by incorporating multiple br use of heterogeneous hardware to execute a single application

3769.6   Programmable Arithmetic Devices for High Speed Digital Signal.. - Chen   (Correct)
The high throughput computation requirements of real-time digital signal processing (dsp) systems usually dictate hardware intensive solutions. Often attendant to hardware approaches are problems of h... / . Techniques for High Performance br Compiled Code Processor Performance High Machine Language Level

3740.4   The Design and Performance of an I/O Subsystem for Real-time ORB.. - Schmidt, Kuhns, Bector, Levine   (Correct)
There is increasing demand to extend Object Request Broker (ORB) middleware to support applications with stringent quality of service (QoS) requirements. However, conventional ORBs do not define stand... / Ace Orb tao Tao Is A High-Performance Real-Time Orb Endsystem br running on off-the-shelf hardware and software. Second it

3739.8   PASSION: Parallel And Scalable Software for Input-Output - Choudhary, Bordawekar, Harry.. (1994)   (Correct)
We are developing a software system called PASSION: Parallel And Scalable Software for InputOutput which provides software support for high performance parallel I/O. PASSION provides support at the la... / provides software support for high performance parallel I O. PASSION br nCUBE etc. provide some kind of hardware and software support for parallel

3733.7   Efficient Runtime Support for Cluster-Based Distributed Shared Memory .. - Speight (1997)   (Correct)
Distributed shared memory (DSM) systems provide a shared memory programming paradigm on top of a physically distributed network of computers. The DSM system removes the necessity for programmers to mo... / interconnects constructing a high-performance multiprocessor from a network br . . . Hardware SMP Performance vs. Brazos DSM

3716.9   Disk-directed I/O for MIMD Multiprocessors - Kotz (1996)   (Correct)
Many scientific applications that run on today's multiprocessors, such as weather forecasting and seismic analysis, are bottlenecked by their file-I/O needs. Even if the multiprocessor is configured w... / technique provided consistent high performance that was largely independent br is configured with sufficient I O hardware the file-system software often

3716.0   Exploiting Fine-grain Parallelism in Concurrent Constraint Languages - Montelius (1997)   (Correct)
Montelius, J., 1997. Exploiting Fine-grain Parallelism in Concurrent Constraint Languages, 220 pp. Uppsala Thesis in Computing Science 28, ISSN 0283-359X, ISBN 91-506-1215-8. SICS Dissertation Series ... / system was implemented on a high-performance shared-memory multiprocessor. br . The Hardware

3705.6   Performance Measurement Tools For Highlevel Parallel Programming.. - Irvin (1995)   (Correct)
Users of high-level parallel programming languages require accurate performance information that is relevant to their source code. Furthermore, when their programs experience performance problems at t... / languages cannot guarantee high performance because compilers are often br at the lowest levels of their hardware and software systems programmers

3657.1   Efficient Reliable Group Communication For Distributed Systems - Kaashoek, Tanenbaum (1994)   (Correct)
Many applications can profit from broadcast communication, but few operating systems provide primitives that make broadcast communication available to user applications. In this paper we introduce pri... / multicasting to support high-performance multi-media applications br the n destinations fail due to hardware or software errors. In this

3645.5   DPF: A Data Parallel Fortran Benchmark Suite - Yu Hu (1995)   (Correct)
The Data Parallel Fortran (DPF) benchmark suite is designed for evaluating data parallel compilers and scalable architectures. Many of the DPF codes are provided in three versions: basic, optimized an... / intended target language is High Performance Fortran HPF However due br not be available until late in the hardware product cycle. This in turn

3637.6   Replication Using Group Communication Over a Partitioned Network - Amir (1995)   (Correct)
In systems based on the client-server model, a single server may serve many clients and the heavy load on the server may cause the response time to be adversely affected. In such circumstances, replic... / necessarily consistent reply. High performance of the architecture is br the available non-reliable hardware multicast for efficient

3602.0   Experience with a Clustered Parallel Reduction Machine - Beemster, Hartel, Hertzberger.. (1993)   (Correct)
A clustered architecture has been designed to exploit divide and conquer parallelism in functional programs. The programming methodology developed for the machine is based on explicit annotations and ... / as a basis for a high performance compiler of the functional br has been constructed with stock hardware. This paper describes the

3567.6   Alternative Analysis for Computational Holon Architectures - Zeigler, Vahie, Kim (1994)   (Correct)
Simulator : : : : : : : : : : : : : : : : : : : : : : : : : 87 Appendix E. Examples of Human Performance Process Hierarchical Decomposition 92 Appendix F. Scalable Coherent Interfaces 96 Contents (c... / . High Performance Supercomputers br Models Hardware Architecture Requirements

3540.3   Efficient MultiThreaded User-Space Transport for Network Computing.. - Gomez, Rego (1997)   (Correct)
We present a novel user-space and transaction-oriented protocol for use in high-performance distributed computing applications. The TRAP protocol is designed to support low-latency communication in mu... / protocol for use in high-performance distributed computing br versions of TCP. But advances in hardware OS design and the pressures of

3535.0   Performance Prediction and Network Media Selection for PVM Clusters - Steed (1996)   (Correct)
Workstation clusters are becoming more popular as a parallel computing platform. Several programming libraries, including Parallel Virtual Machine (PVM), allow workstation clusters to be programmed as... / parallelism is the future of high-performance computing Although br by complex interactions between hardware and software. Performance

3513.9   Transport System Architectures for High-Performance Communications.. - Schmidt, Suda (1993)   (Correct)
Providing end-to-end gigabit communication support for bandwidth-intensive distributed applications requires highperformance transport systems. This paper describes and classifies transport system mec... / System Architectures for High-Performance Communications Subsystems br and process management and hardware devices such as high-speed

3506.8   Runtime Mechanisms for Efficient Dynamic Multithreading - Karamcheti, Plevyak, Chien (1996)   (Correct)
High performance on distributed memory machines for programming models with dynamic thread creation and multithreading requires efficient thread management and communication. Traditional multithreadin... / Abstract High performance on distributed memory br that assume minimal compiler and hardware support are suitable for

3497.1   ASHs: Application-Specific Handlers for High-Performance Messaging - Wallach (1996)   (Correct)
Application-specific safe message handlers (ASHs) are designed to provide applications with hardware-level network performance. ASHs are user-written code fragments that safely and efficiently execute... / Handlers for High-Performance Messaging Deborah A. br to provide applications with hardware-level network performance. ASHs

3490.2   The Microarchitecture of Superscalar Processors - Smith, Sohi (1995)   (Correct)
Superscalar processing is the latest in a long series of innovations aimed at producing ever-faster microprocessors. By exploiting instruction-level parallelism, superscalar processors are capable of ... / method for implementing high performance microprocessors. . . The br Processing Model Because hardware and software evolve it is rare

3487.5   Adaptive Operating System Abstractions: A Case Study of.. - Bodhisattwa Mukherjee (1994)   (Correct)
ions: A Case Study of Multiprocessor Locks Bodhisattwa Mukherjee (bodhi@cc.gatech.edu) Karsten Schwan (schwan@cc.gatech.edu) GIT--CC--94/39 10 June 1994 Abstract Operating system kernels typical... / However the attainment of high performance for a variety of parallel br properties of the underlying hardware ffl Adaptability how can

3470.5   Compiling for Heterogeneous Systems: A Survey and an Approach - McKinley, Moss, Singhai, Weaver.. (1995)   (Correct)
Large applications tend to contain several models of parallelism, but only a few of these map efficiently to the single model of parallelism embodied in a homogeneous parallel system. Heterogeneous pa... / resources for achieving high performance. Unfortunately heterogeneity br use of heterogeneous hardware to execute a single application

3455.8   Algorithmic Redistribution Methods for Block Cyclic Decompositions - Petitet, Dongarra (1998)   (Correct)
In a serial computational environment, transportable efficiency is the essential motivation for developing blocking strategies and block-partitioned algorithms. An algorithmic blocking factor adjust... / shown to be able to achieve high performance and efficiency for a given br to maximize the efficiency of the hardware resources. In a

3440.4   Parallel Rendering - Crockett (1995)   (Correct)
In computer graphics, rendering is the process by which an abstract description of a scene is converted to an image. When the scene is complex, or when high-quality images or high frame rates are requ... / as a primary driver of high-performance graphics systems. By the end br this situation. Today parallel hardware is routinely used in graphics

3438.4   Studies of Integration and Optimization of Interpreted and Compiled.. - Fox, Li, Wen, Zhang (1997)   (Correct)
an view our front end compiler as similar to the javac compiler's function of producing JavaVM bytecodes. The II/CVM will naturally need the study of such issues as Just in Time compilation, dynamic l... / of combining productivity with high performance. We intend to use Web br model within the same hardware technology. Thus a trend is to

3416.0   Pattern-Driven Automatic Parallelization - Christoph W. Keßler (1996)   (Correct)
This paper describes a knowledge--based system for automatic parallelization of a wide class of sequential numeric codes operating on vectors and dense matrices, and for execution on distributed memor... / Vienna Fortran CMZ High Performance Fortran HPF and others. br protocols buffering undocumented hardware features and other problems.

3410.9   Software Strategies for Portable Computer Energy Management - Lorch, Smith (1998)   (Correct)
Limiting the energy consumption of computers, especially portables, is becoming increasingly important. Thus, new energy-saving computer components and architectures have been and continue to be devel... / features have both high performance and low power modes with br created by existing and suggested hardware innovations. Introduction

3406.7   Filesystems for Network-Attached Secure Disks - Gibson, al. (1997)   (Correct)
Network-attached storage enables network-striped data transfers directly between client and storage to provide clients with scalable bandwidth on large transfers. Network-attached storage also decoupl... / To be most effective all high-performance clients should be br encouraging the inclusion of hardware support for message digest to

3397.3   High Performance Software Coherence for Current and Future.. - Leonidas Kontothanassis (1994)   (Correct)
Shared memory provides an attractive and intuitive programmingmodel for large-scale parallel computing, but requires a coherence mechanism to allow caching for performance while ensuring that processo... / High Performance Software Coherence for br end of this spectrum to the other. Hardware cache coherence is fast but

3380.6   Programming Techniques For Eagersharing Distributed Memory Systems - By Ai   (Correct)
of the Dissertation PROGRAMMING TECHNIQUES FOR EAGERSHARING DISTRIBUTED MEMORY SYSTEMS by Ai Li Doctor of Philosophy in Computer Science State University of New York at Stony Brook 1993 To overcome th... / realize the potential for high performance and execution efficiency of br in this dissertation includes hardware and software iv techniques for

3336.7   High-Performance Local Area Communication With Fast Sockets - Rodrigues, Anderson, Culler (1997)   (Correct)
Modern switched networks such as ATM and Myrinet enable low-latency, high-bandwidth communication. This performance has not been realized by current applications, because of the high processing overhe... / High-Performance Local Area Communication With br to the ability of modern network hardware however. While TCP is capable

3298.1   An Empirical Evaluation of OS Support for Real-time CORBA Object.. - Levine, Flores-Gaitan, Schmidt (1999)   (Correct)
There is increasing demand to extend Object Request Broker (ORB) middleware to support distributed applications with stringent real-time requirements. However, lack of proper OS support can yield subs... / Overview Of Tao Tao Is A High-Performance Real-Time Orb Endsystem br ORB middleware. While holding the hardware and ORB constant we vary the

3285.5   Optimizing a CORBA Inter-ORB Protocol (IIOP) Engine for Minimal.. - Gokhale, Schmidt   (Correct)
To support the quality of service (QoS) requirements of embedded multimedia applications, such as real-time audio and video, electronic mail and fax, and Internet telephony, off-theshelf middleware li... / for TAO which is our high-performance real-time ORB. Second we br imposed by embedded system hardware necessitates a minimal footprint

3283.2   Processor Allocation Policies for Message-Passing Parallel Computers - Mccann (1994)   (Correct)
When multiple jobs compete for processing resources on a parallel computer, the operating system kernel's processor allocation policy determines how many and which processors to allocate to each. This... / the potential for achieving high performance scalability and br . The Hardware and Software Environment

3248.7   Using Information from the Programmer to Implement System.. - Adve (1996)   (Correct)
The memory consistency model of a shared-memory system is a formal specification of the semantics of sharedmemory. The most commonly assumed model, sequential consistency, provides simple semantics bu... / but is not easily amenable to high performance. Researchers have proposed br uniprocessor systems including hardware and compiler overlap and reorder

3247.2   A Survey of User-Level Network Interfaces for System Area Networks - Mukherjee (1997)   (Correct)
System Area Networks (SANs), such as Myricom Myrinet and IBM Vulcan, provide latency, bandwidth, and reliability that are orders of magnitude better than traditional local area networks. SAN benefits ... / network. The demand for high performance communication subsystems-to br the operating system and the hardware latency to access e.g.read

3244.6   Parallelising Serial Code: A Comparison Of Three High-Performance.. - MacLaren (1997)   (Correct)
9 Declaration 11 Copyright 12 Acknowledgements 13 1 Introduction 14 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.2 Aims . . . . . . . . . . . . . . . . . . . . . ... / Code A Comparison Of Three High-Performance Parallel Programming Methods br . Abstract Two hardware-based parallel programming

3240.9   GLUnix: a Global Layer Unix for a Network of Workstations - Ghormley (1997)   (Correct)
ions To provide remote execution of both parallel and sequential jobs, GLUnix extends some existing UNIX abstractions and introduces new abstractions, borrowing heavily from MPP environments such as ... / By leveraging commodity high-performance workstations and networks br applications. Although viable hardware solutions are available today

3240.5   Overview of neural hardware - Heemskerk (1995)   (Correct)
Neural hardware has undergone rapid development during the last few years. This paper presents an overview of neural hardware projects within industries and academia. It describes digital, analog, and... / in many cases very high performance rates have been obtained. br Overview of neural hardware Jan N. H. Heemskerk Unit of

3232.2   Placement of Objects in Parallel Object-Based Systems - Ghandeharizadeh, Wilhite (1994)   (Correct)
Parallelism is a viable solution to constructing high performance object-oriented database systems. This paper analyzes the role of parallelism in such systems. In parallel systems based on a shared-n... / solution to constructing high performance object-oriented database br SQL a Distributed High-Performance High Availability Implementation

3218.6   Alleviating Priority Inversion and Non-determinism in Real-time CORBA .. - Schmidt, Mungee, Gokhale (1998)   (Correct)
There is increasing demand to extend CORBA to support applications with stringent real-time requirements. However, conventional CORBA Object Request Brokers (ORBs) exhibit substantial priority inversi... / components that support high-performance real-time applications br objects are written in what OS hardware platform they run on or what

3198.4   An Overview of Rewrite Rule Laboratory (RRL) - Kapur, Zhang (1995)   (Correct)
RRL (Rewrite Rule Laboratory) was originally developed as an environment for experimenting with automated reasoning algorithms for equational logic based on rewrite techniques. It has now matured into... / specific purposes. Herky High-Performance Key Operations is a fast br the use of formal methods in hardware and software design. We provide

3192.8   Mechanisms for Efficient, Protected Messaging - Lee   (Correct)
Fine-grain parallelism is the key to high performance muticomputing. By partitioning problems into small sub-tasks -- grain-sizes as small as 70 cycles have been found in common benchmark programs -- ... / parallelism is the key to high performance muticomputing. By br network interfaces fast hardware is defeated by software layers

3188.2   Willow: A Scalable Shared-Memory Multiprocessor - Bennett, Dwarkadas, Greenwood.. (1992)   (Correct)
We are currently developing Willow, a shared-memory multiprocessor whose design provides system capacity and performance capable of supporting over a thousand commercial microprocessors. Most recently... / Virtually all designers of high performance computers confronted with br simulators one a detailed hardware-level simulator the other an

3174.0   A Conflict-Free Memory Design For Multiprocessors - Shing (1991)   (Correct)
A CONFLICT-FREE MEMORY DESIGN FOR MULTIPROCESSORS By Honda Shing Multiprocessors have been widely used in achieving high performance computation. In a multiprocessor, applications are implemented wit... / been widely used in achieving high performance computation. In a br and still maintain good performance and high efficiency. Furthermore

3161.7   Tender to III/97/31 Lot 5, Deliverable 1.1 - DISCO Report on the.. - Bertozzi, Chiola, Ciaccio, Conte.. (1998)   (Correct)
This report surveys the state of the art of Cluster Computing based mainly on low-cost PC or workstations technology. Real industrial applications as well as EU funded and international University/Res... / . . . High performance interconnect technology . br . Hardware and Software Technology

3124.9   X Vision: A Portable Substrate for Real-Time Vision Applications - Hager, Toyama (1996)   (Correct)
In the past several years, the speed of standard processors has reached the point where interesting problems requiring visual tracking can be carried out on standard workstations. However, relatively ... / vision which provides high performance on standard workstations br is accelerated using specialized hardware for a notable exception see

3117.0   Compiler Technology for Future Microprocessors - Hwu, Hank, Gallagher, Mahlke.. (1995)   (Correct)
Advances in hardware technology have made it possible for microprocessors to execute a large number of instructions concurrently (i.e., in parallel). These microprocessors take advantage of the opport... / Center for Reliable and High-Performance Computing University of br Abstract Advances in hardware technology have made it possible

3099.1   Massively Parallel Programming Languages - A Classification of Design .. - Gellerich, Gutzmann   (Correct)
This paper presents the results of a study in which we examined about 50 parallel programming languages in order to detect typical approaches towards supporting massive parallelism. Based on a classif... / parallelism to achieve high performance and therefore excluded br parallel structures available in hardware yielding a mapped

3083.5   Scheduling Threads for Low Space Requirement and Good Locality - Girija Narlikar (1999)   (Correct)
The running time and memory requirement of a parallel program with dynamic, lightweight threads depends heavily on the underlying thread scheduler. In this paper, we present a simple, asynchronous, sp... / used work stealing to provide high performance br space efficient . Today's hardware-coherent shared memory

3083.0   Scheduler-Conscious Synchronization - Kontothanassis, Wisniewski, Scott (1994)   (Correct)
Efficient synchronization is important for achieving good performance in parallel programs, especially on large-scale multiprocessors. Most synchronization algorithms have been designed to run on a de... / Science and Technology-High Performance Computing Software Science br ones provide more sophisticated hardware support for synchronization.

3072.0   Multiprocessor Cache Coherence Based on Virtual Memory Support - Petersen, Li (1995)   (Correct)
Virtual memory based cache coherence is a mechanism that relies only on hardware that already exists on the microprocessors of a shared memory multiprocessor system, yet dynamically detects and res... / which focuses on building high-performance multicomputers from commodity br is a mechanism that relies only on hardware that already exists on the

3052.9   Architectural Considerations for Deterministic Real-Time ORB.. - Levine, Schmidt, Gill (1997)   (Correct)
There is increasing demand to extend object-oriented middleware to support applications with stringent quality of service (QoS) requirements. However, conventional object-oriented middleware does not ... / components that support high-performance real-time applications and br objects are written in what OS hardware platform they run on or what

3045.7   Implementation of Stack-Based Languages on Register Machines - Ertl (1996)   (Correct)
Languages with programmer-visible stacks (stack-based languages) are used widely, as intermediate languages (e.g., JavaVM, FCode), and as languages for human programmers (e.g., Forth, PostScript). How... / processor architecture in high-performance computers from br efficiently on mainstream hardware using aggressive compiler

3044.1   Architectural Support for Single Address Space Operating Systems - Koldinger, Chase, Eggers (1992)   (Correct)
Recent microprocessor announcements show a trend toward wide-address computers: architectures that support 64 bits of virtual address space. Such architectures facilitate fundamentally new operating s... / This simplifies the use of high-performance virtually indexed data br protection lookaside buffer a hardware structure that implements this

3041.3   Compiler Architectures for Heterogeneous Systems - McKinley, Singhai, Weaver, Weems (1995)   (Correct)
Heterogeneous parallel systems incorporate diverse models of parallelism within a single machine or across machines and are better suited for diverse applications [25, 43, 30]. These systems are alr... / resources for achieving high performance. Unfortunately heterogeneity br use of heterogeneous hardware to execute a single application

3038.7   Software-Managed Address Translation - Jacob, Mudge (1997)   (Correct)
In this paper we explore software-managed address translation. The purpose of the study is to specify the memory management design for a high clock-rate PowerPC implementation in which a simple design... / International Symposium on High Performance Computer Architecture br is just as efficient as hardware managed address translation

3036.6   Effects of Communication Latency, Overhead, and Bandwidth in a.. - Martin, Vahdat, Culler, Anderson (1997)   (Correct)
This work provides a systematic study of the impact of communication performance on parallel applications in a high performance network of workstations. We develop an experimental system in which the ... / on parallel applications in a high performance network of workstations. We br realistic inputs on a flexible hardware prototype that can vary its

3031.3   A Multithreaded Communication System for ATM-Based High Performance.. - Park, Lee   (Correct)
Current advances in processor technology and the rapid development of high-speed networking technology (e.g., Asynchronous Transfer Mode (ATM), Myrinet, and Fast Ethernet) have made network-based comp... / System for ATM-Based High Performance Distributed Computing br and MPCs where the networking hardware and communication software are

3028.5   HFS: A Performance-Oriented Flexible File System Based on.. - Krieger (1996)   (Correct)
ing with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works, requires prior specific permission and... / have poor support for high performance I O and as a result the br file systems both because of the hardware resources it must manage and

3009.8   Message Passing Support for Multi-grained, Multi-threading, and.. - Ang, Chiou, Rudolph, Arvind (1996)   (Correct)
In order to become generally useful, message passing mechanisms not only need to provide high performance, but also the three M's: multi-granularity, multi-threading and multiprocessing. In this paper... / not only need to provide high performance but also the three M's br thread. Proper design of the hardware network interface can

2995.7   Measuring and Optimizing CORBA Latency and Scalability Over.. - Gokhale, Schmidt (1998)   (Correct)
There is increasing demand to extend object-oriented middleware, such as OMG CORBA, to support applications with stringent quality of service (QoS) requirements. However, conventional CORBA Object Req... / in TAO which is a high-performance real-time implementation of br backplanes and shared memory. Hardware CORBA shields applications from

2962.8   Hardware Support for Dynamic Access Ordering: Performance of Some.. - McKee (1993)   (Correct)
Hardware Support for Dynamic Access Ordering: Performance of Some Design Options Sally A. McKee Department of Computer Science University of Virginia Charlottesville, VA, 22903 mckee@virginia.edu Mem... / in the application of high performance microprocessors to br Hardware Support for Dynamic Access

2961.5   Dynamic Access Ordering for Symmetric Shared-Memory Multiprocessors - McKee (1994)   (Correct)
Dynamic Access Ordering for Symmetric SharedMemory Multiprocessors Sally A. McKee Department of Computer Science University of Virginia Charlottesville, VA 22903 mckee@cs.virginia.edu Memory bandwidth... / in the application of high performance microprocessors to br This paper describes the use of hardware-assisted access ordering in

2957.0   A Synopsis of the Legion Project - Grimshaw, Wulf, French, Weaver, Jr. (1994)   (Correct)
The coming of giga-bit networks makes possible the realization of a single nationwide virtual computer comprised of a variety of geographically distributed high-performance machines and workstations. ... / of geographically distributed high-performance machines and workstations. To br These are software problems the hardware challenges are being addressed

2952.1   The Nexus Approach to Integrating Multithreading and Communication - Foster (1996)   (Correct)
Lightweight threads have an important role to play in parallel systems: they can be used to exploit shared-memory parallelism, to mask communication and I/O latencies, to implement remote memory acces... / threads and communication in high-performance distributed-memory systems. br handlers At the lower-performance higher-functionality end of the

2952.1   Analysis and Applications of Receptive Safety Properties in.. - Matos   (Correct)
Formal verification for complex concurrent systems is a computationally intensive and, in some cases, intractable process. The complexity is an inherent part of the verification process due to the ... / is required in reliable and high performance systems. We can specify the br user profiles heterogeneous hardware distributed execution

2947.5   Parallel Algorithms For Test Generation And Fault Simulation - Patil (1990)   (Correct)
INTRODUCTION 1.1. Parallel Processing for VLSI CAD With the increased complexity of VLSI circuits, existing Computer-Aided Design (CAD) algorithms will not be able to handle large circuits in a reason... / called the HIPERCAD High Performance CAD project whose br interfaces. Parallel processing hardware has also become more affordable

2946.5   Experience with a Distributed File System Implementation - Wang, Anderson, Dahlin (1997)   (Correct)
this paper we report on some of the lessons we have learned from the implementation effort. We believe our experience may offer insight for future system builders and encourage the development of new ... / The recent emergence of high-performance local area networks br in xFS is close to those seen in hardware DSM systems such as DASH

2945.2   Phoneme Probability Estimation with Dynamic Sparsely Connected.. - Ström (1997)   (Correct)
This paper presents new methods for training large neural networks for phoneme probability estimation. An architecture combining time-delay windows and recurrent connections is used to capture the imp... / for robust training of large high performance ANNs based on sparsely br is trained with special parallel hardware and a rather complex training

2935.3   Exploiting Multiprocessor Memory Hierarchies For Operating Systems - Xia (1996)   (Correct)
d this mentorship into a joyful and valuable life experience. Working very closely together, under his guidance, we persisted through numerous difficult times together as well as shared many happy rew... / memory hierarchy is key to high performance. However the operating br trade-offs of the software hardware optimization schemes are also

2910.8   A Numerical Linear Algebra Problem Solving Environment Designer's.. - Petitet, Casanova, Dongarra, Robert, .. (1998)   (Correct)
This chapter discusses the design of modern numerical linear algebra problem solving environments. Particular emphasis is placed on three essential components out of which such environments are constr... / impact of the architecture of high performance computers on the design of br understanding of the exploitable hardware resources of that architecture.

2904.2   PANDA - Supporting Distributed Programming in C++ - Assenmacher, Breitbach, Buhler.. (1993)   (Correct)
PANDA is a run-time package based on a very small operating system kernel which supports distributed applications written in C++. It provides powerful abstractions such as very efficient user-level ... / Presto Faust and Levy high performance and flexibility of br PANDA has been designed for a hardware platform consisting of a network

2901.3   The MIT Alewife Machine: A Large-Scale Distributed-Memory.. - Agarwal, Chaiken, Johnson, Kranz.. (1991)   (Correct)
The Alewife multiprocessor project focuses on the architecture and design of a large-scale parallel machine. The machine uses a low dimension direct interconnection network to provide scalable communi... / processor. Introduction High-performance computer design is driven by br and concentrates on the novel hardware features of the machine including

2886.3   Cache Performance of Garbage-Collected Programs - Reinhold (1994)   (Correct)
As processor speeds continue to improve relative to main-memory access times, cache performance is becoming an increasingly important component of program performance. Prior work on the cache perfor... / A cache miss on current high-performance machines costs tens of br This widening gap has motivated hardware designers to seek improved

2881.8   Message Passing Support on StarT-Voyager - Computation Structures   (Correct)
No single message passing mechanism can efficiently support all the different types of communication that occur naturally in most parallel or distributed programs. MIT's StarT-Voyager, a hybrid messag... / mechanisms to achieve very high performance over a wide spectrum of br of communication types and sizes. Hardware and operating system enforced

2875.9   Application Performance and Flexibility on Exokernel Systems - Kaashoek, Engler, Ganger.. (1997)   (Correct)
The exokernel operating system architecture safely gives untrusted software efficient control over hardware and software resources by separating management from protection. This paper describes an exo... / applications to achieve high performance without sacrificing the br software efficient control over hardware and software resources by

2866.9   ASPEN: High-Performance Hardware Support for Distributed Shared-Memory - Maxham (1994)   (Correct)
ASPEN: High-Performance Hardware Support for Distributed Shared-Memory Kenneth Mark Maxham This thesis describes and evaluates an integrated memory and network subsystem designed to provide the abstr... / Rice University Aspen High-Performance Hardware Support For br ASPEN High-Performance Hardware Support for Distributed

2859.9   Sather 2: A Language Design for Safe, High-Performance Computing - Gomes, Löwe, Quittek, Weissman (1997)   (Correct)
Consistency of objects in a concurrent computing environment is usually ensured by serializing all incoming method calls. However, for high performance parallel computing intra-object parallelism, i.e... / A Language Design for Safe High-Performance Computing Benedict Gomes br networks Myrinet ATM parallel hardware platforms are now more widely

2851.1   Synchronization and Communication in the T3E Multiprocessor - Scott (1996)   (Correct)
This paper describes the synchronization and communication primitives of the Cray T3E multiprocessor, a shared memory system scalable to 2048 processors. We discuss what we have learned from the T3D p... / programming model e.g.High Performance Fortran HPF or the br memories. Load store performance highlights the memory pipelining

2829.5   Parallel Computers and Complex Systems - Fox, Coddington (1994)   (Correct)
We present an overview of the state of the art and future trends in high performance parallel and distributed computing, and discuss techniques for using such computers in the simulation of complex pr... / the art and future trends in high performance parallel and distributed br before the end of the decade. Hardware trends imply that all computers

2824.2   Issues in Autonomous Mobile Robot Navigation - Singhal (1997)   (Correct)
Three main problems facing outdoor autonomous mobile robot navigation are unstructured environments, moving obstacles, and multiple sensors. Each of these leads to uncertainties that usually cannot be... / passed off as robots. Today's high performance world demands precise and br Beacons for Localization B Hardware for Proposed Experimentation

2805.9   Lessons from FTM: an Experiment in the Design and Implementation of a .. - Muller, al. (1995)   (Correct)
This report describes an experiment in the design of a general purpose fault tolerant system, FTM. The main objective of the FTM design was to implement a "low-cost" fault tolerant system that could... / our approach relies on a high performance stable storage br which can be implemented either by hardware or software. We first motivate

2805.0   Falcon: On-line Monitoring and Steering of Large-Scale Parallel.. - Gu (1995)   (Correct)
Falcon is a system for on-line monitoring and steering of large-scale parallel programs. The purpose of such interactive steering is to improve its performance or to affect its execution behavior. The... / Introduction The high performance of current parallel br basis. Falcon runs on several hardware platforms including the Kendall

2804.2   Implementing Fine-Grain Distributed Shared Memory On Commodity SMP.. - Schoinas (1996)   (Correct)
This paper reports our experience implementing the Blizzard fine-grain distributed shared memory system on a network of unmodified dual-processor workstations running a commercial operating system. Th... / shared memory but that high performance requires either custom br optimized software commodity hardware and custom hardware a

2798.3   Vision-Based Road Detection in Automotive Systems: A Real-Time.. - Broggi, Bertè (1995)   (Correct)
The main aim of this work is the development of a vision-based road detection system fast enough to cope with the difficult real-time constraints imposed by moving vehicle applications. The hardware p... / Figure .b But due to the high performance levels achieved it will be br moving vehicle applications. The hardware platform a special-purpose

2797.6   Object Models for Distributed or Persistent Programming - Cahill Nixon (1997)   (Correct)
As use of object orientation for application development has increased, many researchers have investigated the design of object-based programming languages for the distributed and persistent programmi... / as well as the traditional high performance community. A distributed br consideration for the underlying hardware except when this is part of the

2773.0   Flexibility and Performance of Parallel File Systems - Kotz, Nieuwejaar (1996)   (Correct)
Many scientific applications for high-performance multiprocessors have tremendous I/O requirements. As a result, the I/O system is often the limiting factor of application performance. Several new par... / scientific applications for high-performance multiprocessors have br systems with sufficient I O hardware Kot Most of today's

2758.9   Implementing Multidestination Worms in Switch Based Parallel Systems: .. - Craig Stunkely   (Correct)
Multidestination message passing has been proposed as an attractive mechanism for efficiently implementing multicast and other collective operations on direct networks. However, applying this mechanis... / on these systems to achieve high performance parallel computation. Many br traffic the central-queue based hardware multicast implementation affects

2756.6   An Interaction of Coherence Protocols and Memory Consistency Models.. - Shi, Hu, Tang (1997)   (Correct)
Coherence protocols and memory consistency models are two important issues in hardware coherent shared memory multiprocessors and software distributed shared memory(DSM) systems. Over the years, many ... / Zhimin Tang Center of High Performance Computing Institute of br models are two important issues in hardware coherent shared memory

2755.3   MORPH: A System Architecture for Robust High Performance Using.. - Chien, Gupta (1996)   (Correct)
Achieving 100 TeraOps performance within a tenyear horizon will require massively-parallel architectures that exploit both commodity software and hardware technology for cost efficiency. Increasing cl... / System Architecture for Robust High Performance Using Customization An br both commodity software and hardware technology for cost efficiency.

2754.4   Architectural Mechanisms for Explicit Communication in Shared Memory.. - Ramachandran, Shah, Sivasubramaniam, .. (1995)   (Correct)
The goal of this work is to explore architectural mechanisms for supporting explicit communication in cachecoherent shared memory multiprocessors. The motivation stems from the observation that applic... / approach to achieving high performance in shared address space br protocols with little or no added hardware complexity. We show that with a

2739.8   Informing Loads: Enabling Software To Observe And React To Memory.. - Horowitz, Martonosi, Mowry, Smith (1995)   (Correct)
Memory latency is an important bottleneck in system performance that cannot be adequately solved by hardware alone. Several promising software techniques have been shown to address this problem succes... / is already present in today's high-performance processors. Key Words and br cannot be adequately solved by hardware alone. Several promising software

2732.6   Indigo: User-level Support for Building Distributed Shared.. - Prince Kohli (1995)   (Correct)
ions Prince Kohli Mustaque Ahamad y Karsten Schwan College of Computing Georgia Institute of Technology June 12, 1996 Abstract Distributed systems that consist of workstations connected by high p... / of workstations connected by high performance interconnects offer br the programmability of such hardware by presenting to application

2728.8   Lazy Threads: Implementing a Fast Parallel Call - Goldstein, Schauser, Culler (1996)   (Correct)
In this paper we describe lazy threads, a new approach for implementing multi-threaded execution models on conventional machines. We show how they can implement a parallel call at nearly the efficienc... / thread of control such as High Performance Fortran or a fixed set br parallel execution directly in hardware In many cases the

2717.7   Calypso: An Environment for Reliable Distributed Parallel Processing - Baratloo, Dasgupta, Kedem (1995)   (Correct)
The importance of adapting networks of workstations for use as parallel processing platforms is well established. However, current solutions do not always satisfactorily address important issues that ... / th IEEE Intl. Symp. on High Performance Distributed Computing br COTS Commercial Off-The-Shelf hardware software and operating systems.

2714.7   Active tracking of foveated feature clusters using affine structure - Reid, Murray (1996)   (Correct)
We describe a novel method of obtaining a fixation point on a moving object for a real-time gaze control system. The method makes use of a real-time implementation of a corner detector and tracker a... / for the method in use with a high performance head eye platform and br in sluggish and unreliable performance -highlight the need for a

2707.8   Enterprise: An Interactive Graphical Programming Environment For.. - Chan, Lu, al. (1992)   (Correct)
Workstation environments have been in use for more than a decade now. Although a network of workstations together represents a large amount of aggregate computing power, single users often cannot util... / the need for more costly high performance computers and utilizing br programs in a distributed hardware environment. Enterprise code

2706.0   Non-Blocking Algorithms and Preemption-Safe Locking on.. - Michael, Scott (1998)   (Correct)
Most multiprocessors are multiprogrammed in order to achieve acceptable response time and to increase their utilization. Unfortunately, inopportune preemption may significantly degrade the performance... / Science and Technology-High Performance Computing Software Science br Another alternative is hardware partitioning under which no two

2705.6   BSPlib: The BSP Programming Library - Hill, McColl, Stefanescu, Goudreau.. (1998)   (Correct)
BSPlib is a small communications library for bulk synchronous parallel (BSP) programming which consists of only 20 basic operations. This paper presents the full definition of BSPlib in C, motivates t... / be able to run unchanged with high performance on any general purpose br provide a clear focus for future hardware developments. For a model to

2702.5   An Efficient and Scalable Approach for Implementing Fault Tolerant.. - Morin, al. (1997)   (Correct)
Distributed Shared Memory (dsm) architectures are attractive to execute high performance parallel applications. Made up of a large number of components, these architectures have however a high proba... / are attractive to execute high performance parallel applications. Made br despite a significant increase in hardware reliability these architectures

2693.7   Communication Throughput of Interconnection Networks - Monien, Diekmann, Lüling (1994)   (Correct)
Modern flow control techniques used for massively parallel computers have made network capacity a more important parameter for the application performance than network latency. Network latency is us... / a call for proposal for the US High Performance Computing project HPC br in their programming models and hardware realizations full custom design

2672.6   Data placement in shared-nothing parallel database systems - Mehta, DeWitt (1994)   (Correct)
F3.733e+05> Data placement in shared-nothing database systems has been studied extensively in the past and various placement algorithms have been proposed. However, there is no consensus on the most... / management is essential for high performance in such large systems. An br The results show that current hardware technology trends have

2657.3   Use of Computational Kernels in Full and Sparse Linear Solvers.. - Daydé, Duff (1996)   (Correct)
We believe that the availability of portable and efficient serial and parallel numerical libraries that can be used as building blocks is extremely important for both simplifying application software ... / Efficient Code Design on High-Performance RISC Processors br as efficiently as possible the hardware of high-performance computers

2651.3   ADAPTIVE: A Flexible and Adaptive Transport System Architecture to.. - Schmidt, Box, Suda (1992)   (Correct)
Transport systems integrate operating system services such as memory and process management together with communication protocols that utilize these OS services to support distributed applications run... / for Multimedia Applications on High-Performance Networks Douglas C. br and general operating system hardware and software factors such as

2648.5   Compiling for Shared-Memory and Message-Passing Computer - Larus (1994)   (Correct)
Many parallel languages presume a shared address space in which any portion of a computation can access any datum. Some parallel computers directly support this abstraction with hardware shared memory... / reduces communication High-Performance Fortran HPF for example br support this abstraction with hardware shared memory. Other computers

2646.7   Parallel Algorithms For CAD With Applications To Circuit Extraction - Belkhale (1991)   (Correct)
INTRODUCTION 1.1. Parallel Processing for CAD As the sizes of VLSI circuits increases in the future, the computational requirements for performing various computer-aided design (CAD) tasks such as sim... / In the HIPERCAD HIgh PERformance CAD project currently br the use of special-purpose hardware accelerators. Special-purpose

2607.5   A Kernel Implementation of Distributed Shared Memory on a Network of.. - Brett Fleisch (1994)   (Correct)
We describe the evolution of a distributed shared memory (DSM) system, Mirage, and the difficulties encountered when moving the system from a Unix-based 1 kernel on the VAX to a UNIX-based kernel on... / Our goal was to design a high performance DSM system. However an br improved on conventional hardware by applying three well-known

2602.7   A Selective Caching Technique - John, Radhakrishnan   (Correct)
Efficient caches are extremely important for achieving good performance from modern high performance processors. Conventional cache architectures exploit locality, but do so rather blindly. Since all ... / good performance from modern high performance processors. Conventional br It does not require complex hardware as in or detailed cache

2589.8   The Zebra Striped Network File System - Hartman, Ousterhout (1993)   (Correct)
Zebra is a network file system that increases throughput by striping file data across multiple servers. Rather than striping each file separately, Zebra forms all the new data from each client into a ... / file system. This provides high performance for writes of small files as br Sprite file system on the same hardware. For small files the Zebra

2581.5   Replication Techniques For Speeding Up Parallel Applications On.. - Henri Bal (1992)   (Correct)
This paper discusses the design choices involved in replicating objects and their effect on performance. Important issues are: how to maintain consistency among different copies of an object; how to i... / is intended for parallel high-performance applications. Orca is not br of Orca on different hardware configurations have been in use

2579.5   Fast Messages (FM): Efficient, Portable Communication for Workstation .. - Pakin, Karamcheti, Chien (1997)   (Correct)
Illinois Fast Messages (FM) is a low-level software messaging layer designed to meet the demands of high performance network hardware. It delivers much of the hardware's raw performance to both applic... / to meet the demands of high performance network hardware. It delivers br of high performance network hardware. It delivers much of the

2562.1   Improving the Parallelism and Concurrency in Decoupled Architectures - K.J., Naresh.C   (Correct)
This paper investigates a technique to facilitate anticipatory loading to queues even in presence of data dependent control dependencies. The proposed method consists of fetching along one or both pat... / between them and yield high performance and increased flexibility. br dynamically scheduled processors hardware a scoreboard could reorder

2547.8   The Effects of High-Performance Processors, Real-Time Priorities and.. - Claypool, Habermann, Riedl (1998)   (Correct)
Multimedia applications have the potential to enhance work for teams of users collaborating across distances. Jitter hampers the effectiveness of these multimedia applications. Jitter is the variation... / July The Effects of High-Performance Processors Real-Time br over the use of specialized hardware corporate and academic

2537.7   An Optimized Hardware Architecture and Communication Protocol for.. - Shoemaker (1997)   (Correct)
Managing communications in parallel processing systems has proven to be one of the most critical problems facing designers. As processor speeds continue to increase, communication latency and bandwidt... / the construction of modular high-performance digital systems in which the br An Optimized Hardware Architecture and Communication

2508.9   Implementing Multidestination Worms in Switch-Based Parallel Systems: .. - Craig Stunkel (1997)   (Correct)
Multidestination message passing has been proposed as an attractive mechanism for efficiently implementing multicast and other collective operations on direct networks. However, applying this mechanis... / on these systems to achieve high performance parallel computation. Many br traffic the central-buffer-based hardware multicast implementation affects

2504.2   Using Memory-Mapped Network Interfaces to Improve the Performance of.. - Leonidas Kontothanassis (1996)   (Correct)
Shared memory is widely believed to provide an easier programming model than message passing for expressing parallel algorithms. Distributed Shared Memory (DSM) systems provide the illusion of shared ... / International Conference on High Performance Computer Architecture br top of standard message passing hardware at very low implementation cost

2502.7   Cluster I/O with River: Making the Fast Case Common - Remzi Arpaci-Dusseau   (Correct)
We introduce River, a data-flow programming environment and I/O substrate for clusters of computers. River is designed to provide maximum performance in the common case --- even in the face of non-uni... / two simple design features a high-performance distributed queue and a br the face of non-uniformities in hardware software and workload. River is

2501.0   Automatic Differentiation Of Advanced CFD Codes For Multidisciplinary .. - Bischof, Corliss, Green, Griewank.. (1992)   (Correct)
This paper addresses one such synergism for computa- unknown AUTOMATIC DIFFERENTIATION OF ADVANCED CFD CODES FOR MULTIDISCIPLINARY DESIGN C. Bischof G. Corliss Argonne National Laboratory, Argonne, ... / CAS grand challenge of the High Performance Computing and Communications br procedures. Advances in computer hardware and software electronic

2499.4   Requirements for Data-Parallel Programming Environments - Adve, Carle, Granston, Hiranandani.. (1994)   (Correct)
this paper is to convey an understanding of the tools and strategies that will be needed to adequately support efficient, machineindependent data-parallel programming. To achieve our goal, we will exa... / languages such as Fortran D High Performance Fortran HPF and br extended to reflect the underlying hardware. For example distributed-memory

2496.3   Parallel Processing on Networks of Workstations: A Fault-Tolerant.. - Dasgupta, al. (1995)   (Correct)
One of the most sought after software innovation of this decade is the construction of systems using off-the-shelf workstations that actually deliver, and even surpass, the power and reliability of su... / A Fault-Tolerant High Performance Approach Abstract One br systems and other software hardware artifacts. Such computations are

2493.8   Distributed and Parallel Database Systems - Özsu, Valduriez (1996)   (Correct)
this paper, we present an overview of the distributed DBMS and parallel DBMS technologies, highlight the unique characteristics of each, and indicate the similarities between them. This discussion sho... / in order to deliver high-performance and high-availability br in order to deliver high-performance and high-availability database

2486.3   Languages and Tools for Real-time Systems: Problems, Solutions and.. - Gerber (1994)   (Correct)
This report summarizes two talks I gave at the ACM SIGPLAN Workshop on Language, Compiler, and Tool Support for Real-Time Systems, which took place on June 21, 1994, in in Orlando, Florida. The worksh... / tools CASE tool suites high-performance compilers etc. What I mean br buying some very expensive hardware which may not even be upgradable

2481.6   The MIT Alewife Machine - Agarwal, Bianchini, Chaiken, al (1991)   (Correct)
A variety of models for parallel architectures such as shared memory, message passing, and dataflow, have converged in the recent past to a hybrid architecture form called distributed shared memory (D... / allow programmers to write high-performance applications quickly. A br DSM By using a combination of hardware and software mechanisms DSM

2480.9   Unifying Data and Control Transformations for Distributed.. - Cierniak, Li (1994)   (Correct)
We present a unified approach to locality optimization that employs both data and control transformations. Data transformations include changing the array layout in memory. Control transformations inv... / to be a serious obstacle to high performance on distributed shared memory br Most shared-memory machines both hardware and software based rely on data

2480.1   MRPC: A High Performance RPC System for MPMD Parallel Computing - Chang, Czajkowski, von Eicken (1997)   (Correct)
MRPC is an RPC system that is designed and optimized for MPMD parallel computing. Existing systems based on standard RPC incur an unnecessarily high cost when used on high-performance multi-computers,... / high cost when used on high-performance multi-computers limiting the br as well as of specialized hardware allowing it for example to

2479.5   Compiler Assisted Distributed Memory Parallelization of an Iterative.. - Pommerell, Rühl (1993)   (Correct)
Distributed memory parallel processors (DMPPs) can deliver high peak performance comparable or higher than vector supercomputers while promising a better cost-performance ratio. Programming, however, ... / a programming language like High Performance Fortran HPF support br low investment in computer hardware powerful and cheap

2473.1   Performance of the Galley Parallel File System - Nils Nieuwejaar (1996)   (Correct)
As the I/O needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file sy... / is capable of providing high-performance I O to applications that br physical limitations of storage hardware but a more significant reason

2467.1   Communication overlap in multi-tier parallel algorithms - Baden, Fink (1998)   (Correct)
Hierarchically organized multicomputers such as SMP clusters offer new opportunities and new challenges for high-performance computation, but realizing their full potential remains a formidable task. ... / and new challenges for high-performance computation but realizing br calculations realizing the hardware's potential remains a formidable

2465.7   BSP Clusters: High Performance, Reliable And Very Low Cost - Donaldson, Hill, Skillicorn (1998)   (Correct)
We describe a transport protocol suitable for BSPlib programs running on a cluster of PCs connected by a 100Mbps Ethernet switch. The protocol provides a reliable packet-delivery mechanism that uses g... / Research Group Bsp Clusters High Performance Reliable And Very Low Cost br low-latency protocols on similar hardware but the addition of reliability

2464.3   Kaxiras@cs.wisc.edu - Sc Edu   (Correct)
In this paper we propose Instruction-based Prediction as a means to optimize directory-based cache coherent NUMA shared-memory. Instruction-based prediction is based on observing the behavior of load ... / used to transparently offer high performance while preserving programmers' br of Instruction-Based Prediction in Hardware SharedMemory Stefanos

2458.9   Performance Modeling of Distributed Memory Architectures - Johnsson (1991)   (Correct)
We provide performance models for several primitive operations on data structures distributed over memory units interconnected by a Boolean cube network. In particular, we model single source, and mul... / of fundamental importance for high performance. We present analytic models br render a fair assessment of the hardware capabilities while modeling the

2456.5   Practical Parallel Algorithms for Personalized Communication and.. - Bader, Helman, JaJa (1995)   (Correct)
A fundamental challenge for parallel computing is to obtain high-level, architecture independent, algorithms which efficiently execute on general-purpose parallel machines. With the emergence of messa... / implementation will allow high performance implementations of a large br parallel algorithms. Each of our hardware platforms can be viewed as a

2454.3   Network Performance Under Hybrid Traffic Loads - Kim, Chien (1996)   (Correct)
In actual multicomputer networks, communications consist of hybrid traffic in which messages exhibit a variety of sizes. However to date, most studies on network performance are based on traffic loads... / Background and Related Work High performance routing networks the subject br to packetization with different hardware requirements. Finally we study

2447.3   Optimizing Parallel Applications for Wide-Area Clusters - Bal, Plaat, Bakker, Dozy, Hofman (1998)   (Correct)
Recent developments in networking technology cause a growing interest in connecting local-area clusters of workstations over wide-area links, creating multilevel clusters. Often, latency and bandwidth... / into account most obtain high performance. The optimizations we used br for our research including the hardware and systems software. Also we

2444.6   The Impact of Data Transfer and Buffering Alternatives on Network.. - Mukherjee (1998)   (Correct)
The explosive growth in the performance of microprocessors and networks has created a new opportunity to reduce the latency of fine-grain communication. Microprocessor clock speeds are now approaching... / International Symposium on High-Performance Computer Architecture HPCA br parallelism. Network hardware continues to advance towards

2442.2   StarT the Next Generation: Integrating Global Caches and Dataflow.. - Ang, Arvind, Chiou (1994)   (Correct)
The implicitly parallel programming model provides an attractive approach to deal with the complexity of parallel programming. Implementing this model efficiently, especially on stock processors, rema... / Fortran Fortran D and High Performance Fortran HPF is a very br as to make maximum use of existing hardware and software subsystems. This

2423.4   Process Introspection: A Checkpoint Mechanism for High Performance.. - Ferrari (1996)   (Correct)
The Process Introspection project is a design and implementation effort, the main goal of which is to construct a general purpose, flexible, efficient checkpoint/restart mechanism appropriate for use ... / A Checkpoint Mechanism for High Performance Heterogeneous Distributed br computing and networking hardware have made the use of networks of

2420.3   A High-Performance, Portable Implementation of the MPI Message.. - Gropp (1996)   (Correct)
MPI (Message Passing Interface) is a specification for a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, an... / A High-Performance Portable Implementation of br being followed the current hardware and software environment for

2415.5   Fine-grain Access Control for Distributed Shared Memory - Schoinas (1994)   (Correct)
This paper discusses implementations of fine-grain memory access control, which selectively restricts reads and writes to cache-block-sized memory regions. Fine-grain access control forms the basis of... / shared-memory machines achieve high performance by using hardware-intensive br require little or no additional hardware. These techniques permit

2415.2   The Design and Performance of a Pluggable Protocols Framework for.. - Kuhns, O'Ryan, Schmidt, Parsons (1999)   (Correct)
To be an effective platform for performance-sensitive realtime and embedded applications, off-the-shelf OO middleware like CORBA, DCOM, and Java RMI must preserve communication-layer quality of servic... / we describe how TAO our high-performance real-time CORBAcompliant br protocols and interconnects or hardware However the general lack of

CiteSeer - citeseer.org - Terms of Service - Privacy Policy - Copyright © 1997-2002 NEC Research Institute