Results 1 - 10
of
14
Parallel computing on the berkeley now
- In Proceedings of the 9th Joint Symposium on Parallel Processing (JSPP 97
, 1997
"... (NOW) project demonstrates a new approach to largescale system design enabled by technology advances that provide inexpensive, low latency, high bandwidth, scalable interconnection networks. This paper provides an overview of the hardware and software architecture of NOW and reports on the performan ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
(NOW) project demonstrates a new approach to largescale system design enabled by technology advances that provide inexpensive, low latency, high bandwidth, scalable interconnection networks. This paper provides an overview of the hardware and software architecture of NOW and reports on the performance obtained at each layer of the system: Active Messages, MPI message passing, and benchmark parallel applications. 1
Evaluating Design Alternatives for Reliable Communication on High-Speed Networks
- IN PROC. 9TH INT. CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS-9
, 2000
"... We systematically evaluate the performance of five implementations of a single, user-level communication interface. Each implementation makes different architectural assumptions about the reliability of the network hardware and the capabilities of the network interface. The implementations differ ac ..."
Abstract
-
Cited by 14 (9 self)
- Add to MetaCart
(Show Context)
We systematically evaluate the performance of five implementations of a single, user-level communication interface. Each implementation makes different architectural assumptions about the reliability of the network hardware and the capabilities of the network interface. The implementations differ accordingly in their division of protocol tasks between host software, network-interface firmware, and network hardware. Using microbenchmarks, parallelprogramming systems, and parallel applications, we assess the performance impact of different protocol decompositions. We show how moving protocol tasks to a relatively slow network interface yields both performance advantages and disadvantages, depending on the characteristics of the application and the underlying parallelprogramming system. In particular, we show that a communication system that assumes highly reliable network hardware and that uses network-interface support to process multicast traffic performs best for all applications.
SUPPLE: an Efficient Run-Time Support for Non-Uniform Parallel Loops
, 1996
"... The efficient implementation of parallel loops on distributed--memory multicomputers is a hot topic of research. To this end, data parallel languages generally exploit static data layout and static scheduling of iterations. Unfortunately, when iteration execution costs vary considerably and are un ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
The efficient implementation of parallel loops on distributed--memory multicomputers is a hot topic of research. To this end, data parallel languages generally exploit static data layout and static scheduling of iterations. Unfortunately, when iteration execution costs vary considerably and are unpredictable, some processors may be assigned more work than others. Workload imbalance can be mitigated by cyclically distributing data and associated computations. Though this strategy often solves load balance issues, it may worsen data locality exploitation. This paper presents SUPPLE (SUPort for Parallel Loop Execution), an innovative run--time support for parallel loops with regular stencil data references and non--uniform iteration costs. SUPPLE relies upon a static block data distribution to exploit locality, and combines static and dynamic policies for scheduling non--uniform iterations. It adopts, as far as possible, a static scheduling policy derived from the owner computes...
An Analysis of VI Architecture Primitives in Support of Parallel and Distributed Communication
, 2002
"... We present the results of a detailed study of the Virtual Interface (VI) paradigm as a communication foundation for a distributed computing environment. Using Active Messages and the Split-C global memory model, we analyze the inherent costs of using VI primitives to implement these highlevel commun ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We present the results of a detailed study of the Virtual Interface (VI) paradigm as a communication foundation for a distributed computing environment. Using Active Messages and the Split-C global memory model, we analyze the inherent costs of using VI primitives to implement these highlevel communication abstractions. We demonstrate a minimum mapping cost (i.e. the host processing required to map one abstraction to a lower abstraction) of 5.4 sec for both Active Messages and Split-C using 4-way 550 MHz Pentium III SMPs and the Myrinet network. We break down this cost to use of individual VI primitives in supporting flow control, buffer management and event processing and identify the completion queue as the source of the highest overhead. Bulk transfer performance plateaus at 44 Mbytes/sec for both implementations due to the addition of fragmentation requirements. Based on this analysis, we present the implications for the VI successor, Infiniband.
pSNOW: A Tool to Evaluate Architectural Issues for NOW Environments
- Proceedings of the International Conference on Supercomputing
, 1997
"... Performance evaluation plays a crucial role in the design of any system. Evaluation tools should clearly identify,isolate and quantify the bottlenecks in the execution to help restructure the application for better performance, as well as suggest enhancements to the existing design. While there has ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Performance evaluation plays a crucial role in the design of any system. Evaluation tools should clearly identify,isolate and quantify the bottlenecks in the execution to help restructure the application for better performance, as well as suggest enhancements to the existing design. While there has been significant progress recently in novel network interface designs and system software solutions to lower the communication overheads on emerging high performance Network of Workstations environments, performance evaluation tools for these environments have not kept pace with this progress. In this research, we present an execution-driven simulation tool called pSNOW that provides us a unified framework to model different system software and architectural designs, and evaluate these designs using real applications. Using this tool, we model three network interfaces and three communication software substrates, and evaluate their relative merits and demerits. 1
MRPC: A High Performance RPC System for MPMD Parallel Computing
- in NASA Multi-Spectral Image Processing”. Master’s Thesis. Northwestern
, 1997
"... MRPC is an RPC system that is designed and optimized for MPMD parallel computing. Existing systems based on standard RPC incur an unnecessarily high cost when used on high-performance multi-computers, limiting the appeal of RPC-based languages in the parallel computing community. MRPC combines the e ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
MRPC is an RPC system that is designed and optimized for MPMD parallel computing. Existing systems based on standard RPC incur an unnecessarily high cost when used on high-performance multi-computers, limiting the appeal of RPC-based languages in the parallel computing community. MRPC combines the efficient control and data transfer provided by Active Messages (AM) with a minimal multithreaded runtime system that extends AM with the features required to support MPMD. This approach introduces only the necessary runtime RPC overheads for an MPMD environment. MRPC has been integrated into Compositional C++ (CC++), a parallel extension of C++ that offers an MPMD programming model. Basic RPC performance in MRPC is within a factor of two from those of Split-C, a highly tuned SPMD language, and other messaging layers. CC++ applications perform within a factor of two to six from comparable Split-C versions, which represent an order of magnitude improvement over previous CC++ implementations. ...
SAFE AND EFFICIENT CLUSTER COMMUNICATION IN JAVA USING EXPLICIT MEMORY MANAGEMENT
, 2000
"... This thesis presents a framework for using explicit memory management to improve the communication performance of Java TM cluster applications. The framework allows programmers to explicitly manage Java communication buffers, called jbufs, which are directly accessed by the DMA engines of high-perfo ..."
Abstract
- Add to MetaCart
This thesis presents a framework for using explicit memory management to improve the communication performance of Java TM cluster applications. The framework allows programmers to explicitly manage Java communication buffers, called jbufs, which are directly accessed by the DMA engines of high-performance network interfaces and by Java programs as primitive-typed arrays. The central idea is to remove the hard separation between Java’s garbage-collected heap and the non-collected memory region in which DMA buffers must normally be allocated. The programmer controls when a jbuf is part of the garbage-collected heap so that the garbage collector can ensure it is safely re-used or deallocated, and when it is not so it can be used for DMA transfers. Unlike other techniques, jbufs preserve Java’s storage- and type-safety and do not depend on a particular garbage collection scheme. The safety, efficiency, and programmability of jbufs are demonstrated throughout this thesis with implementations of an interface to the Virtual In-
Network Interface Active Messages for Low Overhead Communication on SMP PC Clusters
- Communication on SMP PC Clusters. In Proc. on HPCN'99
, 1999
"... . NICAM is a communication layer for SMP PC clusters connected via Myrinet, designed to reduce overhead and latency by directly utilizing a micro-processor equipped on the network interface. It adopts remote memory operations to reduce much of the overhead found in message passing. NICAM employs an ..."
Abstract
- Add to MetaCart
(Show Context)
. NICAM is a communication layer for SMP PC clusters connected via Myrinet, designed to reduce overhead and latency by directly utilizing a micro-processor equipped on the network interface. It adopts remote memory operations to reduce much of the overhead found in message passing. NICAM employs an Active Messages framework for flexibility in programming on the network interface, and this flexibility will compensate for the large latency resulting from the relatively slow micro-processor. Running message handlers directly on the network interface reduces the overhead by freeing the main processors from the work of polling incoming messages. The handlers also make synchronizations faster by avoiding the costly interactions between the main processors and the network interface. In addition, this implementation can completely hide latency of barriers in data-parallel programs, because handlers running in the background of the main processors allow reposition of barriers to any place where...