Results 1 - 10
of
61
Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors
- ACM Transactions on Computer Systems
, 1991
"... Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-memory parallel programs. Unfortunately, typical implementations of busy-waiting tend to produce large amounts of memory and interconnect contention, introducing performance bottlenecks that become marke ..."
Abstract
-
Cited by 433 (29 self)
- Add to MetaCart
Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-memory parallel programs. Unfortunately, typical implementations of busy-waiting tend to produce large amounts of memory and interconnect contention, introducing performance bottlenecks that become markedly more pronounced as applications scale. We argue that this problem is not fundamental, and that one can in fact construct busy-wait synchronization algorithms that induce no memory or interconnect contention. The key to these algorithms is for every processor to spin on separate locally-accessible ag variables, and for some other processor to terminate the spin with a single remote write operation at an appropriate time. Flag variables may be locally-accessible as a result of coherent caching, or by virtue of allocation in the local portion of physically distributed shared memory. We present a new scalable algorithm for spin locks that generates O(1) remote references per lock acquisition, independent of the number of processors attempting to acquire the lock. Our algorithm provides reasonable latency in the absence of contention, requires only a constant amount of space per lock, and requires no hardware support other than
*T: A Multithreaded Massively Parallel Architecture
- IN PROCEEDINGS OF THE 19TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE
, 1992
"... What should the architecture of each node in a general purpose, massively parallel architecture (MPA) be? We frame the question in concrete terms by describing two fundamental problems that must be solved well in any general purpose MPA. From this, we systematically develop the required logical orga ..."
Abstract
-
Cited by 47 (1 self)
- Add to MetaCart
What should the architecture of each node in a general purpose, massively parallel architecture (MPA) be? We frame the question in concrete terms by describing two fundamental problems that must be solved well in any general purpose MPA. From this, we systematically develop the required logical organization of an MPA node, and present some details of *T (pronounced Start), a concrete architecture designed to these requirements. *T is a direct descendant of dynamic dataflow architectures, and unifies them with von Neumann architectures. We discuss a hand-compiled example and some compilation issues.
Cube-3: A Real-Time Architecture for High-Resolution Volume Visualization
- IN PROCEEDINGS OF THE 8TH EUROGRAPHICS WORKSHOP ON GRAPHICS HARDWARE '93
, 1994
"... This paper describes a high-performance special-purpose system, Cube-3, for displaying and manipulating high-resolution volumetric datasets in real-time. A primary goal of Cube-3 is to render 512³, 16-bit per voxel, datasets at about 30 frames per second. Cube-3 implements a ray-casting algorithm in ..."
Abstract
-
Cited by 23 (10 self)
- Add to MetaCart
This paper describes a high-performance special-purpose system, Cube-3, for displaying and manipulating high-resolution volumetric datasets in real-time. A primary goal of Cube-3 is to render 512³, 16-bit per voxel, datasets at about 30 frames per second. Cube-3 implements a ray-casting algorithm in a highly-parallel and pipelined architecture, using a 3D skewed volume memory, a modular fast bus, 2D skewed buffers, 3D interpolation and shading units, and a ray projection cone. Cube-3 will allow users to interactively visualize and investigate in real-time static (3D) and dynamic (4D) high-resolution volumetric datasets.
The Evaluation of Massively Parallel Array Architectures
, 1994
"... Computer Science to the memory of my mother Acknowledgments This dissertation would not have been possible without the help of many people. First, I would like to thank my committee for their many helpful comments and suggestions. Specifically, Al Hanson who taught me about computer vision, Wayne Bu ..."
Abstract
-
Cited by 13 (7 self)
- Add to MetaCart
Computer Science to the memory of my mother Acknowledgments This dissertation would not have been possible without the help of many people. First, I would like to thank my committee for their many helpful comments and suggestions. Specifically, Al Hanson who taught me about computer vision, Wayne Burleson who taught me about VLSI, and Don Towsley who taught me about performance evaluation. Most especially, I’d like to thank my committee chair and my advisor and mentor for my entire graduate career, Chip Weems. Besides teaching me about architecture and writing, he suggested the final form of the topic, pulled me out of many blind alleys, and his vast store of knowledge was a constant help. Many other professors at UMass also contributed to my knowledge of computer science and so helped me with this dissertation. I would especially like to thank Arny Rosenberg who not only taught me theory but more importantly how and where to apply it, and Ed Riseman who’s boundless energy and optimism serves as a model for all of us. The first level of discussion and comments is always with the fellow graduate students in one’s
A Survey of Multiprocessor Operating System Kernels
, 1993
"... Multiprocessors have been accepted as vehicles for improved computing speeds, cost/performance, and enhanced reliability or availability. However, the added performance requirements of user programs and functional capabilities of parallel hardware introduce new challenges to operating system design ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
Multiprocessors have been accepted as vehicles for improved computing speeds, cost/performance, and enhanced reliability or availability. However, the added performance requirements of user programs and functional capabilities of parallel hardware introduce new challenges to operating system design and implementation. This paper reviews research and commercial developments in multiprocessor operating system kernels from the late 1970's to the early 1990's. The paper first discusses some common operating system structuring techniques and examines the advantages and disadvantages of using each technique. It then identifies some of the major design goals and key issues in multiprocessor operating systems. Issues and solution approaches are illustrated by review of a variety of research or commercial multiprocessor operating system kernels. College of Computing Georgia Institute of Technology Atlanta, Georgia 30332--0280 Contents 1 Introduction 1 2 Structuring an Operating System 4 2....
An Efficient Delay-Optimal Distributed Termination Detection Algorithm
- IN JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING (JPDC
, 2001
"... One of the important issues to be addressed when solving problems on parallel machines or distributed systems is that of efficient termination detection. Numerous schemes with different performance characteristics have been proposed in the past for this purpose. These schemes, while being efficie ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
One of the important issues to be addressed when solving problems on parallel machines or distributed systems is that of efficient termination detection. Numerous schemes with different performance characteristics have been proposed in the past for this purpose. These schemes, while being efficient with regard to one performance metric, prove to be inefficient in terms of other metrics. A signicant drawback shared by all previous methods is that they may take as long as (P ) time to detect and signal termination after its actual occurrence, where P is the total number of processing elements. Detection delay is arguably the most important metric to optimize, since it is directly related to the amount of idling of computing resources and to the delay in the utilization of results of the underlying computation. In this paper, we present a novel termination detection algorithm that is simultaneously optimal or near-optimal with respect to all relevant performance measures on any topology. In particular, our algorithm has a best-case detection delay of (1) and a nite optimal worst-case detection delay on any topology equal in order terms to the time for an optimal one-to-all broadcast on that topology|we derive a general expression for an optimal one-to-all broadcast on an arbitrary topology, which is an interesting new result in itself. On k-ary n-cube tori and meshes, the worst-case delay is (D), where D is the diameter of the architecture. Further, our algorithm has message and computational complexities of O(max(MD;P )) ((max(M;P )) on the average for most applications|the same as other message-ecient algorithms) and an optimal space complexity of (P ), where M is the total number of messages used by the underlying computation. We also give a scheme using...
Least Common Ancestor Networks
, 1993
"... Least Common Ancestor Networks (LCANs) are introduced and shown to be a class of networks that include fat-trees, baseline networks, SW-banyans and the router networks of the TRAC 1.1 and 2.0, and the CM-5. Some LCAN properties are stated and the permutation routing capabilities of an important subc ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Least Common Ancestor Networks (LCANs) are introduced and shown to be a class of networks that include fat-trees, baseline networks, SW-banyans and the router networks of the TRAC 1.1 and 2.0, and the CM-5. Some LCAN properties are stated and the permutation routing capabilities of an important subclass are analyzed. Simulation results for three permutation classes verify the accuracy of an iterative solution for a randomized routing strategy.
GeoSheet: A Distributed Visualization Tool for Geometric Algorithms
- Int'l J. Computational Geometry & Applications
, 1994
"... GeoSheet (version 1.0) is an interactive visualization tool for visualizing geometric algorithms in distributed environments. It provides features such as interactive visualization of program states for debugging, high-level graphical input/output manipulation facilities for geometric objects, reuse ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
GeoSheet (version 1.0) is an interactive visualization tool for visualizing geometric algorithms in distributed environments. It provides features such as interactive visualization of program states for debugging, high-level graphical input/output manipulation facilities for geometric objects, reuse of existing data structures and algorithms implementation, and more importantly distributed executions on heterogeneous machines at different sites. To minimize development effort of the tool we make use of existing software packages available in public domain. Specifically we extend Xfig with a message-driven interface and a socket-based interprocess communication (IPC) mechanism. This extended-Xfig is the backbone of this version of the tool. Object-oriented programming methodology is used to construct the visualization interface. By deriving from traditional data type and algorithm libraries, our abstract GeoObject representation super-classes are easy to use, easy to construct, and hig...
Secure File Transfer: A Computational Analog to the Furniture Moving Paradigm
- PARALLEL AND DISTRIBUTED COMPUTING PRACTICES
, 1999
"... One of the most compelling illustrations of the power of parallelism is the furniture-moving paradigm. In it, a large item of furniture needs to be moved from one place to another. A single mover, working alone, must take the item apart, move each piece separately, and then reassemble the item a ..."
Abstract
-
Cited by 9 (8 self)
- Add to MetaCart
One of the most compelling illustrations of the power of parallelism is the furniture-moving paradigm. In it, a large item of furniture needs to be moved from one place to another. A single mover, working alone, must take the item apart, move each piece separately, and then reassemble the item at the new location, taking a long time to complete the job. By contrast, four movers can simply lift the item and quickly move it to its new location. Thus, the time required to accomplish the task is reduced by a factor significantly larger than four. This paper describes a computational analog to the furniture-moving paradigm. The computation in question is concerned with transferring a computer file from one computer system to another over an insecure communications channel. The file contains private or sensitive information whose secrecy and integrity need to be maintained. Cryptography is used to obtain a digital signature of the file, thereby protecting its integrity, and the...
Scalable Parallel Direct Volume Rendering for Nonrectilinear Computational Grids
, 1993
"... ix Acknowledgements x Acknowledgements : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : x Publication History : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : x 1. Introduction 1 1.1 Introduction to Direct Volume Rendering : : : : : : : : : : : : : : : : ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
ix Acknowledgements x Acknowledgements : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : x Publication History : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : x 1. Introduction 1 1.1 Introduction to Direct Volume Rendering : : : : : : : : : : : : : : : : : : : 2 1.1.1 Volumetric Grids : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3 1.1.2 Image-space Rendering Algorithms : : : : : : : : : : : : : : : : : : : 4 1.1.3 Object-space Rendering Algorithms : : : : : : : : : : : : : : : : : : 5 1.1.4 Shear Transformations : : : : : : : : : : : : : : : : : : : : : : : : : : 7 1.1.5 Complexity : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 1.2 Motivation for Parallel Direct Volume Rendering : : : : : : : : : : : : : : : 8 1.2.1 Scalability Is Important : : : : : : : : : : : : : : : : : : : : : : : : : 8 1.3 Context for Use : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9 1.3.1 Distributed Graphical Us...

