Results 1 - 10
of
316
NAMD2: Greater Scalability for Parallel Molecular Dynamics
- JOURNAL OF COMPUTATIONAL PHYSICS
, 1998
"... Molecular dynamics programs simulate the behavior of biomolecular systems, leading to insights and understanding of their functions. However, the computational complexity of such simulations is enormous. Parallel machines provide the potential to meet this computational challenge. To harness this ..."
Abstract
-
Cited by 322 (45 self)
- Add to MetaCart
Molecular dynamics programs simulate the behavior of biomolecular systems, leading to insights and understanding of their functions. However, the computational complexity of such simulations is enormous. Parallel machines provide the potential to meet this computational challenge. To harness this potential, it is necessary to develop a scalable program. It is also necessary that the program be easily modified by application-domain programmers. The
The Wisconsin Wind Tunnel: Virtual Prototyping of Parallel Computers
- In Proceedings of the 1993 ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems
, 1993
"... We have developed a new technique for evaluating cache coherent, shared-memory computers. The Wisconsin Wind Tunnel (WWT) runs a parallel sharedmemory program on a parallel computer (CM-5) and uses execution-driven, distributed, discrete-event simulation to accurately calculate program execution tim ..."
Abstract
-
Cited by 226 (29 self)
- Add to MetaCart
(Show Context)
We have developed a new technique for evaluating cache coherent, shared-memory computers. The Wisconsin Wind Tunnel (WWT) runs a parallel sharedmemory program on a parallel computer (CM-5) and uses execution-driven, distributed, discrete-event simulation to accurately calculate program execution time. WWT is a virtual prototype that exploits similarities between the system under design (the target) and an existing evaluation platform (the host). The host directly executes all target program instructions and memory references that hit in the target cache. WWT's shared memory uses the CM-5 memory 's error-correcting code (ECC) as valid bits for a fine-grained extension of shared virtual memory. Only memory references that miss in the target cache trap to WWT, which simulates a cache-coherence protocol. WWT correctly interleaves target machine events and calculates target program execution time. WWT runs on parallel computers with greater speed and memory capacity than uniprocessors. WWT'...
The Workload on Parallel Supercomputers: Modeling the Characteristics of Rigid Jobs
- Journal of Parallel and Distributed Computing
, 2001
"... The analysis of workloads is important for understanding how systems are used. In addition, workload models are needed as input for the evaluation of new system designs, and for the comparison of system designs. This is especially important in costly large-scale parallel systems. Luckily, workloa ..."
Abstract
-
Cited by 144 (12 self)
- Add to MetaCart
(Show Context)
The analysis of workloads is important for understanding how systems are used. In addition, workload models are needed as input for the evaluation of new system designs, and for the comparison of system designs. This is especially important in costly large-scale parallel systems. Luckily, workload data is available in the form of accounting logs. Using such logs from three dierent sites, we analyze and model the job-level workloads with an emphasis on those aspects that are universal to all sites. As many distributions turn out to span a large range, we typically rst apply a logarithmic transformation to the data, and then t it to a novel hyper-Gamma distribution or one of its special cases. This is a generalization of distributions proposed previously, and leads to good goodness-of-t scores. The parameters for the distribution are found using the iterative EM algorithm. The results of the analysis have been codied in a modeling program that creates a synthetic workload based on the results of the analysis. 1
Scalable Load Balancing Techniques for Parallel Computers
, 1994
"... In this paper we analyze the scalability of a number of load balancing algorithms which can be applied to problems that have the following characteristics : the work done by a processor can be partitioned into independent work pieces; the work pieces are of highly variable sizes; and it is not po ..."
Abstract
-
Cited by 118 (15 self)
- Add to MetaCart
In this paper we analyze the scalability of a number of load balancing algorithms which can be applied to problems that have the following characteristics : the work done by a processor can be partitioned into independent work pieces; the work pieces are of highly variable sizes; and it is not possible (or very difficult) to estimate the size of total work at a given processor. Such problems require a load balancing scheme that distributes the work dynamically among different processors. Our goal here is to determine the most scalable load balancing schemes for different architectures such as hypercube, mesh and network of workstations. For each of these architectures, we establish lower bounds on the scalability of any possible load balancing scheme. We present the scalability analysis of a number of load balancing schemes that have not been analyzed before. This gives us valuable insights into their relative performance for different problem and architectural characteristi...
Job Characteristics of a Production Parallel Scientific Workload on the NASA Ames iPSC/860
, 1995
"... . Statistics of a parallel workload on a 128-node iPSC/860 located at NASA Ames are presented. It is shown that while the number of sequential jobs dominates the number of parallel jobs, most of the resources (measured in node-seconds) were consumed by parallel jobs. Moreover, most of the sequen ..."
Abstract
-
Cited by 104 (26 self)
- Add to MetaCart
(Show Context)
. Statistics of a parallel workload on a 128-node iPSC/860 located at NASA Ames are presented. It is shown that while the number of sequential jobs dominates the number of parallel jobs, most of the resources (measured in node-seconds) were consumed by parallel jobs. Moreover, most of the sequential jobs were for system administration. The average runtime of jobs grew with the number of nodes used, so the total resource requirements of large parallel jobs were larger by more than the number of nodes they used. The job submission rate during peak day activity was somewhat lower than one every two minutes, and the average job size was small. At night, submission rate was low but job sizes and system utilization were high, mainly due to NQS. Submission rate and utilization over the weekend were lower than on weekdays. The overall utilization was 50%, after accounting for downtime. About 2/3 of the applications were executed repeatedly, some for a significant number of times....
Analyzing Scalability of Parallel Algorithms and Architectures
- Journal of Parallel and Distributed Computing
, 1994
"... The scalability of a parallel algorithm on a parallel architecture is a measure of its capacity to effectively utilize an increasing number of processors. Scalability analysis may be used to select the best algorithm-architecture combination for a problem under different constraints on the growth of ..."
Abstract
-
Cited by 96 (17 self)
- Add to MetaCart
(Show Context)
The scalability of a parallel algorithm on a parallel architecture is a measure of its capacity to effectively utilize an increasing number of processors. Scalability analysis may be used to select the best algorithm-architecture combination for a problem under different constraints on the growth of the problem size and the number of processors. It may be used to predict the performance of a parallel algorithm and a parallel architecture for a large number of processors from the known performance on fewer processors. For a fixed problem size, it may be used to determine the optimal number of processors to be used and the maximum possible speedup that can be obtained. The objective of this paper is to critically assess the state of the art in the theory of scalability analysis, and motivate further research on the development of new and more comprehensive analytical tools to study the scalability of parallel algorithms and architectures. We survey a number of techniques and formalisms t...
Special Purpose Parallel Computing
- Lectures on Parallel Computation
, 1993
"... A vast amount of work has been done in recent years on the design, analysis, implementation and verification of special purpose parallel computing systems. This paper presents a survey of various aspects of this work. A long, but by no means complete, bibliography is given. 1. Introduction Turing ..."
Abstract
-
Cited by 82 (6 self)
- Add to MetaCart
A vast amount of work has been done in recent years on the design, analysis, implementation and verification of special purpose parallel computing systems. This paper presents a survey of various aspects of this work. A long, but by no means complete, bibliography is given. 1. Introduction Turing [365] demonstrated that, in principle, a single general purpose sequential machine could be designed which would be capable of efficiently performing any computation which could be performed by a special purpose sequential machine. The importance of this universality result for subsequent practical developments in computing cannot be overstated. It showed that, for a given computational problem, the additional efficiency advantages which could be gained by designing a special purpose sequential machine for that problem would not be great. Around 1944, von Neumann produced a proposal [66, 389] for a general purpose storedprogram sequential computer which captured the fundamental principles of...
Optimal distributed online prediction using mini-batches
, 2010
"... Online prediction methods are typically presented as serial algorithms running on a single processor. However, in the age of web-scale prediction problems, it is increasingly common to encounter situations where a single processor cannot keep up with the high rate at which inputs arrive. In this wor ..."
Abstract
-
Cited by 73 (9 self)
- Add to MetaCart
(Show Context)
Online prediction methods are typically presented as serial algorithms running on a single processor. However, in the age of web-scale prediction problems, it is increasingly common to encounter situations where a single processor cannot keep up with the high rate at which inputs arrive. In this work, we present the distributed mini-batch algorithm, a method of converting many serial gradient-based online prediction algorithms into distributed algorithms. We prove a regret bound for this method that is asymptotically optimal for smooth convex loss functions and stochastic inputs. Moreover, our analysis explicitly takes into account communication latencies between nodes in the distributed environment. We show how our method can be used to solve the closely-related distributed stochastic optimization problem, achieving an asymptotically linear speed-up over multiple processors. Finally, we demonstrate the merits of our approach on a web-scale online prediction problem.
Aspects of Networking in Multiplayer Computer Games
, 2001
"... Distributed, real-time multiplayer computer games (MCGs) are in the vanguard of utilizing the networking possibilities. Although related research have been done in military simulations, virtual reality systems, and computer supported cooperative working, the suggested solutions diverge from the prob ..."
Abstract
-
Cited by 68 (1 self)
- Add to MetaCart
Distributed, real-time multiplayer computer games (MCGs) are in the vanguard of utilizing the networking possibilities. Although related research have been done in military simulations, virtual reality systems, and computer supported cooperative working, the suggested solutions diverge from the problems posed by MCGs. With this in mind, this paper provides a concise overview of four aspects affecting networking in MCGs. Firstly, networking resources (bandwidth, latency, and computational power) set the technical boundaries within which the MCG must operate. Secondly, distribution concepts encompass communication architectures (peer-to-peer, client/server, server-network), and both data and control architectures (centralized, distributed, replicated). Thirdly, scalability allows the MCG to adapt to the resource changes by parametrization. Finally, security aims at fighting back against cheating and vandalism, which are common in online gaming. Keywords---Computer games, networking, online entertainment, distributed interactive simulation, virtual environments.