Results 1 - 10
of
3,995
Fast Parallel Algorithms for Short-Range Molecular Dynamics
- JOURNAL OF COMPUTATIONAL PHYSICS
, 1995
"... Three parallel algorithms for classical molecular dynamics are presented. The first assigns each processor a fixed subset of atoms; the second assigns each a fixed subset of inter-atomic forces to compute; the third assigns each a fixed spatial region. The algorithms are suitable for molecular dyn ..."
Abstract
-
Cited by 653 (7 self)
- Add to MetaCart
dynamics models which can be difficult to parallelize efficiently -- those with short-range forces where the neighbors of each atom change rapidly. They can be implemented on any distributed--memory parallel machine which allows for message--passing of data between independently executing processors
Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors
- In Proceedings of the 17th Annual International Symposium on Computer Architecture
, 1990
"... Scalable shared-memory multiprocessors distribute memory among the processors and use scalable interconnection networks to provide high bandwidth and low latency communication. In addition, memory accesses are cached, buffered, and pipelined to bridge the gap between the slow shared memory and the f ..."
Abstract
-
Cited by 730 (17 self)
- Add to MetaCart
and the fast processors. Unless carefully controlled, such architectural optimizations can cause memory accesses to be executed in an order different from what the programmer expects. The set of allowable memory access orderings forms the memory consistency model or event ordering model for an architecture.
Shade: A Fast Instruction-Set Simulator for Execution Profiling
, 1994
"... Tracing tools are used widely to help analyze, design, and tune both hardware and software systems. This paper describes a tool called Shade which combines efficient instruction-set simulation with a flexible, extensible trace generation capability. Efficiency is achieved by dynamically compiling an ..."
Abstract
-
Cited by 383 (2 self)
- Add to MetaCart
Tracing tools are used widely to help analyze, design, and tune both hardware and software systems. This paper describes a tool called Shade which combines efficient instruction-set simulation with a flexible, extensible trace generation capability. Efficiency is achieved by dynamically compiling and caching code to simulate and trace the application program. The user may control the extent of tracing in a variety of ways; arbitrarily detailed application state information may be collected during the simulation, but tracing less translates directly into greater efficiency. Current Shade implementations run on SPARC systems and simulate the SPARC (Versions 8 and 9) and MIPS I instruction sets. This paper describes the capabilities, design, implementation, and performance of Shade, and discusses instruction set emulation in general.
Making Fast Strategic Decisions in High-Velocity Environments
- Academy of Management Journal
, 1989
"... How do executive teams make rapid decisions in the high-velocity microcomputer industry? This inductive study of eight microcomputer firms led lo propositions exploring that question. Fast decision makers use more, not less, information than do slow decision makers. The former also develop more, not ..."
Abstract
-
Cited by 400 (4 self)
- Add to MetaCart
How do executive teams make rapid decisions in the high-velocity microcomputer industry? This inductive study of eight microcomputer firms led lo propositions exploring that question. Fast decision makers use more, not less, information than do slow decision makers. The former also develop more
Purify: Fast detection of memory leaks and access errors
- In Proc. of the Winter 1992 USENIX Conference
, 1991
"... This paper describes Purifyru, a software testing and quality assurance Ool that detects memory leaks and access erors. Purify inserts additional checking instructions directly into the object code produced by existing compilers. These instructions check every memory read and write performed by the ..."
Abstract
-
Cited by 354 (0 self)
- Add to MetaCart
tracks memory usage and identifies individual memory leals using a novel adaptation of garbage collection techniques. Purify produce standard executable files compatible with existing debuggers, and currently runs on Sun Microsystems ' SPARC family of workstations. Purify's neafly
Parallelizing Operational Weather Forecast Models For Portable And Fast Execution
, 1995
"... . This paper describes a high level library (The Nearest Neighbor Tool: NNT) that has been used to parallelize operational weather prediction models. NNT is part of the Scalable Modeling System (SMS), developed at the Forecast Systems Laboratory (FSL). Programs written in NNT rely on SMS's run- ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
. This paper describes a high level library (The Nearest Neighbor Tool: NNT) that has been used to parallelize operational weather prediction models. NNT is part of the Scalable Modeling System (SMS), developed at the Forecast Systems Laboratory (FSL). Programs written in NNT rely on SMS's run-time system and port between a wide range of computing platforms, performing well in multiprocessor systems. We show, using examples from operational weather models, how large Fortran 77 codes can be parallelized using NNT. We compare the ease of programmability of NNT and High Performance Fortran (HPF). We also discuss optimizations like data movement overlap (in interprocessor communication and I/O operations), and the minimization of data exchanges through the use of redundant computations. We show that although HPF provides a simpler programming interface, NNT allows for program optimizations that increase performance considerably and still keeps a simple user interface. These optimizations h...
A parallel processor for fast execution of time-adaptive Jacobi algorithms
- In Proc. of the ProRISC/IEEE Workshop on Circuits, Systems and Signal Processing
, 1996
"... In this paper we take the class of Jacobi-type algorithms and present a systematic way to derive an architecture for execution of the time adaptive QR and QR \Gamma1 algorithms, two members of the class. We know that Jacobi-type algorithms find natural expression in Cordic arithmetic and that high ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In this paper we take the class of Jacobi-type algorithms and present a systematic way to derive an architecture for execution of the time adaptive QR and QR \Gamma1 algorithms, two members of the class. We know that Jacobi-type algorithms find natural expression in Cordic arithmetic
Fast Execution of Simultaneous Breadth-First Searches on Sparse Graphs
"... Abstract—The construction of efficient parallel graph al-gorithms is important for quickly solving problems in areas such as urban planning, social network analysis, and hardware verification. Existing GPU implementations of graph algorithms tend to be monolithic and thus contributions from the lite ..."
Abstract
- Add to MetaCart
requiring many breadth-first searches that can be executed simultaneously. Although algo-rithms have implicitly leveraged this abstraction in the past, we provide an explicit, reusable implementation that efficiently maps this abstraction to the GPU, performing more than twice as fast as previous approaches
Fast Execution of Irregularly Structured Programs with Low Communication Frequency on the Hypercube
, 1995
"... In this paper, we study the problem of efficiently executing a parallel program composed of N tasks on an N-node hypercube assuming that ffl communications between tasks are irregular i.e. any pair of tasks may want to communicate at any step of the program; ffl communications between any two tas ..."
Abstract
- Add to MetaCart
In this paper, we study the problem of efficiently executing a parallel program composed of N tasks on an N-node hypercube assuming that ffl communications between tasks are irregular i.e. any pair of tasks may want to communicate at any step of the program; ffl communications between any two
External Memory Algorithms and Data Structures
, 1998
"... Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we surve ..."
Abstract
-
Cited by 349 (23 self)
- Add to MetaCart
Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we
Results 1 - 10
of
3,995