Fast Parallel Implementation of Lazy Languages  The EQUALS Experience
 Journal of Functional Programming
, 1992
Cited by 12 (4 self)
This paper describes equals, a fast parallel implementation of a lazy functional language on a commercially available sharedmemory parallel machine, the Sequent Symmetry. In contrast to previous implementations, we detect parallelism automatically using strictness analysis. Another important
Fast Parallel Implementation of DFT Using Configurable Devices
, 1997
Cited by 11 (7 self)
. In this paper we propose a fast parallel implementation of Discrete Fourier Transform (DFT) using FPGAs. Our design is based on the Arithmetic Fourier Transform (AFT) using zeroorder interpolation. For a given problem of size N , AFT requires only O(N 2 ) additions and O(N) real
Fast Parallel Algorithms for ShortRange Molecular Dynamics
 JOURNAL OF COMPUTATIONAL PHYSICS
, 1995
Cited by 660 (7 self)
dynamics models which can be difficult to parallelize efficiently  those with shortrange forces where the neighbors of each atom change rapidly. They can be implemented on any distributedmemory parallel machine which allows for messagepassing of data between independently executing processors
A Simple, Fast Parallel Implementation of Quicksort and its Performance Evaluation on SUN Enterprise 10000
Cited by 18 (1 self)
This paper looks into the behavior of a simple, finegrain parallel extension of Quicksort for cachecoherent shared address space multiprocessors. Quicksoft has many nice properties: i) it is fast and general purpose; it is widely believed that Quicksoft is the fastest generalpurpose sorting
A Fast Parallel Implementation of the Wavelet Packet Best Basis Algorithm on the MP2 for RealTime MRI
, 1996
Cited by 3 (0 self)
attractive to implement. This note describes near realtime performance obtained with a parallel implementation of best basis algorithms for Wavelet Packet bases. The platform for our implementation is a DECmpp 12000/Sx 2000, a parallel machine identical to the MasPar MP2. The DECmpp is a single instruction
Fast Folding and Comparison of RNA Secondary Structures (The Vienna RNA Package)
Cited by 807 (117 self)
implementations of modified algorithms on parallel computers with distributed memory. Performance analysis carried out on an Intel Hypercube shows that parallel computing becomes gradually more and more efficient the longer the sequences are.
LogP: Towards a Realistic Model of Parallel Computation
, 1993
Cited by 560 (15 self)
development of techniques that yield performance across a range of current and future parallel machines. This paper offers a new parallel machine model, called LogP, that reflects the critical technology trends underlying parallel computers. It is intended to serve as a basis for developing fast, portable
FAST VOLUME RENDERING USING A SHEARWARP FACTORIZATION OF THE VIEWING TRANSFORMATION
, 1995
Cited by 543 (2 self)
Volume rendering is a technique for visualizing 3D arrays of sampled data. It has applications in areas such as medical imaging and scientific visualization, but its use has been limited by its high computational expense. Early implementations of volume rendering used bruteforce techniques
Parallel Numerical Linear Algebra
, 1993
Cited by 776 (23 self)
We survey general techniques and open problems in numerical linear algebra on parallel architectures. We first discuss basic principles of parallel processing, describing the costs of basic operations on parallel machines, including general principles for constructing efficient algorithms. We
Implementation and performance of Munin
 IN PROCEEDINGS OF THE 13TH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES
, 1991
Cited by 585 (22 self)
Munin is a distributed shared memory (DSM) system that allows shared memory parallel programs to be executed efficiently on distributed memory multiprocessors. Munin is unique among existing DSM systems in its use of multiple consistency protocols and in its use of release consistency. In Munin
