10 citations found. Retrieving documents...
T. Lippert, A. Seyfried, A. Bode, and K. Schilling. Hyper-systolic parallel computing. IEEE Trans. Paral. Distr. Syst., 9:97--108, 1998.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Framework Design, Parallelization and Force Computation in.. - Matthey   (Correct)

....on data replication is an easy but memoryexpensive approach. It has poor scaling properties due to global communication [91] Programs using this decomposition include UHGromos [22] Amber [121] CHARMM [15] Moldy [98] and an early version of EGO [31] Systolic or hypersystolic loop algorithms [78, 114] are a possible remedy to reduce the memory usage and to improve the scaling. Force decomposition involves either force matrix or systolic loop methods. It scales better than atom decomposition by reducing communication costs through the use of a block decomposition, as in LAMMPS [92] CHARMM ....

T. Lippert, A. Seyfried, A. Bode, and K. Schilling. Hyper-systolic parallel computing. IEEE Trans. Paral. Distr. Syst., 9:97--108, 1998.


Hyper-Systolic Processing On The Quadrics: Improving.. - Palazzari, Lippert..   Self-citation (Lippert Schilling)   (Correct)

No context found.

Th. Lippert, A. Seyfried, A. Bode, K. Schilling: `Hyper-Systolic Parallel Computing', Preprint Server HEP-LAT 9507021, Preprint WUB 95-13, HLRZ 32/95, submitted


Hyper-Systolic Routing for SIMD Systems - Hoferichter, Lippert, al. (1997)   Self-citation (Lippert Schilling)   (Correct)

....of routing. Our method needs a total number of interprocessor communincation operations of O(p 3 2 ) next neighbor data hoppings to perform a complete permutation of data between the processors, with p being the number of processors. The routing procedure is based on the hyper systolic algorithm[3, 4, 5, 6]. In the hypersystolic communication mode, the physical machine is mapped on a 1 dimensional systolic ring structure[7] along which the data is communicated among next neighbors in the ring or between neighbors at a distance of k processors, connected by a second set of fast communication lines ....

Th. Lippert, A. Seyfried, A. Bode, K. Schilling: `Hyper-Systolic Parallel Computing', Preprint Server HEP-LAT 9507021, Preprint WUB 95-13, HLRZ 32/95, submitted to IEEE Trans. of Parallel and Distributed Systems.


FFT for the APE Parallel Computer - Lippert, Schilling, Toschi.. (1998)   Self-citation (Lippert Schilling)   (Correct)

No context found.

Th. Lippert, A. Seyfried, A. Bode, K. Schilling: `Hyper-Systolic Parallel Computing', Preprint Server HEP-LAT 9507021, Preprint WUB 95-13, HLRZ 32/95, accepted for publication in IEEE Trans. of Parallel and Distributed Systems.


Hyper-systolic routing for SIMD systems - Hoferichter, Lippert, Schilling, ..   Self-citation (Lippert Schilling)   (Correct)

....parallel computing or for context switching. O(p 1 2 ) interprocessor communication operations per processor are needed (in terms of next neighbour data transfer) to perform a complete permutation, with p being the number of processors. The router is based on the hyper systolic algorithm [4 6]. 2. HYPER SYSTOLIC ROUTING We start from an implementation network that supports (the embedding of) an abstract 1 dimensional ring connectivity, see Fig. 1. Fast access to next neighbour processors and to processors in distances 1 and k in an homogeneous pattern is assumed along the ring. 2 1 ....

....by p Gamma 1 moves of distance 1 along the systolic ring. Therefore, the systolic method decreases linearly in efficiency with the processor number p and is not scalable on massively parallel systems. In order to improve the scalability of the method we are going to apply the hyper systolic idea [4]. In the hyper systolic algorithm, a redundancy of interprocessor communication, as can be identified in some systolic computations, is removed. This is achieved using intermediate internal systolic arrays. Algorithm 1 Systolic routing emulator. foreach processor i = 1 : p 2 systolic ring for j = ....

Th. Lippert, A. Seyfried, A. Bode, K. Schilling. Hyper-Systolic Parallel Computing. accepted for publication in IEEE Trans. of Parallel and Distributed Systems.


BLAS-3 for the Quadrics Parallel Computer - Lippert, Petkov, Schilling   Self-citation (Lippert Schilling)   (Correct)

.... communication structure of a parallel computer should contain a ring as a subset of the system connectivity, together with an additional regular non local connectivity between the elements of the ring, such that the data flow can be organized following the hyper systolic principle, as introduced [14, 15] for the treatment of exact n body simulations. In the algorithm to be described below, each column is entirely assigned to one processor. Skew ordering of the matrix blocks is the second important feature in the distribution of the matrix elements across the processors, enabling us to carry out ....

....the arrangement shown in Fig. 1. Note that no storage of intermediate results is needed. 3 Hyper Systolic Computation The hyper systolic algorithm extends systolic one by introducing h, h p, copies of the data arrays and cyclical shifts with strides. The algorithm has been introduced in Refs. [14, 15] for the solution n body problems. A discussion of the construction of hyper systolic bases can be found in Ref. 17] In Ref. 16] the symmetric hyper systolic approach has been applied to general data structures, such as matrices. In this paper, we propose a variant of the hyper systolic ....

[Article contains additional citation context not shown here]

Th. Lippert, A. Seyfried, A. Bode, K. Schilling. `Hyper-Systolic Parallel Computing ', preprint HEP-LAT 9507021, WUB 95-13, HLRZ 32/95.


Hyper-Systolic Routing for SIMD Systems - Hoferichter, Lippert, Schilling, ..   Self-citation (Lippert Schilling)   (Correct)

....requires a number of interprocessor communication operations of O(p 1 2 ) per processor (in terms of nextneighbour data hoppings) to perform a complete permutation of data between the processors, with p being the number of processors. The routing procedure is based on the hypersystolic algorithm [4, 5, 6, 7]. In order to exploit hyper systolic communication, an abstract 1 dimensional ring structure [8] is mapped on the physical machine s geometry. Along this ring the data is communicated to (i) next neighbors in the ring or (ii) between processors at a distance of k elements along the ring. The ....

Th. Lippert, A. Seyfried, A. Bode, K. Schilling. Hyper-Systolic Parallel Computing. Preprint Server HEP-LAT 9507021, Preprint WUB 95-13, HLRZ 32/95, submitted to IEEE Trans. of Parallel and Distributed Systems.


Hyper-Systolic Algorithms for N-Body Computations and Parallel.. - Lippert (1998)   Self-citation (Lippert)   (Correct)

....law. The practical implementation of the symmetric hyper systolic n 2 loop computation for a 1 array system on the ring requires a mapping of the array of n elements onto the ring of p processors with n AE p. We employ the hierarchy mapping of the systolic array onto systolic sub arrays [14]. The hierarchy mapping exploits the fact that the computational problem Eq. 1) can be split into n p partial systolic computations y (l Gamma1)p i = n p X m=1 p X j=1 f(x (l Gamma1)p i ; x (m Gamma1)p j ) l = 1; 2; 3; n p ; i = 1; 2; 3; p; with (l Gamma 1)p i 6= ....

....matrix matrix product is of O(n 2 p 1 2 ) with n being the matrix dimension) and thus, it is comparable to best parallel standard methods. It can be shown that the memory requirements of the hyper systolic matrix product are not larger than the requirements of Cannon s algorithm [14]. Disadvantages of Cannon s algorithm, a standard method for distributed BLAS3, are pre skewing and the fact that if we want to use the optimal data layout for matrix matrix multiplication, for matrix vector multiplication, one to all broadcast operations are required and the distribution of the ....

Th. Lippert: `Hyper-Systolic Parallel Computing', Thesis, Rijksuniversiteit Groningen, May 1998.


Hyper-Systolic Algorithms for N-Body Computations and Parallel.. - Lippert (1998)   Self-citation (Lippert)   (Correct)

....discussed. Results from real implementations are presented. Keywords: systolic, hyper systolic, parallel computing, SIMD, HPC, N body problem, level 3 BLAS 1 Introduction In this contribution I want to give an introduction to hyper systolic algorithms which recently have been presented in Refs. [1,2]. This scheme defines a novel class of parallel computing structures which can be useful for a variety of parallel high performance computing (HPC) applications in science and engineering. Examples are n body computations in astrophysics, convolution for image and signal processing, fully coupled ....

....offer this possibility. A suitable formulation of the hyper systolic optimization problem is found in the language of Additive Number Theory. The requirement to minimize interprocessor communication leads to a so called h range problem defining the shift bases for the hyper systolic communication [1]. In general, the determination of optimal bases (with minimized interprocessor communication) is difficult. Bases are either produced by direct calculation (for small systems) or can be found by simulated annealing methods. A sub optimal solution is given by the so called regular base algorithm ....

[Article contains additional citation context not shown here]

Th. Lippert, A. Seyfried, A. Bode, K. Schilling: `Hyper-Systolic Parallel Computing', IEEE Trans. on Parallel and Distributed Systems, 9 (1998) 1.


Large Scale Simulations of Quantum Chromodynamics in a.. - Güsken, Lippert.. (1998)   Self-citation (Lippert Schilling)   (Correct)

.... power of parallel supercomputers and (ii) can be implemented on SIMD systems, like hydrodynamics or computational electrodynamics [18] Moreover, the effectivity of APE100 machines has been considerably enlarged by use of systolic and hyper systolic algorithms for linear algebra tasks and FFT [19 25]. The APE100 Quadrics parallel computer has been designed for quantum field theory, mainly as cheap compute engine in the field of lattice quantum chromodynamics [6, 7, 26, 27] Therefore, its endowments are reduced and it is optimized for efficient multiplications of 3 Theta 3 matrices and ....

Th. Lippert, A. Seyfried, A. Bode, K. Schilling: `Hyper-Systolic Parallel Computing ', IEEE Trans. on Parallel and Distributed Systems, 9 (1998) 1.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC