Results 1 - 10
of
34
Parallel matrix multiplication on a linear array with a reconfigurable pipelined bus system
- Proceedings of IPPS/SPDP ’99 (2nd Merged Symp. of 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
, 1999
"... The known fast sequential algorithms for multiplying two N N matrices (over an arbitrary ring) have time complexity O(N), where 2 < < 3. The current best value of is less than 2.3755. We show that for all 1 p N,multiplying two N N matrices can be performed on a p-processor linear array with a ..."
Abstract
-
Cited by 21 (6 self)
- Add to MetaCart
(Show Context)
The known fast sequential algorithms for multiplying two N N matrices (over an arbitrary ring) have time complexity O(N), where 2 < < 3. The current best value of is less than 2.3755. We show that for all 1 p N,multiplying two N N matrices can be performed on a p-processor linear array with a recon gurable pipelined bus system (LARPBS) in O N
Fast and processor efficient parallel matrix multiplication algorithms on a linear array with a reconfigurable pipelined bus system
- IEEE Trans. on Parallel and Distributed Systems
, 1998
"... Abstract—We present efficient parallel matrix multiplication algorithms for linear arrays with reconfigurable pipelined bus systems (LARPBS). Such systems are able to support a large volume of parallel communication of various patterns in constant time. An LARPBS can also be reconfigured into many i ..."
Abstract
-
Cited by 20 (9 self)
- Add to MetaCart
(Show Context)
Abstract—We present efficient parallel matrix multiplication algorithms for linear arrays with reconfigurable pipelined bus systems (LARPBS). Such systems are able to support a large volume of parallel communication of various patterns in constant time. An LARPBS can also be reconfigured into many independent subsystems and, thus, is able to support parallel implementations of divide-and-conquer computations like Strassen’s algorithm. The main contributions of the paper are as follows: We develop five matrix multiplication algorithms with varying degrees of parallelism on the LARPBS computing model, namely, MM1, MM2, MM3, and compound algorithms &1 (�) and &2 (δ). Algorithm &1 (�) has adjustable time complexity in sublinear level. Algorithm &2 (δ) implies that it is feasible to achieve sublogarithmic time using o(N 3) processors for matrix multiplication on a realistic system. Algorithms MM3, &1 (�), and &2 (δ) all have o(N 3) cost and, hence, are very processor efficient. Algorithms MM1, MM3, and &1 (�) are general-purpose matrix multiplication algorithms, where the array elements are in any ring. Algorithms MM2 and &2 (δ) are applicable to array elements that are integers of bounded magnitude, or floating-point values of bounded precision and magnitude, or Boolean values. Extension of algorithms MM2 and &2 (δ) to unbounded integers and reals are also discussed.
Efficient Deterministic and Probabilistic Simulations of PRAMs on Linear Arrays with Reconfigurable Pipelined Bus Systems
- Journal of Supercomputing
, 2000
"... . In this paper, we present deterministic and probabilistic methods for simulating PRAM computations on linear arrays with reconfigurable pipelined bus systems (LARPBS). The following results are established in this paper. (1) Each step of a p-processor PRAM with m = O#p# shared memory cells can b ..."
Abstract
-
Cited by 16 (11 self)
- Add to MetaCart
. In this paper, we present deterministic and probabilistic methods for simulating PRAM computations on linear arrays with reconfigurable pipelined bus systems (LARPBS). The following results are established in this paper. (1) Each step of a p-processor PRAM with m = O#p# shared memory cells can be simulated by a p-processors LARPBS in O#log p# time, where the constant in the big-O notation is small. (2) Each step of a p-processor PRAM with m = ##p# shared memory cells can be simulated by a p-processors LARPBS in O#log m# time. (3) Each step of a p-processor PRAM can be simulated by a p-processor LARPBS in O#log p# time with probability larger than 1 - 1/p c for all c>0. (4) As an interesting byproduct, we show that a p-processor LARPBS can sort p items in O#log p# time, with a small constant hidden in the big-O notation. Our results indicate that an LARPBS can simulate a PRAM very efficiently. Keywords: Concurrent read, concurrent write, deterministic simulation, linear array...
Integer Sorting and routing in arrays with reconfigurable optical buses
- Proceedings of International Conference of Parallel Processing
, 1996
"... ..."
Scalable Parallel Matrix Multiplication on Distributed Memory Parallel Computers
- Journal of Parallel and Distributed Computing
"... Consider any known sequential algorithm for matrix multiplication over an arbitrary ring with time complexity O(N), where 2 < 3. We show that such an algorithm can be parallelized on a distributed memory parallel computer (DMPC) in O(log N) time by using N = log N processors. Such a parallel comp ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
(Show Context)
Consider any known sequential algorithm for matrix multiplication over an arbitrary ring with time complexity O(N), where 2 < 3. We show that such an algorithm can be parallelized on a distributed memory parallel computer (DMPC) in O(log N) time by using N = log N processors. Such a parallel computation is cost optimal and matches the performance of PRAM. Furthermore, our parallelization on a DMPC can be made fully scalable, that is, for all 1 p N = log N, multiplying two N N matrices can be performed by a DMPC with p processors in O(N =p) time, i.e., linear speedup and cost optimality can be achieved in the range [1::N = log N]. This unifies all known algorithms for matrix multiplication on DMPC, standard or non-standard, sequential or parallel. Extensions of our methods and results to other parallel systems are also presented. The above claims result in significant progress in scalable parallel matrix multiplication (as well as solving many other important problems) on distributed memory systems, both theoretically and practically. 1
Efficient Parallel Algorithms for Distance Maps of 2-D Binary Images Using an Optical Bus
- Model of LPB and LARPBS [11] Segment Switches on an LARPBS [11] 5. Model of LARPBS with Switch Connections [12] 6. Model of LAROB [1] Model of AROB [6] (a) Two-Dimensional Reconfigurable Network (b) Switch Configurations 8. Model of
, 2002
"... Computing a distance map (distance transform) is an operation that converts a two-dimensional (2-D) image consisting of black and white pixels to an image where each pixel has a value or a pair of coordinates that represents the distance to or location of the nearest black pixel. It is a basic opera ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
Computing a distance map (distance transform) is an operation that converts a two-dimensional (2-D) image consisting of black and white pixels to an image where each pixel has a value or a pair of coordinates that represents the distance to or location of the nearest black pixel. It is a basic operation in image processing and computer vision fields, and is used for expanding, shrinking, thinning, segmentation, clustering, computing shape, object reconstruction, etc. This paper examines the possibility of implementing the problem of finding a distance map for an image efficiently using an optical bus. The computational model considered is the linear array with a reconfigurable pipelined bus system (LARPBS), which has been introduced recently based on current electronic and optical technologies. It is shown that the problem for an image can be implemented in (log log log ) bus cycles deterministically or in (log ) bus cycles with high probability on an LARPBS with processors. By high probability, we mean a probability of (1 ) for any constant 1. We also show that the problem can be solved in (log log ) bus cycles deterministically or in (1) bus cycles with high probability on an LARPBS with 3 processors. Scalability of the algorithms is also discussed briefly. The same problem can be solved using an LARPBS of processors in (( ) log log log ) time deterministically or in (( ) log ) time with high probability for any practical machine size of . For processor arrays with practical sizes, a bus cycle is roughly the time of an arithmetic operation. Hence, the algorithm compares favorably to the best known parallel algorithms for the same problem in the literature.
A Many-Core Implementation Based on the Reconfigurable Mesh Model
- in Proc. Int. Conf. on Field Programmable Logic and Applications, FPL
"... The reconfigurable mesh is a model for massively parallel computing for which many algorithms with very low com-plexity have been developed. These algorithms execute cy-cles of bus configuration, communication, and constant-time computation on all processing elements in a lock-step. In this paper, w ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
(Show Context)
The reconfigurable mesh is a model for massively parallel computing for which many algorithms with very low com-plexity have been developed. These algorithms execute cy-cles of bus configuration, communication, and constant-time computation on all processing elements in a lock-step. In this paper, we investigate the use of reconfigurable meshes as coprocessors to accelerate important algorithmic kernels. We discuss the development of a reconfigurable mesh on FPGA technology, including the host integration and the programming tool flow. Then, we present imple-mentation results and a proof-of-concept case study. 1.
Optimally Scaling Permutation Routing on Reconfigurable Linear Arrays with Optical Buses
, 1999
"... We present an optimal and scalable permutation routing algorithm for three reconfigurable models based on linear arrays that allow pipelining of information through an optical bus. Specifically, for any P ≤ N, our algorithm routes any permutation of N elements on a P-processor model optimally in O � ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
We present an optimal and scalable permutation routing algorithm for three reconfigurable models based on linear arrays that allow pipelining of information through an optical bus. Specifically, for any P ≤ N, our algorithm routes any permutation of N elements on a P-processor model optimally in O � � N P steps. This algorithm extends naturally to one for routing h-relations optimally in O(h) steps. We also establish the equivalence of the three models, LARPBS, LPB, and POB. This implies an automatic translation of algorithms (without loss of speed or efficiency) among these models.
Sublogarithmic Deterministic Selection on Arrays with a Reconfigurable Bus
- IEEE Trans. Computers
, 2002
"... ..."
(Show Context)
An Improved Randomized Selection Algorithm With an Experimental Study
- In Proc. The 2nd Workshop on Algorithm Engineering and Experiments (ALENEX00
, 2000
"... This paper presents an efficient randomized high-level parallel algorithm for finding the median given a set of elements distributed across a parallel machine. In fact, our algorithm solves the general selection problem that requires the determination of the element of rank k, for an arbitrarily giv ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
This paper presents an efficient randomized high-level parallel algorithm for finding the median given a set of elements distributed across a parallel machine. In fact, our algorithm solves the general selection problem that requires the determination of the element of rank k, for an arbitrarily given integer k. Our general...