| P. E. Crandall and M. J. Quinn. Block data decomposition for data-parallel programming on a heterogeneous workstation network. In HPDC, pages 42-49. IEEE CS Press, 1993. |
....kernels on heterogeneous platforms. Matrix multiplication has been studied by [23, 2] LU and QR decomposition have been discussed by Barbosa et al. 1] Static partitioning schemes to map a two dimensional data matrix onto heterogeneous resources have been investigated by Crandall and Quinn [14], Kaddoura, Ranka and Wang [22] and Beaumont et al. 3] The main conclusions of these papers are drawn for three kinds of problems: Distributing independent chunks of work to uni dimensional (linear) arrays of heterogeneous processors is easy (see the algorithm in [2] Distributing ....
.... the best distribution of work for each processor arrangement along the two dimensional grid, and there is an exponential number of such arrangements as the grid size increases (see [1, 2] Relaxing the geometrical constraints induced by two dimensional grids leads to irregular partitionings [14, 22, 3] that allow for a good load balancing but are much more di#cult to implement 5 10 15 20 25 30 Greedy heuristic Greedy heuristic max min fairness Greedy heuristic quadratic resolution of the size of the solution ring, with a high communication to computation ratio: H W = 10. 10 ....
P. E. Crandall and M. J. Quinn. Block data decomposition for data-parallel programming on a heterogeneous workstation network. In 2nd International Symposium on High Performance Distributed Computing, pages 42--49. IEEE Computer Society Press, 1993.
....etc) when applied hierarchically for clusters of subclusters. Some approaches to minimizing execution time are purely static. Algorithms for statically partitioning a large rectangular iteration space so as to load balance the work of di erent speed processors are given in Crandall and Quinn [14] and Kaddoura, Ranka and Wang [22] Proposals for the extension of the ScaLAPACK library to heterogeneous processors are given by Barbosa, Tavares and Padilha [3] Kalinov and Lastovetky [23] and Beaumont et al. [4, 5] All these papers target a speci c class of algorithms where processors are ....
P.E. Crandall and M.J. Quinn. Block data decomposition for data-parallel programming on a heterogeneous workstation network. In 2nd International Symposium on High Performance Distributed Computing, pages 4249. IEEE Computer Society Press, 1993.
....In [4] a more complicated recursive heuristic for PERI SUM has been considered, for which a nice approximation bound is provided. Complexity results for all problems are summarized in Table 1. Although several partitioning algorithms have been proposed in the literature by Crandall and Quinn [8], Kalinov and Lastovetky [12] and Kaddoura, Ranka and Wang [11] the complexity results stated in Table 1 are the rst available (to the best of our knowledge) 1D 2D Column based Recursive Unconstrained P Polynomial [3] NP hard. Guaranteed heuristic with 5=4 bound. 4] max Polynomial ....
P.E. Crandall and M.J. Quinn. Block data decomposition for data-parallel programming on a heterogeneous workstation network. In 2nd International Symposium on High Performance Distributed Computing, pages 42-49. IEEE Computer Society Press, 1993. 13
....A global synchronization is required at the end of each iteration to check the convergence criterion. The grid data is divided among the component processes of the application using a data partitioning algorithm called Recursive Bisection. This algorithm was proposed by Crandall and Quinn [5] for decomposing a two dimensional grid for heterogeneous computing environments such as clusters of workstations. This algorithm recursively divides the grid into rectangles such that the number of grid points in the rectangle 1 Note that on workstation clusters (as opposed to dedicated ....
P. Crandall and M. Quinn. Block Data Decomposition for Data-Parallel Programming on a Heterogeneous Workstation Network . In Proceedings of the 2nd Intl. Symp. on High-Performance Distributed Computing, pages 42--59, July 1993.
....the problem of static partitioning of an irregular grid into partitions that reduce the volume of interprocessor communication. Crandall and Quinn describe a number of partitioning strategies for uniform and non uniform grid problems for (possibly heterogeneous) workstation cluster environments [11, 12, 13, 15]. The Binary Recursive Block Decomposition (BRBD) scheme [11] partitions a (possibly non uniform) 2 D grid using orthogonal bisections recursively. Kaddoura et al. describe two array decomposition schemes for heterogeneous environments, called XY and Tile, that may give superior performance ....
....that reduce the volume of interprocessor communication. Crandall and Quinn describe a number of partitioning strategies for uniform and non uniform grid problems for (possibly heterogeneous) workstation cluster environments [11, 12, 13, 15] The Binary Recursive Block Decomposition (BRBD) scheme [11] partitions a (possibly non uniform) 2 D grid using orthogonal bisections recursively. Kaddoura et al. describe two array decomposition schemes for heterogeneous environments, called XY and Tile, that may give superior performance compared to BRBD [18] The Fair Binary Recursive Decomposition ....
Crandall,P., and Quinn,M. (1993). Block Data Decomposition for Data Parallel Programming on a Heterogeneous Workstation Network, In Proceedings of the 2nd International symposium on High Performance Distributed Computing, Los Alamitos, pages 42-49.
....versus 0:5. The total cost is C = 6, to be compared to the cost C = 8 of a partitioning into 8 horizontal slices. 0.30 0.20 0.12 0.10 0.10 0.05 0.05 0.08 Figure 11: Illustrating contiguous block allocation. The cost is as high as C = 8:2. Another approach is proposed by Crandall and Quinn [15]. First they compare a contiguous block allocation (see Figure 11) to horizontal slicing; next they introduce a better processor arrangement: they introduce a recursive algorithm to tile the iteration space (i.e. partitioning the unit square) into p rectangles of prescribed area s 1 ; s 2 ; ....
P.E. Crandall and M.J. Quinn. Block data decomposition for data-parallel programming on a heterogeneous workstation network. In 2nd International Symposium on High Performance Distributed Computing, pages 42-49. IEEE Computer Society Press, 1993.
....problems with varying resource demands have been studied by Nicol and Saltz [13] Nicol [12] has described the decomposition of unstructured grids into perfectly rectilinear partitions. There has been very little work done on decomposition of arrays for nonuniform machines. Crandall and Quinn [10] have presented an algorithm for decomposing a two dimensional data array for a heterogeneous cluster of workstations. This algorithm uses an orthogonal recursive bisection to perform the decomposition. Snyder and Socha [15] have developed a polynomial time algorithm for allocating an I Theta J ....
....pt(p) This happens when partitioning of vertical subarrays continues along one dimension and the algorithms behaves as an XY algorithm. The worst case complexity of the algorithm is O(2 p Gamma1 ) 6. 3 Recursive Bisection algorithm A simple algorithm was proposed by Crandall and Quinn [10] for decomposing a two dimensional array for a NUCE (Figure 15) We shall refer to it as RecursiveBisection. RecursiveBisection partitions the current array according to the first half of the partitions available. Two simple variations are: 1. RecursiveBisection switches the dimension along which ....
Phyllis E. Crandall and Michael J. Quinn. "Block Data Decomposition for Data-Parallel Programming on a Heterogeneous Workstation Network," Proceedings of the Second International Symposium on High-Performance Distributed Ccomputing, pp. 42--49, July 1993.
....efficiency, the problem space must be decomposed in a manner that reduces the time that the faster processors remain idle while waiting for slower processors to complete their computations. Earlier work has addressed two dimensional data partitioning in a heterogeneous workstation network [4, 5, 9]. This paper extends previous results by describing methods for partitioning a three dimensional problem space. We describe, mathematically characterize, and compare four data partitioning methods suitable for a three dimensional problem space: Contiguous Plane, Contiguous Point, Cube Scatter, ....
....Plane partitioning. Locating an arbitrary point is only sightly more difficult. A list may be maintained that records for each physical processor its minimum and maximum i; j; k values. By minimum, we mean the least i associated with the least j that is associated with the least k (i.e. [5][8] 12] is less than P0 P2 P1 Figure 6: An example of Contiguous Point decomposition with three heterogeneous processors. 1] 2] 13] and [1] 9] 12] Maximum is defined similarly. Again, binary search may be used, at a time cost of O(logp) Because of our assumption that each processor is ....
[Article contains additional citation context not shown here]
Phyllis E. Crandall and Michael J. Quinn. Block data decomposition for data-parallel programming on a heterogeneous workstation network. In Proceedings of the Second International Symposium on High-Performance Distributed Computing, 1993.
....therefore the number of bytes that must be communicated, are minimized. This technique is applicable to both the distribution of non uniform problem spaces across homogeneous processors [3, 9, 12, 16, 17] and to the partitioning of uniform or non uniform problems across heterogeneous processors [5, 6, 7, 10]. The base assumption in pursuing block partitioning methods had been that transmission time was the dominating factor of communication cost. With the advent of faster networks and high speed switching technology, this assumption has become outmoded. As transmission speeds improve, the importance ....
....Socha [15] offer an algorithm that creates near rectangular and near bulky segments. Reed et al. present a discussion of communication patterns and partitioning methods in [14] The exploration of block partitioning on clusters has only recently begun. Cheung and Reeves [5] Crandall and Quinn [6], and Kaddoura et al. 10] propose decomposition methods that address nonuniformity in the processors. Crandall and Quinn [7] describe a scheme that is adaptable to non uniformity in the problem space as well. The preponderance of previous work in decomposition in the networked environment assumed ....
Phyllis E. Crandall and Michael J. Quinn. Block data decomposition for data-parallel programming on a heterogeneous workstation network. In Proceedings of the Second International Symposium on High Performance Distributed Computing, pages 42--49, July 1993.
....of processors, network speed, and communication pattern. Bounds are established on the costs of these partitioning schemes. These may be used to make better decomposition decisions. Data partitioning in multiprocessors and distributed systems has received substantial treatment in the literature [1, 2, 4, 5, 9, 10, 12, 14, 15], but only recently has decomposition in the heterogeneous environment been explored. A number of systems for parallel computing in the heterogeneous workstation cluster have recently become available. PVM [8] the p4 [3] and Dataparallel C [11] are examples of these The increasing interest in ....
.... fidn p p) 8) For NSEW patterns the number is 4p and the cost is 4(ff fidn p p) 9) Usually the processors are not homogeneous and often p p does not evenly divide n. Two methods proposed for partitioning the problem space for heterogeneous processors are binary recursive partitioning [5] (Figure 2(b) and general quadrilateral decomposition [10] Figure 2(c) but these schemes often result in a larger number of communications than the other partitionings presented here. Crandall and Quinn [6] have shown that, in the worst case, binary recursive partitioning incurs 6p Gamma 4 ....
Phyllis E. Crandall and Michael J. Quinn. Block data decomposition for data-parallel programming on a heterogeneous workstation network. In Proceedings of the Second International Symposium on High-Performance Distributed Computing, 1993.
....speeds is provided to the system through a data file that contains the identity and estimated performance level of each machine. There is also provision for data migration to respond to the changing workload on the multi user participating machines. Cheung and Reeves [13] and Crandall and Quinn [14, 16] also examine block partitioning across a workstation cluster. Communication costs in the heterogeneous networked environment are mathematically characterized in Crandall and Quinn [15] The mathematical models upon which this investigation is based are extensions of the work described in [15] A ....
....and their number divides the problem space into p squares of equal size, block decomposition generates 4p communications for each 5 point stencil pattern. If the processors are heterogeneous and binary recursive block decomposition is used, a maximum of 6p Gamma 4 communications may be necessary [14, 15]. While this is greater than contiguous row or point, the number of data items that must be exchanged is reduced. A binary recursive block partitioning for 5 heterogeneous processors is shown in Figure 5(c) Table 1 gives the worst case number of communications required by each partitioning method ....
[Article contains additional citation context not shown here]
P. E. Crandall and M. J. Quinn. Block data decomposition for data-parallel programming on a heterogeneous workstation network. In Proceedings of the Second International Symposium on High Performance Distributed Computing, pages 42--49, July 1993.
....and Bokhari, Crockett, and Nicol [5] have recently proposed a parametric binary dissection method. Snyder and Socha propose an algorithm for a static mapping of regular data parallel problems that creates nearrectangular and near bulky segments [13] Cheung and Reeves [6] and Crandall and Quinn [7, 9] examine block partitioning across a workstation cluster. Communication costs in the heterogeneous networked environment are mathematically characterized in Crandall and Quinn [8] The mathematical models upon which (a) b) Figure 1: Two common communication patterns for grid problems. a) ....
....to be divided into p rectangles of equal size, produces 4p communications, but requires fewer data items to be transmitted than contiguous row partitioning. If the processors are heterogeneous and binary recursive block decomposition is used, a maximum of 6p Gamma 4 communications are necessary [7, 8]. While this is the same as for contiguous point, the number of data items that must be exchanged may be greatly reduced. An example of binary recursive block partitioning with five processors with relative speeds (6,5,4,3,2) is given in Figure 4. Table 1 gives the worst case number of ....
[Article contains additional citation context not shown here]
Phyllis E. Crandall and Michael J. Quinn. Block data decomposition for data-parallel programming on a heterogeneous workstation network. In Proceedings of the Second International Symposium on High Performance Distributed Computing, pages 42--49, July 1993.
....workload, and Bokhari, et al. 19] have recently proposed a parametric binary dissection method. Snyder and Socha propose an algorithm for a static mapping of regular data parallel problems that creates near rectangular and near bulky segments [20] Cheung and Reeves [21] and Crandall and Quinn [22, 23] examine block partitioning across a workstation cluster. Communication costs in the heterogeneous networked environment are mathematically characterized in Crandall and Quinn [24] The mathematical models upon which this investigation is based are extensions of the work described in [24] A ....
....boundaries of the partition are distorted to enable a more fine grained load balance among the heterogeneous processors. This, however, may result in an increase in the number of necessary off processor messages for some communication patterns. Figure 5(c) shows a binary recursive decomposition [22, 24] which partitions the problem space in the presence of heterogeneous processors according to their individual capabilities while maintaining rectangularity, and thereby generating fewer communications than the general quadrilateral method. Since certain communication patterns, such as the 5 point ....
[Article contains additional citation context not shown here]
P. E. Crandall and M. J. Quinn, "Block data decomposition for data-parallel programming on a heterogeneous workstation network," in Proceedings of the Second International Symposium on HighPerformance Distributed Computing, pp. 42--49, July 1993.
....homogeneous processors [2, 3, 17, 20] The grid points in these problems do not have equal computational requirements, or the grid is sparse. Decomposition reduces to the problem of assigning equal computational effort to each processor. Block decomposition has specifically been addressed by [2, 3, 8, 9, 15, 18], and data decomposition across heterogeneous processors has been discussed in [4, 10, 16] A number of systems to facilitate parallel processing on workstation clusters have recently appeared. PVM (Parallel Virtual Machine) which is freely available, supports parallel computing across a wide ....
....block decomposition with 9 processors. has increased from 2p to 4p. Clearly, this is advantageous only when the savings in bandwidth requirements outweighs the additional cost in message preparation latency. The necessary conditions for this partitioning to be worthwhile are described in [9]. In the case we consider here, the processors are not completely homogeneous. The block decomposition method must be altered to accommodate heterogeneity. The general quadrilateral method described in [15] starts with a regularly partitioned grid. The boundaries are then distorted to achieve load ....
[Article contains additional citation context not shown here]
Phyllis E. Crandall and Michael J. Quinn. Block data decomposition for data-parallel programming on a heterogeneous workstation network. In Proceedings of the Second International Symposium on High-Performance Distributed Computing, 1993.
No context found.
P. E. Crandall and M. J. Quinn. Block data decomposition for data-parallel programming on a heterogeneous workstation network. In HPDC, pages 42-49. IEEE CS Press, 1993.
No context found.
P. E. Crandall and M. J. Quinn. Block data decomposition for data-parallel programming on a heterogeneous workstation network. In 2nd International Symposium on High Performance Distributed Computing, pages 42--49. IEEE Computer Society Press, 1993.
No context found.
P.E. Crandall and M.J. Quinn, "Block Data Decomposition for Data-Parallel Programming on a Heterogeneous Workstation Network," Proc. Second Int'l Symp. High Performance Distributed Computing, pp. 42-49, 1993.
No context found.
P. E. Crandall and M. J. Quinn. Block data decomposition for data-parallel programming on a heterogeneous workstation network. In 2nd International Symposium on High Performance Distributed Computing, pages 42--49. IEEE Computer Society Press, 1993.
No context found.
P. E. Crandall and M. J. Quinn. Block data decomposition for data-parallel programming on a heterogeneous workstation network. In 2nd International Symposium on High Performance Distributed Computing, pages 42--49. IEEE Computer Society Press, 1993.
No context found.
P. E. Crandall and M. J. Quinn. Block data decomposition for data-parallel programming on a heterogeneous workstation network. In 2nd International Symposium on High Performance Distributed Computing, pages 42--49. IEEE Computer Society Press, 1993.
No context found.
P. E. Crandall and M. J. Quinn. Block data decomposition for data-parallel programming on a heterogeneous workstation network. In 2nd International Symposium on High Performance Distributed Computing, pages 42--49. IEEE Computer Society Press, 1993.
No context found.
P. E. Crandall and M. J. Quinn. Block data decomposition for data-parallel programming on a heterogeneous workstation network. In 2nd International Symposium on High Performance Distributed Computing, pages 42--49. IEEE Computer Society Press, 1993.
No context found.
Phyllis E. Crandall and Michael J. Quinn. Block data decomposition for data-parallel programming on a heterogeneous workstation network. In Proc. of HPDC, pages 42--49, 1993.
No context found.
P. E. Crandall and M. J. Quinn. Block data decomposition for data-parallel programming on a heterogeneous workstation network. In 2nd International Symposium on High Performance Distributed Computing, pages 42--49. IEEE Computer Society Press, 1993.
No context found.
P.E. Crandall and M.J. Quinn. Block data decomposition for data-parallel programming on a heterogeneous workstation network. In 2nd International Symposium on High Performance Distributed Computing, pages 4249. IEEE Computer Society Press, 1993.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC