| S. Ranka, R. Shankar and K. Alsabti, "Many-toMany Communication With Bounded Traffic," Proc. Symp. on the Frontiers of Massively Parallel Computation, 1995. |
....operation on coarse grained, message passing massively parallel processors (MPPs) In the standard broadcast operation, one processor broadcasts a message to every other processor. Various implementations of this operation for architectures with different machine characteristics have been proposed [5, 9, 12, 13, 14]. Another well studied broadcasting operation is the all to all broadcast in which every processor broadcasts a message to every other processor [3, 7, 8, 15] Let p be the number of processors. Assume that s of the p processors, which we call source processors, contain a message to be broadcast ....
S. Ranka, R. Shankar and K. Alsabti, "Many-toMany Communication With Bounded Traffic," Proc. Symp. on the Frontiers of Massively Parallel Computation, 1995.
....cost for load balancing assumes that there is no network congestion. This is a reasonable assumption for networks that are bandwidth rich as is the case with most commercial systems. Without assuming anything about network congestion, load balancing phase can be done using transportation primitive [SAR95] in time 2 # N P # t w time provided N P # O(P 2 ) Splitting is done when the accumulated cost of communication becomes equal to the cost of moving records around 1 If the message size is large, by routing message in parts, this communication step can be done in time : t s t w ....
R. Shankar, K. Alsabti, and S. Ranka. Many-to-many communication with bounded traffic. In Frontiers '95, the fifth symposium on advances in massively parallel computation, McLean, VA, February 1995.
....cost for load balancing assumes that there is no network congestion. This is a reasonable assumption for networks that are bandwidth rich as is the case with most commercial systems. Without assuming anything about network congestion, load balancing phase can be done using transportation primitive [22] in time 2 N P t w time provided N P O(P 2 ) Splitting is done when the accumulated cost of communication becomes equal to the cost of moving records around in the splitting phase [14] So splitting is done when: X (Communication Cost) Moving Cost Load Balancing This criterion for ....
R. Shankar, K. Alsabti, and S. Ranka. Many-to-many communication with bounded traffic. In Frontiers '95, the fifth symposium on advances in massively parallel computation, McLean, VA, February 1995.
....the communication. Our initial work in designing these algorithms appear in [13] That work was motivated by the need to perform irregular and data dependent communication operations in parallelizing intermediate and high level vision problems. Recently, algorithms have been proposed by Ranka [4] and by Hambrush [3] for communication operations. The algorithm in [4] reduces node contention and has been implemented onCM5 using active messages. In this algorithm, the number of message passing start ups is doubled; it is efficient when the traffic is large. The algorithms in [3] are ....
....in [13] That work was motivated by the need to perform irregular and data dependent communication operations in parallelizing intermediate and high level vision problems. Recently, algorithms have been proposed by Ranka [4] and by Hambrush [3] for communication operations. The algorithm in [4] reduces node contention and has been implemented onCM5 using active messages. In this algorithm, the number of message passing start ups is doubled; it is efficient when the traffic is large. The algorithms in [3] are motivated by mesh architecture and the messages are transmitted in nonblocking ....
[Article contains additional citation context not shown here]
S. Ranka, R. Shankar and K. Alsabti. "Many-to-Many Communication With Bounded Traffic," 1995 Symposium on Frontiers of Massively Parallel Computation.
....the resulting vector of size mp on one of the processors. Global Concatenate is the same as Gather except that the collected data is stored on each processor P i . Each processor sends a distinct message of size m to every processor P i in an All to all Communication. Transportation Primitive [9] performs many to many personalized communication with possibly high variance in message sizes in O( r p ) time, where r p 2 is the maximum of outgoing or incoming traffic at any processor. In Order Maintaining Data Movement each processor P i contains two integers s i and r i , and has s i ....
S. Ranka, R.V. Shankar and K.A. Alsabti, Many-tomany communication with bounded traffic, Proc. Frontiers of Massively Parallel Computation (1995).
....with possibly high variance in message size. Let r be the maximum of outgoing or incoming traffic at any processor The transportation primitive breaks down the communication into two allto all communication phases where all the messages sent by any particular processor have uniform message sizes [12]. If r p 2 , the running time of this operation is equal to two all to all communication operations with a maximum message size of O( r p ) 8. Order Maintaining Data Movement. Consider the following data movement problem, an abstraction of the data movement patterns that we repeatedly ....
S. Ranka, R.V. Shankar and K.A. Alsabti, Many-to-many communication with bounded traffic, Proc. Frontiers of Massively Parallel Computation (1995), to appear.
.... processor is bounded by t, the time taken for the communication is 2 t ( lower order terms) when t O(p 2 p = If the outgoing and incoming traffic bounds are r and c instead, the communication takes time 2 (r c) lower order terms) when either r O(p 2 p = or c O(p 2 p = [20]. 3 Parallel Algorithms for Selection Parallel algorithms for selection are also iterative and work by reducing the number of elements to be considered from iteration to iteration. The elements are distributed across processors and each iteration is performed in parallel by all the processors. ....
S. Ranka, R.V. Shankar and K.A. Alsabti, Many-to-many communication with bounded traffic, Proc. Frontiers of Massively Parallel Computation (1995), to appear.
....with possibly high variance in message size. Let r be the maximum of outgoing or incoming traffic at any processor The transportation primitive breaks down the communication into two all to all communication phases where all the messages sent by any particular processor have uniform message sizes [RSA95] If r p 2 , the running time of this operation is equal to two all to all communication operations with a maximum message size of O( r p ) 8. Order Maintaining Data Movement. Consider the following data movement problem, an abstraction of the data movement patterns that we encounter in ....
....generated received by a processor. The maximum number of messages sent out by a processor is d nmax navg e 1 and the maximum number of elements sent is n max . The maximum number of elements received by a processor is n avg . Therefore, the running time is O( nmax navg n max ) RSA95] The order maintaining load balance algorithm may generate much more communication than necessary. For example, consider the case where all processors have n avg elements CHAPTER 4. LOAD BALANCING ALGORITHMS 57 Algorithm 5 Modified order maintaining load balance n Number of total elements p ....
S. Ranka, R.V. Shankar, and K.A. Alsabti. Many-to-many communication with bounded traffic. In Proc. Frontiers of Massively Parallel Computation, 1995.
....with possibly high variance in message size. Let r be the maximum of outgoing or incoming traffic at any processor The transportation primitive breaks down the communication into two allto all communication phases where all the messages sent by any particular processor have uniform message sizes [10]. If r p 2 , the running time of this operation is equal to two all to all communication operations with a maximum message size of O( r p ) 8. Order Maintaining Data Movement. Consider the following data movement problem, an abstraction of the data movement patterns that we repeatedly ....
S. Ranka, R.V. Shankar and K.A. Alsabti, Many-to-many communication with bounded traffic, Proc. Frontiers of Massively Parallel Computation (1995), to appear.
....with possibly high variance in message size. Let r be the maximum of outgoing or incoming traffic at any processor The transportation primitive breaks down the communication into two allto all communication phases where all the messages sent by any particular processor have uniform message sizes [14]. If r p 2 , the running time of this operation is equal to two all to all communication operations with a maximum message size of O( r p ) 8. Order Maintaining Data Movement. Consider the following data movement problem, an abstraction of the data movement patterns that we encounter in ....
S. Ranka, R.V. Shankar and K.A. Alsabti, Many-to-many communication with bounded traffic, Proc. Frontiers of Massively Parallel Computation (1995).
.... processor is bounded by t, the time taken for the communication is 2 t ( lower order terms) when t O(p 2 p = If the outgoing and incoming traffic bounds are r and c instead, the communication takes time 2 (r c) lower order terms) when either r O(p 2 p = or c O(p 2 p = [20]. 3 Parallel Algorithms for Selection Parallel algorithms for selection are also iterative and work by reducing the number of elements to be considered from iteration to iteration. The elements are distributed across processors and each iteration is performed in parallel by all the processors. ....
S. Ranka, R.V. Shankar and K.A. Alsabti, Many-to-many communication with bounded traffic, Proc. Frontiers of Massively Parallel Computation (1995), to appear.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC