| I. Banicescu. Load Balancing and Data Locality in the Parallelization of the Fast Multipole Algorithm Parallelism. PhD thesis, Polytechnic University, 1996. |
....means that those processors which finish early must remain idle and wait for those processors which finish later. By efficiently moving work from busy to idle processors, an effective utilization of resources that translates into shorter application execution times could be realized. Fractiling [1][14] a dynamic scheduling technique based on a probabilistic analysis, balances loads by handling both predictable and unpredictable events. It draws from earlier loop scheduling techniques that dynamically schedule iterates in decreasing size chunks and has been proven to significantly improve ....
....which handles processor load imbalances caused by both predictable events, such as uneven work distribution, and unpredictable events, such as systemic interference. It is based on a probabilistic analysis and maintains locality by exploiting the self similarity properties of fractals [15][1]. Fractiling has been proven to consistently improve the performance of N body simulation applications [1] 2] 3] In Fractiling, the work is repeatedly subdivided into decreasing size chunks. The initial larger chunks incur little overhead while the processor finishing times are balanced by the ....
[Article contains additional citation context not shown here]
I. Banicescu, Load Balancing and Data Locality in the Parallelization of the Fast Multipole Algorithm, Ph.D. thesis, Department of Computer Science, Polytechnic University, 1996.
....Leathrum, Elliot, Lambert, Rankin, and Board 1995) one of the most efficient hierarchical algorithms used for N body simulations, and two other options that incorporate load balancing techniques: LB (J. A. Board et al. 1992; J. A. Board et al. 1995) and Fractiling (Banicescu and Hummel 1995; Banicescu 1996; Banicescu and Lu 1998) LB is a load balancing option based on the total number of interactions (using the Costzones method (J. Singh et al. 1993) Fractiling is a dynamic scheduling technique, based on a probabilistic analysis, that simultaneously balances processor loads and maintains ....
....on a range of number of processors and problems sizes using data sets which consisted of uniform and nonuniform distributions of particles. The experiments were conducted on a distributed memory shared address space environment (KSR 1 at the Cornell Theory Center) Banicescu and Hummel 1995; Banicescu 1996) and distributed memory message passing environments (IBM SP2 at the Maui High Performance Computing Center and SuperMSPARC at the NSF Engineering Research Center for Computational Field Simulation) Banicescu and Lu 1998; Lu 1997) The results of these experiments were evaluated in terms of ....
[Article contains additional citation context not shown here]
Banicescu, I. (1996, January). Load Balancing and Data Locality in the Parallelization of the Fast Multipole Algorithm.
....proven to be effective in scientific applications. It is implemented within individual applications and balances processor loads by migrating the application s data from busy to idle tasks. Applications such as N body simulations have successfully employed Fractiling to improve their performance (Banicescu 1996)(Banicescu and Lu 1998) Dynamic coarse grained load balancing is achieved by moving tasks from heavily to lightly loaded processors. In general, this approach is application independent. It is implemented at the operating system level, relieving the application programmer from this ....
.... (ORB) the Costzones method, and a hashed octtree (HOT) method which employs the Morton order (Singh, Holt, Totsuka, et al. 1993) Warren and Salmon 1993) Some experimentation with new scheduling schemes applied to scientific problems have been presented in (Hummel, Schonberg, and Flynn 1992)(Banicescu 1996)(Banicescu and Lu 1998) Lu 1997) These schemes combine static techniques that exploit data locality with dynamic techniques that improve load balancing. In these schemes, work units and their associated data are initially placed on the same processor. Each processor executes its units in ....
[Article contains additional citation context not shown here]
Banicescu, I. (1996, January). Load Balancing and Data Locality in the Parallelization of the Fast Multipole Algorithm.
.... which can provide services such as task migration and can exploit detailed runtime performance information that is automatical ly gathered [19] The other is the recently developed data parallel technique which demonstrated the effectiveness of balancing computational load dynamically [4]. Several modifications were needed to exploit these data parallel dynamic load balancing techniques inside the task parallel runtime environment. Some of these modifications are centered around support for fault tolerance. After a review of existing fault tolerant task parallel runtime ....
.... [24] 25] 22] 9] Some of these techniques include the Orthogonal Recursive Bisection (ORB) the Costzones method, and a Hashed Oct Tree (HOT) method which employs the Morton order [24] 25] Some experimentation with new scheduling schemes applied to scientific problems have been presented in [15][4][6] These schemes combine static techniques that exploit data locality 8 with dynamic techniques that improve load balancing. In these schemes, work units and their associ ated data are initially placed on the same processor. Each processor executes its units in decreasing size chunks to ....
[Article contains additional citation context not shown here]
I. Banicescu, Load Balancing and Data Locality in the Parallelization of the Fast Multipole Algorithm, Ph.D. Thesis, Department of Computer Science, Polytechnic University, 1996.
....property of fractals. Previous work on load balancing N body simulations with Fractiling was applied to the parallel implementation of the Greengard s 3 d Fast Multipole Algorithm and performed on a distributed memory shared address space environment KSR 1 at the Cornell Theory Center (Banicescu 1996). This paper attempts to experimentally extend the validity and test the benefits of this technique in message passing environment, on a SuperMSPARC at the NSF Engineering Research Center for Computational Field Simulation and on a IBM SP2. Our approach to load balancing N body simulations in this ....
....due to load imbalance (Grama, Kumar, and Sameh 1994) With random assignment, the load imbalances of individual subtiles mute each other out to some extent. Some experimentation with new scheduling schemes applied to scientific problems have been presented in (Hummel, Schonberg, and Flynn 1992; Banicescu 1996). These schemes combine static techniques that exploit data locality with dynamic techniques that improve load balancing. In these schemes, work units and their associated data are initially placed on the same processor. Each processor executes its units in decreasing size chunks to preserve load ....
Banicescu, I. (1996, January). Load Balancing and Data Locality in the Parallelization of the Fast Multipole Algorithm. Ph. D. thesis, Polytechnic University.
....in better resource utilization. Hector, a parallel runtime environment developed at Mississippi State University performs coarse grain load balancing by migrating tasks [17] Fractiling is a fine grain technique that achieves load balancing by migrating data from heavily loaded to idle tasks [2]. It has been successfully applied to N body simulations. In this paper, we discuss the combination of these two strategies, Hector and fractiling, to achieve better load balancing. The combined system, resulting in better resource utilization, is proposed to be extended to the world wide ....
....considered to improve the performance of N body simulations due to load imbalance [8] With random assignment, the load imbalances of individual subtiles mute each other out to some extent. Some experimentation with new scheduling schemes applied to scientific problems have been presented in [10][2][5] 12] These schemes combine static techniques that exploit data locality with dynamic techniques that improve load balancing. In these schemes, work units and their associated data are initially placed on the same processor. Each processor executes its units in decreasing size chunks to ....
[Article contains additional citation context not shown here]
I. Banicescu: "Load Balancing and Data Locality in the Parallelization of the Fast Multipole Algorithm".- PhD thesis, Polytechnic University, (1996)
....that adapts to algorithmic and systemic load imbalances while maximizing data locality. It draws from earlier loop scheduling techniques where iterates are dynamically scheduled in decreasing size chunks to reduce synchronization and has been successfully implemented in N body simulations [6][8] The early large chunks have relatively little overhead and their uneven finishing times are smoothed over by later smaller chunks. Fractiling uses a tiling technique to optimize chunk shapes such that data locality and reuse are maximized. 3 C. An Integrated Strategy Advances in runtime ....
....combined scheme, chunk sizes are determined globally according to a Factoring rule, while chunk shapes are determined locally according to a Tiling rule. The Fractiling method was developed in response to the shortcomings of other methods and has successfully been applied to N body simulations [6][7] 8] 19] It is based on a probabilistic analysis and therefore accommodates load imbalances caused by predictable events (such as irregular data) and unpredictable events (such as data access latency) Fractiling adapts to algorithmic and system induced load imbalances while maximizing data ....
[Article contains additional citation context not shown here]
I. Banicescu, Load Balancing and Data Locality in the Parallelization of the Fast Multipole Algorithm, Ph.D. Thesis, Department of Computer Science, Polytechnic University, 1996.
....tasks to balance the load. If the amount of data is too large, the resulting corrections will be too coarse. If the amount of data is too small, the process of exchanging data will occur much overhead. The Fractiling method was developed in response to the shortcomings of these other methods [2], 7] It draws on earlier schemes that schedule loop iterations in decreasing size chunks: the early larger chunks have relatively little overhead, while the later smaller chunks smooth over their unevenness [10] 8] 2.2: Current Work and Recent Results Fractiling simultaneously balances ....
I. Banicescu, Load Balancing and Data Locality in the Pa- rallelization of the Fast Multipole Algorithm, Ph.D. The- sis, Polytechnic University, Department of Computer Sci- ence, 1996.
....this new algorithm, the O(n 2 ) algorithm is found to have a q = Gamma1:2 and thus is less scalable than the newly discovered sequential algorithm that will have a q = Gamma1. 8. 3 The N body Problem This example applies the Gamma opt measurement methodology to results previously obtained in [1] which describes a dynamic load balancing algorithm applied to N body simulations. The Nbody simulation represents the time integration of N interacting bodies given specified initial positions and velocities. The naive N body algorithm has a complexity of O(n 2 ) per time step. Recently, O(n ....
....to analyze load balancing strategies using the Gamma opt metric. This example focuses on computing and comparing the effectiveness of implementing the parallel 3 d Fast Multipole Algorithm (PFMA) 4] and two of its recent options that incorporate load balancing techniques: LB and Fractiling [2] [1] applied to a range of number of processors and problem sizes. LB is a static load balancing option strictly based on the total number of interactions. Fractiling is a dynamic scheduling technique, based on a probabilistic analysis, that simultaneously balances processor loads and maintains ....
[Article contains additional citation context not shown here]
I. Banicescu. Load Balancing and Data Locality in the Parallelization of the Fast Multipole Algorithm. PhD thesis, Polytechnic University, 1996.
....this new algorithm, the O(n 2 ) algorithm is found to have a q = Gamma1:2 and thus is less scalable than the newly discovered sequential algorithm that will have a q = Gamma1. 9. 3 The N body Problem This example applies the Gamma opt measurement methodology to results previously obtained in [3] which describes a dynamic load balancing algorithm applied to N body simulations. The N body simulation represents the time integration of N interacting bodies given specified initial positions and velocities. The naive N body algorithm has a complexity of O(n 2 ) per time step. Recently, O(n ....
....to analyze load balancing strategies using the Gamma opt metric. This example focuses on computing and comparing the effectiveness of implementing the parallel 3 d Fast Multipole Algorithm (PFMA) 6] and two of its recent options that incorporate load balancing techniques: LB and Fractiling [4] [3] applied to a range of number of processors and problem sizes. LB is a static load balancing option strictly based on the total number of interactions. Fractiling is a dynamic scheduling technique, based on a probabilistic analysis, that simultaneously balances processor loads and maintains ....
[Article contains additional citation context not shown here]
I. Banicescu. Load Balancing and Data Locality in the Parallelization of the Fast Multipole Algorithm. PhD thesis, Polytechnic University, 1996.
No context found.
I. Banicescu. Load Balancing and Data Locality in the Parallelization of the Fast Multipole Algorithm Parallelism. PhD thesis, Polytechnic University, 1996.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC