21 citations found. Retrieving documents...
Foster, I.T., Worley, P.H.: Parallel algorithms for the spectral transform method. SIAM J. Sci. Stat. Comput. 3 (1997) 806--837

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
An Optimal Index Reshuffle Algorithm for Multidimensional Arrays.. - Ding (2001)   (Correct)

....problem solution time. An example is shown in Figure 1. The 3D fields of an atmosphere (or ocean) model are mapped onto 8 processors, with horizontal dimensions split among the processors. In spectral transform based models, such as the CCM atmospheric model[l, 2] and the shallow water equation[3], one often needs to dynamically remap between the height local domain decomposition and the longitude local decomposition for tasks of distinct nature. In grid based atmosphere and ocean models, similar remappings are needed for polar filtering[4] and for data input output[5] An important ....

....arrays on each processor are viewed as an 1D array of blocks. This exchange involves all to all communications. Each processor sends P 1 blocks out, each to a different processor. Each processor also receives P 1 blocks, each from a different processor. The relevant code segment is well known [9, 10, 7, 3, 8]: All processors simultaneously do the following: do q = 1, P 1 send a message to destination processor destID receive a message from source processor srcID 16 There are two popular methods to determine destID and srcID. One method is to set destID = srcID = myID X0R q) here myID is ....

I. T. Foster and P. H. Worley. "Parallel algorithms for the spectral transform method," SIAM J. Sci. Star. Comput., v.18, pp. 806-837. 1997.


MPI and OpenMP Paradigms on Cluster of SMP Architectures: the.. - He, Ding (2002)   (1 citation)  (Correct)

....and reducing total problem solution time. For example, the 3D fields of an atmosphere (or ocean) model are mapped onto 8 processors, with horizontal dimensions split among the processors. In spectral transform based models, such as the CCM atmospheric model[8] and the shallow water equation[9], one often needs to dynamically remap between the height local domain decomposition and the longitude local decomposition for tasks of distinct nature. In grid based atmosphere and ocean models, similar remappings are needed for data input output[10] To transpose a multidimensional array , say ....

....of data blocks, each of size (G3) Do a local transpose on the local array viewed as viewed as . Local in place algorithm is used for steps (G1) and (G3) For step (G2) global exchange, the following well known all to all communication pattern[9, 12, 13] is used: All processors simultaneously do the following: do q = 1, P 1 send a message to destination processor destID (C.3) receive a message from source processor srcID end do Here we adopt destID = srcID = myID XOR q) where myID is the processor id, and XOR is the bit wise exclusive ....

I. T. Foster and P. H. Worley. "Parallel algorithms for the spectral transform method," SIAM J. Sci. Stat. Comput. , v.18, pp. 806-837. 1997.


An Optimal Index Reshuffle Algorithm for Multidimensional Arrays.. - Ding (1999)   (Correct)

....problem solution time. An example is shown in Figure 1. The 3D elds of an atmosphere (or ocean) model are mapped onto 8 processors, with horizontal dimensions split among the processors. In spectral transform based models, such as the CCM atmospheric model[1, 2] and the shallow water equation[3], one often needs to dynamically remap between the height local domain decomposition and the longitude local decomposition for tasks of distinct nature. In gridbased atmosphere and ocean models, similar remappings are needed for polar ltering[4] and for data input output[5] An important aspect ....

....3D arrays on each processor are viewed as an 1D array of blocks. This exchange involves all to all communications. Each processor sends P 1 blocks out, each to a di erent processor. Each processor also receives P 1 blocks, each from a di erent processor. The relevant code segment is well known [9, 10, 7, 3, 8]: All processors simultaneously do the following: do q = 1, P 1 send a message to destination processor destID (C.3) receive a message from source processor srcID end do 16 There are two popular methods to determine destID and srcID. One method is to set destID = srcID = myID XOR q) ....

I. T. Foster and P. H. Worley. \Parallel algorithms for the spectral transform method," SIAM J. Sci. Stat. Comput., v.18, pp. 806-837. 1997.


FOAM: Expanding the Horizons of Climate Modeling - Tobis, Schafer, al. (1997)   (1 citation)  (Correct)

....space. The spectral transform approach has useful properties from a numerical methods point of view (avoiding aliasing, accurate differentiation) at the cost of some complexity in sequential implementation. In a parallel implementation, however, it also introduces a need for global communication [8]. PCCM2 addresses these issues by incorporating parallel spectral transform algorithms developed at Argonne and Oak Ridge National Laboratories [6, 8] that support the use of several hundred or more processors, depending on model resolution. Additional modifications involved the semi Lagrangian ....

....at the cost of some complexity in sequential implementation. In a parallel implementation, however, it also introduces a need for global communication [8] PCCM2 addresses these issues by incorporating parallel spectral transform algorithms developed at Argonne and Oak Ridge National Laboratories [6, 8] that support the use of several hundred or more processors, depending on model resolution. Additional modifications involved the semi Lagrangian representation of advection and techniques for load balancing [6] Calculations in the third, vertical dimension, particularly those representing ....

I. Foster and P. Worley. Parallel algorithms for the spectral transform method. SIAM Journal of Scientific and Statistical Computation, 18(3), 1997.


Performance Portability for Coupled Atmosphere-Ocean General.. - Worley   (Correct)

....peak performance. While peak performance is rightfully criticized as being unobtainable even with optimized code, timings from models or kernels written specifically for a given architecture often show significantly better performance, as demonstrated by the before and after timings found in [1, 9, 10, 38]. This and similar research [4, 40, 41] has also shown how unpredictable this optimization can be. Even logically similar architectures require different techniques to achieve good performance. Despite the dissimilarity of techniques required to attain good performance on high performance ....

.... code uses an additional high level call structure or framework to support multiple parallel algorithms and a low level encapsulation of the message passing to support runtime tuning of the parallel code, allowing easy optimization of the message passing and load balancing on different platforms [9, 10, 43, 44]. b) The development of the parallel version of the PSU NCAR MM5 regional atmospheric model [15] This more than five year research collaboration between ANL and the Mesoscale and Microscale Meteorology Division (MMM) at NCAR explored numerous issues of relevance to this proposal. These include ....

[Article contains additional citation context not shown here]

I. T. Foster and P. H. Worley. Parallel algorithms for the spectral transform method. SIAM J. Sci. Comput., 18(3):806--837, May 1997.


New Methods for Global Atmospheric Dynamics on Parallel Computers - Drake, al.   (Correct)

....version of the CCM by the introduction of hybrid isentropic sigma coordinate. 1. 2 Preliminary Studies As part of a CHAMMP research project on numerical methods for climate modeling a set of test cases for shallow water equation models was defined [33] and several models implemented [20, 36, 8, 37]. Three methods proved particularly promising. We will briefly describe each of these methods that form the basis for our research proposal. 1.2.1 Semi Lagrangian Models The semi Lagrangian transport (SLT) method is now an important part of many climate and weather models [24] A three time ....

P.H. Worley and I.T. Foster. Parallel algorithms for the spectral transform method. ORNL Tech Report ORNL/TM--12507, Oak Ridge National Laboratory, Oak Ridge, TN, 1994.


Massively Parallel Semi-Lagrangian Advection - Thomas, Côté (1995)   (Correct)

....may not map directly onto a physical interconnection network, resulting in several processors attempting to send messages over the same channel at the same time. A reasonably accurate model of this behaviour is to scale the transfer rate by the number N of processors concurrently sending a message [6]. Tcomm = t s ht h Nstw (21) In fact, the above equation yields quite accurate predictions of the communication overhead for the PVM implementation of the advection problem on both the Intel iPSC 860 and the Cray T3D. 3.2. Target Architectures In this article, the performance of the ....

....PVM. The interconnection network is a seven dimensional hypercube with a peak node to node aggregate bandwidth of 2.8 Mbytes s or t w = 1:4 s, internode hop time of t h = 2 s and a message startup time of t s = 136 s. Empirical studies have shown, however, that these represent optimal values [5] [6]. Realistic values of t s , t h and t w based on scalability studies of a spectral transform shallow water model [6] are summarized in Table 1. The Cray T3D is a massively parallel processor based on a 3 D periodic torus mesh network and contains up to 2048 processing elements (PEs) 3] In ....

[Article contains additional citation context not shown here]

I. Foster and P. Worley. Parallel algorithms for the spectral transform method. Technical Report MCS-P426-0494, Argonne National Laboratory, Argonne, Illinois, April 1994.


A Parallel Spectral Model for Atmospheric Transport.. - Kindler, Schwan.. (1995)   (4 citations)  (Correct)

.... advantage of the natural parallelism offered by the spectral computations being performed (e.g. taking advantage of independently computable terms in equations) Our parallelization strategy and results also differ from the recent, extensive work on parallel climate models by Foster and Worley[9, 10], most of which was performed concurrently with our research. Specifically, Foster and Worley investigate message passing machines like the Intel Paragon, while our work includes a detailed study of performance overheads arising on shared memory multiprocessors. Parallelism is again attained by ....

....to include additional code modules, such as modules performing chemical calculations for specific constituents. While the implementation of TRANS does not offer a uniform or self contained framework for inclusion of additional or complementary code modules as described in other research[9, 26], TRANS attains limited extensibility and more importantly, the ability to interact on line with other programs potentially running on different machines by using a uniform format for exchange of binary input and output files. This format is described in detail in [27] Briefly, it permits ....

[Article contains additional citation context not shown here]

I.T. Foster and P.H. Worley. Parallel algorithms for the spectral transform method. Technical Report ORNL/TM-12507, Oak Ridge National Laboratory, April 1994.


Opportunities and Tools for Highly Interactive.. - Eisenhauer, Gu.. (1996)   (6 citations)  (Correct)

....are more easily and accurately calculated in this spectral domain, though the variables must be transformed back into a grid domain for the chemistry calculations. Details of this solution approach, which is quite common in global models, can be found in [Hau40] Sil54] KHYK61] WP86] or [FW94] Our model contains 37 layers, which represent segments of the earth s atmosphere from the surface to approximately 50 km, with a horizontal resolution of 42 waves or 946 spectral values. In a grid system, this corresponds to a resolution of about 2.8 degrees by 2.8 degrees. Thus in each layer ....

I.T. Foster and P.H. Worley. Parallel algorithms for the spectral transform method. Technical Report ORNL/TM-12507, Oak Ridge National Laboratory, April 1994.


Scaling the Unscalable: A Case Study on the AlphaServer SC - Worley (2002)   Self-citation (Worley)   (Correct)

....parallel FFT, Legendre transform, and semi Lagrangian transport algorithms each have numerous parallel implementation options. The options are di#erentiated by their message counts, message volumes, and MPI communication protocols. For more details on the parallel algorithms used in CCM MP 2D see [3, 5]. The parallel algorithms for both the spectral transform and semi Lagrangian transport require significant interprocessor communication. For example, the total amount of data communicated per simulation day for CCM MP 2D as a function of the number of processors is described in Fig. 1. As can be ....

I. T. Foster and P. H. Worley, Parallel algorithms for the spectral transform method, SIAM J. Sci. Comput., 18 (1997), pp. 806--837.


Optimizing Collective I/O Performance on Parallel.. - Chen, Foster..   Self-citation (Foster)   (Correct)

....done on automatic performance optimization in parallel I O community, however, such techniques have been explored by other communities. For example, Golding et al. 6] have proposed an attributemanaged storage system approach to the management of large scale storage systems. Foster and Worley [5] have used performance modeling techniques to guide algorithm selection in parallel climate models. 5 Conclusions Our results show that both the DRA and Panda parallel I O libraries achieve high I O performance for interesting problems. However, high performance often requires that the user ....

I. Foster and P. Worley. Parallel algorithms for the spectral transform method. SIAM Journal of Scientific and Statistical Computation, 18(3), 1997. To appear.


Ornl/tm-13682 - Computer Science And   Self-citation (Worley)   (Correct)

.... protocols, we use an integrated suite of tests that are derived from or motivated by the Parallel Spectral Transform Shallow Water Model (PSTSWM) parallel application code [15] 16] PSTSWM was developed to evaluate strategies for parallelizing spectral global atmospheric circulation models [4] [5], and has imbedded a large number of parallel algorithm options. Among these options are numerous choices for the communication protocols used to implement the different parallel algorithms and numerous choices of message passing layer. We use PSTSWM to examine 1) single processor performance, ....

I. T. Foster and P. H. Worley, Parallel algorithms for the spectral transform method, SIAM J. Sci. Comput., 18 (1997), pp. 806--837.


Design and Performance of a Scalable Parallel.. - Drake, Foster.. (1995)   (15 citations)  Self-citation (Foster Worley)   (Correct)

....is, computers with hundreds or thousands of processors. In the course of this study, we have developed new parallel algorithms for numerical methods used in atmospheric circulation models, and evaluated these and other parallel algorithms using both testbed codes and analytic performance models [8, 10, 11, 29, 32]. We have also incorporated some of the more promising algorithms into a production parallel climate model called PCCM2 [6] The latter development has allowed us to validate performance results obtained using simpler testbed codes, and to investigate issues such as load balancing and parallel ....

....However, we restrict our attention to two dimensional decompositions, as these provide adequate parallelism on hundreds or thousands of processors and simplify code development. A variety of different decompositions, and hence parallel algorithms, are possible in spectral atmospheric models [10, 11]. Physical space is best partitioned by latitude and longitude, as a decomposition in the vertical dimension requires considerable communication in the physics component. However, Fourier and spectral space can be partitioned in several different ways. A latitude wavenumber decomposition of ....

[Article contains additional citation context not shown here]

I. T. Foster and P. H. Worley, Parallel algorithms for the spectral transform method, submitted for publication. Also available as Tech. Report ORNL/TM--12507, Oak Ridge National Laboratory, Oak Ridge, TN, May 1994.


Parallel Community CLIMATE MODEL: DESCRIPTION & USER'S.. - Drake, Flanery.. (1996)   (1 citation)  Self-citation (Foster Worley)   (Correct)

....processors can proceed independently. The rest of this section describes in more detail the data decomposition and parallel algorithms for the Legendre transform and the semi Lagrangian transport. A detailed description and comparison of parallel algorithms can be found in the series of papers. [14, 46, 10, 15]. More details on the parallel CCM2, and a description of load balancing techniques used in the physics component of the model, can be found in other papers [21, 47] A data parallel implementation of PCCM2 is described in [18] and a modestly parallel implementation for shared memory and ....

....to hold the duplicated spectral coefficients. Since relatively little work is done in the spectral domain in CCM2, this redundant work has not proved to be an issue, and the vector sum has proved to be a viable parallel algorithm for PCCM2. For a more detailed discussion of these issues see [14, 46, 10, 15, 21]. 3.4. Semi Lagrangian Transport The advection of moisture in CCM2 uses a semi Lagrangian transport (SLT) method in conjunction with shape preserving interpolation [41] The method updates the value of the moisture field at a grid point (the arrival point, A) by first establishing a trajectory ....

I. T. Foster and P. H. Worley. Parallel algorithms for the spectral transform method. Technical Report ORNL/TM--12507, Oak Ridge National Laboratory, Oak Ridge, TN, 1994.


Parallel Algorithms for Semi-Lagrangian Transport in Global.. - Drake, Foster (1995)   Self-citation (Foster)   (Correct)

....This trajectory is found iteratively using the interpolated velocity field at the mid point, M, of the trajectory. From this mid point the departure point, D, is calculated and the moisture field is interpolated at D using shape preserving interpolation. Like the spectral tranform method [2], there are a variety of parallel algorithms for semiLagrangian transport, in two major classes: distributed and transpose. All the calculations involve the same computational grid as the columnar physics, and are initially decomposed in the same way, a block decomposition of the latitude and ....

....algorithm is also of interest. While some of these parallel SLT algorithms have been examined by different researchers, a comprehensive comparison has not previously been attempted. In this study, we use the same methodology developed for the evaluation of parallel spectral transform methods [2]: 1. Implement all parallel algorithms in a code that solves the shallow water equations on a sphere, using fictitious vertical levels to obtain the correct granularity for threedimensional GCMs. 2. Implement runtime options for each algorithm for selecting communication protocols, data ....

[Article contains additional citation context not shown here]

I. T. Foster and P. H. Worley, Parallel algorithms for the spectral transform method, Tech. Rep. ORNL/TM--12507, Oak Ridge National Laboratory, Oak Ridge, TN, May 1994.


Parallel Spectral Transform Shallow Water Model: A.. - Worley Foster (1994)   (6 citations)  Self-citation (Foster Worley)   (Correct)

....the same characterization of communication costs and communication pattern as the Theta(log Q) transpose algorithm. All parallel algorithms execute essentially the same computations, and, modulo load imbalances, differ only in communication costs. Load balance issues are discussed in detail in [4]. Note that the simple characterizations described here ignore link contention in the physical network, and, for example, the Theta(log Q) distributed Legendre transform algorithm is not automatically better than the Theta(Q) distributed algorithm. In summary, PSTSWM provides 12 different ....

....machine configurations to determine the best aspect ratio and parallel algorithms for a given number of processors. This information can then be saved and used whenever a given number of processors becomes available for a run. We defer the complete description of the algorithm comparison to [4], and only discuss the high level tuning for the largest configurations on each of the target machines. The best parallel algorithm combinations and logical aspect ratios are listed below. The parallel algorithms are denoted as follows: Theta(Q) transpose (TQ) Theta(log Q) transpose (TL) ....

[Article contains additional citation context not shown here]

I. T. Foster and P. H. Worley, Parallel algorithms for the spectral transform method, ORNL/TM--12507, Oak Ridge National Laboratory, Oak Ridge, TN, May 1994.


A Users' Guide To Pstswm - Worley, Toonen (1995)   Self-citation (Worley)   (Correct)

....benchmark code and parallel algorithm testbed that solves the nonlinear shallow water equations on a rotating sphere using the spectral transform method. PSTSWM was developed to evaluate parallel algorithms for the spectral transform method as it is used in global atmospheric circulation models [6]. Multiple parallel algorithms are embedded in the code and can be selected at run time, as can the problem size, number of processors, and data decomposition. Six different problem test cases are also supported, each with associated solution and error analysis options. The extensive selection of ....

....copying and exploiting our knowledge of the context in which they are called. In this report, we describe the practical issues of how to use PSTSWM. In a future report, we will describe the code structure and embedded parallel algorithms in detail. Algorithm comparison results are described in [6]. Benchmark results are described in [18] and [5] The benchmarking philosophy inspired by PSTSWM is described in [17] The rest of this report is as follows. Chapter 2 gives a brief history of the development of PSTSWM. Chapter 3 describes how to obtain, build, and run the code. Chapter 4 ....

[Article contains additional citation context not shown here]

I. T. Foster and P. H. Worley, Parallel algorithms for the spectral transform method, Tech. Report ORNL/TM--12507, Oak Ridge National Laboratory, Oak Ridge, TN, May 1994.


Algorithm Comparison And Benchmarking Using A Parallel.. - Patrick Worley   (2 citations)  Self-citation (Foster Worley)   (Correct)

No context found.

I. T. Foster and P. H. Worley, Parallel algorithms for the spectral transform method, Tech. Report ORNL/TM-12507, Oak Ridge National Laboratory, Oak Ridge, TN 37831, April 1994. (also available as a preprint from Argonne National Laboratory, IL 60439)


Design and Performance of a Scalable Parallel.. - Drake, Foster.. (1995)   (15 citations)  Self-citation (Foster Worley)   (Correct)

....is, computers with hundreds or thousands of processors. In the course of this study, we have developed new parallel algorithms for numerical methods used in atmospheric circulation models, and evaluated these and other parallel algorithms using both testbed codes and analytic performance models [8, 10, 11, 29, 32]. We have also incorporated some of the more promising algorithms into a production parallel climate model called PCCM2 [6] The latter development has allowed us to validate performance results obtained using simpler testbed codes, and to investigate issues such as load balancing and parallel ....

....However, we restrict our attention to two dimensional decompositions, as these provide adequate parallelism on hundreds or thousands of processors and simplify code development. A variety of different decompositions, and hence parallel algorithms, are possible in spectral atmospheric models [10, 11]. Physical space is best partitioned by latitude and longitude, as a decomposition in the vertical dimension requires considerable communication in the physics component. However, Fourier and spectral space can be partitioned in several different ways. A latitude wavenumber decomposition of ....

[Article contains additional citation context not shown here]

I. T. Foster and P. H. Worley, Parallel algorithms for the spectral transform method, submitted for publication. Also available as Tech. Report ORNL/TM-- 12507, Oak Ridge National Laboratory, Oak Ridge, TN, May 1994.


Using Runtime Measurements and Historical Traces for - Acquiring Knowledge In (2004)   (Correct)

No context found.

Foster, I.T., Worley, P.H.: Parallel algorithms for the spectral transform method. SIAM J. Sci. Stat. Comput. 3 (1997) 806--837


A Diskless Checkpointing Algorithm for Super-scale.. - Engelmann, Geist (2003)   (Correct)

No context found.

I. T. Foster and P. H. Worley. Parallel algorithms for the spectral transform method. SIAM Journal on Scientific Computing, 18(3):806--837, 1997.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC