DMCA
VIPP: Visual Interactive Parallel Performance Tool
Citations
996 |
Performance Fortran Forum. “High Performance Fortran language specification
- High
- 1993
(Show Context)
Citation Context ...e a significant amount of communication relative to the amount of computation (the surface-to-volume effect) [2]. VIPP does not support blocked interleaved decompositions, e.g. HPF-style BLOCK CYCLIC =-=[8]-=-, in which the points in the template correspond to a rectangular block of cells. Such a strategy trades off communication overhead agains load imbalance by increasing the granularity of the templates... |
447 | The ParaDyn parallel performance measurement tool.
- Miller, Callaghan, et al.
- 1995
(Show Context)
Citation Context ...artitioning, and parallel efficiency. As compared with VIPP, hardware level simulators such as Proteus [4] provide detailed performance information. Other performance measurement tools include Paradyn=-=[14]-=-, and Pablo [18]. These tools enable the user to measure running programs on parallel computers and are general purpose. Paradyn is able to attribute performance measurements to specific higher-level ... |
412 | The stanford dash multiprocessor.
- Lenoski, Laudon, et al.
- 1992
(Show Context)
Citation Context ...comm = n \Delta # fi \Delta ` L + B \Delta fi 1024 ' (4) 3 In practice, we may improve performance dramatically by explicitly managing cache locality. See, for example, results with the Stanford Dash =-=[13]-=- and application-specific cache protocols with the Wisconsin Wind Tunnel [6]. Such techniques expose the message passing activity used to maintain cache coherence. 4 Nevertheless, there is no substitu... |
349 | The Stanford FLASH Multiprocessor.
- Kuskin, Ofelt, et al.
- 1994
(Show Context)
Citation Context ...en distributed local memories via message passing. Included are conventional message passing MPPs such as the Intel Paragon, certain distributed shared memory architectures such as the Stanford Flash =-=[12]-=-, and workstation clusters. VIPP models the parallel execution of our application on such a machine by partitioning the mesh into subproblems and executing the subproblems one per processor in loosely... |
344 | Partitioning of unstructured problems for parallel processing, in
- SIMON
- 1991
(Show Context)
Citation Context ...tion, Particle Methods, Education in Parallel Processing. 1 Introduction The load balancing problem arises in a number of irregular applications including particle methods [5], finite element methods =-=[19]-=-, and adaptive mesh methods [3]. In practice, the task of balancing workloads entail entails juggling a diversity of data- and application-dependent performance tradeoffs. We have developed VIPP, an i... |
277 |
A partitioning strategy for nonuniform problems on multiprocessors
- Berger, Bokhari
- 1987
(Show Context)
Citation Context ...n in Parallel Processing. 1 Introduction The load balancing problem arises in a number of irregular applications including particle methods [5], finite element methods [19], and adaptive mesh methods =-=[3]-=-. In practice, the task of balancing workloads entail entails juggling a diversity of data- and application-dependent performance tradeoffs. We have developed VIPP, an instructional tool for evaluatin... |
240 | Proteus: A High-Performance Parallel Architecture Simulator,"
- Brewer, Dellarocas, et al.
- 1991
(Show Context)
Citation Context ...d on these values, VIPP estimates and displays workload balance, communication overhead, domain partitioning, and parallel efficiency. As compared with VIPP, hardware level simulators such as Proteus =-=[4]-=- provide detailed performance information. Other performance measurement tools include Paradyn[14], and Pablo [18]. These tools enable the user to measure running programs on parallel computers and ar... |
199 | A parallel hashed oct-tree N-body algorithm.
- Warren, Salmon
- 1993
(Show Context)
Citation Context ...ared memory architectures. It is intended for twodimensional particle methods that employ a uniform single level mesh to organize the computation [9], but does not support hierarchical representations=-=[23, 20]. VIPP pro-=-vides a convenient graphical interface that enables the user to interactively browse the load balancing parameter space. For example, the user may "dial in" performance characteristics of ex... |
169 | Scalable Performance Analysis: The Pablo Performance Analysis Environment,” in
- Reed, Aydt, et al.
- 1993
(Show Context)
Citation Context ... parallel efficiency. As compared with VIPP, hardware level simulators such as Proteus [4] provide detailed performance information. Other performance measurement tools include Paradyn[14], and Pablo =-=[18]-=-. These tools enable the user to measure running programs on parallel computers and are general purpose. Paradyn is able to attribute performance measurements to specific higher-level language constru... |
94 | Application-Specific Protocols for User-Level Shared Memory" Supercomputing `94
- Falsafi, Lebeck, et al.
- 1994
(Show Context)
Citation Context ...y improve performance dramatically by explicitly managing cache locality. See, for example, results with the Stanford Dash [13] and application-specific cache protocols with the Wisconsin Wind Tunnel =-=[6]-=-. Such techniques expose the message passing activity used to maintain cache coherence. 4 Nevertheless, there is no substitute for comparing VIPP's predictions against actual observations of a working... |
42 | A parallel adaptive fast multipole method. In Supercomputing,
- Singh, Holt, et al.
- 1993
(Show Context)
Citation Context ...ared memory architectures. It is intended for twodimensional particle methods that employ a uniform single level mesh to organize the computation [9], but does not support hierarchical representations=-=[23, 20]. VIPP pro-=-vides a convenient graphical interface that enables the user to interactively browse the load balancing parameter space. For example, the user may "dial in" performance characteristics of ex... |
33 |
Processor Self Scheduling for Multiple-Nested Parallel Loops
- Tang, Yew
- 1986
(Show Context)
Citation Context ...ff communication overhead agains load imbalance by increasing the granularity of the templates, and may be added in a future release of VIPP. 5 3.5 Processor Self-Scheduling Processor self-scheduling =-=[22] is includ-=-ed as an "iron man" to establish a theoretical upper bound on how well workloads can be balanced, given that the problem has an inexact solution. While processor self-scheduling generally yi... |
32 |
Programming abstractions for dynamically partitioning and coordinating localized scientific calculations running on multiprocessors
- Baden
- 1991
(Show Context)
Citation Context ...rkloads must be shuffled as the particles move; particles change owners as the result of their time evolution. Of the three communication activities, only the first appears to have a significant cost =-=[17, 11, 1]-=-; the reamining activities are ignored by VIPP. Partitioning is performed periodically according to a user-specified interval. VIPP will measure the actual partitioning time and scale the time to a us... |
24 | Partitioning with spacefilling curves
- Pilkington, Baden
- 1994
(Show Context)
Citation Context ...rkloads must be shuffled as the particles move; particles change owners as the result of their time evolution. Of the three communication activities, only the first appears to have a significant cost =-=[17, 11, 1]-=-; the reamining activities are ignored by VIPP. Partitioning is performed periodically according to a user-specified interval. VIPP will measure the actual partitioning time and scale the time to a us... |
21 | A Parallel Software Infrastructure for Dynamic Block-Irregular Scienti Calculations
- Kohn
- 1995
(Show Context)
Citation Context ...rkloads must be shuffled as the particles move; particles change owners as the result of their time evolution. Of the three communication activities, only the first appears to have a significant cost =-=[17, 11, 1]-=-; the reamining activities are ignored by VIPP. Partitioning is performed periodically according to a user-specified interval. VIPP will measure the actual partitioning time and scale the time to a us... |
21 |
An Analysis of Scatter Decomposition,”
- Nicol, Saltz
- 1990
(Show Context)
Citation Context ...communication overheads. 3.4 Interleaved or Scattered Decomposition Whereas orthogonal recursive bisection produces coarse grain connected partitions, an interleaved or scattered decomposition method =-=[15]-=- renders fine grained partitions. Interleaved decomposition decomposes the partitioning mesh by assigning the bins of the mesh periodically to the processors based on a lexicographical ordering. The p... |
13 | Fast Mapping And Remapping Algorithm For Irregular and Adaptive Problems
- Ou, Ranka, et al.
- 1993
(Show Context)
Citation Context ...echniques yield highly balanced partitionings, and have been applied by Singh and Hennessy [20] and Warren and Salmon[23] to hierarchical N-body methods, and by Ou et al. to the Finite Element Method =-=[16]-=-. Pilkington and Baden have conducted empirical studies [17] in which the highly irregular partitionings were observed to incur modest communication overheads. 3.4 Interleaved or Scattered Decompositi... |
11 |
Hardware and software architectures for irregular problem architectures
- Fox
- 1992
(Show Context)
Citation Context ...ters. VIPP models the parallel execution of our application on such a machine by partitioning the mesh into subproblems and executing the subproblems one per processor in loosely asynchronous fashion =-=[7]-=-. VIPP determines the parallel running time of the application by estimating the time spent in local and global computation, and adding this to an estimate of communication and load balancing times. T... |
10 |
Acomparison of load balancing strategies for particle methods running on MIMD multiprocessors
- Baden, Kohn
- 1991
(Show Context)
Citation Context .... Because of the fine-grained nature of the interleaving process, partitions tend to require a significant amount of communication relative to the amount of computation (the surface-to-volume effect) =-=[2]-=-. VIPP does not support blocked interleaved decompositions, e.g. HPF-style BLOCK CYCLIC [8], in which the points in the template correspond to a rectangular block of cells. Such a strategy trades off ... |
5 |
Parallelization using spatial decomposition for molecular dynamics
- Clark, Hanxleden, et al.
- 1994
(Show Context)
Citation Context ...ncing, Performance Visualization, Particle Methods, Education in Parallel Processing. 1 Introduction The load balancing problem arises in a number of irregular applications including particle methods =-=[5]-=-, finite element methods [19], and adaptive mesh methods [3]. In practice, the task of balancing workloads entail entails juggling a diversity of data- and application-dependent performance tradeoffs.... |
2 | Visualizing distributed data structures
- Srinivas
- 1995
(Show Context)
Citation Context ...puters and are general purpose. Paradyn is able to attribute performance measurements to specific higher-level language constructs or to specific data structures. Along these lines Srinivas (Indiana) =-=[21]-=- has developed a tool for visualizing distributed data structures. By comparison to Paradyn and Pablo, VIPP is a domain-specific trace-driven simulator; it does not require a parallelized program or e... |
1 |
Vipp user and implementer's guide, tech
- Johnson, Baden
- 1995
(Show Context)
Citation Context ...ers and buttons which allow the user to modify the various parallelization options. These controls are summarized in Table 2, and described more fully in the accompanying User and Implementor's Guide =-=[10]-=-. 5 VIPP in Action VIPP enables the user to explore a parameter space of load balancing alternatives and to note their affect on performance. The following capabilities are included; these examples il... |