| Arjan van Gemund, "Performance Modeling of Parallel Systems," In PhD thesis, Technische Universiteit Delft, the Netherlands, 1996. |
....show the usability of the model for full scale applications. Furthermore, without knowing the actual performance of the SHRIMP machine, the results of the analytical method and those of the simulator can only be verified against each other, not against the actual performance of the machine. In [77, 61], A. van Gemund presents a methodology that yields parameterized performance models of parallel programs running on shared memory as well as distributedmemory (vector) machines. The aim of this research is to estimate performance degradation CHAPTER 2. RELATED WORK 22 due to synchronization ....
A. van Gemund. Performance Modeling of Parallel Systems. Delft University Press, 1996.
....the communication overhead in the SHRIMP multicomputer under a variety of workloads: analytic modeling and event driven simulation. Performance parameters of the system are mainly determined by details of DMA speed, network connectivity, and policies for arbitrating of buses and network links. In [13], A. van Gemund presents a methodology that estimates performance degradation due to synchronization effects. 3 P 3 T : A Performance Estimator for Distributed and Parallel Programs P 3 T is a state of the art performance estimator that targets both distributed and parallel programs. Figure ....
A. van Gemund. Performance Modeling of Parallel Systems.
....the way the system really operates, the authors believe they do not introduce a significant error in the model. Furthermore, the model assumes that each processor executes the same program, that all messages are of the same size and that messages are sent to uniformly distributed destinations. In [36, 32], A. van Gemund presents a methodology that yields parameterized performance models of parallel programs running on shared memory as well as distributed memory (vector) machines. The aim of this research is to estimate performance degradation due to synchronization effects, covering both condition ....
A. van Gemund. Performance Modeling of Parallel Systems. Delft University Press, 1996.
....latency, occupancy of system buffers and network congestion. Their analytic model is based on two assumptions: i) packet inter arrival times and service times at every component are exponentially distributed, and (ii) the states of any pair of components are independent random variables. In [25, 24], A. van Gemund presents a methodology that yields parameterized performance models of parallel programs running on shared memory as well as distributed memory (vector) machines. The aim of this research is to estimate performance degradation due to synchronization effects, covering both condition ....
A. van Gemund. Performance Modeling of Parallel Systems. Delft University Press, 1996.
....are integrated) but also a specific part of such a process, e.g. a telematics application, in more detail. Many techniques have been developed for performance prediction of several classes of (concurrent) discreteevent systems, such as (parallel) computer systems (Ajmone Marsan et al. 1986; Gemund 1996; Jonkers 1995) telecommunication systems (Dowd and Gelenbe 1995) manufacturing systems (Yao 1994) and traffic. As opposed to e.g. the manufacturing sector, these techniques are not yet widely applied in the relatively new area of redesign of administrative business processes. However, with the ....
....exist, generally associated with different modelling formalisms. These methods mainly differ in the position they occupy on the trade off between prediction accuracy and (computational) efficiency. Static techniques are used to quickly obtain first order performance estimates (Fahringer 1995; Gemund 1996). Techniques based on (timed, stochastic) extensions of Petri nets (Ajmone Marsan et al. 1986) yield accurate results, but are timeconsuming due to a state explosion (which results in a complexity which is exponential in the model size) Between these two extremes, several techniques exist to ....
[Article contains additional citation context not shown here]
Gemund, A.J.C. van. 1996. Performance Modeling of Parallel Systems. Ph.D. Thesis, Delft University of Technology, The Netherlands (Apr.).
....methods mainly differ in the position they occupy on the trade off between prediction accuracy and computational efficiency. Static techniques, e.g. based on symbolic expressions or simple critical path algorithms, are used to efficiently obtain first order performance estimates (Fahringer 1995; Gemund 1996). Techniques based on timed or stochastic extensions of Petri nets (Ajmone Marsan et al. 1986) yield accurate results, but are timeconsuming due to a state space explosion (resulting in a complexity which is exponential in the model size) In the remainder of this paper, we will call techniques ....
....of computers K varying from 1 to 4. The results are shown in Table 1 and graphically in Figure 9. Simulation is used as the baseline to assess the accuracy of the analytical methods. The simulations were carried out by making use of the C programming language with the PAMELA simulation library (Gemund 1996). We assume an exponential distribution for the service times of the resources. The table gives 95 confidence intervals for the completion times (in minutes) obtained from 500 simulation samples. The Glamis methodology as described in the previous section was applied with both exact MVA (Reiser ....
Gemund, A.J.C. van. 1996. Performance Modeling of Parallel Systems. Ph.D. Thesis, Delft University of Technology, The Netherlands (Apr.).
....a language for performance modeling, called Pamela (PerformAnce ModEling LAnguage) has been developed. The underlying analysis technique is based on compile time deterministic path analysis with the assumption of deterministic task times (mean values) instead of accounting for variance [4]. We give an example to illustrate the deterministic analysis in Pamela. Consider the following program fragment consisting of a loop and a conditional statement: for (i=1;i =n;i ) if (condition) statement; For the purpose of the example we assume that the total execution time is only contributed ....
....and the execution time of the statement respectively, while r is an uniform random variable with sample space [0; 1] This implies that the if construct is modeled in term of a Bernoulli trial with chance p of success. Currently, the existing analysis method, called serialization analysis [4], yields the execution time for the Pamela model in Eq. 1.1) given by T = np (1.2) As long as n; p and are deterministic, T will also be deterministic and the analysis above is straightforward and trivial. However, the analysis will be far from trivial if the parameters are stochastic rather ....
[Article contains additional citation context not shown here]
A.J.C. van Gemund, Performance Modeling of Parallel Systems, PhD thesis, Delft University of Technology, The Netherlands, April 1996.
....of arithmetic parameters with the linear regression analysis method. Chapter 3 Application In this chapter the modeling approach is validated using Gauss elimination. First control flow is discussed with introducing the profiling method. Gauss elimination is then analyzed. Here PAMELA model [2] is introduced to make it easier by first transforming calculation domain to PAMELA domain and then to time domain. Using the parameters that have been measured in the previous chapter we get the prediction execution time of Gauss elimination. And we calculate the error of our prediction with the ....
....concurrent communications. The measurement is shown in Table 4.3 which the sender is s = 0,1,2,3. Table 4.3 shows that communication time which the sender is processor 3 cost approximately the half 2 Small number bytes is performed due to restriction of CSTools. 4.3. CONCURRENCY 30 P[0] P[1] P[2] P[3] Figure 4.4: Two pairs concurrent communications inside cluster 1 s r T c [ms] 0,1) 2,3) 0.9 (0,1) 3,2) 1.7 (0,3) 1,2) 0.9 (4,5) 6,7) 1.7 (4,5) 7,6) 1.7 (4,7) 5,6) 0.9 (8,9) 10,11) 1.7 (8,9) 11,10) 1.7 (8,11) 9,10) 0.9 (12,13) 14,15) 1.7 (12,13) 15,14) 1.7 (12,15) 13,14) 0.9 ....
A.J.C. van Gemund, Performance Modeling of Parallel System, PhD thesis, Delft University of Technology, The Netherlands, April 1996.
....in the speculatively parallelized program is minimized, This constitutes an interesting static optimization problem. Clearly it requires static run time prediction techniques which compute statement frequencies from (estimated or profiled) loop counts, true ratios etc. for the parallelized program [Fah93, van96]. If there is no fill in to be provided for in the program, we probably can completely eliminate all tests but the first one, which is located immediately after building the organizational data structures for the sparse matrix. 4.4 Symbolic data flow analysis for cross matching Now let us ....
Arjan van Gemund. Performance Modeling of Parallel Systems. PhD thesis, Technical University of Delft (NL), 1996.
....The C term refers to the structured use of mutual exclusion (ME [1] in contention for (processing or data) resources. By imposing these specific restrictions in the synchronization structure, a performance analyzability is achieved that allows for reliable, closed form, analytic cost estimation [6]. This, in turn, unlocks the potential of automatic program optimization which is the ultimate objective [7] The crucial motivation for the SPC programming model is that in practice the loss of expressiveness due to the inherent synchronization restrictions seems to be limited to a (small) ....
A.J.C. van Gemund, Performance Modeling of Parallel Systems. PhD thesis, Technical University, Delft, The Netherlands, Apr. 1996.
....The C term refers to the structured use of mutual exclusion (ME [1] in contention for (processing or data) resources. By imposing these speci c restrictions in the synchronization structure, a performance analyzability is achieved that allows for reliable, closed form, analytic cost estimation [6]. This, in turn, unlocks the potential of automatic program optimization which is the ultimate objective [4] The crucial motivation for the SPC programming model is that in practice the loss of expressiveness due to the inherent synchronization restrictions is limited to a (small) constant. This ....
A.J.C. van Gemund, Performance Modeling of Parallel Systems. PhD thesis, Technical University, Delft, The Netherlands, Apr. 1996.
.... which implies structure with respect to the synchronization patterns that are possible (only SP task graphs) By imposing these speci c restrictions in the synchronization structure, a performance analyzability is achieved that allows for reliable, closed form, analytic cost estimation [6]. This, in turn, unlocks the potential of automatic program optimization which is the ultimate objective [5] The aforementioned trade o is based on the following conjecture [5] which states that the loss of parallelism when programming according to the SPC model is typically limited to a ....
A.J.C. van Gemund, Performance Modeling of Parallel Systems. PhD thesis, Technical University, Delft, The Netherlands, Apr. 1996.
....annotated with this assignment information. The annotated task graph then enables us to execute the application on a parallel com puter and it also allows us to do performance optimization and scalability analysis using a performance simulation tool called Pamela (PerformAnce ModEling LAnguage [4,6]) For the parallel execution a general purpose task graph executor TGEX has been implemented using PVM for communication. In order to enable overlapping between computation and communication, TGEX has three autonomous processes on each processor: a sender, a receiver and a executer. These three ....
....includes a performance prediction module that is based on a technique that combines the high efficiency of analytic prediction techniques with the accuracy of performance simulation techniques. In the following we present a brief description of the technique. More details can be found in, e.g. [5,6]. As mentioned earlier, an important feature of TGEX is the use of concurrency, with respect to both computation as well as communication (computation tasks and communication tasks proceed simultaneously) Due to the dynamic scheduling of processing and communication resources involved in the ....
[Article contains additional citation context not shown here]
A.J.C. van Gemund, "Performance Modeling of Parallel Systems", Ph.D. thesis, Delft University of Technology, 1996.
..... 59 A.8 Other utilities . 63 2 Chapter 1 Introduction This report documents research done during three months in the context of a program aimed at the development of the SPC programming model [4, 3]. The general purpose of this research program is to find a model for structured programming of parallel computations, based on the use of sequential parallel structures and mutual exclusion (contention) as the only ways to synchronize tasks. Most of the definition languages that support ....
A.J.C. van Gemund, Performance Modeling of Parallel Systems. PhD thesis, Technical University, Delft, The Netherlands, Apr. 1996.
....The C term refers to the structured use of mutual exclusion (ME [1] in contention for (processing or data) resources. By imposing these speci c restrictions in the synchronization structure, a performance analyzability is achieved that allows for reliable, closed form, analytic cost estimation [8]. This, in turn, unlocks the potential of automatic program optimization which is the ultimate objective [7] The crucial motivation for the SPC programming model is that in practice the loss of expressiveness due to the inherent synchronization restrictions is limited to a (small) constant. This ....
A.J.C. van Gemund, Performance Modeling of Parallel Systems. PhD thesis, Technical University, Delft, The Netherlands, Apr. 1996.
.... C term refers to the structured use of mutual exclusion (ME [1] in contention for (processing or data) resources. By imposing these specific restrictions in the synchronization structure, a performance analyzability is achieved that allows for reliable, closed form, analytic cost estimation [6]. This, in turn, unlocks the potential of automatic program optimization which is the ultimate objective [4] The crucial motivation for the SPC programming model is that in practice the loss of expressiveness due to the inherent synchronization restrictions is limited to a (small) constant. This ....
A.J.C. van Gemund, Performance Modeling of Parallel Systems. PhD thesis, Technical University, Delft, The Netherlands, Apr. 1996.
....critical section (software server) or physical, e.g. model a CPU, memory bank, communication bus, etc. The scheduling policy (we will consider in this paper) associated with resources is simply FCFS with non deterministic, fair conflict arbitration (but other scheduling disciplines are possible [14]) The notion of resources is universal in SPC, i.e. in principle, each process in SPC is always mapped onto at least one resource 1 . The underlying concept is that a process must always execute in the context of some resource (an instruction will cost cycles to at least one of more ....
.... to increase (nesting) readability as in main = par (i = 1, N) seq (j = 1, M) op(i,j) SPC also allows the expression of nested data parallelism, such as used in the following Divideand Conquer example 1 As multiple mappings may be specified a process may simultaneously use a set of resources [14]. main = par (i = 1, N) par (j = 1, M) f(i,j) f(i,j) C . Note that SPC allows dynamic parallelism as the value of N and M may be controlled at run time. Thus irregular problems can be expressed naturally. 2.2.2 Pipelining As the programming power of the popular data parallel ....
[Article contains additional citation context not shown here]
A.J.C. van Gemund, Performance Modeling of Parallel Systems. PhD thesis, Delft University of Technology, The Netherlands, Apr. 1996.
..... 34 4.2 Measurements taken from the PAMELA VIEWER . 36 vii Introduction 1 This thesis describes the initial version of a viewer for the visualization of execution traces of PAMELA models. The name PAMELA is a acronym for PerformAnce ModEling LAnguage [4]. The PAMELA language is a language for performance analysis of models that simulate a real time multitasking and multiprogramming environment with process synchronization. The PAMELA language allows detailed simulation models of parallel systems. The PAMELA models build with the PAMELA language ....
A.J.C. van Gemund. Performance Modeling of Parallel Systems. PhD thesis, Delft University of Technology, Apr. 1996, Delft University Press, ISBN 90407 -1326-X.
....principles how the SPC model is applied to optimization issues by discussing vectorization. Section 4 concludes the paper. 2 Principle In this section we introduce our approach to the modeling of parallel computation. As a description vehicle we shall use Pamela, a performance modeling language [4]. Although Pamela has indeed been developed with the SPC modeling paradigm in mind, the SPC approach represents a distinct model of parallel computation that is independent of the underlying description language. First, we briefly introduce Pamela as far as it is needed for a full understanding of ....
....show that the average estimation error is less than a constant factor 2, while in most cases the average error is well within tens of percents. Although applied in the sequel, due to space limitations the cost estimation algorithm itself is not described in the paper. Details can be found in [4]. Despite the advantages in analytical sense, as described above, it would seem that the constraints imposed by the highly structured approach towards synchronization are quite severe. For example, the SP restriction with respect to CS would make it impossible to express a fundamental parallel ....
[Article contains additional citation context not shown here]
A.J.C. van Gemund, Performance Modeling of Parallel Systems. PhD thesis, Delft University of Technology, Apr. 1996.
....annotated 5 with this assignment information. The annotated task graph then enables us to execute the application on a parallel computer and it also allows us to do performance optimisation and scalability analysis using a performance simulation tool called Pamela (PerformAnce ModEling LAnguage [4,6]) The low level communication among the tasks are taken care by the TGEX, the user needs only to define the formats of the data structures to be communicated between the tasks. Notice that no explicit communication statements and no information about how to communicate need to be specified by the ....
....includes a performance prediction module that is based on a technique that combines the high efficiency of analytic prediction techniques with the accuracy of performance simulation techniques. In the following we present a brief description of the technique. More details can be found in, e.g. [5,6]. As mentioned earlier, an important feature of TGEX is the use of concurrency, with respect to both computation as well as communication (computation tasks and communication tasks proceed simultaneously) Due to the dynamic scheduling of processing and communication resources involved in the ....
[Article contains additional citation context not shown here]
A.J.C. van Gemund, "Performance Modeling of Parallel Systems", Ph.D. thesis, Delft University of Technology, 1996.
....model a critical SW section, or physical, e.g. model a CPU, memory bank, communication bus, etc. The scheduling policy we will consider in this paper associated with resources is simply FCFS with non deterministic, fair conflict arbitration but other scheduling disciplines can also be specified [15]. The notion of resources is universal in SPC, i.e. in principle, each process in SPC is always mapped onto at least one resource 2 . The underlying concept is that a process must always execute in the context of some resource (an instruction will cost cycles to at least one of more ....
....limitation mechanism. In summary, the SPC model of coordination is based on only a few constructs, i.e. the ; and if process composition operators and the resource assignment. Note that 2 As multiple mappings may be specified a process may simultaneously use a set of resources [15]. within SPC the resource concept is only used to express ME in order to express dynamic synchronization between program level components. Although possible, it is not intended to direct the actual mapping of processes onto physical processors or other machine resources as in our aim to study ....
[Article contains additional citation context not shown here]
A.J.C. van Gemund, Performance Modeling of Parallel Systems. PhD thesis, Delft University of Technology, Apr. 1996.
....cost is minimized while retaining a good load balance. After the mapping is known, the task graph is annotated with the assignment information. Apart from actually executing the annotated graph, TGEX comes with a performance simulator, implemented using Pamela (PerformAnce ModEling LAnguage [4, 5]) that enables performance optimisation and scalability analysis in order to investigate the influence of the various algorithm and machine properties. As mentioned earlier, TGEX has been implemented using PVM for communication. In order to enable true overlap between computation and ....
A.J.C. van Gemund, "Performance Modeling of Parallel Systems", Ph.D. thesis, Delft University of Technology, 1996.
No context found.
Arjan van Gemund, "Performance Modeling of Parallel Systems," In PhD thesis, Technische Universiteit Delft, the Netherlands, 1996.
No context found.
Arjan van Gemund, "Performance Modeling of Parallel Systems," PhD thesis, Technische Universiteit Delft, the Netherlands, 1996.
No context found.
A.J.C. van Gemund, Performance Modeling of Parallel Systems. PhD thesis, Delft University of Technology, The Netherlands, Apr. 1996.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC