| A. Karp and H. Flatt. Measuring parallel processor performance. Communications of the ACM, 33(5), 537--543, May 1990. |
....the start of this project, we found basically two di erent type of concurrency models. In the rst type, the concurrency models that try to measure the performance of a parallel program and compare that with the performances of a sequential program. Good examples of this are [37] 25] 28] and [15]. They simulate the parallel program and measure its run time. This run time is used to estimate the run time of the parallel program if it is distributed over a set of processors and to estimate the run time of a sequential version of the program. They compute then the speed up realized by the ....
Karp, A.H., and H.P. Flatt, Measuring parallel processor performance. In: Communications of the ACM, Volume 33, Issue 5, May
....the fact that massively parallel, distributed systems are an attractive alternative to supercomputers in terms of both price and performance for many applications. One of the outcomes of research in such scalable systems has been the development of various models of performance. For example, 9] [13] and [15] discuss various aspects of speedup and efficiency of parallel systems. Although insightful, these do not provide explicit analytical models. One goal of our research is to identify an analytical model that predicts the execution time of an application given a set of parameters for a ....
A. Karp and H. Flatt. Measuring parallel processor performance. Commun. ACM, 33(5), 1990.
....poorest results whereas the 21 MMT scheduling algorithm generally produces the best results. Finally, lazy cancellation generally produces better performance than aggressive cancellation. 8.2. Experimentally Determined Parallel Fraction In this section we use a metric, based on one proposed in [25], called the experimentally determined parallel fraction to assess the performance of the parallel simulations. This metric is derived from Amdahl s law. In Amdahl s law, the execution time of a program on n processors is assumed to be given by T(n) T s T p n , where T s is the time required ....
A. H. Karp and H. P. Flatt, "Measuring Parallel Processor Performance," Communications of the ACM, Vol. 33, No. 5, pp. 539-543, May 1990.
....that a linear scaled speedup curve corresponds to a constant eciency with linear problem scaling. This indicates a linear isoeciency function. When the isoeciency is superlinear (or non existent) the scaled speedup curve is sublinear. A number of other scalability metrics have also been proposed [35, 25, 36, 8, 26, 9, 33, 17, 34, 30, 8]. A detailed discussion of these and other metrics is provided in [14] These scalability metrics are relevant to petaFLOPS scale computing to varying degrees. The isoeciency metric is particularly useful because of the following features: It determines whether an algorithm can e ectively ....
Alan H. Karp and Horace P. Flatt. Measuring parallel processor performance. Communications of the ACM, 33(5):539-543, 1990.
.... including intra process parallelism [1] scaled process parallelism [5] and clustered workload parallelism [3a, 3b, 3c] It extends work in unifying parallel speedup models [5] which included other speedup models [2] 14] 13] 12] and accommodated speedup limiting considerations developed in [10], 9] 16] and developed additional speedup modifiers including non linear scaling, interconnection network Probability of Acceptance of requests, non uniform memory access, and non uniform workload. This paper identifies the Levels of Parallelism, integrates these parallel levels with two ....
A.H. Karp, H.P. Flatt, "Measuring Parallel Processor Performance.", Communications of the ACM, vol. 33, no. 5, May 1990.
....for parallel system performance evaluation should quantify this gap between available and delivered compute power since understanding the application and architectural bottlenecks is crucial for application restructuring and architectural enhancements. Many performance metrics [1] 2] 3] 4] [5], 6] have been proposed to quantify the match between application and architecture in a parallel system. While these metrics are useful for tracking overall performance trends, they provide little additional information about where performance is lost. Some of these metrics [4] 5] 6] attempt ....
....[2] 3] 4] 5] 6] have been proposed to quantify the match between application and architecture in a parallel system. While these metrics are useful for tracking overall performance trends, they provide little additional information about where performance is lost. Some of these metrics [4] [5], 6] attempt to identify the cause (the application or the architecture) of the problem when the parallel system does not perform as expected. Once the problem is identified, it is essential to find the individual application and architectural artifacts that lead to these bottlenecks and quantify ....
[Article contains additional citation context not shown here]
A. H. Karp and H. P. Flatt, "Measuring Parallel processor Performance," Communications of the ACM, vol. 33, no. 5, pp. 539--543, May 1990.
....speedup measures the parallelism inherent to the algorithm (refer to section 3.1) This definition is useful in comparing different architectures for a given algorithm. It is, however, useless in comparing different algorithm architecture pairs for solving the same problem. Karp and Flatt [Karp90] use serial fraction f as a metric for measuring the performance of a parallel system on a fix sized problem. They define f S n n = 1 1 1 1 where S is the speedup, and n is the number of processors. Interestingly, if we rewrite this equation as S n n f = 1 1 ( It becomes ....
Karp, A. H. and Flatt, H. P., "Measuring parallel processor performance," Communications of the ACM, vol. 33, no. 5, pp. 539-543, 1990.
....the program optimizing the cpu communication overlapping and maintaining each processor as busy as possible. With this information in mind, this paper is going to describe some of the most useful metrics to analyze parallel programs: speedup, efficiency, experimentally determined serial fraction [8] and percentage of cpu communication overlapping. The remainding metrics: percentage of idle time per processor, load and communication balancing and synchronization time are described in detail in [10] The speedup is the elapsed time of the best sequential algorithm divided by the elapsed time ....
....the program that must be executed sequentially. To calculate this serial fraction in an analytical way is a very complex problem, because there is not an easy way to determine for each program which fraction of it is executed serially and which fraction is not. For this reason, Karp and Flatt in [8] introduce the definition of experimentally determined serial fraction (serial fraction from now on) which is an empirical estimation of the theoretical serial fraction. The serial fraction is defined by the following equation: 3 f = 1 A(n;p) Gamma 1 p 1 Gamma 1 p where A(n,p) is the ....
Karp, A. and Flatt, H., Measuring Parallel Processor Performance, Communications of the ACM, Vol. 33, No. 5, May 1990.
....system because one can often reach very misleading conclusions regarding the performance of a large parallel system by simply extrapolating the performance of a similar smaller system. Many different measures have been developed to study the scalability of parallel algorithms and architectures [3, 7, 10, 15, 19, 22, 28, 32, 33]. In this paper, we use the isoefficiency metric [18, 7, 19] to study the scalability of an iteration of the PCG algorithm on some important architectures. The isoefficiency function of a combination of a parallel algorithm and a parallel architecture relates the problem size to the number of ....
Alan H. Karp and Horace P. Flatt. Measuring parallel processor performance. Communications of the ACM, 33(5):539--543, 1990.
....A framework for benchmark performance analysis, Roger Hockney [71] ffl Parallel Computers 2: Architectures, Programming and Algorithms, Roger Hockney and C. Jesshope [72] ffl Reevaluating Amdahl s law, J. Gustafson [73] ffl Measuring parallel processor performance, Allan Karp and H. Flatt [74]. 47 ffl Modeling the serial and parallel fractions of a parallel algorithm, E. Carmona and M. Rice [75] ffl Benchmarking parallel programs in a multiprogramming environment the PARBENCH system W. Nagel and M. Linn [76] ffl Toward a better parallel performance metric, X. Sun and J. ....
....time of the program by optimizing the cpu communication overlap and maintaining each processor as busy as possible. With these objectives in mind, this section describes some of the most useful metrics to analyze parallel programs: speedup, efficiency, experimentally determined serial fraction [74] and percentage of cpu communication overlap. The remaining 1 Overhead produced by the measuring mechanism. 56 metrics: percentage of idle time per processor, load and communication balancing and synchronization time are described in detail elsewhere [24] Speedup is the elapsed time of the ....
[Article contains additional citation context not shown here]
Allan Karp and Flatt H., "Measuring parallel processor performance", CACM, , no. 33, pp. 532--533, May 1990.
....can reach very misleading conclusions regarding the performance of a large parallel system if one attempts to simply extrapolate its performance based on that for a similar smaller system. Many different measures have been developed to study the scalability of parallel algorithms and architectures [27, 26, 17, 48, 23, 49, 12, 44, 33]. In this paper, we analyze the scalability of the FFT algorithm on a few important architectures using the isoefficiency metric developed by Kumar and Rao [25, 13] The isoefficiency function of a combination of a parallel algorithm and a parallel architecture relates the problem size to the ....
Alan H. Karp and Horace P. Flatt. Measuring parallel processor performance. Communications of the ACM, 33(5):539--543, 1990.
....problem to 13 increase at the rate necessary to maintain a fixed efficiency, then the parallel system should be considered unscalable from a practical point of view. 2. 3 Relationship between Isoefficiency and Other Metrics A number of scalability metrics have been proposed by various researchers [22, 34, 36, 55, 61, 59, 60, 75, 80, 98, 108, 131, 130, 132, 137, 142, 146, 149]. We present a detailed survey of these metrics in [84] After reviewing these various measures of scalability, one may ask whether there exists one measure that is better than all others [66] The answer to this question is no, as different measures are suitable for different situations. One ....
....system over the other can be done using the standard speedup metric. Note that for any fixed problem size W , the speedup on a parallel system will saturate or peak at some value S max (W ) which can also be used as a metric. Scalability issues for the fixed problem size case are addressed in [36, 75, 55, 105, 134, 146]. Another possible scenario is that in which a parallel computer with a fixed number of processors is being used and the best parallel algorithm needs to be chosen for solving a particular problem. For a fixed p, the efficiency increases as the problem size is increased. The rate at which the ....
Alan H. Karp and Horace P. Flatt. Measuring parallel processor performance. Communications of the ACM, 33(5):539--543, 1990.
....the overhead functions; and we use this simulator to study the scalability of five applications on shared memory platforms with different communication topologies. Several performance metrics such as speedup [2] scaled speedup [12] sizeup [30] experimentally determined serial fraction [14], and isoefficiency function [15] have been proposed for quantifying the scalability of parallel systems. While these metrics are extremely useful for tracking performance trends, they do not provide the information needed to understand the reason why an application does not scale well with an ....
Alan H. Karp and Horace P. Flatt. Measuring Parallel processor Performance. Communications of the ACM, 33(5):539--543, May 1990.
....The most important is execution time. Some metrics measure specific operations, such as instructions, cache misses, or execution pipeline stalls. These metrics do not relate the operations from one kernel to another, however. Others, such as speedup, sizeup [17] and measured serial fraction [5] attempt to relate some theoretical performance to the observed performance, but they are based on the full program resulting in one value per application; hence coupling is not represented. While the coupling parameter does relate a theoretical performance to an observed performance (kernels in ....
Alan H. Karp and Horace P. Flatt. Measuring parallel processor performance. Communications of the ACM, 33(5):539--543, May 1990.
....in the application or in the machine. Similarly, when a parallel system exhibits non ideal behavior in the memory constrained or time constrained scaling strategies, scaled speedup and sizeup fail to show whether the problem rests with the application and or the architecture. Three other metrics [34, 32, 43] attempt to address this deficiency. Isoefficiency function [34] tries to capture the impact of problem sizes along the application dimension and the number of processors along the architectural dimension. For a problem with a fixed size, the processor utilization (efficiency) normally decreases ....
....a growth that is linear or less is indicative of a more scalable hardware. Apart from providing a bound on achievable performance (Amdahl s law) the theoretical serial fraction of an application is not very useful in giving a realistic estimate of performance on actual hardware. Karp and Flatt [32] use an experimentally determined serial fraction f for a problem with a fixed size in evaluating parallel systems. f is computed by executing the application on the actual hardware and calculating the effective loss in speedup. On an ideal architecture with no overheads introduced by the ....
[Article contains additional citation context not shown here]
A. H. Karp and H. P. Flatt. Measuring Parallel processor Performance. Communications of the ACM, 33(5):539-- 543, May 1990.
....used to quantify scalability. A drawback with this definition is that an architecture would be considered non scalable if an algorithm running on it has a large sequential part. There have been several recent attempts at refining this notion and define new scalability metrics (see for instance [11, 7, 13]) With the evolution of parallel architectures along two dimensions, namely shared memory and message passing, there has been a considerable debate regarding their scalability given that there is a potential for large latencies for remote operations (message communication and remote memory ....
....6.31418 0.789 0.038141 16 126.51990 12.95340 0.810 0.015680 32 72.00830 22.75930 0.711 0.013097 Table 1: Conjugate Gradient, datasize=n = 14000; nonzeros = 2030000 remote caches) using the hardware performance monitor. Table 1 gives the speedup, efficiency, and the measured serial fraction 5 [11] for the CG algorithm on the KSR 1. Figure 7 shows the corresponding speedup curve for the algorithm. Up to about 4 processors, the insufficient sizes of the sub cache and local cache inhibits achieving very good speedups. However, notice that relative to the 4 processor performance the 8 and 16 ....
Alan H. Karp and Horace P. Flatt. Measuring parallel processor performance. Communications of the ACM, 33(5):539--543, May 1990.
....The most important is execution time. Some metrics measure specific operations, such as instructions, cache misses, or execution pipeline stalls. These metrics do not relate the operations from one kernel to another, however. Others, such as speedup, sizeup [13] and measured serial fraction [4] attempt to relate some theoretical performance to the observed performance, but they are based on the full program resulting in one value per application; hence coupling is not represented. While the coupling parameter does relate a theoretical performance to an observed performance (kernels in ....
Alan H. Karp and Horace P. Flatt. Measuring parallel processor performance. Communications of the ACM, 33(5):539--543, May 1990.
....Finally, wavefront array architectures combine systolic data pipelining with an asynchronous dataflow execution paradigm. Benchmarking of concurrent computers is a very problematic area, despite the existence of several test suites. There is a growing body of experience in this area, for example [Karp 90] and the references in [Bell 89] 4.2 The Future for Hardware There are at least four hardware related aspects which could be influential in determining the form of future generation machines. These are processor and interconnection technology, memory technology, storage technology and machine ....
A H Karp and H P Flatt, Measuring Parallel Processor Performance, CACM v33 n5, pp539-543, May 1990.
....speedup to be close to linear, thus indicating that arbitrarily large instances of these problems can be solved in fixed time by simply increasing p. In [16] Gupta and Kumar identify the classes of parallel systems which yield linear and sublinear time constrained speedup curves. Karp and Flatt [27] introduced experimentally determined serial fraction f as a new metric for measuring the performance of a parallel system on a fix sized problem. If S is the speedup on a p processor system, then f is defined as 1=S Gamma1=p 1 Gamma1=p . The value of f is exactly equal to the serial fraction ....
....has no other overheads) Smaller values of f are considered better. If f increases with the number of processors, then it is considered as an indicator of rising communication overhead, and thus an indicator of poor scalability. If the value of f decreases with increasing p, then Karp and Flatt [27] consider it to be an anomaly to be explained by phenomena such as superlinear speedup effects or cache effects. On the contrary, our investigation shows that f can decrease for perfectly normal programs. Assuming that the serial and the parallel algorithms are the same, f can be approximated by ....
[Article contains additional citation context not shown here]
Alan H. Karp and Horace P. Flatt. Measuring parallel processor performance. Communications of the ACM, 33(5):539--543, 1990.
.... may be modified by adding an overhead function term T o (p) to include the practical effects T (p) t s t p p T o (p) 1:2) The modified Amdahl s model has been used for evaluating parallel processing performance on different architectures, such as the vector supercomputers (see e.g. [18]) distributed memory multicomputer hypercube (see e.g. 13] and UMA shared memory multiprocessors (see e.g. 28] The overhead function T o (p) can be affected by the structure of the application which influences the necessity for communication, task dispatching algorithm used to control ....
A. H. Karp, H. P. Flatt, "Measuring parallel processor performance", Communications of the ACM, Vol. 33, No. 5, 1990, pp. 539-543.
....on the concept of scaled speedup, intensive research has been conducted in recent years in the area of performance evaluation. Some other definitions of speedup have also been proposed, such as generalized speedup, cost related speedup, and superlinear speedup. Interested readers can refer to [14, 9, 16, 7, 18, 2, 8] for details. This paper is organized as follows. In Section 2 we introduce the program model and some basic terminologies. More generalized speedup formulations for the three models of speedup are presented in Section 3. Speedup formulations for simplified cases are studied in Section 4. The ....
Karp, A. H., and Flatt, H. P. Measuring parallel processor performance. Communications of the ACM 33, 5 (May 1990), 539--543.
....on the concept of scaled speedup, intensive research has been conducted in recent years in the area of performance evaluation. Some other definitions of speedup have also been proposed, such as generalized speedup, cost related speedup, and superlinear speedup. Interested readers can refer to [14, 9, 16, 7, 18, 2, 8] for details. This paper is organized as follows. In Section 2 we introduce the program model and some basic terminologies. More generalized speedup formulations for the three models of speedup are presented in Section 3. Speedup formulations for simplified cases are studied in Section 4. The ....
Karp, A. H., and Flatt, H. P. Measuring parallel processor performance. CACM 33, 5 (May 1990), 539--543.
No context found.
A. Karp and H. Flatt. Measuring parallel processor performance. Communications of the ACM, 33(5), 537--543, May 1990.
No context found.
A.H. Karp and H.P. Flatt, Measuring Parallel Processor Performance, Comm. ACM, vol. 33, no. 5, pp. 539543, May 1990.
No context found.
A. H. Karp and H. P. Flatt, "Measuring Parallel Processor Performance," Communications of the ACM, Vol. 33, No. 5, 1990.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC