### Table 3. Parameter Heterogeneity

2003

"... In PAGE 6: ... For each input parameter shown, the eight bars represent the values of that parameter for each of the eight processors, divided by the value of that parameter when measured over all eight processors. The heterogeneity in various measures for particular barriers is also summarized in Table3 . The parameter Mave refers to the average of the distribution fM.... ..."

Cited by 3

### Table 3. Examples of Parameter Heterogeneity

2003

"... In PAGE 8: ... The heterogeneity in various measures for particular barriers is also summarized in Table 3. The degree of parameter heterogeneity, such as in the measures of and P (Ljw) in the gures and Table3 , is perhaps higher than might be expected. Some (irregular) applications, such as Radiosity, are inherently heterogeneous, and thus the per-processor memory request measures... ..."

Cited by 3

### Table 1: Results for heterogeneous machine;;

"... In PAGE 4: ... An object of n points is represented bya(3n;; 3n) matrix, thus increasing consider- ably the problem size as compared to the 2D analysis. Table1 presents results for a machine com- posed by f244, 244, 161, 161, 60, 50, 49g Mflops processors, connected by a 100 Mbit... ..."

### Table 1: Variables in the design space of the heterogeneous multi-processors.

1998

Cited by 15

### Table 1: Variables in the design space of the heterogeneous multi-processors.

1998

Cited by 2

### Table 1: Floating point performance characteristics of individual cores of modern, multi-core processor architectures. DGESV and SGESV are the LAPACK subroutines for dense system solution in double precision and single precision respectively. Architecture Clock DP Peak SP Peak time(DGESV)/

2007

"... In PAGE 2: ... When combined with the size of the register file of 128 registers, it is capable of delivering close to peak performance on many common computationally intensive workloads. Table1 shows the difference in peak performance between single precision (SP) and double precision (DP) of four modern processor architectures; also, on the last column is reported the ratio between the time needed to solve a dense linear system in double and single precision by means of the LAPACK DGESV and SGESV respec- tively. Following the recent trend in chip design, all of the presented processors are multi-core architectures.... In PAGE 6: ... For the Cell processor (see Figures 7 and 8), parallel implementations of Algo- rithms 2 and 3 have been produced in order to exploit the full computational power of the processor. Due to the large difference between the single precision and double precision floating point units (see Table1 ), the mixed precision solver performs up to 7 and 11 faster than the double precision peak in the unsymmetric and symmetric, positive definite cases respectively. Implementation details for this case can be found in [7, 8].... ..."

### Table 3. Performance results for heterogeneous configurations

1999

"... In PAGE 9: ... The percentage of work performed by each machine can be obtained from this information. In Table3 we present the percentage of work performed... In PAGE 10: ...From Table3 we see that in general we see some improvement in timings as we progressively increase the number of processors in the configuration. The exception is the 128 node run using the 2 ANL machines.... ..."

Cited by 9

### Table 2 Specifications of the Eleven Heterogeneous Computers Machine

"... In PAGE 12: ... 5.2 Applications A small heterogeneous local network of 11 different Solaris and Linux workstations shown in Table2 is used in the experiments. The network is based on 100 Mbit Ethernet with a switch enabling parallel communications between the computers.... In PAGE 15: ... 7. Determination of a set with relatively few points used to build the speed functions of the processors X2-X5 whose specifications are shown in Table2 . As few as 6 points and 5 points are used to build an efficient speed function for matrix multiplication and LU factorization respectively with deviation approximately 5% from other speed functions built with more number of points.... In PAGE 15: ... Though the absolute speed must be obtained by multiplication of two dense non-square matrices, we observed that our serial version gives almost the same speeds for multiplication of two dense square matrices if the number of elements in a dense non-square matrix is the same as the number of elements in a dense square matrix. This is illustrated in Table 3 for computers X2-X5 whose specifications are shown in Table2 . Thus speed functions of the processors built using dense square matrices will be the same as those built using dense non-square matrices.... In PAGE 17: ... However allocation of a task to these computers, the size of which is greater than 36000000 and 81000000 for matrix-matrix multiplication and LU factorization respectively, will result in severe performance degradation of the parallel application. For each of these two applications, the largest problem size that can be solved on the network of heterogeneous networks shown in Table2 is just the sum of the largest sizes of the tasks that can be solved on each computer. There are three important issues in selecting a set of points to build a speed function of a processor: 1.... In PAGE 18: ... Speeds of the processors are assumed to be zero for problem sizes beyond their upper bounds. multiplication obtained using three sets of 6, 7, and 8 points and speed functions for LU factorization obtained using three sets of 5, 7, and 8 points for the computers X2-X5 whose specifications are shown in Table2 . It can be seen that 6 points and 5 points are enough to build an efficient speed function that fall within acceptable limits of deviation for matrix multiplication and LU factorization respectively.... ..."

### Table 5. Runtimes (in seconds) for heterogeneous parallelism on di erent numbers of processors using di erent data sizes.

1998

"... In PAGE 19: ... The speedups obtained when heterogeneous parallelism is implemented on di erent numbers of processors with di erent data sizes. Table5 shows the runtimes of the heterogeneous parallelism algorithm on di erent numbers of processors with di erent data sizes. This table shows: In the fth column (the number of processors is 6), runtimes are reported only for two data sizes (row 6 and 8).... ..."

Cited by 7

### Table 2: Results for an heterogeneous LAN with 8 processors Parameter runtime idle balance

1997

"... In PAGE 7: ...2.1 Local Area Heterogeneous Networks Table2 contains results for the algorithm without load balancing (NLB), PLB with parameter = 50 and nally PLB with a dynamic adaption of power... ..."

Cited by 1