### Table 2: IEEE 754 floating-point characteristics and limits.

"... In PAGE 8: ...06062 That question is addressed by examination of the output of this simple hoc program: func c() \ { eps = $1 x = $2 sum = x if (x == 0) return 1 for (n = 2; n lt;= 30; ++n) \ { term = x^n/factorial(n) if (abs(term/sum) lt; eps) return n sum += term } return n } e32 = 2^-23 e64 = 2^-52 e80 = 2^-63 e128 = 2^-112 proc q() { println c(e32,$1), c(e64,$1), c(e80,$1), c(e128,$1) } q(ln(1/2)) 10 17 19 29 Function c(eps,x) returns the number of terms needed to sum the Taylor series to an accuracy eps. Procedure q() prints the counts for the four IEEE 754 machine epsilons (see Table2 for characteristics of that system). The results are shown in Table 3.... ..."

### Table 8: Millions of floating-point operations and MFLOPS: no large clusters of eigen- values

"... In PAGE 10: ...orthogonalization of eigenvectors belonging to large clusters of eigenvalues is necessary with PSSYEVX. If eigenvalues are not clustered, PSSYEV needs about twice as many arithmetic operations as PSSYEVX (see Table8 ) and less communication (see Table 6) and because of the heavy usage of BLAS 1 routines the MFLOPS for large problems are lower than the measured ones of PSSYEVX. Otherwise PSSYEV can be much faster than PSSYEVX be- cause too much time is spent with computation on only one processor.... In PAGE 23: ... For example with n = 400; nb = 20; np = 4 each processor has a 200 200 matrix in its memory whereas with n = 400; nb = 16; np = 4 processor (0; 0) has a 208 208 matrix, processor (0; 1) has a 192 208 matrix, processor (1; 0) a 208 192 and processor (1; 1) a 192 192 matrix. As a result, the operations for PSSYEV without large clusters vary between 175 106 and 184 106 for nb = 20 (see Table8 ) and 170 106 and 195 106 for nb = 16. As expected, load balance is better for nb = 20, com-... In PAGE 25: ... The other reason is, that the sequential QR-algorithm within PSSYEV uses a lot of BLAS 1 routines and consequently reaches less MFLOPS per node than PSSYEVX. Although GA DIAG STD uses the same algorithm as PSSYEVX if there are no large clusters of eigenvalues the number of operations is much higher even for the node with the smallest number of operations (see Table8 ). Due to the poor load balance of GA DIAG STD, see Table 8, it is even higher than that of PSSYEV on the node with the highest operation count.... In PAGE 25: ... Although GA DIAG STD uses the same algorithm as PSSYEVX if there are no large clusters of eigenvalues the number of operations is much higher even for the node with the smallest number of operations (see Table 8). Due to the poor load balance of GA DIAG STD, see Table8 , it is even higher than that of PSSYEV on the node with the highest operation count. For large problems also the MFLOPS per node reached with GA DIAG STD are significantly lower than the ones reached by PSSYEV or PSSYEVX, mainly because it is completely based on BLAS 1 routines and therefore performance is additionally reduced by cache misses.... In PAGE 49: ... The values for F02FQFP were measured for the matrix dimensions given in Table 1. It can be seen from Table8 , that for no large clusters of eigenvalues the number of floating point operations is very much higher than for the other codes. Generally, F02FQFP needs less operations and lower execution time for the problems with one large cluster of eigenvalues (Table 9), but the values for the large problem show a strange behaviour: if the number of columns on each processor is even, (that is the case for the measurements for 4 nodes up to 32 nodes) the number of operations, compared to the values for no large clusters of eigenvalues, is reduced by a factor of up to 3.... In PAGE 50: ...The MFLOPS cannot be very high, since only BLAS 1 routines are used in F02FQFP. The values are nearly the same for all measurements in Table8 and for the small problems in Table 9. For the large problems in Table 9 the MFLOPS are lower for an even number of matrix columns per processor.... ..."

### Table 6: Floating point and integer performance specifications for the various CPUs

"... In PAGE 18: ... This observation is validated by other benchmarks where floating point and integer specifications, SPECfp95 and SPECint95, of various CPUs are compared[25,26]. Table6 shows the performance specifications for the Pentium, UltraSparc and the DEC Alpha. As noted earlier, compiler effects were quite significant, though the underlying phenomena could not be explained directly by the algorithm implementations.... ..."

### Table 3. Number of floating point operations per calculation for terrain following.

"... In PAGE 4: ...) The first step is to search the quadtree to determine which leaf node P is in. This is extremely fast, because two simple comparisons suffice to determine which child contains a given point in a non-leaf node (see Table3 ). One consequence of the high speed of this step is that it is not very important that the simple algorithm above does not lead to a well balanced quadtree, but rather to one where areas with many small polygons have more quadtree leaves than those with larger polygons.... ..."

### Table 5.6. Comparison with an algorithm using a more sophisticated interval Newton method.*

1987

Cited by 27

### Table 1. Floating-point Units

2004

"... In PAGE 6: ...7 and XPower. Table1 shows the resources, latencies and power val- ues for the floating-point units. For single precison, we used floating-point units of various depths and for division we used a approximate reciprocator (using a BRAM based lookup table) and a multiplier.... ..."

Cited by 2