### TABLE II POLYMIX-4 RESULTS FOR TWO MAIN PHASES

Cited by 1

### Table 6: Main outcome of the validation phase

### Table 2. Main phases in the development of a special purpose tool phase special

"... In PAGE 14: ... The first implementation of a thermal building simulator in IDA (in 1995) was actually a re- implementation of a traditional model and this provided some opportunities for comparison also from this perspective. Table2 lists estimated development times for the two approaches from [Sahlin 1996]: ... ..."

### TABLE 2. Weight Simple Prune main training phase stopping conditions

### Table 3: Timings in seconds for the main phases of out-of-core LU factoriza-

1996

"... In PAGE 22: ...hen reading, so wewould not expect Eq. 22 to hold. Thus, the version of the algorithm that stores the matrix in pivoted form is expected to be faster. This is borne out by the timings presented in Table3 for an 8 #02 8 process mesh. These timings are directly comparable with those of Table 2, and show that the version of the algorithm that stores the matrix in pivoted form is faster by 10-15#25.... In PAGE 24: ... The M = 8000 case in Table 4 failed, presum- ably because PFS was not able to handle the need to simultaneously read 8Mbytes from each of 64 separate #0Cles. The M = 10000 case ran success- fully out-of-core, and the results in Table 4 should be compared with those in Table3 , from whichwe observe that increasing n g increases the time for I#2FO and factorization, but decreases the times for all other phases of the algorithm. The increase in I#2FO is an unexpected result since increasing n g should decrease the I#2FO cost.... In PAGE 26: ... Communication overhead, together with the #0Doating-point operation count, determines the performance of the computational phases of the algorithm as n g changes. The failure of the M = 8000 case in Table3 prompted us to devise a second way of implementing logically distributed #0Cles. Instead of opening a separate #0Cle for each process, the new method opens a single #0Cle and divides it into blocks, assigning one block to each process.... ..."

Cited by 19

### Table 7: Timings in seconds for the main phases of out-of-core LU factor-

1996

"... In PAGE 27: ... This is in contrast with the matrix multiplication phase which exhibits almost perfect speedup. In Table7 timings are presented for the case n g = 10 for an 8 #02 8 process mesh. Comparing these results #0Crst with those given in Table 4 for a physi- cally and logically distributed #0Cle, the decrease in the times for reading and writing is striking.... In PAGE 29: ... This again shows that as n g increases, thereby increasing the amount of data being read and written in each I#2FO operation, I#2FO performance starts to degrade quite signi#0Ccantly once n g is su#0Eciently large. Table 8 shows timings for the M = 10000 case for the same problem pa- rameters as in Table7 , but for n g = 5. Comparing the results in Tables 6, 7, and 8 we see that the time for writing data does not decrease montonically as n g increase, but is smallest for n g = 5.... ..."

Cited by 19

### Table 7: Timings in seconds for the main phases of out-of-core LU factor-

1996

"... In PAGE 26: ... This is in contrast with the matrix multiplication phase which exhibits almost perfect speedup. In Table7 timings are presented for the case n g = 10 for an 8 #02 8 process mesh. Comparing these results #0Crst with those given in Table 4 for a physi- cally and logically distributed #0Cle, the decrease in the times for reading and writing is striking.... In PAGE 28: ... This again shows that as n g increases, thereby increasing the amount of data being read and written in each I#2FO operation, I#2FO performance starts to degrade quite signi#0Ccantly once n g is su#0Eciently large. Table 8 shows timings for the M = 10000 case for the same problem pa- rameters as in Table7 , but for n g = 5. Comparing the results in Tables 6, 7, and 8 we see that the time for writing data does not decrease montonically as n g increase, but is smallest for n g = 5.... ..."

Cited by 19

### Table 3: Timings in seconds for the main phases of out-of-core LU factoriza-

1996

"... In PAGE 23: ...hen reading, so wewould not expect Eq. 22 to hold. Thus, the version of the algorithm that stores the matrix in pivoted form is expected to be faster. This is borne out by the timings presented in Table3 for an 8 #02 8 process mesh. These timings are directly comparable with those of Table 2, and show that the version of the algorithm that stores the matrix in pivoted form is faster by 10-15#25.... In PAGE 25: ... Thus, for the parameters of Table 4 the M = 5000 and M = 8000 cases #0Ct in core, so we just read in the whole matrix, factorize it using the standard ScaLAPACK routine P GETRF, and then write it out again. In Table 4 it takes about 58 seconds to perform an in-core factoriza- tion of a 5000 #02 5000 matrix, compared with 191 seconds for an out-of-core factorization #28see Table3 #29. The M = 8000 case in Table 4 failed, presum- ably because PFS was not able to handle the need to simultaneously read 8Mbytes from each of 64 separate #0Cles.... In PAGE 25: ... The M = 8000 case in Table 4 failed, presum- ably because PFS was not able to handle the need to simultaneously read 8Mbytes from each of 64 separate #0Cles. The M = 10000 case ran success- fully out-of-core, and the results in Table 4 should be compared with those in Table3 , from whichwe observe that increasing n g increases the time for I#2FO and factorization, but decreases the times for all other phases of the algorithm. The increase in I#2FO is an unexpected result since increasing n g should decrease the I#2FO cost.... In PAGE 27: ... Communication overhead, together with the #0Doating-point operation count, determines the performance of the computational phases of the algorithm as n g changes. The failure of the M = 8000 case in Table3 prompted us to devise a second way of implementing logically distributed #0Cles. Instead of opening a separate #0Cle for each process, the new method opens a single #0Cle and divides it into blocks, assigning one block to each process.... ..."

Cited by 19