### Table 1. Execution time of the bitonic sort algorithm using the GCel #28sec.#29.

1995

"... In PAGE 5: ...Table1 #29. Using keys of large size, a substantial speed-up of the application is reached.... In PAGE 5: ...substantial speed-up of the application is reached. Table1 shows that the map- ping kernel provides a decrease of execution time up to 39#25 #2828#25 using small keys#29. Overall we see a signi#0Ccant speed-up of applications using the mapping kernel.... ..."

Cited by 8

### TABLE VI SPEED-UP VALUES FOR BITONIC SORT IN PARALLEL RESPECTING THE SEQUENTIAL ALGORITHM

### Table 4: Comparison of the times taken by the split radix sort and the bitonic sort (n keys each with d bits). The constants in the theoretical times are very small for both algorithms. On the Connection Machine, the bitonic sort is implemented in microcode whereas the split radix sort is implemented in macrocode, giving the bitonic sort an edge. 10

1989

"... In PAGE 14: ... The split radix sort is fast in the scan model, but is it fast in practice? After all, our architectural justi cation claimed that the scan primitives bring the P-RAM models closer to reality. Table4 compares implementations of the split radix sort and Batcher apos;s bitonic sort [4] on the Connection Machine. We choose the bitonic sort for comparison because it is commonly cited as the most practical parallel sorting algorithm.... ..."

Cited by 138

### Table 7.2 summarizes the running times of the best known deterministic sorting algorithms for the hypercube over ascending ranges of the ratio n=p. MergeSort is listed rst because it is the best known sorting method (in the sense of worst case asymptotic complexity) when n p. The last column indicates that MergeSort remains the best known algorithm up to n = (p), at which point BitonicSort has the same complexity. The \hybrid quot; entry refers to an algorithm to be de ned and analyzed in Section 8.2.4. For n = (pq), where q = log3=2 p log log p, SmoothSort is the best known sorting algorithm and its complexity is given by the last entry in the table. Of course, when n exceeds p by a polynomial factor, CubeSort and ColumnSort also exhibit optimal complexity. Two more algorithms with this property will be described in Chapter 8.

### Table 3: Bitonic Sorting on SGI Power Challenge.

1998

Cited by 10

### Table 3: Bitonic Sorting on SGI Power Challenge with v PRAM processors

1998

Cited by 10

### Table 1 Timing results of heterogeneous process migration of the linpack and bitonic sort programs.

1999

"... In PAGE 7: ...Table 1 Timing results of heterogeneous process migration of the linpack and bitonic sort programs. As shown in Table1 , we have tested both programs with two di erent data sizes, which cause di erent size of data transmission (Tx Size) during process migration. The total costs of process migration can be split into three parts: the cost of scanning data structure of a migrating process (Scan), the cost of transmitting those data (Tx), and the cost of restoring them on a destination machine (Restore).... ..."

Cited by 3

### Table 2 lists the cycle statistics for basic and SIMDized bitonic sorts as well as the quick sort on the SPE. Note that the CPI is significantly lower with the optimized SIMD implementation (less than half of the CPI of basic bitonic sort and one third of the CPI of quick sort). Moreover the SIMDized implementation with heavy loop unrolling and branch avoidance result in smaller percentage of dependency and branch misses. Quick sort on the SPE has worse CPI and stall percentage than both basic and SIMDized bitonic sort, but as we have observed earlier it is faster than the basic bitonic sort, since it has a smaller total cycle count.

2007

"... In PAGE 9: ... It is important to point out that the pure SIMD nature of the SPEs, their large 128-bit register file, rich set of SIMD instruction set, and their lack of branch prediction hard- ware result in SIMDized bitonic sort to prevail over quick sort, which is not the case for SSE enhanced bitonic sort on the Intel processors such as the Xeon. Table2 : Cycle statistics metrics basic SIMD quick bitonic bitonic sort CPI (cycles per instruction) 2.26 1.... ..."

Cited by 2

### Table 2 shows the predicted and the measured ratios. Beside the already described possible early termination, there also seems to be the e ect that guarded S amp;M operations can speed up the algorithm on average. But nor the expected number of iterations of the block is yet known, neither a similar analysis as for Odd-Even Merge Sort.

1996

"... In PAGE 12: ...7778 1.7892 Table2 : Ratio of the runtimes of Periodic Balanced Sort and Bitonic Sort.... ..."

Cited by 7

### Table 2 shows the predicted and the measured ratios. Beside the already described possible early termination, there also seems to be the e ect that guarded S amp;M operations can speed up the algorithm on average. But nor the expected number of iterations of the block is yet known, neither a similar analysis as for Odd-Even Merge Sort.

1996

"... In PAGE 12: ...7778 1.7892 Table2 : Ratio of the runtimes of Periodic Balanced Sort and Bitonic Sort.... ..."

Cited by 7