3 citations found. Retrieving documents...
A. H. Karp, Bit reversal on uniprocessors, SIAM Rev., 38 (1996), pp. 1--26.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Fast Bit-Reversals On Uniprocessors And Shared-Memory.. - Zhang, Zhang (2001)   (Correct)

....is highly sensitive to how caches and memory hierarchies are used in the implementations. In other words, a fast bit reversal implementation must be cache e#ective. Several papers have well addressed the significance and e#ects of considering memory hierarchy to bit reversals (e.g. 2] [11], and [15] Besides the important usage for FFT, di#erent versions of bit reversal implementations can also be used as benchmark programs to evaluate the memory hierarchy of various computer systems. With the rapid development of RISC and VLSI technology, the speed of processors has increased ....

....to array Y in their bit reversal positions, i # for i = 1, N , where N = 2 n . The above program says that X is a bit reversal reordering of Y . The indices of i and i # of X and Y are represented by a sequence of n binary digits. Positions i and its bit reversal i # are defined in [11] as i = n 1 # j=0 a j 2 j and i # = n 1 # j=0 a j 2 n 1 j , where a j is either 0 or 1. For example, a 5 bit reversal of i = 10010 is i # = 01001. The bit reversal operations have following unique characteristics: First, in many implementations, each element in an array is used ....

[Article contains additional citation context not shown here]

A. H. Karp, Bit reversal on uniprocessors, SIAM Rev., 38 (1996), pp. 1--26.


Towards an Optimal Bit-Reversal Permutation Program - Carter, Gatlin (1998)   (6 citations)  (Correct)

....Given that each element is brought into cache only once, each page must be brought into TLB nearly b 2 = p 2b 1 k 1 times. The final section presents an optimized BitReverse program, and shows it is better than any other known method. This last task is made easier since a comprehensive study [K96] shows that Alan Karp s Hybrid bit reversal is superior to the 29 other algorithms he found in a thorough literature search. Our program beats Hybrid significantly. 2. The RoCol TM pebble game EQUIPMENT: Two buckets, labeled A and B. N pebbles (initially in A. An (infinitely large) Go ....

....cost (according to our experiments) when the data must be fetched from memory. The TLB has associativity 64 (it is fully associative) each page is 8 KBytes, and a TLB miss costs about 75 cycles. Figures 5 and 6 compare COBRA to the Hybrid program developed by Alan Karp. Karp s experiments [K96] demonstrate that on a wide variety of architectures, Hybrid is consistently either the best performing or near the best performing code, compared to 29 other methods he found in a thorough literature search. The size we selected for jaj and jcj were dependent on the size of cache. On the IBM ....

Karp, A.H., "Bit Reversal on Uniprocessors," SIAM Review, Vol 38, No. 1, pp 1-26, March 1996.


Memory Hierarchy Considerations for Fast Transpose and.. - Gatlin, Carter (1999)   (4 citations)  (Correct)

....the cache set. Consequently, a square matrix transpose problem needs to be unreasonably large to be TLB Murphy , but even modest sized bit reversals will be TLB Murphy. 2.1 Related Work In recent years, bit reversal reorderings have been extensively studied. In Alan Karp s excellent survey [11], thirty bit reversal reordering programs are compared, including Karp s own program which he calls the Hybrid method. In his experiments on a variety of computers, Hybrid is always competitive with the best algorithm and a large majority of the time is the best performing algorithm. For this ....

....the 12 cycles per element predicted. Again we attribute this minor difference to program overhead. On all three architectures our algorithm outperformed the Hybrid code, which was selected as a benchmark of performance as it was consistently the best or a top performing reordering algorithm in [11]. On the Power2 COBRA outperforms Hybrid by nearly a factor of 2 on large arrays, on the 21164 by a factor of 4 on large arrays, and by a factor of 3 on large arrays for the Ultrasparc2. See Table 5. IBM Power2 Digital Alpha 21164 Sun Ultrasparc 2 in cycles element Lower bound for E reg = ....

A. Karp. Bit reversals on uniprocessors. SIAM Review, 38(1):1--26, Mar. 1996.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC