19 citations found. Retrieving documents...
S. Sen and S. Chatterjee. Towards a theory of cache-efficient algorithms. In 11th ACM Symposium of Discrete Algorithms, pages 829--838, 2000.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Optimizing Graph Algorithms for Improved Cache Performance - Joon-Sang Park Michael (2002)   (Correct)

....improved cache performance using the Simplescalar simulator. 1. Introduction The topic of cache performance has been well studied in recent years. It has been clearly shown that the amount of processor memory traffic is the bottleneck for achieving high performance in many applications [4][21]. While cache performance has been well studied, much of the focus has been on dense linear algebra problems, such as matrix multiplication and FFT [4] 9] 15] 24] All of these problems possess very regular access patterns that are known at compile time. In this paper, we take a different ....

....better overall performance is a difficult problem. Modern microprocessors are including deeper and deeper memory hierarchies to hide the cost of cache misses. The performance of these deep memory hierarchies has been shown to differ significantly from predictions based on a single level of cache [21]. Different miss penalties for each level of the memory hierarchy as well as the TLB also play an important role in the effectiveness of cachefriendly optimizations. These penalties vary among processors and cause large variations in execution time. The area of graph problems are fundamental in ....

[Article contains additional citation context not shown here]

S. Sen, S. Chatterjee. Towards a Theory of CacheEfficient Algorithms. In Proc. of Symposium on Discrete Algorithms, 2000.


Algorithm Engineering for Parallel Computation - Bader, Moret, Sanders (2002)   (Correct)

....are applicable to high performance algorithm engineering. However, many of these tools need further refinement. For example, cache efficient programming is a key to performance but it is not yet well understood, mainly because of complex machinedependent issues like limited associativity [72, 75], virtual address translation [65] and increasingly deep hierarchies of high performance machines [31] A key question is whether we can find simple models as a basis for algorithm development. For example, cache oblivious algorithms [31] are efficient at all levels of the memory hierarchy in ....

S. Sen and S. Chatterjee. Towards a theory of cache-efficient algorithms. In Proc. 11th Ann. Symp. Discrete Algorithms (SODA-00), pages 829--838, San Francisco, CA, 2000. ACM-SIAM.


Cache-Friendly Implementations of Transitive Closure - Penner, Prasanna (2001)   (Correct)

....overall performance is a difficult problem Modem microprocessors are including deeper and deeper memory hierarchies to hide the cost of cache misses. The performance of these deep memory hierarchies has been shown to differ significantly from predictions based on a single level of cache [16]. Different miss penalties for each level of the memory hierarchy as well as the TLB also play an important role in the effectiveness of cachefriendly optimizations. These miss penalties vary from processor to processor and can cause large variations in experimental results. The All Pairs ....

....closure displays much longer running times. 2.3. Related Work A number of groups have done research in the area of cache performance analysis in implementing algorithms in recent years. Detailed cache models have been developed by Weikle, McKee, and Wulf in [20] and Sen and Chatterjee in [16]. Instead of eliminating cache misses, some groups develop methods to tolerate these misses. Multithreading has been discussed as one method of accomplishing this. Kwak and others discuss the effects of multithreading on cache performance in [11] A number of papers have discussed the ....

[Article contains additional citation context not shown here]

S. Sen, S. Chatterjee. Towards a Theory of CacheEfficient Algorithms. In Proc. of Symposium on Discrete Algorithms, 2000.


Scanning Multiple Sequences Via Cache Memory - Mehlhorn, Sanders (2003)   (Correct)

....Furthermore, their model can only be evaluated numerically for a particular set of parameters. The time for the evaluation grows at least quadratically in the number of memory accesses, whereas our analysis yields closed form and nearly matching upper and lower bounds. Sen and Chatterchjee [SC00] describe a general method for emulating external memory algorithms in direct mapped cache memory. The emulation multiplies the number of cache faults by two and increases the instruction count by Theta(B) for every cache fault. In particular, the emulation implies that m sequences can be ....

S. Sen and S. Chatterjee. Towards a theory of cache-efficient algorithms. SODA, pages 829--838, 2000.


Efficient Sorting Using Registers and Caches - Arge, Chase, Vitter.. (2000)   (4 citations)  (Correct)

....to and from it in blocks of fixed size (an I O operation) Computation is performed on data that is in fast memory. Computation time is usually not taken into account because the I O time dominates the total time. Algorithms are analyzed in this model in terms of number of I O operations. Others [11, 12, 16] have proposed cache models similar to the I O model. The cache is analogous to the (fast) main memory in the I O model. Main memory takes the place of disks, and is assumed to contain all the data. The following parameters are defined in the model: N = number of elements in the problem instance ....

....of registers. The merge heap is an obvious candidate, as it is accessed multiple times for every key, and is relatively small. Storing the heap in registers reduces the number of instructions needed to access it, and also avoids cache interference misses, though this effect is small at small order [16], by eliminating memory accesses in the heap. However, the small number of registers limits the order (k) of the merge. The small order merge executes less instructions, but also causes more cache misses. Because the processor is not stalling while accessing memory, the increase in misses does not ....

[Article contains additional citation context not shown here]

S. Sen and S. Chatterjee. Towards a theory of cache-efficient algorithms. In Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms, 2000.


Cache-Oblivious B-Trees - Bender, Demaine, Farach-Colton (2000)   (25 citations)  (Correct)

....at each memory level. While this leads to accurate time predictions, it makes it difficult to design and analyze optimal algorithms in these models. A second body of work concentrates on two level memory hierarchies, either in the context of memory and disk [4, 9, 18, 37, 38] or cache and memory [30, 20]. In such a model there are only a few parameters, making it relatively easy to design efficient algorithms. The motivation is that it is common for one level of the memory hierarchy to dominate the running time. The difficulty with this approach is that the programmer must focus efforts on a ....

S. Sen and S. Chatterjee. Towards a theory of cache-efficient algorithms. In Proc. 11th ACM-SIAM Sympos. Discrete Algorithms, pp. 829--838, San Francisco, Jan. 2000.


Cache-Oblivious Algorithms (Extended Abstract) - Frigo, al.   (Correct)

.... The general idea that divide and conquer enhances memory locality has been known for a long time [29] Previous theoretical work on understanding hierarchical memories and the I O complexity of algorithms has been studied in cache aware models lacking an automatic replacement strategy, although [10, 28] are recent 0 0.05 0.1 0.15 0.2 0.25 0 200 400 600 800 1000 1200 N iterative recursive Figure 4: Average time to transpose an N Theta N matrix, divided by N 2 . exceptions. Hong and Kung [21] use the red blue pebble game to prove lower bounds on the I O complexity of matrix ....

S. Sen and S. Chatterjee. Towards a theory of cacheefficient algorithms. Unpublished manuscript, 1999.


Recursive Array Layouts and Fast Matrix Multiplication - Chatterjee, Lebeck.. (1999)   (13 citations)  Self-citation (Chatterjee)   (Correct)

No context found.

S. Sen and S. Chatterjee. Towards a theory of cache-efficient algorithms. In Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms, pages 829--838, San Francisco, CA, Jan. 2000.


Nonlinear Array Layouts for Hierarchical Memory Systems - Chatterjee, Jain.. (1999)   (50 citations)  Self-citation (Chatterjee Lebeck)   (Correct)

No context found.

S. Sen, S. Chatterjee, and A. R. Lebeck. Towards a theory of cache-efficient algorithms. In preparation,Apr. 1999.


Towards a Theory of Cache-Efficient Algorithms - Sen, Chatterjee, Dumir (1999)   (13 citations)  Self-citation (Sen Chatterjee)   (Correct)

....of a memory hierarchy. In this paper, we address the issue of better and systematic utilization of caches starting from the algorithm design stage. Some of the results in this appeared in a preliminary form in the Proceedings of the Eleventh ACM SIAM Symposium on Discrete Algorithms 2000 [29]. This work is supported in part by DARPA Grant DABT63 98 1 0001, NSF Grants CDA 97 2637 and CDA 95 12356, The University of North Carolina at Chapel Hill, Duke University, and an equipment donation through Intel Corporation s Technology for Education 2000 Program. The views and conclusions ....

S. Sen and S. Chatterjee. Towards a theory of cache-efficient algorithms. In Proceedings of the Symposium on Discrete Algorithms, 2000.


Cache-Efficient Matrix Transposition - Chatterjee, Sen (2000)   (5 citations)  Self-citation (Sen Chatterjee)   (Correct)

....these techiques. In fact, Carter and Gatlin [10] conclude their recent paper saying What is needed next is a study of messy details not modeled by UMH (particularly cache associativity) that are important to the performance of the remaining steps of the FFT algorithm. In a companion paper [35], we propose a two level hierarchy to model the interaction between cache and main memory, that resembles the two level I O model but incorporates the two salient features of caches listed above. Somewhat surprisingly, the work in that paper shows that the constraint imposed by limited ....

....This way, we will account for the computation in cache also. In the context of the cache, we will continue to use M for cache size and for block size. The block size B is much smaller (about 4 8 elements as opposed to 1000) and referred to as the cache line. Therefore our cache model [35] has four parameters, namely and , one more than the I O model. Although we have chosen M for both the main memory size (in the context of I O model) and cache size (in the cache model) the reader should think of M as the size of the faster memory. A significant distinction between the two ....

[Article contains additional citation context not shown here]

S. Sen and S. Chatterjee. Towards a theory of cacheefficient algorithms. Submitted for publication, July 1999.


Cache-Efficient Matrix Transposition - Chatterjee, Sen (2000)   (5 citations)  Self-citation (Sen Chatterjee)   (Correct)

....these techiques. In fact, Carter and Gatlin [10] conclude their recent paper saying What is needed next is a study of messy details not modeled by UMH (particularly cache associativity) that are important to the performance of the remaining steps of the FFT algorithm. In a companion paper [35], we propose a two level hierarchy to model the interaction between cache and main memory, that resembles the two level I O model but incorporates the two salient features of caches listed above. Somewhat surprisingly, the work in that paper shows that the constraint imposed by limited ....

....otherwise. This way, we will account for the computation in cache also. In the context of the cache, we will continue to use M for cache size and B for block size. The block size B is much smaller (about 4 8 elements as opposed to 1000) and referred to as the cache line. Therefore our cache model [35] has four parameters, namely N#M#B and L, one more than the I O model. Although we have chosen M for both the main memory size (in the context of I O model) and cache size (in the cache model) the reader should think of M as the size of the faster memory. A significant distinction between the ....

[Article contains additional citation context not shown here]

S. Sen and S. Chatterjee. Towards a theory of cacheefficient algorithms. Submitted for publication, July 1999.


Nonlinear Array Layouts for Hierarchical Memory Systems - Chatterjee, Jain.. (1999)   (50 citations)  Self-citation (Chatterjee Lebeck)   (Correct)

....Our experimental implementations were based on C macros and functions, with no special compiler support. The observed performance is nonetheless quite respectable. It is our position that the ability to directly manipulate array layout has ramifications all the way up to algorithm design [50, 60], and is not something that compilers alone should manipulate. Replacing one layout by another is simple and easily mechanizable, but determining matching controlflow changes is significantly more complicated than loop tiling. Further research is needed to determine whether such changes can be ....

S. Sen, S. Chatterjee, and A. R. Lebeck. Towards a theory of cache-efficient algorithms. In preparation, Apr. 1999.


Scanning Multiple Sequences Via Cache Memory - Kurt Mehlhorn Peter (2003)   (Correct)

No context found.

S. Sen and S. Chatterjee. Towards a theory of cache-efficient algorithms. In 11th ACM Symposium of Discrete Algorithms, pages 829--838, 2000.


Software Methods to Improve Data Locality and Cache Behavior - Beyls (2004)   (Correct)

No context found.

S. Sen, S. Chatterjee, and N. Dumir. Towards a theory of cacheefficient algorithms. Journal of the ACM, 49(6):828--858, 2002. 2.4


Predicting Memory-Access Cost Based on Data-Access Patterns - Byna, Sun, Gropp, Thakur (2004)   (Correct)

No context found.

S. Sen and S. Chatterjee, "Towards a theory of Cache efficient algorithms", SODA, 2000


An Analytical Model for Energy Minimization - Claude Tadonki And (2003)   (Correct)

No context found.

S. Sen and S. Chatterjee . Towards a Theory of Cache-Efficient Algorithms . In SODA, 2000 .


Software Methods to Improve Data Locality and Cache Behavior - Beyls (2004)   (Correct)

No context found.

S. Sen, S. Chatterjee, and N. Dumir. Towards a theory of cacheefficient algorithms. Journal of the ACM, 49(6):828--858, 2002. 2.4


Cache-Oblivious B-Trees - Bender, Demaine, Farach-Colton (2000)   (25 citations)  (Correct)

No context found.

S. Sen and S. Chatterjee. Towards a theory of cache-efficient algorithms. In Proc. 11th ACM-SIAM Sympos. Discrete Algorithms, pp. 829--838, San Francisco, Jan. 2000.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC