54 citations found. Retrieving documents...
J. Hong and H. Kung. I/O-complexity: The red blue pebble game. In Proceedings of ACM Symposium on Theory of Computing, 1981.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Cache-Oblivious B-Trees - Bender, Demaine, Farach-Colton (2000)   (25 citations)  (Correct)

....memory hierarchies [AACS87, ACS87, ACFS94, ABZ96, ABZ02, RW94, Sav95, Vit01, VS94b] though the proliferation of parameters in these models makes them cumbersome for algorithm design. A second body of work concentrates on two level memory hierarchies, either main memory and disk [AV88, BV99, HK81, Vit01, VS94a] or cache and main memory [LL97, SC00] With these models the programmer must anticipate which level of the memory hierarchy is the bottleneck, resulting in a program that is less exible to di erent scale problems and that does not adapt when the bottleneck changes, e.g. as a ....

Jia-Wei Hong and H. T. Kung. I/O complexity: The red-blue pebble game. In Proceedings of the 13th Annual ACM Symposium on Theory of Computation, pages 326-333, Milwaukee, Wisconsin, May 1981.


Optimizing Graph Algorithms for Improved Cache Performance - Joon-Sang Park Michael (2002)   (Correct)

....1024 and 4096 nodes (1024 N 4096) Each data element is 8 bytes. Many processors currently on the market have in the range of 16 to 64 KB of level 1 cache and between 256 KB and 4 MB of level 2 cache. Many processors have a TLB with approximately 64 entries and a page size of 4 to 8 KB. In [11], it was shown that the lower bound on processor memory traffic was W(N C ) for the usual implementation of matrix multiply. By examining the data dependency graphs for both matrix multiplication and the Floyd Warshall algorithm, it can be shown that matrix multiplication reduces to the ....

J. Hong and H. Kung. I/O Complexity: The Red Blue Pebble Game. In Proc. of ACM Symposium on Theory of Computing, 1981.


Computing on Data Streams - Henzinger, Raghavan, Rajagopalan (1998)   (30 citations)  (Correct)

....In this context, they show (almost) tight upper and lower bounds for a large number of frequency moments and show how communication complexity techniques can be used to prove lower bounds on the space requirements. Our model appears at first sight to be closely related to papers on I O complexity [HK81], hierarchical memory [AACS87] paging [ST85] and competitive analysis [KMRS88] as well as external memory algorithms [VV96] However, our model is considerably more stringent: whereas in these papers on memory manage5 ment one can bring back (into fast memory) a data item that was previously ....

J-W Hong and H.T. Kung. I/O Complexity: the red-blue pebble game. Proc. ACM STOC, 326--333, 1981.


Quantitative Performance Modeling of Scientific Computations and.. - Toledo (1995)   (2 citations)  (Correct)

....steps is Theta(k ) Theta(4n) Theta(n) By repeating this strategy, we can compute T steps with proportional efficiency, saving a factor of Theta( p M) I O s over the naive method and using only a small constant factor more work, which results from redundant calculations. Hong and Kung [62] extend this result to save a factor of Theta(M 1=d =d) I O s for d dimensional meshes. We now describe a sufficient condition for the covering technique to work when all the iterations use the same update operator F . We associate each state variable x v with a vertex v in a directed graph G, ....

....algorithms have a global update operator. This section proves that the covering technique cannot substentially reduce the number of I O s, compared to the naive method, in algorithms with a global update operator. This lower bound is due to Leiserson, Rao and Toledo [74] Hong and Kung [62] devised a formal model for studying the I O requirements of out of core algorithms which use the covering techniques, called the red blue pebble game. The model assumes that an algorithm is given as a directed acyclic graph (dag) in which nodes represent intermediate values in the computation, as ....

[Article contains additional citation context not shown here]

J.-W. Hong and H. T. Kung. I/O complexity: the red-blue pebble game. In Proceedings of the 13th Annual ACM Symposium on Theory of Computing, pages 326--333, 1981.


Cache-Friendly Implementations of Transitive Closure - Penner, Prasanna (2001)   (Correct)

.... clearly shown that the amount of processor memory traffic is the bottleneck for achieving high performance in most applications [3, 17] While the topic of cache performance has been well studied, much of the focus has been on dense linear algebra problems, such as matrix multiplication and FFT [3, 10, 14, 21]. All of these problems possess very regular access patterns that are known at compile time. In this paper, we take a unique approach to this topic by focusing on the fundamental irregular problem of transitive closure. Optimizing cache performance to achieve better Supported by the US DARPA ....

....is reduced by a factor of r, where cache size is on the order of r2 compared with the baseline implementation. The maximum reduction factor in processor memory traffic to perform ordinary matrix multiplication given a limited internal memory is O( where M is the size of the internal memory [10]. Using the structure of the Floyd Warshall dependency graph, it can be shown: Theorem 3: Our USTR implementation of the Floyd Warshall algorithm is (asymptotically) optimal with respect to processor memory traffic. To illustrate this reduction in processor memory traffic we show results from ....

J. Hong and H. Kung. I/O Complexity: The Red Blue Pebble Game. In Proc. of ACM Symposium on Theory of Computing, 1981.


Lower Bounds for g-Cuts on Multi-Dimensional Rectangular Grids - Galtier (1996)   (Correct)

....for the minimum cut of grids into equal sized subsets have already been found by many researchers. An early proof for the two dimensional case is due to Hoffman, Martin and Rose [7] Since, a large number of proofs have been published for three and larger dimensional cases (See for instance [3, 8, 10, 12]) The best proof as far as we know extends to all dimensions and is described in [11] pp 223 226. The problem of cuts with another fl constant has only been approached in a coarse way. For instance, Vavasis [15] presents a lower bound to partition three dimensional graphs into p ....

....bound is asymptotically exact, but is defined as being C Delta f(n; p) where C is a constant that is not precisely calculated, and may be very large. Moreover, these techniques extend hardly to other dimensions. Some other research on the multi dimensional fl separator problem is presented in [8], but with also large C interdimensional multiply constants. More than that, no lower bound is known when the grid is rectangular. Why do we need to improve these constants For the automatic design of objectcodes on specified machines, compilers need to know the exact maximum amount of data a ....

[Article contains additional citation context not shown here]

Jia-Wei Hong and H. T. Kung. I/O complexity: the red-blue pebble game. In 13th Annual ACM Symposium on Theory of Computing, 1981.


External Memory Algorithms with Dynamically Changing Memory.. - Barve, Vitter (1998)   (2 citations)  (Correct)

....dynamically optimal memory adaptive algorithms. In Section 3, we present asymptotically tight resource consumption bounds for key problems such as permuting, sorting, FFT, permutation networks and matrix multiplication. Our lower bounds provide a reinterpretation of the lower bounds of [AV88] and [HK81] in a dynamic memory allocation context. In order to prove algorithms for the above problems to be dynamically optimal, we define natural, applicationspecific measures for the resource consumption at each I O step. The measures determine how efficiently an algorithm adapts to memory fluctuations. ....

....tree operations, and Theta(n for (standard) matrix multiplication and LU decomposition. Below, we prove the lower bounds implicit in Theorem 1 for permuting, sorting, FFT, permutation networks and matrix multiplication by reinterpreting the original I O lower bounds proved in [AV88] and [HK81, SV87] in a dynamic memory context. The lower bound for buffer tree operations can be proved by adapting the arguments of [AKL93] relating comparison tree lower bounds to I O lower bounds to the dynamic memory model. In Section 8 and Section 13, we present dynamically optimal algorithms for ....

[Article contains additional citation context not shown here]

J. W. Hong and H. T. Kung. I/O complexity: The red-blue pebble game. Proc. 13th Annual ACM Symp. on Theory of Computation, pages 326--333, may 1981.


Communication Lower Bounds for Distributed-Memory Matrix.. - Irony, Toledo   (Correct)

....in this paper. The lemma shows that a processor that accesses at most N elements of A, at most N elements of B, and contributes to the computation of at most N 4 DROR IRONY AND SIVAN TOLEDO elements of the product C = AB can perform at most O(N # N) useful arithmetic operations. Hong and Kung [12] proved a weaker form of this lemma. Their lemma only considers access to elements of A and B, not to contributions to elements of C, so it is too weak to be used in the proofs of our distributed memory lower bounds. Also, Hong and Kung stated the lemma using asymptotic notation, whereas we state ....

....data in the cache. The results themselves are not new, but they show how the proof technique that we use for the parallel communication bounds can be applied to the analysis of cache misses. Specifically, the bounds that we prove here are asymptotically the same as those proved by Hong and Kung [12] and again by Toledo [24] Our bounds, however, specify constants, unlike Hong and Kung s result which is stated using asymptotic notation. The constants here are slightly stronger than those given in [24] but the proof technique is similar. In e#ect, we are using Toledo s proof technique but the ....

[Article contains additional citation context not shown here]

J.-W. Hong and H. T. Kung. I/O complexity: the red-blue pebble game. In Proceedings of the 13th Annual ACM Symposium on Theory of Computing, pages 326--333, 1981.


Efficient Out-of-Core Algorithms for Linear Relaxation Using.. - Exte Nd Ed   (Correct)

....the entire state vector for one time step before proceeding to the next. This strategy causes Theta(T V ) I O transfers (I O s) to be made for a T step computation, if jV j 2M . For some graphs, however, there are more clever strategies that use many fewer I O s. For instance, Hong and Kung [5] allude to a method for a T step linear relaxation algorithm on a p n by p n mesh that uses only Theta(T n= p M ) I O s, where the primary memory has size M . The idea is illustrated in Figure 1. We load into primary memory the initial state of a k by k submesh S. With this information in ....

....cycles of a p n by p n multigrid algorithm, where n 2M , the number of I O s required for T cycles is Theta(T n) as well. Can the number of I O s be reduced for this multigrid computation We shall show in Section 4 that in the red blue pebble game model for I O proposed by Hong and Kung [5], the answer is no, even if redundant computations are allowed. The naive algorithm is optimal. The problem is essentially that informa tion propagates quickly in the multigrid because of its small diameter. Consequently, it is impossible to tile the graph with overlapping subgraphs so that each ....

[Article contains additional citation context not shown here]

J.-W. Hong and H.T. Kung. I/O complexity: the red-blue pebble game. In Proceedings of the 13th Annual ACM Symposium on Theory of Computing, pages 326--333, 1981.


A Survey of Out-of-Core Algorithms in Numerical Linear Algebra - Toledo (1999)   (9 citations)  (Correct)

....# M) less I O than the naive schedule. Can we do better yet We cannot. The following theorem states that when main memory cannot store more than about one sixth of one input matrix, the blocked schedule described above is asymptotically optimal. This result was originally proved by Hong and Kung [33]; we give here a novel proof which is both simple and intuitive. Note that the theorem does not bound the number of transfers required to multiply two matrices, only the number of transfers that is required if we multiply the matrices using the conventional algorithm. Other matrix multiplication ....

....for the partitioned algorithms and showed that the partitioned algorithms generate asymptotically fewer page faults (I O transfers) than conventional algorithms. Fischer and Probert [25] performed a similar analysis for Strassen s matrix multiplication algorithm. Hong and Kung proved in 1981 [33] that the partitioned schedule for matrix multiplication is asymptotically optimal in a model of I O that allows redundant computation and does not assume that I O is performed in blocks. They also proved lower bounds for other problems that are discussed in this survey, such as FFTs. Their bounds ....

[Article contains additional citation context not shown here]

J.-W. Hong and H. T. Kung. I/O complexity: the red-blue pebble game. In Proceedings of the 13th Annual ACM Symposium on Theory of Computing, pages 326--333, 1981.


Towards a Theory of Cache-Efficient Algorithms - Sen, Chatterjee, Dumir (1999)   (13 citations)  (Correct)

....for theoretical analysis or simplistic to the point of lack of predictiveness. Memory hierarchy models used by computer architects to design caches have numerous parameters and suffer from the first shortcoming [1, 26] Early algorithmic work in this area focussed on a two layered memory model[21] a very large capacity memory with slow access time (secondary memory) and a limited size faster memory (internal memory) All computation is performed on elements in the internal memory and there is no restriction on placement of elements in the internal memory (fully associative) The focus of ....

....Because of the tremendous difference in speeds, it ignores the cost of internal processing and counts only the number of I Os. Floyd [15] originally defined a formal model and proved tight bounds on the number of I Os required to transpose a matrix using two pages of internal memory. Hong and Kung [21] extended this model and studied the I O complexity of FFT when the internal memory size is bounded by M . Aggarwal and Vitter [4] further refined the model by incorporating an additional parameter B, the number of (contiguous) elements transferred in a single I O operation. They gave upper and ....

J. Hong and H. Kung. I/O complexity: The red blue pebble game. In Proceedings of ACM Symposium on Theory of Computing, 1981. 24


Cache-Oblivious B-Trees - Bender, Demaine, Farach-Colton (2000)   (25 citations)  (Correct)

....the relative speeds and block sizes at each memory level. While this leads to accurate time predictions, it makes it difficult to design and analyze optimal algorithms in these models. A second body of work concentrates on two level memory hierarchies, either in the context of memory and disk [4, 9, 18, 37, 38], or cache and memory [30, 20] In such a model there are only a few parameters, making it relatively easy to design efficient algorithms. The motivation is that it is common for one level of the memory hierarchy to dominate the running time. The difficulty with this approach is that the ....

J.-W. Hong and H. T. Kung. I/O complexity: The red-blue pebble game. In Proc. 13th ACM Sympos. Theory of Computation, pp. 326-- 333, Milwaukee, Wisconsin, May 1981.


Optimum Binary Search Trees On The Hierarchical Memory Model - Thite (2001)   (2 citations)  (Correct)

....The I O complexity of an algorithm is the cost of inputs and outputs between faster internal memory and slower secondary memory. Aggarwal and Vitter [AV88] proved tight upper and lower bounds for the I O complexity of sorting, computing the FFT, permuting, and matrix transposition. Hong and Kung [HK81] introduced an abstract model of pebbling a computation graph to analyze the I O complexity of algorithms. The vertices of the graph that hold pebbles represent data items that are loaded into main memory. With a limited number of pebbles available, the number of moves needed to transfer all the ....

J. Hong and H. Kung. I/O-complexity: The red blue pebble game. In Proceedings of ACM Symposium on Theory of Computing, 1981.


Efficient External-Memory Data Structures and Applications - Arge (1996)   (32 citations)  (Correct)

....chapter we first consider basic paradigms for designing I O efficient algorithms and then address the question of lower bounds in the I O model. Early work on I O complexity concentrated on sorting and sorting related problems. Initial theoretical work was done by Floyd [59] and by Hong and Kung [72] who studied matrix transposition and fast Fourier transformation in restricted I O models. The general I O model was introduced by Aggarwal and Vitter [5] and the notion of parallel disks was introduced by Vitter and Shriver [133] The latter papers also deal with fundamental problems such as ....

....in Section 2.3.2 consider a specific graph and prove a lower bound for playing the game on this graph. This result is then used to prove an I O lower bound on the reduce operation. We prove the lower bound for a number of specific blockings. 131 2.3. 1 The (M; B) Blocked Red Blue Pebble Game In [72] Hung and Kung defined a red blue pebble game played on directed acyclic graphs in order to define I O complexity. In their game there were no notion of blocks. Here we define a game which is also played on directed graphs with red and blue pebbles, but otherwise is rather different form the Hung ....

J. W. Hong and H. T. Kung. I/O complexity: The red-blue pebble game. In Proc. ACM Symp. on Theory of Computation, pages 326--333, 1981.


Parallel Pointer-Based Join Algorithms in Memory Mapped .. - Buhr, Goel, Nishimura, .. (1996)   (1 citation)  (Correct)

....both in sequential and parallel settings. The notions of block transfer and hierarchy are developed further in a parallel model in which memory consists of a tree of modules, where computation takes place at the leaves [6] I O complexity models start with a single disk and CPU with block transfer [18, 4] and continue through parallel disks with flat memory and hierarchical memory [35, 36] Our analytical model draws on ideas from several of these papers, though our intent is not to characterize the complexity of problems, but rather to predict performance on many real architectures. 2.3 Related ....

Hong, J.-W. and Kung, H. T. I/O Complexity: The RedBlue Pebble Game. In ACM STOC, pp. 326--333, 1981.


Communication Efficient Multi-processor FFT - Johnsson, Jacquemin, Krawitz (1991)   (8 citations)  (Correct)

....FFT. For details of the derivations see for instance [16, 17, 18] As the radix of the FFT increases the number of arithmetic operations decreases somewhat. However, the main advantage from an increased radix in architectures with a limited memory bandwidth is a reduced need for memory accesses [5, 6]. The number of real operations (leading terms only) and memory accesses for radix 2, 4, and 8 kernels are given in Table 1. The number of arithmetic operations for the radix 8 algorithm is approximately 20 less than that of the radix 2 algorithm. The exact number of multiplications and additions ....

.... A A A A A A A A A A A A A u u u [4] 4] 4] 4] 0] 0] 0] 0] Delta Delta Delta Delta Delta Delta A A A A A A u u Delta Delta Delta Delta Delta Delta A A A A A A u u Delta Delta Delta Delta Delta Delta A A A A A A u u Delta Delta Delta Delta Delta Delta A A A A A A u u [6] [6] 4] 4] 2] 2] 0] 0] Delta Delta A A u Delta Delta A A u Delta Delta A A u Delta Delta A A u Delta A u Delta A Delta Delta A A u Delta Delta A A u [7] 3] 5] 1] 6] 2] 4] 0] Index binary x(15) 1111 x(14) 1110 x(13) 1101 x(12) 1100 x(11) 1011 x(10) 1010 x(9) ....

[Article contains additional citation context not shown here]

J.W. Hong and H.T. Kung. I/O complexity: The red-blue pebble game. In Proc. of the 13th ACM Symposium on the Theory of Computation, pages 326--333. ACM, 1981.


Simple Randomized Mergesort on Parallel Disks - Rakesh Barve Dept (1996)   (41 citations)  (Correct)

....I O as a fundamental, frequently used operation during sorting. One approach to alleviate the effects of the I O bottleneck is to use parallel disk systems [HGK 94, PGK88, Uni89, GS84, Mag87] Aggarwal and Vitter [AV88] generalizing initial work done by Floyd [Flo72] and Hong and Kung [HK81] laid the foundation for I O algorithms by studying the I O complexity of sorting and related problems. The model they studied [AV88] considers an internal memory of size M and I O reads or writes that each result in a transfer of D blocks, where each block is comprised of B contiguous records, ....

J. W. Hong and H. T. Kung. I/O complexity: The red-blue pebble game. Proc. 13th Annual ACM Symp. on Theory of Computation, pages 326--333, may 1981.


Markov Analysis of Multiple-Disk Prefetching Strategies.. - Pai, Schäffer, Varman (1994)   (11 citations)  (Correct)

....subsystems [9, 16, 18] Performance evaluation of different multiple disk systems, and associated management strategies have been studied in [19, 12, 17, 5, 6, 10] for example. A number of analytic studies of I O performance for specific computational problems have been undertaken previously in [11, 20, 8, 1, 21, 2, 23, 22, 14, 4]. This paper suggests an analytic framework for the study of I O parallelism by undertaking a specific case study of prefetching as a means for improving I O performance in a multiple disk environment. In particular, we study the tradeoff between the average disk parallelism and the cache size ....

J.-W. Hong and H. T. Kung. I/O Complexity: The Red-Blue Pebble Game. In Proc. Thirteenth ACM Symp. on Theory of Computing, pages 326--333, 1981.


Cache-Efficient Matrix Transposition - Chatterjee, Sen (2000)   (5 citations)  (Correct)

....processing. Because of the tremendous difference in speeds, it ignores the cost of internal processing and counts only the number of I Os. Floyd [15] defined a formal model and proved tight bounds on the number of I Os required to transpose a matrix using two internal memory pages. Hong and Kung [24] extended this model and studied the I O complexity of FFT when the internal memory size is bounded by M . Aggarwal and Vitter [3] further refined the model by incorporating an additional parameter , the number of (contiguous) elements transferred in a single I O operation. They gave upper and ....

J. Hong and H. Kung. I/O complexity: The red blue pebble game. In Proceedings of ACM Symposium on Theory of Computing, 1981.


Optimum Binary Search Trees On The Hierarchical Memory Model - Shripad Thite University (2001)   (2 citations)  (Correct)

No context found.

J. Hong and H. Kung. I/O-complexity: The red blue pebble game. In Proceedings of ACM Symposium on Theory of Computing, 1981.


The Cost of Cache-Oblivious Searching - Bender, Brodal, Fagerberg, Ge.. (2003)   (Correct)

No context found.

J.-W. Hong and H. T. Kung. I/O complexity: The red-blue pebble game. In Proc. of the 13th Ann. ACM Symp. on Theory of Computation (STOC), pages 326-333, 1981.


Communication Lower Bounds for Distributed-Memory Matrix.. - Irony, Toledo (2004)   (Correct)

No context found.

J.-W. Hong and H. T. Kung. I/O complexity: the red-blue pebble game. In Proceedings of the 13th Annual ACM Symposium on Theory of Computing, pages 326-333, 1981.


Towards a Theory of Cache-Efficient Algorithms - Sen, Chatterjee (1999)   (13 citations)  (Correct)

No context found.

J. Hong and H. Kung. I/O complexity: The red blue pebble game. In Proceedings of ACM Symposium on Theory of Computing, 1981.


Cache-Oblivious B-Trees - Bender, Demaine, Farach-Colton (2000)   (25 citations)  (Correct)

No context found.

J.-W. Hong and H. T. Kung. I/O complexity: The red-blue pebble game. In Proc. 13th ACM Sympos. Theory of Computation, pp. 326-- 333, Milwaukee, Wisconsin, May 1981.


The Parallel Hierarchical Memory Model - Ben Juurlink And (1994)   (7 citations)  (Correct)

No context found.

J.W. Hong and H.T. Kung. I/O Complexity: The Red-Blue Pebble Game. In Proc. of the 13-th Annual ACM Symp. on Theory of Computing, pages 326--333, May 1981.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC