| Michael Cox, Narendra Bhandari, and Michael Shantz, Multi-level tex236 ture caching for 3D graphics hardware, Proceedings of the 25th Symposium on Computer Architecture, 1998. |
....systems have a limited amount of local texture memory, applications issue primitives in an order which exploits texture locality. The parallel API can reduce this locality since the rasterizers can interleave the rendering of several command streams. In architectures which use implicit caching [4, 10], the effectiveness of the cache can possibly be reduced. In architectures which utilize local texture memory as an explicit cache, texture management is complicated. In Argus, shared texture download is facilitated by shared memory, and locality of texture access is provided by the caching ....
M. Cox, N. Bhandari, and M. Shantz. Multi-Level Texture Caching for 3D Graphics Hardware. Proceedings of the 25 Symposium on Computer Architecture, 1998.
.... Hakura and Gupta examine different organizations for on chip cache architectures which are useful for exploiting locality of reference in texture filtering, texture magnification, and to a limited extent, repeated textures [5] Cox, Bhandari, and Shantz extend this work to multi level caching [3]. They demonstrate that on chip caches in conjunction with large off chip caches can be used to exploit all of the aforementioned forms of texture locality as well as inter frame texture locality. Thus, memory bandwidth requirements can be dramatically reduced for scenes in which the working set ....
....An obvious solution to this problem is caching. Many issues are resolved by integrating a small amount of high speed, on chip memory organized to match the access patterns of the texture system. According to our measurements (detailed in Section 5. 1) as well as data found in other literature [3, 5], it is quite reasonable to expect miss rates on the order of 1.5 per access. Many texture systems are capable of providing the computation for a trilinearly mip mapped fragment on every clock cycle. Thus, because there are eight texture accesses per cycle, the perfragment texel miss rate is 12 . ....
[Article contains additional citation context not shown here]
M. Cox, N. Bhandari, and M. Shantz. Multi-Level Texture Caching for 3D Graphics Hardware. Proceedings of the 25 th International Symposium on Computer Architecture, 1998.
....a cache works very well in handling texture locality. They have suggested that the right parameters are a 16 KB size, an associativity of 4 and texture blocking with a block width of 4. In our evaluations we use these parameters. Cox, Bhandari and Shantz have evaluated a second level of caching [2] and have shown that it could catch inter frame locality. Igehy, Eldridge and Proudfoot show in [8] that prefetching with a pixel buffer reaches the performance of a zero latency system if the external bus provides enough bandwidth. The industrial trends and the studies presented before suggest ....
Michael Cox, Narendra Bhandari, and Michael Shantz. Multi-level texture caching for 3d graphics hardware. In Proceedings of the 25th Internationnal Symposium on Computer Architecture, 1998.
....Doing so, a manufacturer could build such a machine with more processors and take advantage of the continuous performance increase of such chips. 3D accelerators often use texture caches to store the texels (pixel on the texture) used most recently. As texture mapping has a good locality [6, 3, 8], such cache enables manufacturers to reduce the external bus size. However, if such components are used in a multiprocessor configuration, texture locality is reduced because data are used in different processors [13, 7, 14] This phenomenon highly depends on the distribution scheme used. When ....
....PC graphics cards, a PC 3D accelerator manufacturer could not afford a cache size or a bus bandwidth increase if they were needed only by the multiprocessor configurations. On the other hand, the bus between the L2 cache (the graphic card memory) and the main memory has been shown by Cox et al. in [3] not to be a bottleneck and increasing its size in a parallel machine is feasible. Two main factors impact on the performance. As each processor is responsible for a statically interleaved fraction of the tiles, the smaller is the width, the better is the load balancing. On the other hand, the ....
[Article contains additional citation context not shown here]
M. Cox, N. Bhandari, and M. Shantz. Multi-level texture caching for 3D graphics hardware. In Proceedings of the 25th Internationnal Symposium on Computer Architecture, 1998.
....is important. Filtering is often used in texture mapping and requires eight different texels to draw one pixel on the screen. To reduce the traffic between the processor and the texture memory, recent PC 3D accelerators take advantage of a texture cache. As texture mapping has a good locality [11, 5, 13], the external memory bus size needed is reduced. However, if such components are used in a multiprocessor configuration, texture locality is reduced because data are used in different processors [22, 12, 23] This phenomenon highly depends on the distribution scheme used. When building a high ....
....PC graphics cards. A PC 3D accelerator manufacturer could not afford a cache size or a bus bandwidth increase if they were needed only by the multiprocessor configurations. On the other hand, the bus between the L2 cache (the graphic card memory) and the main memory has been shown by Cox et al. in [5] not to be a bottleneck and increasing its size in a parallel machine is feasible. Two main factors impact on the performance. As each processor is responsible for a statically interleaved fraction of the tiles, the smaller is the width, the better is the load balancing. On the other hand, the ....
[Article contains additional citation context not shown here]
M. Cox, N. Bhandari, and M. Shantz. Multi-level texture caching for 3D graphics hardware. In Proceedings of the 25th Internationnal Symposium on Computer Architecture, 1998.
....the detail measure d is colored using a linearly varying color map in the figure (b) 3. 4 Texture Caching Although the zerobit encoding scheme offers fast reconstruction of texel values, texture caching can improve the rendering performance by exploiting the locality property of texel reference [8, 4]. In our scheme, when a texel value is necessary, all texels in the 4 Theta4 Theta4 texture cell containing it is simultaneously reconstructed for efficiency. Rather than instantly throwing away used decompressed cells, storing them in a cache for the later use can possibly saves decoding ....
M. Cox, N. Bhandari, and M. Shantz. Multi-level texture caching for 3D graphics hardware. In Proceedings of the 25th Annual International Symposium on Computer Architecure, pages 86--97, June 1998.
....for the available memory bandwidth to providing adequate bandwidth for abundant computation capability. Thus, it has become increasingly critical to be thrifty with memory bandwidth. Texture caching is one effective technique that minimizes bandwidth demands by leveraging locality of reference [3, 5, 6]. Unfortunately, however, parallel rasterization algorithms diminish locality because work is divided into smaller tasks. Consequently, the efficiency of a parallel graphics architecture, as measured against a serial rasterizer, is in part determined by how well texture caching extends to parallel ....
....based on the number of cache sets to reduce conflict misses, thus leading to 6D texture tiling. This is explained in Section 4.3.1. Another important conclusion was that rasterization should also be done in a 4D tiled order rather than in a 2D scan line order to maximize locality. Cox et al. [3] examined the use of a large secondary cache as a mechanism to take advantage of frame to frame coherence in texture data, finding that the inter frame working set of texture data is on the order of several megabytes. Vartanian et al. 12] have evaluated the performance of texture caching with ....
M. Cox, N. Bhandari, and M. Shantz. Multi-Level Texture Caching for 3D Graphics Hardware. Proceedings of the 25 th International Symposium on Computer Architecture, 1998.
....As texture mapping becomes pervasive in 3D applications, particularly games, a critical design issue is effi ciently transferring bandwidth intensive texture images between host memory and graphics card. There are two models used in existing graphics architectures: CPU push vs. card pull [6]. Traditionally, the CPU pushes the texture images required by the rasterizer to the graphics card. Since the CPU has no knowledge of the rasterization process, the unit of transfer in the CPU push model is the entire texture image, even when only a small portion of it is actually needed. The ....
....models. Each texel takes four bytes in QuakeII and three bytes in Graz and Library. Using texel blocks as the minimum texture transfer unit significantly cuts down the texture traffic bandwidth, as well as texture memory size requirement, and yet is amenable to direct hardware implementations [6]. The other way to improve the texture memory access efficiency is to cache texture images for subsequent reuse. Existing 3D applications, and architects have paid close attention to intra frame texture locality [14] In contrast, inter frame texture locality is less explored in the literature, ....
M. Cox, N. Bhandari, and M. Shantz. Multi-Level Texture Caching for 3D Graphics Hardware. In Proceedings of ACM/IEEE International Symposium on Computer Architecture (ISCA), 1998.
....most of the earlier research on 3D graphics hardware tends to be less quantitative than qualitative. In particular, graphics hardware papers that discuss architectural tradeo s based on concrete measurements from real world applications only start to appear in the last two years [HG97, CB97, CBS98, C 98] As a result, it has been impossible to empirically compare various graphics architectural ideas on an unbiased basis, and thus advance the eld by drawing on lessons distilled from the comparisons. While it is di cult to solve the rst problem, we believe the graphics hardware ....
....texture mapping becomes pervasive in 3D applications, particularly PC games, e ciently moving bandwidth intensive texture images between host memory and graphics card increasingly becomes a critical design issue. There are two models used in existing graphics architectures: CPU push vs. card pull [CBS98] Traditionally, the CPU pushes the texture images that are going to be needed by the rasterizer to the graphics card. Since the CPU has no knowledge of the rasterization process, the unit of transfer in the CPU push model is the entire texture image, even when only a small portion of it is ....
[Article contains additional citation context not shown here]
M. Cox, N. Bhandari, and M. Shantz. Multi-Level Texture Caching for 3D Graphics Hardware. In Proceedings of ACM/IEEE International Symposium on Computer Architecture(ISCA), 1998.
No context found.
Michael Cox, Narendra Bhandari, and Michael Shantz, Multi-level tex236 ture caching for 3D graphics hardware, Proceedings of the 25th Symposium on Computer Architecture, 1998.
No context found.
Michael Cox, Narendra Bhandari, and Michael Shantz. MultiLevel Texture Caching for 3D Graphics Hardware. In Proceedings of the 25th Annual International Symposium on Computer Architecture, pages 86--97. IEEE Press, 1998.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC