Demystifying gpu microarchitecture through microbenchmarking (2010)

Cached

Download Links

by Henry Wong , Misel-myrto Papadopoulou , Maryam Sadooghi-alv , Andreas Moshovos
Venue:In ISPASS
Citations:6 - 0 self

Active Bibliography

Compact Data Structure and Scalable Algorithms for the Sparse Grid Technique – Josef Weidendorfer, Gerrit Buse, Daniel Butnaru, Dirk Pflüger, Technische Universität München
CrystalGPU: Transparent and Efficient Utilization of GPU Power – Abdullah Gharaibeh, Samer Al-kiswany, Matei Ripeanu
Author manuscript, published in "Computers and mathematics with applications (2010)" DOI: 10.1016/j.camwa.2010.01.054 – unknown authors - 2011
1 Data Layout Transformation for Structured-Grid Codes on GPU – I-jui Sung, Wen-mei Hwu
7 Inter-Block GPU Communication via Fast Barrier Synchronization – Shucai Xiao, Wu-chun Feng
2 Data Layout Transformation Exploiting Memory-Level Parallelism in Structured Grid Many-Core Applications – I-jui Sung, John A. Stratton, Wen-mei W. Hwu
1 Streamlining GPU Applications On the Fly —Thread Divergence Elimination through Runtime Thread-Data Remapping – Eddy Z. Zhang, Yunlian Jiang, Ziyu Guo, Xipeng Shen
Architecture-Aware Mapping and Optimization on a 1600-Core GPU – Mayank Daga, Thomas Scogl, Wu-chun Feng
3 Correctly treating synchronizations in compiling fine-grained spmd-threaded programs for cpu – Ziyu Guo, Eddy Z. Zhang, Xipeng Shen - 2011
University of Alberta Jit4OpenCL: A Compiler from Python to OpenCL – Xunhao Li, José Nelson Amaral, Computing Science, Duane Szafron, Computing Science
19 Efficient sparse matrix-vector multiplication on CUDA – Nathan Bell, Michael Garl - 2008
29 Implementing sparse matrix-vector multiplication on throughput-oriented processors – Nathan Bell, Michael Garland - 2009
Kernels for Multi-Core CPUs – John A. Stratton, Sam S. Stone, Wen-mei W. Hwu
Efficient Mapping of Multiresolution Image Filtering Algorithms on Graphics Processors – Richard Membarth, Frank Hannig, Hritam Dutta, Jürgen Teich
Acceleration of Multiresolution Imaging Algorithms: A Comparative Study – Richard Membarth, Hritam Dutta, Frank Hannig, Jürgen Teich
University of Illinois at Urbana-Champaign Center for Reliable and High-Performance Computing – John A. Stratton, Sam S. Stone, Wen-mei W. Hwu - 2008
Parallel Hyperbolic PDE Simulation on Clusters: Cell versus GPU – Scott Rostrup, Hans De Sterck
Distributed Stream Processing with DUP – Kai Christian Bader, Tilo Eißler, Nathan Evans, Chris Gauthierdickey, Christian Grothoff, Krista Grothoff, Jeff Keene, Harald Meier, Craig Ritzdorf, Matthew J. Rutherford, Technische Universität München
1 Gaussian Elimination Based Algorithms on the GPU – Aydın Buluç , John R. Gilbert , Ceren Budak - 2008