Results 1 - 10
of
12
The Yin and Yang of Processing Data Warehousing Queries on GPU Devices
"... Database community has made significant research efforts to optimize query processing on GPUs in the past few years. However, we can hardly find that GPUs have been truly adopted in major warehousing production systems. Preparing to merge GPUs to the warehousing systems, we have identified and addre ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
(Show Context)
Database community has made significant research efforts to optimize query processing on GPUs in the past few years. However, we can hardly find that GPUs have been truly adopted in major warehousing production systems. Preparing to merge GPUs to the warehousing systems, we have identified and addressed several critical issues in a threedimensional study of warehousing queries on GPUs by varying query characteristics, software techniques, and GPU hardware configurations. We also propose an analytical model to understand and predict the query performance on GPUs. Based on our study, we present our performance insights for warehousing query execution on GPUs. The objective of our work is to provide a comprehensive guidance for GPU architects, software system designers, and database practitioners to narrow the speed gap between the GPU kernel execution (the fast mode) and data transfer to prepare GPU execution (the slow mode) for high performance in processing data warehousing queries. The GPU query engine developed in this work is open source to the public. 1.
Dandelion: a compiler and runtime for heterogeneous systems
- in Proc. of the Twenty-Fourth ACM Symp. on Operating Systems Principles. ACM
"... Computer systems increasingly rely on heterogeneity to achieve greater performance, scalability and en-ergy efficiency. Because heterogeneous systems typi-cally comprise multiple execution contexts with differ-ent programming abstractions and runtimes, program-ming them remains extremely challenging ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
(Show Context)
Computer systems increasingly rely on heterogeneity to achieve greater performance, scalability and en-ergy efficiency. Because heterogeneous systems typi-cally comprise multiple execution contexts with differ-ent programming abstractions and runtimes, program-ming them remains extremely challenging. Dandelion is a system designed to address this pro-grammability challenge for data-parallel applications. Dandelion provides a unified programming model for heterogeneous systems that span diverse execution con-texts including CPUs, GPUs, FPGAs, and the cloud. It adopts the.NET LINQ (Language INtegrated Query) ap-proach, integrating data-parallel operators into general purpose programming languages such as C # and F#. It therefore provides an expressive data model and native language integration for user-defined functions, enabling programmers to write applications using standard high-level languages and development tools. Dandelion automatically and transparently distributes data-parallel portions of a program to available comput-ing resources, including compute clusters for distributed execution and CPU and GPU cores of individual nodes for parallel execution. To enable automatic execution of.NET code on GPUs, Dandelion cross-compiles.NET code to CUDA kernels and uses the PTask runtime [85] to manage GPU execution. This paper discusses the de-sign and implementation of Dandelion, focusing on the distributed CPU and GPU implementation. We evaluate the system using a diverse set of workloads. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author. Copyright is held by the Owner/Author(s).
Stealing webpages rendered on your browser by exploiting gpu vulnerabilities
- In: Proc. of the IEEE Symposium on Security and Privacy; 2014
, 2015
"... Abstract-Graphics processing units (GPUs) are important components of modern computing devices for not only graphics rendering, but also efficient parallel computations. However, their security problems are ignored despite their importance and popularity. In this paper, we first perform an in-depth ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Abstract-Graphics processing units (GPUs) are important components of modern computing devices for not only graphics rendering, but also efficient parallel computations. However, their security problems are ignored despite their importance and popularity. In this paper, we first perform an in-depth security analysis on GPUs to detect security vulnerabilities. We observe that contemporary, widely-used GPUs, both NVIDIA's and AMD's, do not initialize newly allocated GPU memory pages which may contain sensitive user data. By exploiting such vulnerabilities, we propose attack methods for revealing a victim program's data kept in GPU memory both during its execution and right after its termination. We further show the high applicability of the proposed attacks by applying them to the Chromium and Firefox web browsers which use GPUs for accelerating webpage rendering. We detect that both browsers leave rendered webpage textures in GPU memory, so that we can infer which webpages a victim user has visited by analyzing the remaining textures. The accuracy of our advanced inference attack that uses both pixel sequence matching and RGB histogram matching is up to 95.4%.
Red fox: an execution environment for relational query processing on GPUs
- In CGO
, 2014
"... Modern enterprise applications represent an emergent ap-plication arena that requires the processing of queries and computations over massive amounts of data. Large-scale, multi-GPU cluster systems potentially present a vehicle for major improvements in throughput and consequently over-all performan ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Modern enterprise applications represent an emergent ap-plication arena that requires the processing of queries and computations over massive amounts of data. Large-scale, multi-GPU cluster systems potentially present a vehicle for major improvements in throughput and consequently over-all performance. However, throughput improvement using GPUs is challenged by the distinctive memory and computa-tional characteristics of Relational Algebra (RA) operators that are central to queries for answering business questions. This paper introduces the design, implementation, and evaluation of Red Fox, a compiler and runtime infrastruc-ture for executing relational queries on GPUs. Red Fox is comprised of i) a language front-end for LogiQL which is a commercial query language, ii) an RA to GPU compiler, iii) optimized GPU implementation of RA operators, and iv) a supporting runtime. We report the performance on the full set of industry standard TPC-H queries on a single node GPU. Compared with a commercial LogiQL system imple-mentation optimized for a state of art CPU machine, Red Fox on average is 6.48x faster including PCIe transfer time. We point out key bottlenecks, propose potential solutions, and analyze the GPU implementation of these queries. To the best of our knowledge, this is the first reported end-to-end compilation and execution infrastructure that supports the full set of TPC-H queries on commodity GPUs.
Dynamic thread block launch: A lightweight execution mechanism to support irregular applications on gpus
- in Proceeding of the 42nd Annual International Symposium on Computer Architecuture (ISCA-42
, 2015
"... Abstract GPUs have been proven effective for structured applications that map well to the rigid 1D-3D grid of threads in modern bulk synchronous parallel (BSP) programming languages. However, less success has been encountered in mapping data intensive irregular applications such as graph analytics, ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract GPUs have been proven effective for structured applications that map well to the rigid 1D-3D grid of threads in modern bulk synchronous parallel (BSP) programming languages. However, less success has been encountered in mapping data intensive irregular applications such as graph analytics, relational databases, and machine learning. Recently introduced nested device-side kernel launching functionality in the GPU is a step in the right direction, but still falls short of being able to effectively harness the GPUs performance potential. We propose a new mechanism called Dynamic Thread Block Launch (DTBL) to extend the current bulk synchronous parallel model underlying the current GPU execution model by supporting dynamic spawning of lightweight thread blocks. This mechanism supports the nested launching of thread blocks rather than kernels to execute dynamically occurring parallel work elements. This paper describes the execution model of DTBL, device-runtime support, and microarchitecture extensions to track and execute dynamically spawned thread blocks. Experiments with a set of irregular data intensive CUDA applications executing on a cycle-level simulator show that DTBL achieves average 1.21x speedup over the original flat implementation and average 1.40x over the implementation with device-side kernel launches using CUDA Dynamic Parallelism.
GPU-accelerated Database Systems: Survey and Open Challenges?
"... Abstract. The vast amount of processing power and memory band-width provided by modern graphics cards make them an interesting platform for data-intensive applications. Unsurprisingly, the database research community identified GPUs as effective co-processors for data processing several years ago. I ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract. The vast amount of processing power and memory band-width provided by modern graphics cards make them an interesting platform for data-intensive applications. Unsurprisingly, the database research community identified GPUs as effective co-processors for data processing several years ago. In the past years, there were many ap-proaches to make use of GPUs at different levels of a database system. In this paper, we explore the design space of GPU-accelerated database management systems. Based on this survey, we present key properties, important trade-offs and typical challenges of GPU-aware database ar-chitectures, and identify major open challenges. Additionally, we sur-vey existing GPU-accelerated DBMSs and classify their architectural properties. Then, we summarize typical optimizations implemented in GPU-accelerated DBMSs. Finally, we propose a reference architecture, indicating how GPU acceleration can be integrated in existing DBMSs. 1
Oncilla: GAS GPU Cluster Model Collaboration with
"... �On-line and off-line analysis �Retail analysis �Forecasting �Pricing �Etc… �Combination of relational data queries and computational kernels �Current applications process 1 to 50 TBs of data [1] �Not a traditional domain for GPU acceleration, but: �Parallel queries experience good speedup on GPUs [ ..."
Abstract
- Add to MetaCart
(Show Context)
�On-line and off-line analysis �Retail analysis �Forecasting �Pricing �Etc… �Combination of relational data queries and computational kernels �Current applications process 1 to 50 TBs of data [1] �Not a traditional domain for GPU acceleration, but: �Parallel queries experience good speedup on GPUs [2] �GPU-related techniques can be applied to other “Big Data ” problems like irregular graphs, sorting
Hetero-DB: Next Generation High-Performance Database Systems by Best Utilizing Heterogeneous Computing and Storage Resources
, 2015
"... Abstract With recent advancement on hardware technologies, new general-purpose high-performance devices have been widely adopted, such as the graphics processing unit (GPU) and solid state drive (SSD). GPU may offer an order of higher throughput for applications with massive data parallelism, compar ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract With recent advancement on hardware technologies, new general-purpose high-performance devices have been widely adopted, such as the graphics processing unit (GPU) and solid state drive (SSD). GPU may offer an order of higher throughput for applications with massive data parallelism, compared with the multicore CPU. Moreover, new storage device SSD is also capable of offering a much higher I/O throughput and lower latency than a traditional hard disk device (HDD). These new hardware devices can significantly boost the performance of many applications; thus the database community has been actively engaging in adopting them into database systems. However, the performance benefit cannot be easily reaped if the new hardwares are improperly used. In this paper, we propose Hetero-DB, a high-performance database system by exploiting both the characteristics of the database system and the special properties of the new hardware devices in system’s design and implementation. Hetero-DB develops a GPU-aware query execution engine with GPU device memory
Load-Aware Inter-Co-Processor Parallelism in Database Query ProcessingI
"... For a decade, the database community has been exploring graphics process-ing units and other co-processors to accelerate query processing. While the developed algorithms often outperform their CPU counterparts, it is not ben-eficial to keep processing devices idle while over utilizing others. Theref ..."
Abstract
- Add to MetaCart
(Show Context)
For a decade, the database community has been exploring graphics process-ing units and other co-processors to accelerate query processing. While the developed algorithms often outperform their CPU counterparts, it is not ben-eficial to keep processing devices idle while over utilizing others. Therefore, an approach is needed that efficiently distributes a workload on available (co-)processors while providing accurate performance estimates for the query optimizer. In this paper, we contribute heuristics that optimize query pro-cessing for response time and throughput simultaneously via inter-device parallelism. Our empirical evaluation reveals that the new approach achieves speedups up to 1.85 compared to state-of-the-art approaches while preserving accurate performance estimations. In a further series of experiments, we eval-uate our approach on two new use cases: joining and sorting. Furthermore, we use a simulation to assess the performance of our approach for systems with multiple co-processors and derive some general rules that impact performance in those systems.
Concurrent Analytical Query Processing with GPUs
"... In current databases, GPUs are used as dedicated accel-erators to process each individual query. Sharing GPUs among concurrent queries is not supported, causing serious resource underutilization. Based on the profiling of an open-source GPU query engine running commonly used single-query data wareho ..."
Abstract
- Add to MetaCart
(Show Context)
In current databases, GPUs are used as dedicated accel-erators to process each individual query. Sharing GPUs among concurrent queries is not supported, causing serious resource underutilization. Based on the profiling of an open-source GPU query engine running commonly used single-query data warehousing workloads, we observe that the uti-lization of main GPU resources is only up to 25%. The underutilization leads to low system throughput. To address the problem, this paper proposes concurrent query execution as an effective solution. To efficiently share GPUs among concurrent queries for high throughput, the major challenge is to provide software support to control and resolve resource contention incurred by the sharing. Our so-lution relies on GPU query scheduling and device memory swapping policies to address this challenge. We have im-plemented a prototype system and evaluated it intensively. The experiment results confirm the effectiveness and perfor-mance advantage of our approach. By executing multiple GPU queries concurrently, system throughput can be im-proved by up to 55 % compared with dedicated processing. 1.