Results 1 -
9 of
9
Query Processing Techniques for Solid State Drives
, 2009
"... Solid state drives perform random reads more than 100x faster than traditional magnetic hard disks, while offering comparable sequential read and write bandwidth. Because of their potential to speed up applications, as well as their reduced power consumption, these new drives are expected to gradual ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
Solid state drives perform random reads more than 100x faster than traditional magnetic hard disks, while offering comparable sequential read and write bandwidth. Because of their potential to speed up applications, as well as their reduced power consumption, these new drives are expected to gradually replace hard disks as the primary permanent storage media in large data centers. However, although they may benefit applications that stress random reads immediately, they may not improve database applications, especially those running long data analysis queries. Database query processing engines have been designed around the speed mismatch between random and sequential I/O on hard disks and their algorithms currently emphasize sequential accesses for disk-resident data. In this paper, we investigate data structures and algorithms that leverage fast random reads to speed up selection, projection, and join operations in relational query processing. We first demonstrate how a column-based layout within each page reduces the amount of data read during selections and projections. We then introduce FlashJoin, a general pipelined join algorithm that minimizes accesses to base and intermediate relational data. FlashJoin’s binary join kernel accesses only the join attributes, producing partial results in the form of a join index. Subsequently, its fetch kernel retrieves the attributes for later nodes in the query plan as they are needed. FlashJoin significantly reduces memory and I/O requirements for each join in the query. We implemented these techniques inside Postgres and experimented with an enterprise SSD drive. Our techniques improved query runtimes by up to 6x for queries ranging from simple relational scans and joins to full TPC-H queries.
Exploring Power-Performance Tradeoffs in Database Systems
"... Abstract — With the total energy consumption of computing systems increasing in a steep rate, much attention has been paid to the design of energy-efficient computing systems and applications. So far, database system design has focused on improving performance of query processing. The objective of t ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract — With the total energy consumption of computing systems increasing in a steep rate, much attention has been paid to the design of energy-efficient computing systems and applications. So far, database system design has focused on improving performance of query processing. The objective of this study is to experimentally explore the potential of power conservation in relational database management systems. We hypothesize that, by modifying the query optimizer in a DBMS to take the power cost of query plans into consideration, we will be able to reduce the power usage of database servers and control the tradeoffs between power consumption and system performance. We also identify the sources of such savings by investigating the resource consumption features during query processing in DBMSs. To that end, we provide an in-depth anatomy and qualitatively analyze the power profile of typical queries in the TPC benchmarks. We perform extensive experiments on a physical testbed based on the PostgreSQL system using workloads generated from the TPC benchmarks. Our hypothesis is supported by such experimental results: power savings in the range of 11 %- 22 % can be achieved by equipping the DBMS with a query optimizer that selects query plans based on both estimated processing time and power requirements. I.
Workload-Aware Database Monitoring and Consolidation
"... In most enterprises, databases are deployed on dedicated database servers. Often, these servers are underutilized much of the time. For example, in traces from almost 200 production servers from different organizations, we see an average CPU utilization of less than 4%. This unused capacity can be p ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In most enterprises, databases are deployed on dedicated database servers. Often, these servers are underutilized much of the time. For example, in traces from almost 200 production servers from different organizations, we see an average CPU utilization of less than 4%. This unused capacity can be potentially harnessed to consolidate multiple databases on fewer machines, reducing hardware and operational costs. Virtual machine (VM) technology is one popular way to approach this problem. However, as we demonstrate in this paper, VMs fail to adequately support database consolidation, because databases place a unique and challenging set of demands on hardware resources, which are not well-suited to the assumptions made by VM-based consolidation. Instead, our system for database consolidation, named Kairos, uses novel techniques to measure the hardware requirements of database workloads, as well as models to predict the combined resource utilization of those workloads. We formalize the consolidation problem as a non-linear optimization program, aiming to minimize the number of servers and balance load, while achieving near-zero performance degradation. We compare Kairos against virtual machines, showing up to a factor of 12 × higher throughput on a TPC-C-like benchmark. We also tested the effectiveness of our approach on real-world data collected from production servers
A Vision for Next Generation Query Processors and an Associated Research Agenda
"... Abstract. Query processing is one of the most important mechanisms for data management, and there exist mature techniques for effective query optimization and efficient query execution. The vast majority of these techniques assume workloads of rather small transactional tasks with strong requirement ..."
Abstract
- Add to MetaCart
Abstract. Query processing is one of the most important mechanisms for data management, and there exist mature techniques for effective query optimization and efficient query execution. The vast majority of these techniques assume workloads of rather small transactional tasks with strong requirements for ACID properties. However, the emergence of new computing paradigms, such as grid and cloud computing, the increasingly large volumes of data commonly processed, the need to support data driven research, intensive data analysis and new scenarios, such as processing data streams on the fly or querying web services, the fact that the metadata fed to optimizers are often missing at compile time, and the growing interest in novel optimization criteria, such as monetary cost or energy consumption, create a unique set of new requirements for query processing systems. These requirements cannot be met by modern techniques in their entirety, although interesting solutions and efficient tools have already been developed for some of them in isolation. Next generation query processors are expected to combine features addressing all of these issues, and, consequently, lie at the confluence of several research initiatives. This paper aims to present a vision for such processors, to explain their functionality requirements, and to discuss the open issues, along with their challenges. 1
HP Labs
"... Enterprises rely on decision support systems to influence critical business choices. At the same time, IT-related power costs are growing and are a key concern for enterprise executives. Yet, there is little work to date characterizing the power use of decision support systems. Towards this end, we ..."
Abstract
- Add to MetaCart
Enterprises rely on decision support systems to influence critical business choices. At the same time, IT-related power costs are growing and are a key concern for enterprise executives. Yet, there is little work to date characterizing the power use of decision support systems. Towards this end, we present the first holistic measurements and analysis of an audit-class system running the TPC-H decision support benchmark at the 300GB scale. We first provide a breakdown of the system’s power use into its core hardware components. We then explore its power-performance tradeoffs. This investigation shows that there is ample room to improve its energy use without sacrificing much performance. Moreover, the most energy-efficient configuration depends on the workload. These results suggest that, going forward, database software has an important role to play in optimizing for energy use.
Experimentation, Measurement, Performance.
"... Rising energy costs in large data centers are driving an agenda for energy-efficient computing. In this paper, we focus on the role of database software in affecting, and, ultimately, improving the energy efficiency of a server. We first characterize the power-use profiles of database operators unde ..."
Abstract
- Add to MetaCart
Rising energy costs in large data centers are driving an agenda for energy-efficient computing. In this paper, we focus on the role of database software in affecting, and, ultimately, improving the energy efficiency of a server. We first characterize the power-use profiles of database operators under different configuration parameters. We find that common database operations can exercise the full dynamic power range of a server, and that the CPU power consumption of different operators, for the same CPU utilization, can differ by as much as 60%. We also find that for these operations CPU power does not vary linearly with CPU utilization. We then experiment with several classes of database systems and storage managers, varying parameters that span from different query plans to compression algorithms and from physical layout to CPU frequency and operating system scheduling. Contrary to what recent work has suggested, we find that within a single node intended for use in scale-out (shared-nothing) architectures, the most energy-efficient configuration is typically the highest performing one. We explain under which circumstances this is not the case, and argue that these circumstances do not warrant a retargeting of database system optimization goals. Further, our results reveal opportunities for cross-node energy optimizations and point out directions for new scale-out architectures.
Assessing Data Deduplication Trade-offs from an Energy and Performance Perspective
"... Abstract—The energy costs of running computer systems are a growing concern: for large data centers, recent estimates put these costs higher than the cost of hardware itself. As a consequence, energy efficiency has become a pervasive theme for designing, deploying, and operating computer systems. Th ..."
Abstract
- Add to MetaCart
Abstract—The energy costs of running computer systems are a growing concern: for large data centers, recent estimates put these costs higher than the cost of hardware itself. As a consequence, energy efficiency has become a pervasive theme for designing, deploying, and operating computer systems. This paper evaluates the energy trade-offs brought by data deduplication in distributed storage systems. Depending on the workload, deduplication can enable a lower storage footprint, reduce the I/O pressure on the storage system, and reduce network traffic, at the cost of increased computational overhead. From an energy perspective, data deduplication enables a trade-off between the energy consumed for additional computation and the energy saved by lower storage and network load. The main point our experiments and model bring home is the following: while for non energy-proportional machines performance- and energy-centric optimizations have break-even points that are relatively close, for the newer generation of energy proportional machines the break-even points are significantly different. An important consequence of this difference is that, with newer systems, there are higher energy inefficiencies when the system is optimized for performance. I.
Resiliency-Aware Data Management
"... Computing architectures change towards massively parallel environments with increasing numbers of heterogeneous components. The large scale in combination with decreasing feature sizes leads to dramatically increasing error rates. The heterogeneity further leads to new error types. Techniques for en ..."
Abstract
- Add to MetaCart
Computing architectures change towards massively parallel environments with increasing numbers of heterogeneous components. The large scale in combination with decreasing feature sizes leads to dramatically increasing error rates. The heterogeneity further leads to new error types. Techniques for ensuring resiliency in terms of robustness regarding these errors are typically applied at hardware abstraction and operating system levels. However, as errors become the normal case, we observe increasing costs in terms of computation overhead for ensuring robustness. In this paper, we argue that ensuring resiliency on the data management level can reduce the required overhead by exploiting context knowledge of query processing and data storage. Apart from reacting on already detected errors, this was mostly neglected in database research so far. We therefore give a broad overview of the background of resilient computing and existing techniques from the database perspective. Based on the lack of existing techniques on data management level, we raise three fundamental challenges of resiliency-aware data management and present example use cases. Finally, our vision of resiliency-aware data management opens many directions of future work. Fundamental research, including the partial reuse of underlying mechanisms, would allow data management systems to cope with future hardware characteristics by effectively and efficiently ensuring resiliency.

