Abstract:
Modern CPUs have instructions that allow basic operations to be performed on several data elements in parallel. These instructions are called SIMD instructions, since they apply a single instruction to multiple data elements. SIMD technology was initially built into commodity processors in order to accelerate the performance of multimedia applications. SIMD instructions provide new opportunities for database engine design and implementation. We study various kinds of operations in a database context, and show how the inner loop of the operations can be accelerated using SIMD instructions. The use of SIMD instructions has two immediate performance benefits: It allows a degree of parallelism, so that many operands can be processed at once. It also often leads to the elimination of conditional branch instructions, reducing branch mispredictions. We consider the most important database operations, including sequential scans, aggregation, index operations, and joins. We present techniques for implementing these using SIMD instructions. We show that there are significant benefits in redesigning traditional query processing algorithms so that they can make better use of SIMD technology. Our study shows that using a SIMD parallelism of four, the CPU time for the new algorithms is from 10 % to more than four times less than for the traditional algorithms. Superlinear speedups are obtained as a result of the elimination of branch misprediction effects. 1.
Citations
|
3148
|
Computer Architecture: A Quantitative Approach
– Hennessy, Patterson
- 1996
|
|
1651
|
R-trees: A dynamic index structure for spatial searching
– Guttman
- 1984
|
|
455
|
The ubiquitous B-tree
– Comer
- 1979
|
|
300
|
The K-D-B Tree: A Search Structure for Large Multidimensional Indexes
– Robinson
- 1981
|
|
152
|
Quad trees: a data structure for retrieval on composite keys
– Finkel, Bentley
- 1974
|
|
130
|
Dbmss on a modern processor: Where does time go
– Ailamaki, DeWitt, et al.
- 1999
|
|
116
|
Improved query performance with variant indexes
– O’Neil, Quass
- 1997
|
|
70
|
Database Architecture Optimized for the New Bottleneck: Memory Access
– Boncz, Manegold, et al.
- 1999
|
|
54
|
Cache Conscious Indexing for Decision-Support
– Rao, Ross
- 1999
|
|
49
|
Weaving relations for cache performance
– Ailamaki, DeWitt, et al.
- 2001
|
|
49
|
A reliable randomized algorithm for the closest-pair problem
– Dietzfelbinger, Hagerup, et al.
- 1997
|
|
40
|
Making B+ trees cache conscious in main memory
– Rao, Ross
- 2000
|
|
31
|
Improving index performance through prefetching
– Chen, Gibbons, et al.
- 2001
|
|
27
|
Optimizing multidimensional index trees for main memory access
– Kim, Cha, et al.
- 2001
|
|
18
|
What happens during a join? Dissecting cpu and memory optimization effects
– Manegold, Boncz, et al.
|
|
17
|
Conjunctive selection conditions in main memory
– Ross
- 2002
|
|
12
|
B-tree Indexes and CPU Caches
– Graefe, Larson
- 2001
|
|
6
|
A comparison of non-equijoin algorithms
– DeWitt, Naughton, et al.
- 1991
|
|
3
|
Multimedia extensions for general purpose microprocessors: A survey
– Slingerland, Smith
- 2005
|
|
3
|
instruction set user’s manual
– VIS
- 2000
|
|
2
|
IA-32 Intel Architecture software developer’s manual (order nu mber 245472
– Inc
- 2001
|
|
1
|
Optimization for decision support solutions
– Engine
- 1998
|
|
1
|
Intel C++ compiler user’s manual
– Inc
- 2001
|
|
1
|
Intel IA64 architecture software developer’s manual
– Inc
- 2001
|
|
1
|
Search within a page
– Strong, Markovsky, et al.
- 1979
|
|
1
|
Performance Software Inc. TimesTen 4.3 and Front-Tier 2.3 product descriptions, 2002. Available at http://www.timesten.com
– Ten
|