Hardware Acceleration of Database Operations

Size: px

Start display at page:

Download "Hardware Acceleration of Database Operations"

Ralph Hodge
5 years ago
Views:

1 Hardware Acceleration of Database Operations Jared Casper and Kunle Olukotun Pervasive Parallelism Laboratory Stanford University

2 Database machines n Database machines from late 1970s n Put some compute on the disk track/head/unit n Processors got faster, I/O performance did not n Processor could keep up with disk n No performance left on the table n Today's database machines n Made up of general purpose components n Massive amounts of memory n Very high speed interconnect n Tables, even databases, fit entirely within memory 2

3 Database Operation Acceleration n Processors can not keep up with memory n Join performance is at 100s of million tuples per second n 64-bit tuples 2-3 GB/s n Chips can get over 100 GB/s n Performance is being left on the table n Follow 10x10 rule, build accelerators n Three acceleration blocks n Selection, merge join, sort n Combine these to do a sort merge join n Goal is to keep up with memory 3

Select 1 0 1 1 1 0 0 1 C F E B E C A F E B A B E n Software implementation uses SIMD n Read data into SIMD register n Use SIMD shuffle operation to move selected data

4 Select C F E B E C A F E B A B E n Software implementation uses SIMD n Read data into SIMD register n Use SIMD shuffle operation to move selected data to one end of the register n Mask used as index into table for shuffle values n Unaligned write to append to output n Limited by SIMD width, number of SIMD registers 4

5 Select

6 Merge Join n Scan two sorted columns, output matching values n Can have associated values or record IDs n Output cross product when multiple values n Generally viewed as the free thing after sorting n More an indication of how slow sorting is n Software implementations have bad branching behaviour n Limits the IPC hard to keep up with memory 6

7 Merge Join Output is bitmask of equal keys with corresponding values Ready for input into the select block 7

8 Merge Sort st Pass nd Pass

9 Merge Sort Level 9

10 High Bandwidth Sort Merge Node 10

11 Sort Merge Join Sort, merge join, and select blocks are combined to perform an full sort merge join in hardware 11

12 Prototyping Platform - Maxeler 12

13 Select Throughput Throughput (GB/s) Memory System Saturated! Cardinality (%) n Software achieved 7 GB/s (33%) n STREAM achieved 12 GB/s (57%) % of Line Bandwidth 13

14 Select Resources 10 Throughput 400 MHz) Count (thousands) ROM bits 16:1 mux 4:1 mux registers Throughput (bytes/clock) 14

15 Merge Join Throughput Throughput (GB/s) m=1 m=2 m=3 m= % Total Line Throughput Output ratio Resources required is a quadratic function of desired bandwidth All in comparison logic, routing was the limiting factor Above 1.5x output, write bandwidth dominates Throughput above is input consumed 15

16 Sort throughput K 750K 1.5M 3M 6M 12.5M 25M 50M 100M 200M 400M 800M 1.6B 3.2B 6.4B 12.5B 25B 50B Million values per second 2 passes 3 passes 3 passes (projected) Size of Input Resources required is a linear function of desired input size Dominated by the memory required to hold working sets Recent CPU/GPU numbers ~300M 32-bit values per second 16

17 Sort Merge Join n Performance limited by intra-fpga link n Total throughput is 800 million tuples/second n ~6.5 GB/s n 8x previous work on software joins 17

18 Conclusions n FPGAs can be used to saturate memory bandwidth in ways that processors can not n Make the most of every byte read n In some cases, address bandwidth is just as important as raw data bandwidth n Scaling your design to high bandwidths can greatly influence the architecture n Think streaming n Next step is to interact with the rest of the system 18

19 Questions?

Hardware Acceleration of Database Operations

Hardware Acceleration of Database Operations Jared Casper and Kunle Olukotun Pervasive Parallelism Laboratory Stanford University {jaredc, kunle}@stanford.edu ABSTRACT As the amount of memory in database