Dark Silicon Accelerators for Database Indexing

Size: px

Start display at page:

Download "Dark Silicon Accelerators for Database Indexing"

Molly Townsend
5 years ago
Views:

1 Dark Silicon Accelerators for Database Indexing Onur Kocberber, Kevin Lim, Babak Falsafi, Partha Ranganathan, Stavros Harizopoulos

energy Challenge: CPUs ill- matched to server workloads Most of Rme

2 Dark Silicon and Big Data Challenges Data explosion Data growing faster than technology End of Free energy Higher density higher energy Challenge: CPUs ill- matched to server workloads Most of Rme wairng for data rather than compurng Need to specialize for data-centric workloads

Most ouen balanced tree or hash table Frequently accessed Hash Table

3 How Do Data- Centric Workloads Access Data? Databases create and use an index Data structures for fast data lookup Most ouen balanced tree or hash table Frequently accessed Hash Table Tree Indexing is pointer- intensive UnderuRlize general- purpose CPUs IPCs as low as 0.25 on OoO core

4 ContribuRon: Database Indexing Widget Index lookups on general- purpose CPUs: Pointer- intensive low IPC Time- intensive poor energy- efficiency Database Indexing Widget Dedicated hardware for database index lookups Full- service offload: core sleeps when widget runs Up to 65% less energy per query

5 Outline IntroducRon Indexing in Databases Indexing Widget Results

Modern Databases and Indexing Two types of contemporary in- memory databases: Column- store analy/cal processing Scale- out transac/on processing

6 Modern Databases and Indexing Two types of contemporary in- memory databases: Column- store analy/cal processing Scale- out transac/on processing Customer Date Product Customer Date Product Customer Date Product with DSS with OLTP Two fundamental indexing operarons Hash table probe Tree traversal

7 How Much Time is Spent Indexing? Measurement on Xeon 5670 CPU with HW Counters 100% ExecuEon Time 75% 50% 25% Tree / Hash Table Tree Hash Table Hash Table 0% Order Status Payment Query 2 Query 17 OLTP DSS Indexing can account for up to 73% of execution

Example: Hash Join SQL : SELECT A_name FROM

(60M rows) age Table A (2M rows) age 1 35 2

19 ❷ Probe 1 2 3 4 5 6 7 8 25 48 19 11 63

8 Example: Hash Join SQL : SELECT A_name FROM A,B WHERE A_age = B_age ❶ Build Table B (60M rows) age Table A (2M rows) age ❸ Result Hash Table (A) ❷ Probe Hash table probes dominate execution

9 Indexing with Hash Table Probes Key Hash FuncEon >> Compare? Buckets Hash Table Chains Each hash probe operaeon: à dynamic instrucrons: hash, then chase pointers à 50% memory ref.

10 Indexing with Tree Traversals SQL : SELECT A_Product,A_Customer FROM A WHERE A_age = 25 Index on A_age Key Tuple Ptr Customer Age Date Product Result

11 Indexing with Tree Traversals SQL : SELECT A_Product,A_Customer FROM A WHERE A_age = 25 Index on A_age Key Each index traversal : à 10K- 15K dynamic instrucrons: lots of pointer chasing à 50-60% memory ref.

12 Outline IntroducRon Indexing in Databases Indexing Widget Results

13 Indexing Widget Overview Dedicated offload engine for index lookups AcRvated on- demand by the core Full- service index lookup Core sleeps when widget runs Widget features Efficient: Specialized control and funcronal units Low- latency: Caches frequently- accessed index data Tightly- integrated: Uses core s L1- D and TLB

14 From Core Widget Details Configura3on Registers Index Addr. Key Search Type Result Table Addr. Data type Controller (FSM) ComputaEonal Logic Buffer (SRAM) ❶ Configure ❷ Run ❸ Return Hash Tree

Data type Controller (FSM) Hash Tree ComputaEonal Logic Buffer (SRAM) ❶ Configure

15 From Core Widget Details Configura3on Registers Index Addr. Key Search Type Result Table Addr. Data type Controller (FSM) Hash Tree ComputaEonal Logic Buffer (SRAM) ❶ Configure If (haswidget) {! widget.index=&a;! widget.key=&b;! widget.type=equal;! widget.result=&r;! widget.data= int;!!! widget.run();! } else {! Hashprobe(); }!!

16 From Core Widget Details Configura3on Registers Index Addr. Key Search Type Result Table Addr. Data type Controller (FSM) ComputaEonal Logic Buffer (SRAM) ❷ Run Hash Tree To/From L1

17 From Core Widget Details Configura3on Registers Index Addr. Key Search Type Result Table Addr. Data type Controller (FSM) ComputaEonal Logic Buffer (SRAM) ❸ Return Hash Tree Store To/From L1 &Result Table, Key &Result Table, Key &Result Table, Key

18 Methodology First- order analyrcal model ExecuRon traces: Pin ExecuRon profiling: Vtune, Oprofile Benchmark ApplicaRons OLTP: TPC- C on VoltDB DSS: TPC- H on MonetDB Model Parameters L1 / L2 / Off- chip latency: 2 / 12 / 200 cycles Widget buffer: 2- way set associarve cache Energy EsRmaRons Mcpat

19 Energy Efficiency with Indexing Widget ReducEon in Energy (%) Qry 17 Order S. Payment Qry 2 ReducRon over ConvenRonal OoO ReducRon over ARM- like OoO ApplicaEon Coverage (%) Up to 65% reduction in energy

20 Performance with Indexing Widget 4 Overall Speedup Qry 17 Order Status Payment Qry KB 1KB 2KB 4KB 8KB Widget Buffer Size Widget does not hurt performance

21 Conclusions Data explosion, dark silicon trends call for specializaron Rethinking of architectures to achieve efficiency Databases spend significant Rme in indexing Mostly pointer chasing: general purpose CPUs are poorly suited Augment CPU with indexing widget Dedicated offload engine: core sleeps when widget runs Improves efficiency: 65% less energy, 3x faster query execuron More challenges: Data types, data sharing, generalizaron

22 Thanks!

Meet the Walkers! Accelerating Index Traversals for In-Memory Databases"

Meet the Walkers! Accelerating Index Traversals for In-Memory Databases Onur Kocberber Boris Grot, Javier Picorel, Babak Falsafi, Kevin Lim, Parthasarathy Ranganathan Our World is Data-Driven! Data resides