Meet the Walkers! Accelerating Index Traversals for In-Memory Databases"

Size: px

Start display at page:

Download "Meet the Walkers! Accelerating Index Traversals for In-Memory Databases""

Rudolf Miles Fletcher
5 years ago
Views:

1 Meet the Walkers! Accelerating Index Traversals for In-Memory Databases Onur Kocberber Boris Grot, Javier Picorel, Babak Falsafi, Kevin Lim, Parthasarathy Ranganathan

structures Many requests, abundant parallelism

2 Our World is Data-Driven! Data resides in huge databases Most frequent task: find data Indexes used for fast data lookup Indexing efficiency is critical Rely on pointer-intensive data structures Many requests, abundant parallelism Power-limited hardware Index! Data! Need high-throughput and energy-efficient index lookups 2

3 Index Lookups on General-Purpose Cores Index Lookups! OoO Cores! Data in memory Pointer-chasing! Low MLP Inherent parallelism Limited OoO inst. window Index Lookups OoO Core Inst. Window One lookup at a time Memory OoO cores are ill-matched to indexing 3

4 Widx: an Indexing Widget Specialized: Custom HW for index lookups Switch fewer transistors per lookup Parallel: Multiple lookups at once Extract parallelism beyond the OoO exec. window Programmable: Simple RISC cores Target a wide range of DBMSs 3x higher throughput, 5.5x energy reduction vs. OoO 4

5 Outline Introduction Indexing in database systems Indexing inefficiencies in modern processors Widx Evaluation highlights Summary 5

Hash index: fundamental index structure Dominant

6 Modern Databases & Indexing Indexes are essential for all database operators Data structures for fast data lookup Hash index: fundamental index structure Dominant operation: join via hash index K! Hash Index R T C Q E K Walk 6

7 Join via Hash Index Lookup Join: find on the index matching for every values entry in A in and A B A Hash B Index on B Join Walk 7

Time 100 75 50 25 0 Index Scan Sort & Join Other 2 3 5 7 8 9

8 How Much Time is Spent Indexing?! Measurement on Xeon 5670 CPU with 100GB Dataset % of Execution Time Index Scan Sort & Join Other TPC-H TPC-DS Indexing is the biggest contributor to execution time 8

low cache locality Next lookup: Inherently

9 Dissecting Index Lookups Hash: Avg. 30% time of each lookup Computationally intensive, high cache locality Walk: Avg. 70% time of each lookup Trivial computation, low cache locality Next lookup: Inherently parallel Beyond the inst. window capacity OoO Core Inst. Window 9

10 Outline Introduction Indexing in Database Systems Indexing inefficiencies in modern processors Widx Evaluation highlights Summary 10

11 Roadmap for! Efficient and High-Throughput Indexing 1. Specialize Customize hardware for hashing and walking 2. Parallelize Perform multiple index lookups at a time 3. Generalize Use a programmable building block 11

12 Step 1: Specialize Design a dedicated unit for hash and walk Hash: compute hash values from a key list Walk: access the hash index and follow pointers Hash Walk General-purpose Specialized OoO hash and walk hardware 12

13 Step 2: Parallelize Serial Decoupled Decoupled & Parallel Time H W H W H H W

14 Widx unit: common building block for hash and walk Two-stage RISC core Custom ISA Widx units are programmable Step 3: Generalize Execute functions written in Widx ISA Support limitless number of data structure layouts hash( ) H walk( ) W Widx unit 14

15 Putting it all together: Widx When Widx runs, core goes idle Widx Walkers W W W OoO Core Result Hash Producer W Widx H P MMU L1 Simple, parallel hardware

16 Development Write code for each unit and compile for Widx ISA Hash DBMS Index Code Programming Model hash (arg1, arg2, ) { } Walk walk (arg1, arg2, ) { } Res. Produce emit (arg1, arg2, ) { } Execution Communicate Load the code query-specific inputs H W P 16

17 Outline Introduction Indexing in Database Systems Indexing inefficiencies in modern processors Widx Evaluation highlights Summary 17

18 Methodology Flexus simulation infrastructure [Wenisch '06] Benchmarks TPC-H on MonetDB TPC-DS on MonetDB Dataset: 100GB uarch Parameters Core Types Area and Power Synopsys Design Compiler Technology node: TSMC 40 nm, std. cell Frequency: 2GHz Widx Area: 0.24mm 2 Widx Power: 0.3W OoO: 4-wide, 128-entry ROB In-order: 2-wide Frequency: 2GHz L1 (I & D): 32KB LLC: 4MB 18

Widx Performance 6 OoO Widx Indexing Speedup 5 4 3 2 1 0 qry2 qry11 qry17 qry19 qry20

19 Widx Performance 6 OoO Widx Indexing Speedup qry2 qry11 qry17 qry19 qry20 qry22 qry5 qry37 qry40 qry52 qry64 qry82 TPC-H TPC-DS 3x higher indexing throughput 19

2.5 Widx Efficiency 2.5 2.5 Runtime 2 1.5 1 0.5 Energy 2 1.5 1 0.5 Energy-Delay 2 1.

20 2.5 Widx Efficiency Runtime Energy Energy-Delay x reduction in indexing energy vs. OoO 20

Conclusions Indexing is essential in modern

21 Conclusions Indexing is essential in modern DBMSs Modern CPUs spend significant time in index lookups Not efficient & fall short of extracting parallelism Widx: Specialized widget for index lookups Efficient, parallel & programmable 3x higher throughput, 5.5x energy reduction vs. OoO 21

22 Thanks! 22

Dark Silicon Accelerators for Database Indexing

Dark Silicon Accelerators for Database Indexing Onur Kocberber, Kevin Lim, Babak Falsafi, Partha Ranganathan, Stavros Harizopoulos Dark Silicon and Big Data Challenges Data explosion Data growing faster