Column-Oriented Database Systems

Size: px

Start display at page:

Download "Column-Oriented Database Systems"

Bertina Summers
5 years ago
Views:

1 Column-Oriented Database Systems Tutorial Peter Boncz (CWI) Adapted from VLDB 29 Tutorial Column-Oriented Database Systems with Daniel Abadi (Yale) Stavros Harizopuolos (HP Labs)

2 What is a column-store? row-store column-store Date Store Product Customer Price Date Store Product Customer Price + easy to add/modify a record - might read in unnecessary data + only need to read in relevant data - tuple writes require multiple accesses => suitable for read-mostly, read-intensive, large data repositories Tutorial Column-Oriented Database Systems 2

3 Are these two fundamentally different? The only fundamental difference is the storage layout However: we need to look at the big picture different storage layouts proposed row-stores row-stores++ row-stores++ converge? 7s 8s 9s s today column-stores new applications new bottleneck in hardware How did we get here, and where we are heading What are the column-specific optimizations? Part How do we improve CPU efficiency when operating on Cs Part 2 Tutorial Column-Oriented Database Systems Part

4 Outline Part : Basic concepts Introduction to key features From DSM to column-stores and performance tradeoffs Column-store architecture overview Will rows and columns ever converge? Part 2: Column-oriented execution Part : MonetDB/X ( VectorWise ) and CPU efficiency Tutorial Column-Oriented Database Systems 4

5 Telco Data Warehousing example Typical DW installation dimension tables Real-world example account or RAM fact table usage source One Size Fits All? - Part 2: Benchmarking Results Stonebraker et al. CIDR 27 toll star schema QUERY 2 SELECT account.account_number, sum (usage.toll_airtime), sum (usage.toll_price) FROM usage, toll, source, account WHERE usage.toll_id = toll.toll_id AND usage.source_id = source.source_id AND usage.account_id = account.account_id AND toll.type_ind in ( AE. AA ) AND usage.toll_price > AND source.type!= CIBER AND toll.rating_method = IS AND usage.invoice_date = 25 GROUP BY account.account_number Column-store Row-store Query 2.6 Query Query.9 Query Query Why? Three main factors (next slides) Tutorial Column-Oriented Database Systems 5

6 Telco example explained (/): read efficiency row store column store read pages containing entire rows read only columns needed one row = 22 columns! in this example: 7 columns is this typical? (it depends) What about vertical partitioning? (it does not work with ad-hoc queries) caveats: select * not any faster clever disk prefetching clever tuple reconstruction Tutorial Column-Oriented Database Systems 6

7 Telco example explained (2/): compression efficiency Columns compress better than rows Typical row-store compression ratio : Column-store : Why? Rows contain values from different domains => more entropy, difficult to dense-pack Columns exhibit significantly less entropy Examples: Male, Female, Female, Female, Male 998, 998, 999, 999, 999, 2 Caveat: CPU cost (use lightweight compression) Tutorial Column-Oriented Database Systems 7

8 Telco example explained (/): sorting & indexing efficiency Compression and dense-packing free up space Use multiple overlapping column collections Sorted columns compress better Range queries are faster Use sparse clustered indexes What about heavily-indexed row-stores? (works well for single column access, cross-column joins become increasingly expensive) Tutorial Column-Oriented Database Systems 8

9 Additional opportunities for column-stores Block-tuple / vectorized processing Easier to build block-tuple operators Amortizes function-call cost, improves CPU cache performance Easier to apply vectorized primitives Software-based: bitwise operations Hardware-based: SIMD Opportunities with compressed columns Avoid decompression: operate directly on compressed Delay decompression (and tuple reconstruction) Also known as: late materialization Part Exploit columnar storage in other DBMS components Physical design (both static and dynamic) more in Part 2 See: Database Cracking, from CWI Tutorial Column-Oriented Database Systems 9

10 Time (sec) Effect on C-Store performance Column-Stores vs Row-Stores: How Different are They Really? Abadi, Hachem, and Madden. SIGMOD 28. Average for SSBM queries on C-store original C-store column-oriented join algorithm enable compression & operate on compressed enable late materialization Tutorial Column-Oriented Database Systems

11 Summary of column-store key features Storage layout columnar storage Part header/id elimination compression Part 2 Part multiple sort orders column operators Part Part 2 Execution engine avoid decompression late materialization Part 2 vectorized operations Part Design tools, optimizer Tutorial Column-Oriented Database Systems

12 Outline Part : Basic concepts Introduction to key features From DSM to column-stores and performance tradeoffs Column-store architecture overview Will rows and columns ever converge? Part 2: Column-oriented execution Part : MonetDB/X ( VectorWise ) and CPU efficiency Tutorial Column-Oriented Database Systems 2

13 From DSM to Column-stores 7s -985: 985: DSM paper TOD: Time Oriented Database Wiederhold et al. "A Modular, Self-Describing Clinical Databank System," Computers and Biomedical Research, 975 More 97s: Transposed files, Lorie, Batory, Svensson. An overview of cantor: a new system for data analysis Karasalo, Svensson, SSDBM 98 A decomposition storage model Copeland and Khoshafian. SIGMOD s: Commercialization through SybaseIQ Late 9s 2s: Focus on main-memory performance DSM on steroids [997 now] Hybrid DSM/NSM [2 24] CWI: MonetDB Wisconsin: PAX, Fractured Mirrors Michigan: Data Morphing CMU: Clotho 25 : Re-birth of read-optimized DSM as column-store MIT: C-Store CWI: X VectorWise + startups Tutorial Column-Oriented Database Systems

14 The original DSM paper Proposed as an alternative to NSM 2 indexes: clustered on ID, non-clustered on value Speeds up queries projecting few columns Requires more storage ID A decomposition storage model Copeland and Khoshafian. SIGMOD value Tutorial Column-Oriented Database Systems 4

15 Memory wall and PAX 9s: Cache-conscious research to: from: Cache Conscious Algorithms for Relational Query Processing. Shatdal, Kant, Naughton. VLDB 994. Database Architecture Optimized for the New Bottleneck: Memory Access. Boncz, Manegold, Kersten. VLDB 999. and: DBMSs on a modern processor: Where does time go? Ailamaki, DeWitt, Hill, Wood. VLDB 999. PAX: Partition Attributes Across Retains NSM I/O pattern Optimizes cache-to-ram communication Weaving Relations for Cache Performance. Ailamaki, DeWitt, Hill, Skounakis, VLDB 2. Tutorial Column-Oriented Database Systems 5

16 More hybrid NSM/DSM schemes Dynamic PAX: Data Morphing Clotho: custom layout using scatter-gather I/O Fractured mirrors Data morphing: an adaptive, cache-conscious storage technique. Hankins, Patel, VLDB 2. Clotho: Decoupling Memory Page Layout from Storage Organization. Shao, Schindler, Schlosser, Ailamaki, and Ganger. VLDB 24. Smart mirroring with both NSM/DSM copies A Case For Fractured Mirrors. Ramamurthy, DeWitt, Su, VLDB 22. Tutorial Column-Oriented Database Systems 6

17 MonetDB (more in Part ) Late 99s, CWI: Boncz, Manegold, and Kersten Motivation: Main-memory Improve computational efficiency by avoiding expression interpreter DSM with virtual IDs natural choice Developed new query execution algebra Initial contributions: Pointed out memory-wall in DBMSs Cache-conscious projections and joins Tutorial Column-Oriented Database Systems 7

18 25: the (re)birth of column-stores New hardware and application realities Faster CPUs, larger memories, disk bandwidth limit Multi-terabyte Data Warehouses New approach: combine several techniques Read-optimized, fast multi-column access, disk/cpu efficiency, light-weight compression C-store paper: First comprehensive design description of a column-store MonetDB/X proper disk-based column store Explosion of new products Tutorial Column-Oriented Database Systems 8

19 Performance tradeoffs: columns vs. rows DSM traditionally was not favored by technology trends How has this changed? Optimized DSM in Fractured Mirrors, 22 Apples-to-apples comparison Follow-up study Main-memory DSM vs. NSM Flash-disks: a come-back for PAX? Fast Scans and Joins Using Flash Drives Shah, Harizopoulos, Wiener, Graefe. DaMoN 8 Read-Optimized Databases, In- Depth Holloway, DeWitt, VLDB 8 Performance Tradeoffs in Read- Optimized Databases Harizopoulos, Liang, Abadi, Madden, VLDB 6 DSM vs. NSM: CPU performance tradeoffs in block-oriented query processing Boncz, Zukowski, Nes, DaMoN 8 Query Processing Techniques for Solid State Drives Tsirogiannis, Harizopoulos, Shah, Wiener, Graefe, SIGMOD 9 9

comparable to column stores sparse B-tree on ID Tutorial Column-Oriented Database Systems 2 A Case For Fractured Mirrors

20 Fractured mirrors: a closer look Store DSM relations inside a B-tree Leaf nodes contain values Eliminate IDs, amortize header overhead Custom implementation on Shore Tuple Header TID 2 Column Data a a2 a 4 a4 5 a5 a Similar: storage density comparable to column stores sparse B-tree on ID Tutorial Column-Oriented Database Systems 2 A Case For Fractured Mirrors Ramamurthy, DeWitt, Su, VLDB a2 a 4 a4 5 a5 a a2 a 4 a4 a5 Efficient columnar storage in B-trees Graefe. Sigmod Record /27.

21 Fractured mirrors: performance From PAX paper: time regular DSM column? row column? columns projected: Chunk-based tuple merging Read in segments of M pages Merge segments in memory Becomes CPU-bound after 5 pages optimized DSM Tutorial Column-Oriented Database Systems 2

22 Column-scanner implementation row scanner Performance Tradeoffs in Read- Optimized Databases Harizopoulos, Liang, Abadi, Madden, VLDB 6 column scanner apply predicate(s) Joe 45 2 Sue 7 Joe 45 S Direct I/O prefetch ~ms worth of data SELECT name, age WHERE age > 4 Joe Sue Joe 45 Tutorial Column-Oriented Database Systems S apply predicate # #POS 45 #POS S

23 Scan performance Large prefetch hides disk seeks in columns Column-CPU efficiency with lower selectivity Row-CPU suffers from memory stalls Memory stalls disappear in narrow tuples Compression: similar to narrow not shown, details in the paper Tutorial Column-Oriented Database Systems 2

24 time (sec) Varying prefetch size 4 2 no competing disk traffic selected bytes per tuple Column 2 Column 8 Column 6 Column 48 (x 28KB) Row (any prefetch size) No prefetching hurts columns in single scans Tutorial Column-Oriented Database Systems 26

25 time (sec) Varying prefetch size 4 2 no competing disk traffic 4 2 with competing disk traffic Column, 48 Row, selected bytes per tuple Column, 8 Row, No prefetching hurts columns in single scans Under competing traffic, columns outperform rows for any prefetch size Tutorial Column-Oriented Database Systems 27

26 CPU Performance DSM vs. NSM: CPU performance tradeoffs in block-oriented query processing Boncz, Zukowski, Nes, DaMoN 8 Playing field: in-flight tuples, regardless input storage How is data stored in temporary in-memory buffers? Idea: on-the-fly conversion between NSM and DSM Insert data storage conversion operators Convert the entire in-flight tuple, or part of it Effectively hybrid layout Should be planned by query optimizer data layout planning Winners: DSM: sequential access (block fits in L2), random in L NSM: random access, SIMD for grouped Aggregation VLDB 29 Tutorial Column-Oriented Database Systems 28

27 Even faster column scans on flash SSDs New-generation SSDs Very fast random reads, slower random writes Fast sequential RW, comparable to HDD arrays No expensive seeks across columns K Read IOps, K Write Iops 25MB/s Read BW, 2MB/s Write FlashScan and Flashjoin: PAX on SSDs, inside Postgres Query Processing Techniques for Solid State Drives Tsirogiannis, Harizopoulos, Shah, Wiener, Graefe, SIGMOD 9 mini-pages with no qualified attributes are not accessed Tutorial Column-Oriented Database Systems 2

28 Column-scan performance over time regular DSM (2) column-store (26)..to.2x slower from 7x slower..to 2x slower..to same and x faster! optimized DSM (22) SSD Postgres/PAX (29) Tutorial Column-Oriented Database Systems

29 Outline Part : Basic concepts Introduction to key features From DSM to column-stores and performance tradeoffs Column-store architecture overview Will rows and columns ever converge? Part 2: Column-oriented execution Part : MonetDB/X ( VectorWise ) and CPU efficiency Tutorial Column-Oriented Database Systems 4

Architecture of a column-store storage layout read-optimized: dense-packed, compressed organize in extends, batch updates multiple sort orders sparse indexes engine block-tuple operators new access

30 Architecture of a column-store storage layout read-optimized: dense-packed, compressed organize in extends, batch updates multiple sort orders sparse indexes engine block-tuple operators new access methods system-level work on compressed data system-wide column support optimized relational operators loading / updates scaling through multiple nodes transactions / redundancy Tutorial Column-Oriented Database Systems 5

31 Continuous Load and Query (Vertica) Hybrid Storage Architecture Trickle Load > Write Optimized Store (WOS) A B C TUPLE MOVER Asynchronous Data Transfer > Read Optimized Store (ROS) On disk Sorted / Compressed Segmented Large data loaded direct Memory based Unsorted / Uncompressed Segmented Low latency / Small quick inserts A B C (A B C A) Tutorial Column-Oriented Database Systems 8

32 Differential Updates in Column Stores Problem Definition: A table T is kept ordered on a sortkey SK table is stored column-wise, compressed Keep updates (INS,MOD,DEL) in a differential structure write store, in C-Store/Vertica Questions: What data structures/algorithms to use for updates? How are read-only queries impacted? Must merge differential structure with base table Maximal update throughput that can be sustained? Given constraints on RAM for differences IO bandwidth per-modification memory cost Currently viable approach for many scenarios Tutorial Column-Oriented Database Systems 4

33 Value-based Delta Tables (VDTs) How is the row-store organized? Simple approach: INS(C..Cn) table Keeps full record DEL(SK..SKm) Only stores keys of deleted tuples Modifications are DEL+INS SQL: T=inventory SK=[store,prod] Physical Relational Algebra: Problem: overhead on queries Always a MergeUnion (and MergeDiff) CPU cost Union/Diff need the SKs Many queries do not need these by themselves Now the column-store must do IO for them anyway Tutorial Column-Oriented Database Systems 4

Positional Updates Remember the position of an update rather than its SK less

unmodified until the next modification position no need to scan SK columns the

34 Positional Updates Remember the position of an update rather than its SK less CPU cost (especially in block-oriented processing) Scan passes tuples through unmodified until the next modification position no need to scan SK columns the SK=(store,prod) so inserts go in between!! TABLEx = state of TABLE at time x SID(t)=StableID=position of tuple t in columns=rid (t) Stable RIDx(t) = RowID of tuple t at time x HIGHLY VOLATILE!! Tutorial Column-Oriented Database Systems 42

35 RIDs and SIDs TABLEx = state of TABLE at time x SID(t)=StableID=position of tuple t in columns=rid (t) Stable RIDx(t) = RowID of tuple t at time x HIGHLY VOLATILE!! The difference between RID and SID: Tutorial Column-Oriented Database Systems 4

36 Enter Positional Delta Trees A counting B-Tree that keeps track of cumulative tuple shifts DIFF^ta_tb maintains a monotonic mapping between RID x (ta) and RID y (tb) DIFF = (PDT,VALS), VALS = value space Merge Operator used by each query to get up-todate table Tutorial Column-Oriented Database Systems 44

= value space Merge Operator used by each query to

37 Enter Positional Delta Trees A counting B-Tree that keeps track of cumulative tuple shifts DIFF^ta_tb maintains a monotonic mapping between RID x (ta) and RID y (tb) DIFF = (PDT,VALS), VALS = value space Merge Operator used by each query to get up-to-date table Tutorial Column-Oriented Database Systems 45

38 PDT-based Transactions Propagate(PDT^t2_t,PDT^t_t2) Requires Consecutive PDTs Serialize(PDT^t2_t,PDT^t_t) PDT^t_t2 Makes Overlapping PDTs Aligned May Fail!! transactional conflict detection Tutorial Column-Oriented Database Systems 46

39 Column-Oriented Database Systems Tutorial Compression Super-Scalar RAM-CPU Cache Compression Zukowski, Heman, Nes, Boncz, ICDE 6 Integrating Compression and Execution in Column- Oriented Database Systems Abadi, Madden, and Ferreira, SIGMOD 6 Query optimization in compressed database systems Chen, Gehrke, Korn, SIGMOD

40 Compression Trades I/O for CPU Increased column-store opportunities: Higher data value locality in column stores Techniques such as run length encoding far more useful Can use extra space to store multiple copies of data in different sort orders Tutorial Column-Oriented Database Systems 57

41 Run-length Encoding Quarter Product ID Price Quarter Product ID Price Q Q Q Q Q Q Q Q2 Q2 Q2 Q (value, start_pos, run_length) (Q,, ) (Q2,, 5) (Q, 65, 5) (Q4, 5, 6) (value, start_pos, run_length) (,, 5) (2, 6, 2) (,, ) (2, 4, ) Tutorial Column-Oriented Database Systems 58

42 Tutorial Column-Oriented Database Systems 59 Bit-vector Encoding Product ID ID: ID: 2 ID: Integrating Compression and Execution in Column- Oriented Database Systems Abadi et. al, SIGMOD 6 For each unique value, v, in column c, create bit-vector b b[i] = if c[i] = v Good for columns with few unique values Each bit-vector can be further compressed if sparse

43 Dictionary Encoding Integrating Compression and Execution in Column- Oriented Database Systems Abadi et. al, SIGMOD 6 For each unique value create dictionary entry Dictionary can be per-block or percolumn Column-stores have the advantage that dictionary entries may encode multiple values at once Quarter Q Q2 Q4 Q Q Q Q Q Q2 Q4 Q Q Quarter Dictionary : Q : Q2 2: Q : Q4 OR Quarter Dictionary 24: Q, Q2, Q4, Q 22: Q2, Q4, Q, Q 28: Q, Q, Q, Q Tutorial Column-Oriented Database Systems 6

44 Frame Of Reference Encoding Encodes values as b bit offset from chosen frame of reference Special escape code (e.g. all bits set to ) indicates a difference larger than can be stored in b bits After escape code, original (uncompressed) value is written Compressing Relations and Indexes Goldstein, Ramakrishnan, Shaft, ICDE 98 Price Price Frame: 5 Tutorial Column-Oriented Database Systems bits per value Exceptions (see part for a better way to deal with exceptions)

45 Differential Encoding Encodes values as b bit offset from previous value Special escape code (just like frame of reference encoding) indicates a difference larger than can be stored in b bits After escape code, original (uncompressed) value is written Performs well on columns containing increasing/decreasing sequences inverted lists timestamps object IDs sorted / clustered columns Improved Word-Aligned Binary Compression for Text Indexing Ahn, Moffat, TKDE 6 Time 5: 5:2 5: 5: 5:4 5:6 5:7 5:8 5: 5:5 5:6 5:6 Time 5: :5 2 bits per value Exception (see X for a better way to deal with exceptions) Tutorial Column-Oriented Database Systems 62

46 What Compression Scheme To Use? Does column appear in the sort key? Is the average run-length > 2 yes no yes no Are number of unique values < ~5 RLE Differential Encoding no yes Does this column appear frequently in selection predicates? Is the data numerical and exhibit good locality? yes no yes Frame of Reference Encoding no Leave Data Uncompressed Bit-vector Compression Dictionary Compression OR Heavyweight Compression Tutorial Column-Oriented Database Systems 6

47 Super-Scalar RAM-CPU Cache Compression Zukowski, Heman, Nes, Boncz, ICDE 6 Heavy-Weight Compression Schemes Modern disk arrays can achieve > GB/s / CPU for decompression GB/s needed Lightweight compression schemes are better Even better: operate directly on compressed data Tutorial Column-Oriented Database Systems 64

48 Integrating Compression and Execution in Column- Oriented Database Systems Abadi et. al, SIGMOD 6 Operating Directly on Compressed Data I/O - CPU tradeoff is no longer a tradeoff Reduces memory CPU bandwidth requirements Opens up possibility of operating on multiple records at once Tutorial Column-Oriented Database Systems 65

49 Integrating Compression and Execution in Column- Oriented Database Systems Abadi et. al, SIGMOD 6 Operating Directly on Compressed Data Quarter (Q,, ) (Q2,, 6) (Q, 7, 5) (Q4, 87, 6) -6 Index Lookup + Offset jump Product ID 2 ProductID, COUNT(*)) (, ) (2, ) (, 2) SELECT ProductID, Count(*) FROM table WHERE (Quarter = Q2) GROUP BY ProductID Tutorial Column-Oriented Database Systems 66

50 Integrating Compression and Execution in Column- Oriented Database Systems Abadi et. al, SIGMOD 6 Operating Directly on Compressed Data Aggregation Operator Selection Operator Compression- Aware Scan Operator Data isonevalue() isvaluesorted() isposcontiguous() issparse() getnext() decompressintoarray() getvalueatposition(pos) getmin() getmax() getsize() Block API SELECT ProductID, Count(*) FROM table WHERE (Quarter = Q2) GROUP BY ProductID (Q,, ) (Q2,, 6) (Q, 7, 5) (Q4, 87, 6) Tutorial Column-Oriented Database Systems 67

51 Column-Oriented Database Systems Tutorial Tuple Materialization and Column-Oriented Join Algorithms Materialization Strategies in a Column- Oriented DBMS Abadi, Myers, DeWitt, and Madden. ICDE 27. Self-organizing tuple reconstruction in column-stores, Idreos, Manegold, Kersten, SIGMOD 29 Column-Stores vs Row-Stores: How Different are They Really? Abadi, Madden, and Hachem. SIGMOD 28. Query Processing Techniques for Solid State Drives Tsirogiannis, Harizopoulos Shah, Wiener, and Graefe. SIGMOD 29. Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 24

52 When should columns be projected? Where should column projection operators be placed in a query plan? Row-store: Column projection involves removing unneeded columns from tuples Generally done as early as possible Column-store: Operation is almost completely opposite from a row-store Column projection involves reading needed columns from storage and extracting values for a listed set of tuples This process is called materialization Early materialization: project columns at beginning of query plan Straightforward since there is a one-to-one mapping across columns Late materialization: wait as long as possible for projecting columns More complicated since selection and join operators on one column obfuscates mapping to other columns from same table Most column-stores construct tuples and column projection time Many database interfaces expect output in regular tuples (rows) Rest of discussion will focus on this case Tutorial Column-Oriented Database Systems 69

53 When should tuples be constructed? Select + Aggregate QUERY: SELECT custid,sum(price) FROM table WHERE (prodid = 4) AND (storeid = ) AND GROUP BY custid Construct Solution : Create rows first (EM). But: (4,,4) Need to construct ALL tuples Need to decompress data Poor memory bandwidth utilization prodid storeid custid price Tutorial Column-Oriented Database Systems 7

54 Solution 2: Operate on columns Data Source custid Data Source prodid AGG AND Data Source price Data Source storeid Data Source prodid Data Source 2 storeid QUERY: SELECT custid,sum(price) FROM table WHERE (prodid = 4) AND (storeid = ) AND GROUP BY custid prodid storeid custid price Tutorial Column-Oriented Database Systems 7

55 Solution 2: Operate on columns QUERY: Data Source custid Data Source prodid AGG AND Data Source price Data Source storeid AND SELECT custid,sum(price) FROM table WHERE (prodid = 4) AND (storeid = ) AND GROUP BY custid prodid storeid custid price Tutorial Column-Oriented Database Systems 72

56 Solution 2: Operate on columns QUERY: Data Source custid Data Source prodid AGG AND Data Source price Data Source storeid 2 custid Data Source 8 Data Source price SELECT custid,sum(price) FROM table WHERE (prodid = 4) AND (storeid = ) AND GROUP BY custid prodid storeid custid price Tutorial Column-Oriented Database Systems 7

57 Solution 2: Operate on columns QUERY: Data Source custid AGG Data Source price 9 SELECT custid,sum(price) FROM table WHERE (prodid = 4) AND (storeid = ) AND GROUP BY custid Data Source prodid AND Data Source storeid AGG prodid storeid custid price Tutorial Column-Oriented Database Systems 74

58 Time (seconds) Materialization Strategies in a Column-Oriented DBMS Abadi, Myers, DeWitt, and Madden. ICDE 27. For plans without joins, late materialization is a win Low selectivity Medium selectivity High selectivity QUERY: SELECT C, SUM(C 2 ) FROM table WHERE (C < CONST) AND (C 2 < CONST) GROUP BY C Ran on 2 compressed columns from TPC-H scale data Early Materialization Late Materialization Tutorial Column-Oriented Database Systems 75

59 Time (seconds) Materialization Strategies in a Column-Oriented DBMS Abadi, Myers, DeWitt, and Madden. ICDE 27. Even on uncompressed data, late materialization is still a win QUERY: SELECT C, SUM(C 2 ) FROM table WHERE (C < CONST) AND (C 2 < CONST) GROUP BY C 2 Low selectivity Medium selectivity High selectivity Materializing late still works best Early Materialization Late Materialization Tutorial Column-Oriented Database Systems 76

60 What about for plans with joins? Select R.B, R.C, R2.E, R2.H, R.F From R, R2, R Where R.A = R2.D AND R2.G = R.K Project B, C, E, H, F B, C, E, G, H A, B, C B, C, E, H, F Join A = D Join 2 G = K Early materialization D, E, G, H F, K Scan R (F, K, L) B, C A G Join A = D D Join G = K Project Late materialization Scan R (F, K, L) G K F E, H Scan Scan Scan Scan R (A, B, C) R2 (D, E, G, H) R (A, B, C) R2 (D, E, G, H) Tutorial Column-Oriented Database Systems 77 77

61 Early Materialization Example 2 7 QUERY: Green White Brown SELECT C.lastName,SUM(F.price) FROM facts AS F, customers AS C WHERE F.custID = C.custID GROUP BY C.lastName Construct Construct (4,,4) Green White Brown prodid storeid quantity custid price Facts custid lastname Customers Tutorial Column-Oriented Database Systems 78

62 Early Materialization Example White Brown Brown Brown QUERY: SELECT C.lastName,SUM(F.price) FROM facts AS F, customers AS C WHERE F.custID = C.custID GROUP BY C.lastName Join Green White Brown Tutorial Column-Oriented Database Systems 79

63 Late Materialization Example QUERY: SELECT C.lastName,SUM(F.price) FROM facts AS F, customers AS C WHERE F.custID = C.custID GROUP BY C.lastName (4,,4) Join 2 Green White Brown Late materialized join causes out of order probing of projected columns from the inner relation prodid storeid quantity custid price Facts custid lastname Customers Tutorial Column-Oriented Database Systems 8

64 Late Materialized Join Performance Naïve LM join about 2X slower than EM join on typical queries (due to random I/O) This number is very dependent on Amount of memory available Number of projected attributes Join cardinality But we can do better Invisible Join Jive/Flash Join Radix cluster/decluster join Tutorial Column-Oriented Database Systems 8

65 Invisible Join Column-Stores vs Row-Stores: How Different are They Really? Abadi, Madden, and Hachem. SIGMOD 28. Designed for typical joins when data is modeled using a star schema One ( fact ) table is joined with multiple dimension tables Typical query: select c_nation, s_nation, d_year, sum(lo_revenue) as revenue from customer, lineorder, supplier, date where lo_custkey = c_custkey and lo_suppkey = s_suppkey and lo_orderdate = d_datekey and c_region = 'ASIA and s_region = 'ASIA and d_year >= 992 and d_year <= 997 group by c_nation, s_nation, d_year order by d_year asc, revenue desc; Tutorial Column-Oriented Database Systems 82

66 Invisible Join Column-Stores vs Row-Stores: How Different are They Really? Abadi, Madden, and Hachem. SIGMOD 28. Apply region = Asia On Customer Table custkey region nation ASIA CHINA 2 ASIA INDIA ASIA INDIA 4 EUROPE FRANCE Apply region = Asia On Supplier Table suppkey region nation ASIA RUSSIA 2 EUROPE SPAIN ASIA JAPAN Apply year in [992,997] On Date Table dateid year Hash Table (or bit-map) Containing Keys, 2 and Hash Table (or bit-map) Containing Keys, Hash Table Containing Keys 997, 2997, and 997 Tutorial Column-Oriented Database Systems 8

67 orderkey Original Fact Table custkey suppkey orderdate revenue Column-Stores vs Row-Stores: How Different are They Really? Abadi et. al. SIGMOD 28. & & = Hash Table Containing Keys, 2 and + custkey 4 4 Hash Table Containing Keys and + suppkey Hash Table Containing Keys 997, 2997, and orderdate Tutorial Column-Oriented Database Systems 84

68 Tutorial Column-Oriented Database Systems 85 custkey suppkey orderdate JOIN CHINA INDIA INDIA = CHINA INDIA RUSSIA SPAIN = RUSSIA RUSSIA = FRANCE JAPAN

69 Invisible Join Column-Stores vs Row-Stores: How Different are They Really? Abadi, Madden, and Hachem. SIGMOD 28. Apply region = Asia On Customer Table custkey region nation ASIA CHINA 2 ASIA INDIA ASIA INDIA 4 EUROPE FRANCE Apply region = Asia On Supplier Table suppkey region nation ASIA RUSSIA 2 EUROPE SPAIN ASIA JAPAN Apply year in [992,997] On Date Table dateid year Hash Table (or bit-map) Containing Keys, 2 and Range [-] (between-predicate rewriting) Hash Table (or bit-map) Containing Keys, Hash Table Containing Keys 997, 2997, and 997 Tutorial Column-Oriented Database Systems 86

70 Invisible Join Bottom Line Many data warehouses model data using star/snowflake schemes Joins of one (fact) table with many dimension tables is common Invisible join takes advantage of this by making sure that the table that can be accessed in position order is the fact table for each join Position lists from the fact table are then intersected (in position order) This reduces the amount of data that must be accessed out of order from the dimension tables Between-predicate rewriting trick not relevant for this discussion Tutorial Column-Oriented Database Systems 87

71 custkey CHINA INDIA INDIA FRANCE = INDIA CHINA suppkey Still accessing table out of order RUSSIA SPAIN JAPAN = RUSSIA RUSSIA orderdate JOIN = Tutorial Column-Oriented Database Systems 88

72 Jive/Flash Join Fast Joins using Join Indices. Li and Ross, VLDBJ 8:-24, CHINA INDIA INDIA FRANCE = INDIA CHINA Still accessing table out of order Query Processing Techniques for Solid State Drives. Tsirogiannis, Harizopoulos et. al. SIGMOD 29. Tutorial Column-Oriented Database Systems 89

73 Jive/Flash Join. Add column with dense ascending integers from 2. Sort new position list by second column. Probe projected column in order using new sorted position list, keeping first column from position list around 4. Sort new result by first column CHINA INDIA INDIA FRANCE CHINA INDIA INDIA FRANCE = INDIA CHINA = 2 CHINA INDIA 2 INDIA CHINA Tutorial Column-Oriented Database Systems 9

74 Jive/Flash Join Bottom Line Instead of probing projected columns from inner table out of order: Sort join index Probe projected columns in order Sort result using an added column LM vs EM tradeoffs: LM has the extra sorts (EM accesses all columns in order) LM only has to fit join columns into memory (EM needs join columns and all projected columns) Results in big memory and CPU savings (see part for why there is CPU savings) LM only has to materialize relevant columns In many cases LM advantages outweigh disadvantages LM would be a clear winner if not for those pesky sorts can we do better? Tutorial Column-Oriented Database Systems 9

75 Radix Cluster/Decluster The full sort from the Jive join is actually overkill We just want to access the storage blocks in order (we don t mind random access within a block) So do a radix sort and stop early By stopping early, data within each block is accessed out of order, but in the order specified in the original join index Use this pseudo-order to accelerate the post-probe sort as well Database Architecture Optimized for the New Bottleneck: Memory Access VLDB 99 Generic Database Cost Models for Hierarchical Memory Systems, VLDB 2 (all Manegold, Boncz, Kersten) Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 Tutorial Column-Oriented Database Systems 92

76 Radix Cluster/Decluster Bottom line Both sorts from the Jive join can be significantly reduced in overhead Only been tested when there is sufficient memory for the entire join index to be stored three times Technique is likely applicable to larger join indexes, but utility will go down a little Only works if random access within a storage block Don t want to use radix cluster/decluster if you have variablewidth column values or compression schemes that can only be decompressed starting from the beginning of the block Tutorial Column-Oriented Database Systems 9

77 LM vs EM joins Invisible, Jive, Flash, Cluster, Decluster techniques contain a bag of tricks to improve LM joins Research papers show that LM joins become 2X faster than EM joins (instead of 2X slower) for a wide array of query types Tutorial Column-Oriented Database Systems 94

78 Tuple Construction Heuristics For queries with selective predicates, aggregations, or compressed data, use late materialization For joins: Research papers: Always use late materialization Commercial systems: Inner table to a join often materialized before join (reduces system complexity): Some systems will use LM only if columns from inner table can fit entirely in memory Tutorial Column-Oriented Database Systems 95

Outline Part : Basic concepts Introduction to key features From DSM to column-stores and performance tradeoffs Column-store architecture overview Will rows

79 Outline Part : Basic concepts Introduction to key features From DSM to column-stores and performance tradeoffs Column-store architecture overview Will rows and columns ever converge? Part 2: Column-oriented execution Part : MonetDB/X ( VectorWise ) and CPU efficiency Tutorial Column-Oriented Database Systems 96

80 Outline Computational Efficiency of DB on modern hardware how column-stores can help here MonetDB & VectorWise in more depth CPU efficient column compression vectorized decompression Conclusions future work Tutorial Column-Oriented Database Systems 97

81 Column-Oriented Database Systems Tutorial 4 years of hardware evolution vs. DBMS computational efficiency

82 CPU Architecture Elements: Storage CPU caches L/L2/L Registers Execution Unit(s) Pipelined SIMD Tutorial Column-Oriented Database Systems 99

83 Super-Scalar Execution (pipelining) Tutorial Column-Oriented Database Systems 2

Hazards Data hazards Dependencies between instructions L data cache misses Control Hazards Branch mispredictions Computed branches (late binding) L instruction cache

84 Hazards Data hazards Dependencies between instructions L data cache misses Control Hazards Branch mispredictions Computed branches (late binding) L instruction cache misses Result: bubbles in the pipeline Out-of-order execution addresses data hazards control hazards typically more expensive Tutorial Column-Oriented Database Systems

85 SIMD Single Instruction Multiple Data Same operation applied on a vector of values MMX: 64 bits, SSE: 28bits, AVX: 256bits SSE, e.g. multiply 8 short integers Tutorial Column-Oriented Database Systems 4

86 A Look at the Query Pipeline 2 ivan 5 next() 2 ivan 7 77 * PROJECT SELECT id, name (age-)*5 AS bonus FROM employee WHERE age > next() 22 7 >? 2 ivan alice 7 22 FALSE TRUE SELECT next() 2 alice ivan 22 7 SCAN Tutorial Column-Oriented Database Systems 5

87 A Look at the Query Pipeline 2 ivan 5 next() next() next() 77 * >? PROJECT SELECT SCAN Operators Iterator interface -open() -next(): tuple -close() Tutorial Column-Oriented Database Systems 6

88 A Look at the Query Pipeline 2 ivan 5 next() 2 ivan 7 77 * Primitives next() next() 2 2 ivan alice 7 22 alice ivan >? PROJECT FALSE TRUE SELECT SCAN Provide computational functionality All arithmetic allowed in expressions, e.g. Multiplication 7 * 5 mult(int,int) int Tutorial Column-Oriented Database Systems 7

Database Architecture causes Hazards Data hazards Dependencies between instructions L data cache misses SIMD Out-of-order Execution Control Hazards Branch mispredictions Computed branches (late

89 Database Architecture causes Hazards Data hazards Dependencies between instructions L data cache misses SIMD Out-of-order Execution Control Hazards Branch mispredictions Computed branches (late binding) L instruction cache misses work on one tuple at a time Large Tree/Hash Structures Code footprint of all operators in query plan exceeds L cache Data-dependent conditions PROJECT SELECT Next() late binding SCAN method calls DBMSs On A Modern Processor: Where Does Time Go? Tree, List, Hash traversal Complex NSM record navigation Ailamaki, DeWitt, Hill, Wood, VLDB 99 Operators Iterator interface -open() -next(): tuple -close() Tutorial Column-Oriented Database Systems 8

90 DBMS Computational Efficiency TPC-H GB, query selects 98% of fact table, computes net prices and aggregates all Results: C program:? MySQL: 26.2s DBMS X : 28.s MonetDB/X: Hyper-Pipelining Query Execution Boncz, Zukowski, Nes, CIDR 5 Tutorial Column-Oriented Database Systems 9

91 DBMS Computational Efficiency TPC-H GB, query selects 98% of fact table, computes net prices and aggregates all Results: C program:.2s MySQL: 26.2s DBMS X : 28.s MonetDB/X: Hyper-Pipelining Query Execution Boncz, Zukowski, Nes, CIDR 5 Tutorial Column-Oriented Database Systems

92 Column-Oriented Database Systems Tutorial MonetDB

93 a column-store save disk I/O when scan-intensive queries columns avoid an expression interpreter to improve computational efficiency need a few Tutorial Column-Oriented Database Systems 2

94 RISC Database Algebra CPU? Give it nice code! - few dependencies (control,data) - CPU gets out-of-order execution - compiler can e.g. generate SIMD One loop for an entire column batcalc_minus_int(int* - no per-tuple interpretationres, int* col, - arrays: no record navigation int val, - better instruction cache int locality n) { SELECT for(i=; id, name, i<n; (age-)*5 i++) as bonus FROM res[i] people = col[i] - val; } WHERE age > Simple, hardcoded semantics in operators MATERIALIZED intermediate results Tutorial Column-Oriented Database Systems

95 a column-store save disk I/O when scan-intensive queries need a few columns avoid an expression interpreter to improve computational efficiency How? RISC query algebra: hard-coded semantics Decompose complex expressions in multiple operations Operators only handle simple arrays No code that handles slotted buffered record layout Relational algebra becomes array manipulation language Often SIMD for free Plus: use of cache-conscious algorithms for Sort/Aggr/Join Tutorial Column-Oriented Database Systems 4

96 You want efficiency Simple hard-coded operators I take scalability a Faustian pact Result materialization C program:.2s MonetDB:.7s MySQL: 26.2s DBMS X : 28.s Tutorial Column-Oriented Database Systems 5

relational engine Boncz, Grust, vankeulen, Rittinger, Teubner, SIGMOD 6 SW-Store: a vertically partitioned DBMS

97 SIGMOD 985 MIL Primitives for Querying a Fragmented World, Boncz, Kersten, VLDBJ 98 Flattening an Object Algebra to Provide Performance Boncz, Wilschut, Kersten, ICDE 98 MonetDB/XQuery: a fast XQuery processor powered by a relational engine Boncz, Grust, vankeulen, Rittinger, Teubner, SIGMOD 6 SW-Store: a vertically partitioned DBMS for Semantic Web data management Abadi, Marcus, Madden, Hollenbach, VLDBJ 9 Tutorial Column-Oriented Database Systems 7

98 Cache-Conscious Joins Cost Models, Radix-cluster, Radix-decluster MonetDB/XQuery: structural joins exploiting positional column access Cracking: on-the-fly automatic indexing without workload knowledge Recycling: optimizer frontend backend as a research platform using materialized intermediates Run-time Query Optimization: correlation-aware run-time optimization without cost model Database Architecture Optimized for the New Bottleneck: Memory Access VLDB 99 Generic Database Cost Models for Hierarchical Memory Systems, VLDB 2 (all Manegold, Boncz, Kersten) Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 MonetDB/XQuery: a fast XQuery processor powered by a relational engine Boncz, Grust, vankeulen, Rittinger, Teubner, SIGMOD 6 Database Cracking, CIDR 7 Updating a cracked database, SIGMOD 7 Self-organizing tuple reconstruction in columnstores, SIGMOD 9 (all Idreos, Manegold, Kersten) An architecture for recycling intermediates in a column-store, Ivanova, Kersten, Nes, Goncalves, SIGMOD 9 ROX: run-time optimization of XQueries, Abdelkader, Boncz, Manegold, vankeulen, SIGMOD 9 Tutorial Column-Oriented Database Systems 9

99 Cache-Conscious Joins Radix-Cluster Radix-Partitioned Hash Join create partitions << CPU cache small partitions many partitions many partitions multiple passes needed Database Architecture Optimized for the New Bottleneck: Memory Access VLDB 99 Generic Database Cost Models for Hierarchical Memory Systems, VLDB 2 (all Manegold, Boncz, Kersten) VLDB 29 Tutorial Column-Oriented Database Systems 2

100 Radix-Partitioned Hash Join create partitions << CPU cache small partitions many partitions many partitions multiple passes needed Radix-Cluster Cache-Conscious Joins Radix-Cluster Radix-Sort with early stopping Each pass looks at Bi higher-most radix bits Splitting each input cluster into 2 Bi output clusters leaves relation partially ordered Database Architecture Optimized for the New Bottleneck: Memory Access VLDB 99 Generic Database Cost Models for Hierarchical Memory Systems, VLDB 2 (all Manegold, Boncz, Kersten) VLDB 29 Tutorial Column-Oriented Database Systems 2

Cache-Conscious Joins Radix-Cluster Multiple clustering passes Limit number of clusters per pass Avoid cache/tlb trashing Trade memory cost for CPU cost Any data type (hashing) Database Architecture

101 Cache-Conscious Joins Radix-Cluster Multiple clustering passes Limit number of clusters per pass Avoid cache/tlb trashing Trade memory cost for CPU cost Any data type (hashing) Database Architecture Optimized for the New Bottleneck: Memory Access VLDB 99 Generic Database Cost Models for Hierarchical Memory Systems, VLDB 2 (all Manegold, Boncz, Kersten) VLDB 29 Tutorial Column-Oriented Database Systems 22

102 Cache-Conscious Joins Accurate Cache Miss Cost Modeling VLDB 29 Tutorial Column-Oriented Database Systems 2

103 Cache-Conscious Joins Partitioned Hash Join VLDB 29 Tutorial Column-Oriented Database Systems 24

104 Cache-Conscious Joins Radix-Clustered Hash-Join: overall perf (64,, tuples) VLDB 29 Tutorial Column-Oriented Database Systems 25

105 Cache-Conscious Joins Pre-Projection vs. Post-Projection VLDB 29 Tutorial Column-Oriented Database Systems 26

106 Cache-Conscious Joins Pre-Projection vs. Post-Projection Radix-Decluster = cache-conscious post-projection VLDB 29 Tutorial Column-Oriented Database Systems 27

107 Cache-Conscious Joins Post-Projection Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 Partitioned Hash-Join Join Index! Join Indices Valduriez, TODS 87 VLDB 29 Tutorial Column-Oriented Database Systems 28

108 Partitioned Hash-Join Cluster on Left Cache-Conscious Joins Post-Projection Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 VLDB 29 Tutorial Column-Oriented Database Systems 29

109 Partitioned Hash-Join Cluster on Left Cache-Conscious Joins Post-Projection Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB N- VLDB 29 Tutorial Column-Oriented Database Systems

110 Partitioned Hash-Join Cluster on Left Project Left Cache-Conscious Joins Post-Projection Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 VLDB 29 Tutorial Column-Oriented Database Systems

111 Cache-Conscious Joins Post-Projection Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 Partitioned Hash-Join Cluster on Left Project Left Cluster on Right VLDB 29 Tutorial Column-Oriented Database Systems 2

112 Partitioned Hash-Join Cluster on Left Project Left Cluster on Right Project Right Cache-Conscious Joins Post-Projection Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 Radix-Decluster VLDB 29 Tutorial Column-Oriented Database Systems

113 Red column forms a dense domain {,,2,,N-} All subsequences are ordered e.g. Cache-Conscious Joins Radix-Decluster Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 Task: merge subsequences into dense sequence VLDB 29 Tutorial Column-Oriented Database Systems 4

114 Red column forms a dense domain {,,2,,N-} All subsequences are ordered e.g. Cache-Conscious Joins Radix-Decluster Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 Task: merge subsequences into dense sequence Approach : merge H=2 B lists cost O(N log(h)) many (H) cursors needed cache thrashing VLDB 29 Tutorial Column-Oriented Database Systems 5

115 Red column forms a dense domain {,,2,,N-} All subsequences are ordered e.g. Cache-Conscious Joins Radix-Decluster Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 Task: merge subsequences into dense sequence Approach : merge H=2 B lists cost O(N log(h)), many (H) cursors needed cache thrashing Approach 2: insert by position many (H) sparse passes over the result no cache reuse VLDB 29 Tutorial Column-Oriented Database Systems 6

116 Red column forms a dense domain {,,2,,N-} All subsequences are ordered e.g. Cache-Conscious Joins Radix-Decluster Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 Task: merge subsequences into dense sequence Approach : merge H=2 B lists cost O(N log(h)) many (H) cursors needed cache thrashing Approach 2: insert by position many (H) sparse passes over the result no cache reuse Radix-Decluster: insert by position with sliding window VLDB 29 Tutorial Column-Oriented Database Systems 7

117 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 VLDB 29 Tutorial Column-Oriented Database Systems 8

118 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 clusters VLDB 29 Tutorial Column-Oriented Database Systems 9

119 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 destination positions VLDB 29 Tutorial Column-Oriented Database Systems 4

120 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 Projection column (still) in wrong order VLDB 29 Tutorial Column-Oriented Database Systems 4

121 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 Result column to fill insertion window of size 2 VLDB 29 Tutorial Column-Oriented Database Systems 42

122 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 VLDB 29 Tutorial Column-Oriented Database Systems 4

123 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 VLDB 29 Tutorial Column-Oriented Database Systems 44

124 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 VLDB 29 Tutorial Column-Oriented Database Systems 45

125 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 VLDB 29 Tutorial Column-Oriented Database Systems 46

126 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 VLDB 29 Tutorial Column-Oriented Database Systems 47

127 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 VLDB 29 Tutorial Column-Oriented Database Systems 48

128 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 VLDB 29 Tutorial Column-Oriented Database Systems 49

129 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 VLDB 29 Tutorial Column-Oriented Database Systems 5

130 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 VLDB 29 Tutorial Column-Oriented Database Systems 5

131 Cache-Conscious Joins Radix-Decluster Memory Access Pattern Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 Random access only to sliding window(<< cache size) Only compulsary misses cache-conscious VLDB 29 Tutorial Column-Oriented Database Systems 52

132 Cache-Conscious Joins Radix-Decluster Performance Tradeoff Radix-Decluster prefers small W (i.e. vertical fragmentation) Read Also: - Jive Join - Slam Join Fast Joins Using Join Indices Ross, Lei, VLDBJ 98 VLDB 29 Tutorial Column-Oriented Database Systems 5

133 Cache-Conscious Joins Pre-Projection vs. Post-Projection Query Processing Techniques for Solid State Drives Tsirogiannis, Harizopoulos,Shah, Wiener, Graefe, SIGMOD 9 VLDB 29 Tutorial Column-Oriented Database Systems 54

134 Column-Oriented Database Systems VLDB 29 Tutorial MonetDB/X vectorized query processing

135 MonetDB spin-off: MonetDB/X Materialization vs Pipelining Tutorial Column-Oriented Database Systems 78

136 MonetDB/X: Hyper-Pipelining Query Execution Boncz, Zukowski, Nes, CIDR 5 Vectorized In Cache Processing Observations: next() called much less often more vector time = spent array in of primitives ~ less in overhead processed in a tight loop primitive calls process an array of values CPU cache in a loop: Resident CPU Efficiency depends on nice code - out-of-order execution - few dependencies (control,data) - compiler support Compilers like simple loops over arrays - loop-pipelining - automatic SIMD >? FALSE TRUE TRUE FALSE next() next() * 5 next() 2 ivan 5 4 for(i=; peggy 75 i<n; i++) 2 4 res[i] = - (col[i] * 5 > x) ivan peggy for(i=; i<n; > i++)? PROJECT res[i] alice = 22(col[i] FALSE - x) 2 ivan 7 TRUE 4 peggy 45 TRUE for(i=; 5 victor i<n; 25 FALSE i++) SELECT res[i] = (col[i] * x) alice ivan peggy victor SCAN Tutorial Column-Oriented Database Systems 79

137 map_mul_flt_val_flt_col( map_mul_flt_val_flt_col( float *res, float *res, int* sel, int* sel, float val, float val, Tricks float float being *col, int *col, int played: n) n) -{ { Late materialization if (many selected and narrow) for(int i=; i<n; i++) - Materialization for(int i=; i<n; res[i] val * avoidance i++) // SIMD!! col[sel[i]]; using } selection res[i] = val vectors * col[i]; else for(int i=; i<n; i++) selection vectors used to reduce res[i] = val * col[sel[i]]; vector copying } contain selected positions push selections up to get SIMD MonetDB/X: Hyper-Pipelining Query Execution Boncz, Zukowski, Nes, CIDR 5 next() next() next() alice ivan peggy victor * >? PROJECT SELECT SCAN Tutorial Column-Oriented Database Systems 8

138 MonetDB/X Both efficiency Vectorized primitives and scalability.. Pipelined query evaluation MonetDB/X: Hyper-Pipelining Query Execution Boncz, Zukowski, Nes, CIDR 5 C program:.2s MonetDB/X:.6s MonetDB:.7s MySQL: 26.2s DBMS X : 28.s Tutorial Column-Oriented Database Systems 8

139 Memory Hierarchy X query engine MonetDB/X: Hyper-Pipelining Query Execution Boncz, Zukowski, Nes, CIDR 5 CPU cache RAM ColumnBM (buffer manager) (raid) Disk(s) Tutorial Column-Oriented Database Systems 82

140 Memory Hierarchy X query engine MonetDB/X: Hyper-Pipelining Query Execution Boncz, Zukowski, Nes, CIDR 5 Vectors are only the in-cache representation RAM & disk representation might actually be different CPU cache (vectorwise uses both PAX & DSM) RAM ColumnBM (buffer manager) (raid) Disk(s) Tutorial Column-Oriented Database Systems 8

141 Varying the Vector size MonetDB/X: Hyper-Pipelining Query Execution Boncz, Zukowski, Nes, CIDR 5 Less and less iterator.next() and primitive function calls ( interpretation overhead ) Tutorial Column-Oriented Database Systems 85

142 Varying the Vector size MonetDB/X: Hyper-Pipelining Query Execution Boncz, Zukowski, Nes, CIDR 5 Vectors start to exceed the CPU cache, causing additional memory traffic Tutorial Column-Oriented Database Systems 86

143 Varying the Vector size MonetDB/X: Hyper-Pipelining Query Execution Boncz, Zukowski, Nes, CIDR 5 The benefit of selection vectors Tutorial Column-Oriented Database Systems 87

144 MonetDB/MIL materializes columns MonetDB/X: Hyper-Pipelining Query Execution Boncz, Zukowski, Nes, CIDR 5 CPU cache RAM ColumnBM (buffer manager) (raid) Disk(s) Tutorial Column-Oriented Database Systems 88

COLUMN STORE DATABASE SYSTEMS. Prof. Dr. Uta Störl Big Data Technologies: Column Stores - SoSe

COLUMN STORE DATABASE SYSTEMS. Prof. Dr. Uta Störl Big Data Technologies: Column Stores - SoSe COLUMN STORE DATABASE SYSTEMS Prof. Dr. Uta Störl Big Data Technologies: Column Stores - SoSe 2016 1 Telco Data Warehousing Example (Real Life) Michael Stonebraker et al.: One Size Fits All? Part 2: Benchmarking