Column-Oriented Database Systems

Size: px
Start display at page:

Download "Column-Oriented Database Systems"

Transcription

1 Column-Oriented Database Systems Tutorial Peter Boncz (CWI) Adapted from VLDB 29 Tutorial Column-Oriented Database Systems with Daniel Abadi (Yale) Stavros Harizopuolos (HP Labs)

2 What is a column-store? row-store column-store Date Store Product Customer Price Date Store Product Customer Price + easy to add/modify a record - might read in unnecessary data + only need to read in relevant data - tuple writes require multiple accesses => suitable for read-mostly, read-intensive, large data repositories Tutorial Column-Oriented Database Systems 2

3 Are these two fundamentally different? The only fundamental difference is the storage layout However: we need to look at the big picture different storage layouts proposed row-stores row-stores++ row-stores++ converge? 7s 8s 9s s today column-stores new applications new bottleneck in hardware How did we get here, and where we are heading What are the column-specific optimizations? Part How do we improve CPU efficiency when operating on Cs Part 2 Tutorial Column-Oriented Database Systems Part

4 Outline Part : Basic concepts Introduction to key features From DSM to column-stores and performance tradeoffs Column-store architecture overview Will rows and columns ever converge? Part 2: Column-oriented execution Part : MonetDB/X ( VectorWise ) and CPU efficiency Tutorial Column-Oriented Database Systems 4

5 Telco Data Warehousing example Typical DW installation dimension tables Real-world example account or RAM fact table usage source One Size Fits All? - Part 2: Benchmarking Results Stonebraker et al. CIDR 27 toll star schema QUERY 2 SELECT account.account_number, sum (usage.toll_airtime), sum (usage.toll_price) FROM usage, toll, source, account WHERE usage.toll_id = toll.toll_id AND usage.source_id = source.source_id AND usage.account_id = account.account_id AND toll.type_ind in ( AE. AA ) AND usage.toll_price > AND source.type!= CIBER AND toll.rating_method = IS AND usage.invoice_date = 25 GROUP BY account.account_number Column-store Row-store Query 2.6 Query Query.9 Query Query Why? Three main factors (next slides) Tutorial Column-Oriented Database Systems 5

6 Telco example explained (/): read efficiency row store column store read pages containing entire rows read only columns needed one row = 22 columns! in this example: 7 columns is this typical? (it depends) What about vertical partitioning? (it does not work with ad-hoc queries) caveats: select * not any faster clever disk prefetching clever tuple reconstruction Tutorial Column-Oriented Database Systems 6

7 Telco example explained (2/): compression efficiency Columns compress better than rows Typical row-store compression ratio : Column-store : Why? Rows contain values from different domains => more entropy, difficult to dense-pack Columns exhibit significantly less entropy Examples: Male, Female, Female, Female, Male 998, 998, 999, 999, 999, 2 Caveat: CPU cost (use lightweight compression) Tutorial Column-Oriented Database Systems 7

8 Telco example explained (/): sorting & indexing efficiency Compression and dense-packing free up space Use multiple overlapping column collections Sorted columns compress better Range queries are faster Use sparse clustered indexes What about heavily-indexed row-stores? (works well for single column access, cross-column joins become increasingly expensive) Tutorial Column-Oriented Database Systems 8

9 Additional opportunities for column-stores Block-tuple / vectorized processing Easier to build block-tuple operators Amortizes function-call cost, improves CPU cache performance Easier to apply vectorized primitives Software-based: bitwise operations Hardware-based: SIMD Opportunities with compressed columns Avoid decompression: operate directly on compressed Delay decompression (and tuple reconstruction) Also known as: late materialization Part Exploit columnar storage in other DBMS components Physical design (both static and dynamic) more in Part 2 See: Database Cracking, from CWI Tutorial Column-Oriented Database Systems 9

10 Time (sec) Effect on C-Store performance Column-Stores vs Row-Stores: How Different are They Really? Abadi, Hachem, and Madden. SIGMOD 28. Average for SSBM queries on C-store original C-store column-oriented join algorithm enable compression & operate on compressed enable late materialization Tutorial Column-Oriented Database Systems

11 Summary of column-store key features Storage layout columnar storage Part header/id elimination compression Part 2 Part multiple sort orders column operators Part Part 2 Execution engine avoid decompression late materialization Part 2 vectorized operations Part Design tools, optimizer Tutorial Column-Oriented Database Systems

12 Outline Part : Basic concepts Introduction to key features From DSM to column-stores and performance tradeoffs Column-store architecture overview Will rows and columns ever converge? Part 2: Column-oriented execution Part : MonetDB/X ( VectorWise ) and CPU efficiency Tutorial Column-Oriented Database Systems 2

13 From DSM to Column-stores 7s -985: 985: DSM paper TOD: Time Oriented Database Wiederhold et al. "A Modular, Self-Describing Clinical Databank System," Computers and Biomedical Research, 975 More 97s: Transposed files, Lorie, Batory, Svensson. An overview of cantor: a new system for data analysis Karasalo, Svensson, SSDBM 98 A decomposition storage model Copeland and Khoshafian. SIGMOD s: Commercialization through SybaseIQ Late 9s 2s: Focus on main-memory performance DSM on steroids [997 now] Hybrid DSM/NSM [2 24] CWI: MonetDB Wisconsin: PAX, Fractured Mirrors Michigan: Data Morphing CMU: Clotho 25 : Re-birth of read-optimized DSM as column-store MIT: C-Store CWI: X VectorWise + startups Tutorial Column-Oriented Database Systems

14 The original DSM paper Proposed as an alternative to NSM 2 indexes: clustered on ID, non-clustered on value Speeds up queries projecting few columns Requires more storage ID A decomposition storage model Copeland and Khoshafian. SIGMOD value Tutorial Column-Oriented Database Systems 4

15 Memory wall and PAX 9s: Cache-conscious research to: from: Cache Conscious Algorithms for Relational Query Processing. Shatdal, Kant, Naughton. VLDB 994. Database Architecture Optimized for the New Bottleneck: Memory Access. Boncz, Manegold, Kersten. VLDB 999. and: DBMSs on a modern processor: Where does time go? Ailamaki, DeWitt, Hill, Wood. VLDB 999. PAX: Partition Attributes Across Retains NSM I/O pattern Optimizes cache-to-ram communication Weaving Relations for Cache Performance. Ailamaki, DeWitt, Hill, Skounakis, VLDB 2. Tutorial Column-Oriented Database Systems 5

16 More hybrid NSM/DSM schemes Dynamic PAX: Data Morphing Clotho: custom layout using scatter-gather I/O Fractured mirrors Data morphing: an adaptive, cache-conscious storage technique. Hankins, Patel, VLDB 2. Clotho: Decoupling Memory Page Layout from Storage Organization. Shao, Schindler, Schlosser, Ailamaki, and Ganger. VLDB 24. Smart mirroring with both NSM/DSM copies A Case For Fractured Mirrors. Ramamurthy, DeWitt, Su, VLDB 22. Tutorial Column-Oriented Database Systems 6

17 MonetDB (more in Part ) Late 99s, CWI: Boncz, Manegold, and Kersten Motivation: Main-memory Improve computational efficiency by avoiding expression interpreter DSM with virtual IDs natural choice Developed new query execution algebra Initial contributions: Pointed out memory-wall in DBMSs Cache-conscious projections and joins Tutorial Column-Oriented Database Systems 7

18 25: the (re)birth of column-stores New hardware and application realities Faster CPUs, larger memories, disk bandwidth limit Multi-terabyte Data Warehouses New approach: combine several techniques Read-optimized, fast multi-column access, disk/cpu efficiency, light-weight compression C-store paper: First comprehensive design description of a column-store MonetDB/X proper disk-based column store Explosion of new products Tutorial Column-Oriented Database Systems 8

19 Performance tradeoffs: columns vs. rows DSM traditionally was not favored by technology trends How has this changed? Optimized DSM in Fractured Mirrors, 22 Apples-to-apples comparison Follow-up study Main-memory DSM vs. NSM Flash-disks: a come-back for PAX? Fast Scans and Joins Using Flash Drives Shah, Harizopoulos, Wiener, Graefe. DaMoN 8 Read-Optimized Databases, In- Depth Holloway, DeWitt, VLDB 8 Performance Tradeoffs in Read- Optimized Databases Harizopoulos, Liang, Abadi, Madden, VLDB 6 DSM vs. NSM: CPU performance tradeoffs in block-oriented query processing Boncz, Zukowski, Nes, DaMoN 8 Query Processing Techniques for Solid State Drives Tsirogiannis, Harizopoulos, Shah, Wiener, Graefe, SIGMOD 9 9

20 Fractured mirrors: a closer look Store DSM relations inside a B-tree Leaf nodes contain values Eliminate IDs, amortize header overhead Custom implementation on Shore Tuple Header TID 2 Column Data a a2 a 4 a4 5 a5 a Similar: storage density comparable to column stores sparse B-tree on ID Tutorial Column-Oriented Database Systems 2 A Case For Fractured Mirrors Ramamurthy, DeWitt, Su, VLDB a2 a 4 a4 5 a5 a a2 a 4 a4 a5 Efficient columnar storage in B-trees Graefe. Sigmod Record /27.

21 Fractured mirrors: performance From PAX paper: time regular DSM column? row column? columns projected: Chunk-based tuple merging Read in segments of M pages Merge segments in memory Becomes CPU-bound after 5 pages optimized DSM Tutorial Column-Oriented Database Systems 2

22 Column-scanner implementation row scanner Performance Tradeoffs in Read- Optimized Databases Harizopoulos, Liang, Abadi, Madden, VLDB 6 column scanner apply predicate(s) Joe 45 2 Sue 7 Joe 45 S Direct I/O prefetch ~ms worth of data SELECT name, age WHERE age > 4 Joe Sue Joe 45 Tutorial Column-Oriented Database Systems S apply predicate # #POS 45 #POS S

23 Scan performance Large prefetch hides disk seeks in columns Column-CPU efficiency with lower selectivity Row-CPU suffers from memory stalls Memory stalls disappear in narrow tuples Compression: similar to narrow not shown, details in the paper Tutorial Column-Oriented Database Systems 2

24 time (sec) Varying prefetch size 4 2 no competing disk traffic selected bytes per tuple Column 2 Column 8 Column 6 Column 48 (x 28KB) Row (any prefetch size) No prefetching hurts columns in single scans Tutorial Column-Oriented Database Systems 26

25 time (sec) Varying prefetch size 4 2 no competing disk traffic 4 2 with competing disk traffic Column, 48 Row, selected bytes per tuple Column, 8 Row, No prefetching hurts columns in single scans Under competing traffic, columns outperform rows for any prefetch size Tutorial Column-Oriented Database Systems 27

26 CPU Performance DSM vs. NSM: CPU performance tradeoffs in block-oriented query processing Boncz, Zukowski, Nes, DaMoN 8 Playing field: in-flight tuples, regardless input storage How is data stored in temporary in-memory buffers? Idea: on-the-fly conversion between NSM and DSM Insert data storage conversion operators Convert the entire in-flight tuple, or part of it Effectively hybrid layout Should be planned by query optimizer data layout planning Winners: DSM: sequential access (block fits in L2), random in L NSM: random access, SIMD for grouped Aggregation VLDB 29 Tutorial Column-Oriented Database Systems 28

27 Even faster column scans on flash SSDs New-generation SSDs Very fast random reads, slower random writes Fast sequential RW, comparable to HDD arrays No expensive seeks across columns K Read IOps, K Write Iops 25MB/s Read BW, 2MB/s Write FlashScan and Flashjoin: PAX on SSDs, inside Postgres Query Processing Techniques for Solid State Drives Tsirogiannis, Harizopoulos, Shah, Wiener, Graefe, SIGMOD 9 mini-pages with no qualified attributes are not accessed Tutorial Column-Oriented Database Systems 2

28 Column-scan performance over time regular DSM (2) column-store (26)..to.2x slower from 7x slower..to 2x slower..to same and x faster! optimized DSM (22) SSD Postgres/PAX (29) Tutorial Column-Oriented Database Systems

29 Outline Part : Basic concepts Introduction to key features From DSM to column-stores and performance tradeoffs Column-store architecture overview Will rows and columns ever converge? Part 2: Column-oriented execution Part : MonetDB/X ( VectorWise ) and CPU efficiency Tutorial Column-Oriented Database Systems 4

30 Architecture of a column-store storage layout read-optimized: dense-packed, compressed organize in extends, batch updates multiple sort orders sparse indexes engine block-tuple operators new access methods system-level work on compressed data system-wide column support optimized relational operators loading / updates scaling through multiple nodes transactions / redundancy Tutorial Column-Oriented Database Systems 5

31 Continuous Load and Query (Vertica) Hybrid Storage Architecture Trickle Load > Write Optimized Store (WOS) A B C TUPLE MOVER Asynchronous Data Transfer > Read Optimized Store (ROS) On disk Sorted / Compressed Segmented Large data loaded direct Memory based Unsorted / Uncompressed Segmented Low latency / Small quick inserts A B C (A B C A) Tutorial Column-Oriented Database Systems 8

32 Differential Updates in Column Stores Problem Definition: A table T is kept ordered on a sortkey SK table is stored column-wise, compressed Keep updates (INS,MOD,DEL) in a differential structure write store, in C-Store/Vertica Questions: What data structures/algorithms to use for updates? How are read-only queries impacted? Must merge differential structure with base table Maximal update throughput that can be sustained? Given constraints on RAM for differences IO bandwidth per-modification memory cost Currently viable approach for many scenarios Tutorial Column-Oriented Database Systems 4

33 Value-based Delta Tables (VDTs) How is the row-store organized? Simple approach: INS(C..Cn) table Keeps full record DEL(SK..SKm) Only stores keys of deleted tuples Modifications are DEL+INS SQL: T=inventory SK=[store,prod] Physical Relational Algebra: Problem: overhead on queries Always a MergeUnion (and MergeDiff) CPU cost Union/Diff need the SKs Many queries do not need these by themselves Now the column-store must do IO for them anyway Tutorial Column-Oriented Database Systems 4

34 Positional Updates Remember the position of an update rather than its SK less CPU cost (especially in block-oriented processing) Scan passes tuples through unmodified until the next modification position no need to scan SK columns the SK=(store,prod) so inserts go in between!! TABLEx = state of TABLE at time x SID(t)=StableID=position of tuple t in columns=rid (t) Stable RIDx(t) = RowID of tuple t at time x HIGHLY VOLATILE!! Tutorial Column-Oriented Database Systems 42

35 RIDs and SIDs TABLEx = state of TABLE at time x SID(t)=StableID=position of tuple t in columns=rid (t) Stable RIDx(t) = RowID of tuple t at time x HIGHLY VOLATILE!! The difference between RID and SID: Tutorial Column-Oriented Database Systems 4

36 Enter Positional Delta Trees A counting B-Tree that keeps track of cumulative tuple shifts DIFF^ta_tb maintains a monotonic mapping between RID x (ta) and RID y (tb) DIFF = (PDT,VALS), VALS = value space Merge Operator used by each query to get up-todate table Tutorial Column-Oriented Database Systems 44

37 Enter Positional Delta Trees A counting B-Tree that keeps track of cumulative tuple shifts DIFF^ta_tb maintains a monotonic mapping between RID x (ta) and RID y (tb) DIFF = (PDT,VALS), VALS = value space Merge Operator used by each query to get up-to-date table Tutorial Column-Oriented Database Systems 45

38 PDT-based Transactions Propagate(PDT^t2_t,PDT^t_t2) Requires Consecutive PDTs Serialize(PDT^t2_t,PDT^t_t) PDT^t_t2 Makes Overlapping PDTs Aligned May Fail!! transactional conflict detection Tutorial Column-Oriented Database Systems 46

39 Column-Oriented Database Systems Tutorial Compression Super-Scalar RAM-CPU Cache Compression Zukowski, Heman, Nes, Boncz, ICDE 6 Integrating Compression and Execution in Column- Oriented Database Systems Abadi, Madden, and Ferreira, SIGMOD 6 Query optimization in compressed database systems Chen, Gehrke, Korn, SIGMOD

40 Compression Trades I/O for CPU Increased column-store opportunities: Higher data value locality in column stores Techniques such as run length encoding far more useful Can use extra space to store multiple copies of data in different sort orders Tutorial Column-Oriented Database Systems 57

41 Run-length Encoding Quarter Product ID Price Quarter Product ID Price Q Q Q Q Q Q Q Q2 Q2 Q2 Q (value, start_pos, run_length) (Q,, ) (Q2,, 5) (Q, 65, 5) (Q4, 5, 6) (value, start_pos, run_length) (,, 5) (2, 6, 2) (,, ) (2, 4, ) Tutorial Column-Oriented Database Systems 58

42 Tutorial Column-Oriented Database Systems 59 Bit-vector Encoding Product ID ID: ID: 2 ID: Integrating Compression and Execution in Column- Oriented Database Systems Abadi et. al, SIGMOD 6 For each unique value, v, in column c, create bit-vector b b[i] = if c[i] = v Good for columns with few unique values Each bit-vector can be further compressed if sparse

43 Dictionary Encoding Integrating Compression and Execution in Column- Oriented Database Systems Abadi et. al, SIGMOD 6 For each unique value create dictionary entry Dictionary can be per-block or percolumn Column-stores have the advantage that dictionary entries may encode multiple values at once Quarter Q Q2 Q4 Q Q Q Q Q Q2 Q4 Q Q Quarter Dictionary : Q : Q2 2: Q : Q4 OR Quarter Dictionary 24: Q, Q2, Q4, Q 22: Q2, Q4, Q, Q 28: Q, Q, Q, Q Tutorial Column-Oriented Database Systems 6

44 Frame Of Reference Encoding Encodes values as b bit offset from chosen frame of reference Special escape code (e.g. all bits set to ) indicates a difference larger than can be stored in b bits After escape code, original (uncompressed) value is written Compressing Relations and Indexes Goldstein, Ramakrishnan, Shaft, ICDE 98 Price Price Frame: 5 Tutorial Column-Oriented Database Systems bits per value Exceptions (see part for a better way to deal with exceptions)

45 Differential Encoding Encodes values as b bit offset from previous value Special escape code (just like frame of reference encoding) indicates a difference larger than can be stored in b bits After escape code, original (uncompressed) value is written Performs well on columns containing increasing/decreasing sequences inverted lists timestamps object IDs sorted / clustered columns Improved Word-Aligned Binary Compression for Text Indexing Ahn, Moffat, TKDE 6 Time 5: 5:2 5: 5: 5:4 5:6 5:7 5:8 5: 5:5 5:6 5:6 Time 5: :5 2 bits per value Exception (see X for a better way to deal with exceptions) Tutorial Column-Oriented Database Systems 62

46 What Compression Scheme To Use? Does column appear in the sort key? Is the average run-length > 2 yes no yes no Are number of unique values < ~5 RLE Differential Encoding no yes Does this column appear frequently in selection predicates? Is the data numerical and exhibit good locality? yes no yes Frame of Reference Encoding no Leave Data Uncompressed Bit-vector Compression Dictionary Compression OR Heavyweight Compression Tutorial Column-Oriented Database Systems 6

47 Super-Scalar RAM-CPU Cache Compression Zukowski, Heman, Nes, Boncz, ICDE 6 Heavy-Weight Compression Schemes Modern disk arrays can achieve > GB/s / CPU for decompression GB/s needed Lightweight compression schemes are better Even better: operate directly on compressed data Tutorial Column-Oriented Database Systems 64

48 Integrating Compression and Execution in Column- Oriented Database Systems Abadi et. al, SIGMOD 6 Operating Directly on Compressed Data I/O - CPU tradeoff is no longer a tradeoff Reduces memory CPU bandwidth requirements Opens up possibility of operating on multiple records at once Tutorial Column-Oriented Database Systems 65

49 Integrating Compression and Execution in Column- Oriented Database Systems Abadi et. al, SIGMOD 6 Operating Directly on Compressed Data Quarter (Q,, ) (Q2,, 6) (Q, 7, 5) (Q4, 87, 6) -6 Index Lookup + Offset jump Product ID 2 ProductID, COUNT(*)) (, ) (2, ) (, 2) SELECT ProductID, Count(*) FROM table WHERE (Quarter = Q2) GROUP BY ProductID Tutorial Column-Oriented Database Systems 66

50 Integrating Compression and Execution in Column- Oriented Database Systems Abadi et. al, SIGMOD 6 Operating Directly on Compressed Data Aggregation Operator Selection Operator Compression- Aware Scan Operator Data isonevalue() isvaluesorted() isposcontiguous() issparse() getnext() decompressintoarray() getvalueatposition(pos) getmin() getmax() getsize() Block API SELECT ProductID, Count(*) FROM table WHERE (Quarter = Q2) GROUP BY ProductID (Q,, ) (Q2,, 6) (Q, 7, 5) (Q4, 87, 6) Tutorial Column-Oriented Database Systems 67

51 Column-Oriented Database Systems Tutorial Tuple Materialization and Column-Oriented Join Algorithms Materialization Strategies in a Column- Oriented DBMS Abadi, Myers, DeWitt, and Madden. ICDE 27. Self-organizing tuple reconstruction in column-stores, Idreos, Manegold, Kersten, SIGMOD 29 Column-Stores vs Row-Stores: How Different are They Really? Abadi, Madden, and Hachem. SIGMOD 28. Query Processing Techniques for Solid State Drives Tsirogiannis, Harizopoulos Shah, Wiener, and Graefe. SIGMOD 29. Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 24

52 When should columns be projected? Where should column projection operators be placed in a query plan? Row-store: Column projection involves removing unneeded columns from tuples Generally done as early as possible Column-store: Operation is almost completely opposite from a row-store Column projection involves reading needed columns from storage and extracting values for a listed set of tuples This process is called materialization Early materialization: project columns at beginning of query plan Straightforward since there is a one-to-one mapping across columns Late materialization: wait as long as possible for projecting columns More complicated since selection and join operators on one column obfuscates mapping to other columns from same table Most column-stores construct tuples and column projection time Many database interfaces expect output in regular tuples (rows) Rest of discussion will focus on this case Tutorial Column-Oriented Database Systems 69

53 When should tuples be constructed? Select + Aggregate QUERY: SELECT custid,sum(price) FROM table WHERE (prodid = 4) AND (storeid = ) AND GROUP BY custid Construct Solution : Create rows first (EM). But: (4,,4) Need to construct ALL tuples Need to decompress data Poor memory bandwidth utilization prodid storeid custid price Tutorial Column-Oriented Database Systems 7

54 Solution 2: Operate on columns Data Source custid Data Source prodid AGG AND Data Source price Data Source storeid Data Source prodid Data Source 2 storeid QUERY: SELECT custid,sum(price) FROM table WHERE (prodid = 4) AND (storeid = ) AND GROUP BY custid prodid storeid custid price Tutorial Column-Oriented Database Systems 7

55 Solution 2: Operate on columns QUERY: Data Source custid Data Source prodid AGG AND Data Source price Data Source storeid AND SELECT custid,sum(price) FROM table WHERE (prodid = 4) AND (storeid = ) AND GROUP BY custid prodid storeid custid price Tutorial Column-Oriented Database Systems 72

56 Solution 2: Operate on columns QUERY: Data Source custid Data Source prodid AGG AND Data Source price Data Source storeid 2 custid Data Source 8 Data Source price SELECT custid,sum(price) FROM table WHERE (prodid = 4) AND (storeid = ) AND GROUP BY custid prodid storeid custid price Tutorial Column-Oriented Database Systems 7

57 Solution 2: Operate on columns QUERY: Data Source custid AGG Data Source price 9 SELECT custid,sum(price) FROM table WHERE (prodid = 4) AND (storeid = ) AND GROUP BY custid Data Source prodid AND Data Source storeid AGG prodid storeid custid price Tutorial Column-Oriented Database Systems 74

58 Time (seconds) Materialization Strategies in a Column-Oriented DBMS Abadi, Myers, DeWitt, and Madden. ICDE 27. For plans without joins, late materialization is a win Low selectivity Medium selectivity High selectivity QUERY: SELECT C, SUM(C 2 ) FROM table WHERE (C < CONST) AND (C 2 < CONST) GROUP BY C Ran on 2 compressed columns from TPC-H scale data Early Materialization Late Materialization Tutorial Column-Oriented Database Systems 75

59 Time (seconds) Materialization Strategies in a Column-Oriented DBMS Abadi, Myers, DeWitt, and Madden. ICDE 27. Even on uncompressed data, late materialization is still a win QUERY: SELECT C, SUM(C 2 ) FROM table WHERE (C < CONST) AND (C 2 < CONST) GROUP BY C 2 Low selectivity Medium selectivity High selectivity Materializing late still works best Early Materialization Late Materialization Tutorial Column-Oriented Database Systems 76

60 What about for plans with joins? Select R.B, R.C, R2.E, R2.H, R.F From R, R2, R Where R.A = R2.D AND R2.G = R.K Project B, C, E, H, F B, C, E, G, H A, B, C B, C, E, H, F Join A = D Join 2 G = K Early materialization D, E, G, H F, K Scan R (F, K, L) B, C A G Join A = D D Join G = K Project Late materialization Scan R (F, K, L) G K F E, H Scan Scan Scan Scan R (A, B, C) R2 (D, E, G, H) R (A, B, C) R2 (D, E, G, H) Tutorial Column-Oriented Database Systems 77 77

61 Early Materialization Example 2 7 QUERY: Green White Brown SELECT C.lastName,SUM(F.price) FROM facts AS F, customers AS C WHERE F.custID = C.custID GROUP BY C.lastName Construct Construct (4,,4) Green White Brown prodid storeid quantity custid price Facts custid lastname Customers Tutorial Column-Oriented Database Systems 78

62 Early Materialization Example White Brown Brown Brown QUERY: SELECT C.lastName,SUM(F.price) FROM facts AS F, customers AS C WHERE F.custID = C.custID GROUP BY C.lastName Join Green White Brown Tutorial Column-Oriented Database Systems 79

63 Late Materialization Example QUERY: SELECT C.lastName,SUM(F.price) FROM facts AS F, customers AS C WHERE F.custID = C.custID GROUP BY C.lastName (4,,4) Join 2 Green White Brown Late materialized join causes out of order probing of projected columns from the inner relation prodid storeid quantity custid price Facts custid lastname Customers Tutorial Column-Oriented Database Systems 8

64 Late Materialized Join Performance Naïve LM join about 2X slower than EM join on typical queries (due to random I/O) This number is very dependent on Amount of memory available Number of projected attributes Join cardinality But we can do better Invisible Join Jive/Flash Join Radix cluster/decluster join Tutorial Column-Oriented Database Systems 8

65 Invisible Join Column-Stores vs Row-Stores: How Different are They Really? Abadi, Madden, and Hachem. SIGMOD 28. Designed for typical joins when data is modeled using a star schema One ( fact ) table is joined with multiple dimension tables Typical query: select c_nation, s_nation, d_year, sum(lo_revenue) as revenue from customer, lineorder, supplier, date where lo_custkey = c_custkey and lo_suppkey = s_suppkey and lo_orderdate = d_datekey and c_region = 'ASIA and s_region = 'ASIA and d_year >= 992 and d_year <= 997 group by c_nation, s_nation, d_year order by d_year asc, revenue desc; Tutorial Column-Oriented Database Systems 82

66 Invisible Join Column-Stores vs Row-Stores: How Different are They Really? Abadi, Madden, and Hachem. SIGMOD 28. Apply region = Asia On Customer Table custkey region nation ASIA CHINA 2 ASIA INDIA ASIA INDIA 4 EUROPE FRANCE Apply region = Asia On Supplier Table suppkey region nation ASIA RUSSIA 2 EUROPE SPAIN ASIA JAPAN Apply year in [992,997] On Date Table dateid year Hash Table (or bit-map) Containing Keys, 2 and Hash Table (or bit-map) Containing Keys, Hash Table Containing Keys 997, 2997, and 997 Tutorial Column-Oriented Database Systems 8

67 orderkey Original Fact Table custkey suppkey orderdate revenue Column-Stores vs Row-Stores: How Different are They Really? Abadi et. al. SIGMOD 28. & & = Hash Table Containing Keys, 2 and + custkey 4 4 Hash Table Containing Keys and + suppkey Hash Table Containing Keys 997, 2997, and orderdate Tutorial Column-Oriented Database Systems 84

68 Tutorial Column-Oriented Database Systems 85 custkey suppkey orderdate JOIN CHINA INDIA INDIA = CHINA INDIA RUSSIA SPAIN = RUSSIA RUSSIA = FRANCE JAPAN

69 Invisible Join Column-Stores vs Row-Stores: How Different are They Really? Abadi, Madden, and Hachem. SIGMOD 28. Apply region = Asia On Customer Table custkey region nation ASIA CHINA 2 ASIA INDIA ASIA INDIA 4 EUROPE FRANCE Apply region = Asia On Supplier Table suppkey region nation ASIA RUSSIA 2 EUROPE SPAIN ASIA JAPAN Apply year in [992,997] On Date Table dateid year Hash Table (or bit-map) Containing Keys, 2 and Range [-] (between-predicate rewriting) Hash Table (or bit-map) Containing Keys, Hash Table Containing Keys 997, 2997, and 997 Tutorial Column-Oriented Database Systems 86

70 Invisible Join Bottom Line Many data warehouses model data using star/snowflake schemes Joins of one (fact) table with many dimension tables is common Invisible join takes advantage of this by making sure that the table that can be accessed in position order is the fact table for each join Position lists from the fact table are then intersected (in position order) This reduces the amount of data that must be accessed out of order from the dimension tables Between-predicate rewriting trick not relevant for this discussion Tutorial Column-Oriented Database Systems 87

71 custkey CHINA INDIA INDIA FRANCE = INDIA CHINA suppkey Still accessing table out of order RUSSIA SPAIN JAPAN = RUSSIA RUSSIA orderdate JOIN = Tutorial Column-Oriented Database Systems 88

72 Jive/Flash Join Fast Joins using Join Indices. Li and Ross, VLDBJ 8:-24, CHINA INDIA INDIA FRANCE = INDIA CHINA Still accessing table out of order Query Processing Techniques for Solid State Drives. Tsirogiannis, Harizopoulos et. al. SIGMOD 29. Tutorial Column-Oriented Database Systems 89

73 Jive/Flash Join. Add column with dense ascending integers from 2. Sort new position list by second column. Probe projected column in order using new sorted position list, keeping first column from position list around 4. Sort new result by first column CHINA INDIA INDIA FRANCE CHINA INDIA INDIA FRANCE = INDIA CHINA = 2 CHINA INDIA 2 INDIA CHINA Tutorial Column-Oriented Database Systems 9

74 Jive/Flash Join Bottom Line Instead of probing projected columns from inner table out of order: Sort join index Probe projected columns in order Sort result using an added column LM vs EM tradeoffs: LM has the extra sorts (EM accesses all columns in order) LM only has to fit join columns into memory (EM needs join columns and all projected columns) Results in big memory and CPU savings (see part for why there is CPU savings) LM only has to materialize relevant columns In many cases LM advantages outweigh disadvantages LM would be a clear winner if not for those pesky sorts can we do better? Tutorial Column-Oriented Database Systems 9

75 Radix Cluster/Decluster The full sort from the Jive join is actually overkill We just want to access the storage blocks in order (we don t mind random access within a block) So do a radix sort and stop early By stopping early, data within each block is accessed out of order, but in the order specified in the original join index Use this pseudo-order to accelerate the post-probe sort as well Database Architecture Optimized for the New Bottleneck: Memory Access VLDB 99 Generic Database Cost Models for Hierarchical Memory Systems, VLDB 2 (all Manegold, Boncz, Kersten) Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 Tutorial Column-Oriented Database Systems 92

76 Radix Cluster/Decluster Bottom line Both sorts from the Jive join can be significantly reduced in overhead Only been tested when there is sufficient memory for the entire join index to be stored three times Technique is likely applicable to larger join indexes, but utility will go down a little Only works if random access within a storage block Don t want to use radix cluster/decluster if you have variablewidth column values or compression schemes that can only be decompressed starting from the beginning of the block Tutorial Column-Oriented Database Systems 9

77 LM vs EM joins Invisible, Jive, Flash, Cluster, Decluster techniques contain a bag of tricks to improve LM joins Research papers show that LM joins become 2X faster than EM joins (instead of 2X slower) for a wide array of query types Tutorial Column-Oriented Database Systems 94

78 Tuple Construction Heuristics For queries with selective predicates, aggregations, or compressed data, use late materialization For joins: Research papers: Always use late materialization Commercial systems: Inner table to a join often materialized before join (reduces system complexity): Some systems will use LM only if columns from inner table can fit entirely in memory Tutorial Column-Oriented Database Systems 95

79 Outline Part : Basic concepts Introduction to key features From DSM to column-stores and performance tradeoffs Column-store architecture overview Will rows and columns ever converge? Part 2: Column-oriented execution Part : MonetDB/X ( VectorWise ) and CPU efficiency Tutorial Column-Oriented Database Systems 96

80 Outline Computational Efficiency of DB on modern hardware how column-stores can help here MonetDB & VectorWise in more depth CPU efficient column compression vectorized decompression Conclusions future work Tutorial Column-Oriented Database Systems 97

81 Column-Oriented Database Systems Tutorial 4 years of hardware evolution vs. DBMS computational efficiency

82 CPU Architecture Elements: Storage CPU caches L/L2/L Registers Execution Unit(s) Pipelined SIMD Tutorial Column-Oriented Database Systems 99

83 Super-Scalar Execution (pipelining) Tutorial Column-Oriented Database Systems 2

84 Hazards Data hazards Dependencies between instructions L data cache misses Control Hazards Branch mispredictions Computed branches (late binding) L instruction cache misses Result: bubbles in the pipeline Out-of-order execution addresses data hazards control hazards typically more expensive Tutorial Column-Oriented Database Systems

85 SIMD Single Instruction Multiple Data Same operation applied on a vector of values MMX: 64 bits, SSE: 28bits, AVX: 256bits SSE, e.g. multiply 8 short integers Tutorial Column-Oriented Database Systems 4

86 A Look at the Query Pipeline 2 ivan 5 next() 2 ivan 7 77 * PROJECT SELECT id, name (age-)*5 AS bonus FROM employee WHERE age > next() 22 7 >? 2 ivan alice 7 22 FALSE TRUE SELECT next() 2 alice ivan 22 7 SCAN Tutorial Column-Oriented Database Systems 5

87 A Look at the Query Pipeline 2 ivan 5 next() next() next() 77 * >? PROJECT SELECT SCAN Operators Iterator interface -open() -next(): tuple -close() Tutorial Column-Oriented Database Systems 6

88 A Look at the Query Pipeline 2 ivan 5 next() 2 ivan 7 77 * Primitives next() next() 2 2 ivan alice 7 22 alice ivan >? PROJECT FALSE TRUE SELECT SCAN Provide computational functionality All arithmetic allowed in expressions, e.g. Multiplication 7 * 5 mult(int,int) int Tutorial Column-Oriented Database Systems 7

89 Database Architecture causes Hazards Data hazards Dependencies between instructions L data cache misses SIMD Out-of-order Execution Control Hazards Branch mispredictions Computed branches (late binding) L instruction cache misses work on one tuple at a time Large Tree/Hash Structures Code footprint of all operators in query plan exceeds L cache Data-dependent conditions PROJECT SELECT Next() late binding SCAN method calls DBMSs On A Modern Processor: Where Does Time Go? Tree, List, Hash traversal Complex NSM record navigation Ailamaki, DeWitt, Hill, Wood, VLDB 99 Operators Iterator interface -open() -next(): tuple -close() Tutorial Column-Oriented Database Systems 8

90 DBMS Computational Efficiency TPC-H GB, query selects 98% of fact table, computes net prices and aggregates all Results: C program:? MySQL: 26.2s DBMS X : 28.s MonetDB/X: Hyper-Pipelining Query Execution Boncz, Zukowski, Nes, CIDR 5 Tutorial Column-Oriented Database Systems 9

91 DBMS Computational Efficiency TPC-H GB, query selects 98% of fact table, computes net prices and aggregates all Results: C program:.2s MySQL: 26.2s DBMS X : 28.s MonetDB/X: Hyper-Pipelining Query Execution Boncz, Zukowski, Nes, CIDR 5 Tutorial Column-Oriented Database Systems

92 Column-Oriented Database Systems Tutorial MonetDB

93 a column-store save disk I/O when scan-intensive queries columns avoid an expression interpreter to improve computational efficiency need a few Tutorial Column-Oriented Database Systems 2

94 RISC Database Algebra CPU? Give it nice code! - few dependencies (control,data) - CPU gets out-of-order execution - compiler can e.g. generate SIMD One loop for an entire column batcalc_minus_int(int* - no per-tuple interpretationres, int* col, - arrays: no record navigation int val, - better instruction cache int locality n) { SELECT for(i=; id, name, i<n; (age-)*5 i++) as bonus FROM res[i] people = col[i] - val; } WHERE age > Simple, hardcoded semantics in operators MATERIALIZED intermediate results Tutorial Column-Oriented Database Systems

95 a column-store save disk I/O when scan-intensive queries need a few columns avoid an expression interpreter to improve computational efficiency How? RISC query algebra: hard-coded semantics Decompose complex expressions in multiple operations Operators only handle simple arrays No code that handles slotted buffered record layout Relational algebra becomes array manipulation language Often SIMD for free Plus: use of cache-conscious algorithms for Sort/Aggr/Join Tutorial Column-Oriented Database Systems 4

96 You want efficiency Simple hard-coded operators I take scalability a Faustian pact Result materialization C program:.2s MonetDB:.7s MySQL: 26.2s DBMS X : 28.s Tutorial Column-Oriented Database Systems 5

97 SIGMOD 985 MIL Primitives for Querying a Fragmented World, Boncz, Kersten, VLDBJ 98 Flattening an Object Algebra to Provide Performance Boncz, Wilschut, Kersten, ICDE 98 MonetDB/XQuery: a fast XQuery processor powered by a relational engine Boncz, Grust, vankeulen, Rittinger, Teubner, SIGMOD 6 SW-Store: a vertically partitioned DBMS for Semantic Web data management Abadi, Marcus, Madden, Hollenbach, VLDBJ 9 Tutorial Column-Oriented Database Systems 7

98 Cache-Conscious Joins Cost Models, Radix-cluster, Radix-decluster MonetDB/XQuery: structural joins exploiting positional column access Cracking: on-the-fly automatic indexing without workload knowledge Recycling: optimizer frontend backend as a research platform using materialized intermediates Run-time Query Optimization: correlation-aware run-time optimization without cost model Database Architecture Optimized for the New Bottleneck: Memory Access VLDB 99 Generic Database Cost Models for Hierarchical Memory Systems, VLDB 2 (all Manegold, Boncz, Kersten) Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 MonetDB/XQuery: a fast XQuery processor powered by a relational engine Boncz, Grust, vankeulen, Rittinger, Teubner, SIGMOD 6 Database Cracking, CIDR 7 Updating a cracked database, SIGMOD 7 Self-organizing tuple reconstruction in columnstores, SIGMOD 9 (all Idreos, Manegold, Kersten) An architecture for recycling intermediates in a column-store, Ivanova, Kersten, Nes, Goncalves, SIGMOD 9 ROX: run-time optimization of XQueries, Abdelkader, Boncz, Manegold, vankeulen, SIGMOD 9 Tutorial Column-Oriented Database Systems 9

99 Cache-Conscious Joins Radix-Cluster Radix-Partitioned Hash Join create partitions << CPU cache small partitions many partitions many partitions multiple passes needed Database Architecture Optimized for the New Bottleneck: Memory Access VLDB 99 Generic Database Cost Models for Hierarchical Memory Systems, VLDB 2 (all Manegold, Boncz, Kersten) VLDB 29 Tutorial Column-Oriented Database Systems 2

100 Radix-Partitioned Hash Join create partitions << CPU cache small partitions many partitions many partitions multiple passes needed Radix-Cluster Cache-Conscious Joins Radix-Cluster Radix-Sort with early stopping Each pass looks at Bi higher-most radix bits Splitting each input cluster into 2 Bi output clusters leaves relation partially ordered Database Architecture Optimized for the New Bottleneck: Memory Access VLDB 99 Generic Database Cost Models for Hierarchical Memory Systems, VLDB 2 (all Manegold, Boncz, Kersten) VLDB 29 Tutorial Column-Oriented Database Systems 2

101 Cache-Conscious Joins Radix-Cluster Multiple clustering passes Limit number of clusters per pass Avoid cache/tlb trashing Trade memory cost for CPU cost Any data type (hashing) Database Architecture Optimized for the New Bottleneck: Memory Access VLDB 99 Generic Database Cost Models for Hierarchical Memory Systems, VLDB 2 (all Manegold, Boncz, Kersten) VLDB 29 Tutorial Column-Oriented Database Systems 22

102 Cache-Conscious Joins Accurate Cache Miss Cost Modeling VLDB 29 Tutorial Column-Oriented Database Systems 2

103 Cache-Conscious Joins Partitioned Hash Join VLDB 29 Tutorial Column-Oriented Database Systems 24

104 Cache-Conscious Joins Radix-Clustered Hash-Join: overall perf (64,, tuples) VLDB 29 Tutorial Column-Oriented Database Systems 25

105 Cache-Conscious Joins Pre-Projection vs. Post-Projection VLDB 29 Tutorial Column-Oriented Database Systems 26

106 Cache-Conscious Joins Pre-Projection vs. Post-Projection Radix-Decluster = cache-conscious post-projection VLDB 29 Tutorial Column-Oriented Database Systems 27

107 Cache-Conscious Joins Post-Projection Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 Partitioned Hash-Join Join Index! Join Indices Valduriez, TODS 87 VLDB 29 Tutorial Column-Oriented Database Systems 28

108 Partitioned Hash-Join Cluster on Left Cache-Conscious Joins Post-Projection Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 VLDB 29 Tutorial Column-Oriented Database Systems 29

109 Partitioned Hash-Join Cluster on Left Cache-Conscious Joins Post-Projection Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB N- VLDB 29 Tutorial Column-Oriented Database Systems

110 Partitioned Hash-Join Cluster on Left Project Left Cache-Conscious Joins Post-Projection Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 VLDB 29 Tutorial Column-Oriented Database Systems

111 Cache-Conscious Joins Post-Projection Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 Partitioned Hash-Join Cluster on Left Project Left Cluster on Right VLDB 29 Tutorial Column-Oriented Database Systems 2

112 Partitioned Hash-Join Cluster on Left Project Left Cluster on Right Project Right Cache-Conscious Joins Post-Projection Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 Radix-Decluster VLDB 29 Tutorial Column-Oriented Database Systems

113 Red column forms a dense domain {,,2,,N-} All subsequences are ordered e.g. Cache-Conscious Joins Radix-Decluster Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 Task: merge subsequences into dense sequence VLDB 29 Tutorial Column-Oriented Database Systems 4

114 Red column forms a dense domain {,,2,,N-} All subsequences are ordered e.g. Cache-Conscious Joins Radix-Decluster Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 Task: merge subsequences into dense sequence Approach : merge H=2 B lists cost O(N log(h)) many (H) cursors needed cache thrashing VLDB 29 Tutorial Column-Oriented Database Systems 5

115 Red column forms a dense domain {,,2,,N-} All subsequences are ordered e.g. Cache-Conscious Joins Radix-Decluster Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 Task: merge subsequences into dense sequence Approach : merge H=2 B lists cost O(N log(h)), many (H) cursors needed cache thrashing Approach 2: insert by position many (H) sparse passes over the result no cache reuse VLDB 29 Tutorial Column-Oriented Database Systems 6

116 Red column forms a dense domain {,,2,,N-} All subsequences are ordered e.g. Cache-Conscious Joins Radix-Decluster Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 Task: merge subsequences into dense sequence Approach : merge H=2 B lists cost O(N log(h)) many (H) cursors needed cache thrashing Approach 2: insert by position many (H) sparse passes over the result no cache reuse Radix-Decluster: insert by position with sliding window VLDB 29 Tutorial Column-Oriented Database Systems 7

117 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 VLDB 29 Tutorial Column-Oriented Database Systems 8

118 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 clusters VLDB 29 Tutorial Column-Oriented Database Systems 9

119 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 destination positions VLDB 29 Tutorial Column-Oriented Database Systems 4

120 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 Projection column (still) in wrong order VLDB 29 Tutorial Column-Oriented Database Systems 4

121 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 Result column to fill insertion window of size 2 VLDB 29 Tutorial Column-Oriented Database Systems 42

122 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 VLDB 29 Tutorial Column-Oriented Database Systems 4

123 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 VLDB 29 Tutorial Column-Oriented Database Systems 44

124 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 VLDB 29 Tutorial Column-Oriented Database Systems 45

125 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 VLDB 29 Tutorial Column-Oriented Database Systems 46

126 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 VLDB 29 Tutorial Column-Oriented Database Systems 47

127 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 VLDB 29 Tutorial Column-Oriented Database Systems 48

128 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 VLDB 29 Tutorial Column-Oriented Database Systems 49

129 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 VLDB 29 Tutorial Column-Oriented Database Systems 5

130 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 VLDB 29 Tutorial Column-Oriented Database Systems 5

131 Cache-Conscious Joins Radix-Decluster Memory Access Pattern Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 4 Random access only to sliding window(<< cache size) Only compulsary misses cache-conscious VLDB 29 Tutorial Column-Oriented Database Systems 52

132 Cache-Conscious Joins Radix-Decluster Performance Tradeoff Radix-Decluster prefers small W (i.e. vertical fragmentation) Read Also: - Jive Join - Slam Join Fast Joins Using Join Indices Ross, Lei, VLDBJ 98 VLDB 29 Tutorial Column-Oriented Database Systems 5

133 Cache-Conscious Joins Pre-Projection vs. Post-Projection Query Processing Techniques for Solid State Drives Tsirogiannis, Harizopoulos,Shah, Wiener, Graefe, SIGMOD 9 VLDB 29 Tutorial Column-Oriented Database Systems 54

134 Column-Oriented Database Systems VLDB 29 Tutorial MonetDB/X vectorized query processing

135 MonetDB spin-off: MonetDB/X Materialization vs Pipelining Tutorial Column-Oriented Database Systems 78

136 MonetDB/X: Hyper-Pipelining Query Execution Boncz, Zukowski, Nes, CIDR 5 Vectorized In Cache Processing Observations: next() called much less often more vector time = spent array in of primitives ~ less in overhead processed in a tight loop primitive calls process an array of values CPU cache in a loop: Resident CPU Efficiency depends on nice code - out-of-order execution - few dependencies (control,data) - compiler support Compilers like simple loops over arrays - loop-pipelining - automatic SIMD >? FALSE TRUE TRUE FALSE next() next() * 5 next() 2 ivan 5 4 for(i=; peggy 75 i<n; i++) 2 4 res[i] = - (col[i] * 5 > x) ivan peggy for(i=; i<n; > i++)? PROJECT res[i] alice = 22(col[i] FALSE - x) 2 ivan 7 TRUE 4 peggy 45 TRUE for(i=; 5 victor i<n; 25 FALSE i++) SELECT res[i] = (col[i] * x) alice ivan peggy victor SCAN Tutorial Column-Oriented Database Systems 79

137 map_mul_flt_val_flt_col( map_mul_flt_val_flt_col( float *res, float *res, int* sel, int* sel, float val, float val, Tricks float float being *col, int *col, int played: n) n) -{ { Late materialization if (many selected and narrow) for(int i=; i<n; i++) - Materialization for(int i=; i<n; res[i] val * avoidance i++) // SIMD!! col[sel[i]]; using } selection res[i] = val vectors * col[i]; else for(int i=; i<n; i++) selection vectors used to reduce res[i] = val * col[sel[i]]; vector copying } contain selected positions push selections up to get SIMD MonetDB/X: Hyper-Pipelining Query Execution Boncz, Zukowski, Nes, CIDR 5 next() next() next() alice ivan peggy victor * >? PROJECT SELECT SCAN Tutorial Column-Oriented Database Systems 8

138 MonetDB/X Both efficiency Vectorized primitives and scalability.. Pipelined query evaluation MonetDB/X: Hyper-Pipelining Query Execution Boncz, Zukowski, Nes, CIDR 5 C program:.2s MonetDB/X:.6s MonetDB:.7s MySQL: 26.2s DBMS X : 28.s Tutorial Column-Oriented Database Systems 8

139 Memory Hierarchy X query engine MonetDB/X: Hyper-Pipelining Query Execution Boncz, Zukowski, Nes, CIDR 5 CPU cache RAM ColumnBM (buffer manager) (raid) Disk(s) Tutorial Column-Oriented Database Systems 82

140 Memory Hierarchy X query engine MonetDB/X: Hyper-Pipelining Query Execution Boncz, Zukowski, Nes, CIDR 5 Vectors are only the in-cache representation RAM & disk representation might actually be different CPU cache (vectorwise uses both PAX & DSM) RAM ColumnBM (buffer manager) (raid) Disk(s) Tutorial Column-Oriented Database Systems 8

141 Varying the Vector size MonetDB/X: Hyper-Pipelining Query Execution Boncz, Zukowski, Nes, CIDR 5 Less and less iterator.next() and primitive function calls ( interpretation overhead ) Tutorial Column-Oriented Database Systems 85

142 Varying the Vector size MonetDB/X: Hyper-Pipelining Query Execution Boncz, Zukowski, Nes, CIDR 5 Vectors start to exceed the CPU cache, causing additional memory traffic Tutorial Column-Oriented Database Systems 86

143 Varying the Vector size MonetDB/X: Hyper-Pipelining Query Execution Boncz, Zukowski, Nes, CIDR 5 The benefit of selection vectors Tutorial Column-Oriented Database Systems 87

144 MonetDB/MIL materializes columns MonetDB/X: Hyper-Pipelining Query Execution Boncz, Zukowski, Nes, CIDR 5 CPU cache RAM ColumnBM (buffer manager) (raid) Disk(s) Tutorial Column-Oriented Database Systems 88

COLUMN STORE DATABASE SYSTEMS. Prof. Dr. Uta Störl Big Data Technologies: Column Stores - SoSe

COLUMN STORE DATABASE SYSTEMS. Prof. Dr. Uta Störl Big Data Technologies: Column Stores - SoSe COLUMN STORE DATABASE SYSTEMS Prof. Dr. Uta Störl Big Data Technologies: Column Stores - SoSe 2016 1 Telco Data Warehousing Example (Real Life) Michael Stonebraker et al.: One Size Fits All? Part 2: Benchmarking

More information

Column-Stores vs. Row-Stores: How Different Are They Really?

Column-Stores vs. Row-Stores: How Different Are They Really? Column-Stores vs. Row-Stores: How Different Are They Really? Daniel J. Abadi, Samuel Madden and Nabil Hachem SIGMOD 2008 Presented by: Souvik Pal Subhro Bhattacharyya Department of Computer Science Indian

More information

Architecture-Conscious Database Systems

Architecture-Conscious Database Systems Architecture-Conscious Database Systems 2009 VLDB Summer School Shanghai Peter Boncz (CWI) Sources Thank You! l l l l Database Architectures for New Hardware VLDB 2004 tutorial, Anastassia Ailamaki Query

More information

1/3/2015. Column-Store: An Overview. Row-Store vs Column-Store. Column-Store Optimizations. Compression Compress values per column

1/3/2015. Column-Store: An Overview. Row-Store vs Column-Store. Column-Store Optimizations. Compression Compress values per column //5 Column-Store: An Overview Row-Store (Classic DBMS) Column-Store Store one tuple ata-time Store one column ata-time Row-Store vs Column-Store Row-Store Column-Store Tuple Insertion: + Fast Requires

More information

Column Stores vs. Row Stores How Different Are They Really?

Column Stores vs. Row Stores How Different Are They Really? Column Stores vs. Row Stores How Different Are They Really? Daniel J. Abadi (Yale) Samuel R. Madden (MIT) Nabil Hachem (AvantGarde) Presented By : Kanika Nagpal OUTLINE Introduction Motivation Background

More information

Large-Scale Data Engineering. Modern SQL-on-Hadoop Systems

Large-Scale Data Engineering. Modern SQL-on-Hadoop Systems Large-Scale Data Engineering Modern SQL-on-Hadoop Systems Analytical Database Systems Parallel (MPP): Teradata Paraccel Pivotal Vertica Redshift Oracle (IMM) DB2-BLU SQLserver (columnstore) Netteza InfoBright

More information

Big Data Infrastructures & Technologies

Big Data Infrastructures & Technologies Big Data Infrastructures & Technologies SQL on Big Data THE DEBATE: DATABASE SYSTEMS VS MAPREDUCE A major step backwards? MapReduce is a step backward in database access Schemas are good Separation of

More information

Big Data Infrastructures & Technologies. SQL on Big Data

Big Data Infrastructures & Technologies. SQL on Big Data Big Data Infrastructures & Technologies SQL on Big Data THE DEBATE: DATABASE SYSTEMS VS MAPREDUCE A major step backwards? MapReduce is a step backward in database access Schemas are good Separation of

More information

Large-Scale Data Engineering

Large-Scale Data Engineering Large-Scale Data Engineering SQL on Big Data THE DEBATE: DATABASE SYSTEMS VS MAPREDUCE A major step backwards? MapReduce is a step backward in database access Schemas are good Separation of the schema

More information

LOD2 Creating Knowledge out of Interlinked Data. Project Number: Start Date of Project: 01/09/2010 Duration: 48 months

LOD2 Creating Knowledge out of Interlinked Data. Project Number: Start Date of Project: 01/09/2010 Duration: 48 months Collaborative Project LOD2 Creating Knowledge out of Interlinked Data Project Number: 257943 Start Date of Project: 01/09/2010 Duration: 48 months Deliverable 2.3 Integration of MonetDB Technology in Virtuoso

More information

Column Stores - The solution to TB disk drives? David J. DeWitt Computer Sciences Dept. University of Wisconsin

Column Stores - The solution to TB disk drives? David J. DeWitt Computer Sciences Dept. University of Wisconsin Column Stores - The solution to TB disk drives? David J. DeWitt Computer Sciences Dept. University of Wisconsin Problem Statement TB disks are coming! Superwide, frequently sparse tables are common DB

More information

A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture

A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture By Gaurav Sheoran 9-Dec-08 Abstract Most of the current enterprise data-warehouses

More information

Bridging the Processor/Memory Performance Gap in Database Applications

Bridging the Processor/Memory Performance Gap in Database Applications Bridging the Processor/Memory Performance Gap in Database Applications Anastassia Ailamaki Carnegie Mellon http://www.cs.cmu.edu/~natassa Memory Hierarchies PROCESSOR EXECUTION PIPELINE L1 I-CACHE L1 D-CACHE

More information

COLUMN-STORES VS. ROW-STORES: HOW DIFFERENT ARE THEY REALLY? DANIEL J. ABADI (YALE) SAMUEL R. MADDEN (MIT) NABIL HACHEM (AVANTGARDE)

COLUMN-STORES VS. ROW-STORES: HOW DIFFERENT ARE THEY REALLY? DANIEL J. ABADI (YALE) SAMUEL R. MADDEN (MIT) NABIL HACHEM (AVANTGARDE) COLUMN-STORES VS. ROW-STORES: HOW DIFFERENT ARE THEY REALLY? DANIEL J. ABADI (YALE) SAMUEL R. MADDEN (MIT) NABIL HACHEM (AVANTGARDE) PRESENTATION BY PRANAV GOEL Introduction On analytical workloads, Column

More information

Sandor Heman, Niels Nes, Peter Boncz. Dynamic Bandwidth Sharing. Cooperative Scans: Marcin Zukowski. CWI, Amsterdam VLDB 2007.

Sandor Heman, Niels Nes, Peter Boncz. Dynamic Bandwidth Sharing. Cooperative Scans: Marcin Zukowski. CWI, Amsterdam VLDB 2007. Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS Marcin Zukowski Sandor Heman, Niels Nes, Peter Boncz CWI, Amsterdam VLDB 2007 Outline Scans in a DBMS Cooperative Scans Benchmarks DSM version VLDB,

More information

Designing Database Operators for Flash-enabled Memory Hierarchies

Designing Database Operators for Flash-enabled Memory Hierarchies Designing Database Operators for Flash-enabled Memory Hierarchies Goetz Graefe Stavros Harizopoulos Harumi Kuno Mehul A. Shah Dimitris Tsirogiannis Janet L. Wiener Hewlett-Packard Laboratories, Palo Alto,

More information

class 5 column stores 2.0 prof. Stratos Idreos

class 5 column stores 2.0 prof. Stratos Idreos class 5 column stores 2.0 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ worth thinking about what just happened? where is my data? email, cloud, social media, can we design systems

More information

Column-Stores vs. Row-Stores. How Different are they Really? Arul Bharathi

Column-Stores vs. Row-Stores. How Different are they Really? Arul Bharathi Column-Stores vs. Row-Stores How Different are they Really? Arul Bharathi Authors Daniel J.Abadi Samuel R. Madden Nabil Hachem 2 Contents Introduction Row Oriented Execution Column Oriented Execution Column-Store

More information

Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation

Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation Harald Lang 1, Tobias Mühlbauer 1, Florian Funke 2,, Peter Boncz 3,, Thomas Neumann 1, Alfons Kemper 1 1

More information

In-Memory Data Management

In-Memory Data Management In-Memory Data Management Martin Faust Research Assistant Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University of Potsdam Agenda 2 1. Changed Hardware 2.

More information

Column-Stores vs. Row-Stores: How Different Are They Really?

Column-Stores vs. Row-Stores: How Different Are They Really? Column-Stores vs. Row-Stores: How Different Are They Really? Daniel Abadi, Samuel Madden, Nabil Hachem Presented by Guozhang Wang November 18 th, 2008 Several slides are from Daniel Abadi and Michael Stonebraker

More information

CSE 544 Principles of Database Management Systems. Fall 2016 Lecture 14 - Data Warehousing and Column Stores

CSE 544 Principles of Database Management Systems. Fall 2016 Lecture 14 - Data Warehousing and Column Stores CSE 544 Principles of Database Management Systems Fall 2016 Lecture 14 - Data Warehousing and Column Stores References Data Cube: A Relational Aggregation Operator Generalizing Group By, Cross-Tab, and

More information

Column Store Internals

Column Store Internals Column Store Internals Sebastian Meine SQL Stylist with sqlity.net sebastian@sqlity.net Outline Outline Column Store Storage Aggregates Batch Processing History 1 History First mention of idea to cluster

More information

Mammals Flourished Long Before Dinosaurs Became Extinct

Mammals Flourished Long Before Dinosaurs Became Extinct Mammals Flourished Long Before Dinosaurs Became Extinct VLDB 2009 Lyon - Ten Year Award Database Architecture Optimized For The New Bottleneck: Memory Access (VLDB 1999) Stefan Manegold (manegold@cwi.nl)

More information

Weaving Relations for Cache Performance

Weaving Relations for Cache Performance VLDB 2001, Rome, Italy Best Paper Award Weaving Relations for Cache Performance Anastassia Ailamaki David J. DeWitt Mark D. Hill Marios Skounakis Presented by: Ippokratis Pandis Bottleneck in DBMSs Processor

More information

Weaving Relations for Cache Performance

Weaving Relations for Cache Performance Weaving Relations for Cache Performance Anastassia Ailamaki Carnegie Mellon Computer Platforms in 198 Execution PROCESSOR 1 cycles/instruction Data and Instructions cycles

More information

complex plans and hybrid layouts

complex plans and hybrid layouts class 7 complex plans and hybrid layouts prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ essential column-stores features virtual ids late tuple reconstruction (if ever) vectorized execution

More information

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 10: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application

More information

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs

More information

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 9: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application

More information

Main-Memory Databases 1 / 25

Main-Memory Databases 1 / 25 1 / 25 Motivation Hardware trends Huge main memory capacity with complex access characteristics (Caches, NUMA) Many-core CPUs SIMD support in CPUs New CPU features (HTM) Also: Graphic cards, FPGAs, low

More information

class 6 more about column-store plans and compression prof. Stratos Idreos

class 6 more about column-store plans and compression prof. Stratos Idreos class 6 more about column-store plans and compression prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ query compilation an ancient yet new topic/research challenge query->sql->interpet

More information

Weaving Relations for Cache Performance

Weaving Relations for Cache Performance Weaving Relations for Cache Performance Anastassia Ailamaki Carnegie Mellon David DeWitt, Mark Hill, and Marios Skounakis University of Wisconsin-Madison Memory Hierarchies PROCESSOR EXECUTION PIPELINE

More information

Column-Stores vs. Row-Stores How Different Are They Really?

Column-Stores vs. Row-Stores How Different Are They Really? Column-Stores vs. Row-Stores How Different Are They Really? Volodymyr Piven Wilhelm-Schickard-Institut für Informatik Eberhard-Karls-Universität Tübingen 2. Januar 2 Volodymyr Piven (Universität Tübingen)

More information

Column-Oriented Database Systems. Liliya Rudko University of Helsinki

Column-Oriented Database Systems. Liliya Rudko University of Helsinki Column-Oriented Database Systems Liliya Rudko University of Helsinki 2 Contents 1. Introduction 2. Storage engines 2.1 Evolutionary Column-Oriented Storage (ECOS) 2.2 HYRISE 3. Database management systems

More information

Join Processing for Flash SSDs: Remembering Past Lessons

Join Processing for Flash SSDs: Remembering Past Lessons Join Processing for Flash SSDs: Remembering Past Lessons Jaeyoung Do, Jignesh M. Patel Department of Computer Sciences University of Wisconsin-Madison $/MB GB Flash Solid State Drives (SSDs) Benefits of

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models RCFile: A Fast and Space-efficient Data

More information

Outline. Parallel Database Systems. Information explosion. Parallelism in DBMSs. Relational DBMS parallelism. Relational DBMSs.

Outline. Parallel Database Systems. Information explosion. Parallelism in DBMSs. Relational DBMS parallelism. Relational DBMSs. Parallel Database Systems STAVROS HARIZOPOULOS stavros@cs.cmu.edu Outline Background Hardware architectures and performance metrics Parallel database techniques Gamma Bonus: NCR / Teradata Conclusions

More information

Jignesh M. Patel. Blog:

Jignesh M. Patel. Blog: Jignesh M. Patel Blog: http://bigfastdata.blogspot.com Go back to the design Query Cache from Processing for Conscious 98s Modern (at Algorithms Hardware least for Hash Joins) 995 24 2 Processor Processor

More information

Fast Retrieval with Column Store using RLE Compression Algorithm

Fast Retrieval with Column Store using RLE Compression Algorithm Fast Retrieval with Column Store using RLE Compression Algorithm Ishtiaq Ahmed Sheesh Ahmad, Ph.D Durga Shankar Shukla ABSTRACT Column oriented database have continued to grow over the past few decades.

More information

Cache-Aware Database Systems Internals Chapter 7

Cache-Aware Database Systems Internals Chapter 7 Cache-Aware Database Systems Internals Chapter 7 1 Data Placement in RDBMSs A careful analysis of query processing operators and data placement schemes in RDBMS reveals a paradox: Workloads perform sequential

More information

DSM vs. NSM: CPU Performance Tradeoffs in Block-Oriented Query Processing

DSM vs. NSM: CPU Performance Tradeoffs in Block-Oriented Query Processing DSM vs. NSM: CPU Performance Tradeoffs in Block-Oriented Query Processing Marcin Zukowski Niels Nes Peter Boncz CWI, Amsterdam, The Netherlands {Firstname.Lastname}@cwi.nl ABSTRACT Comparisons between

More information

Data Modeling and Databases Ch 7: Schemas. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 7: Schemas. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 7: Schemas Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Database schema A Database Schema captures: The concepts represented Their attributes

More information

Data Blocks: Hybrid OLTP and OLAP on compressed storage

Data Blocks: Hybrid OLTP and OLAP on compressed storage Data Blocks: Hybrid OLTP and OLAP on compressed storage Ben Brümmer Technische Universität München Fürstenfeldbruck, 26. November 208 Ben Brümmer 26..8 Lehrstuhl für Datenbanksysteme Problem HDD/Archive/Tape-Storage

More information

A high performance database kernel for query-intensive applications. Peter Boncz

A high performance database kernel for query-intensive applications. Peter Boncz MonetDB: A high performance database kernel for query-intensive applications Peter Boncz CWI Amsterdam The Netherlands boncz@cwi.nl Contents The Architecture of MonetDB The MIL language with examples Where

More information

will incur a disk seek. Updates have similar problems: each attribute value that is modified will require a seek.

will incur a disk seek. Updates have similar problems: each attribute value that is modified will require a seek. Read-Optimized Databases, In Depth Allison L. Holloway and David J. DeWitt University of Wisconsin Madison Computer Sciences Department {ahollowa, dewitt}@cs.wisc.edu ABSTRACT Recently, a number of papers

More information

MonetDB: Open-source Columnar Database Technology Beyond Textbooks

MonetDB: Open-source Columnar Database Technology Beyond Textbooks MonetDB: Open-source Columnar Database Technology Beyond Textbooks http://wwwmonetdborg/ Stefan Manegold StefanManegold@cwinl http://homepagescwinl/~manegold/ >5k downloads per month Why? Why? Motivation

More information

Hardware-Conscious DBMS Architecture for Data-Intensive Applications

Hardware-Conscious DBMS Architecture for Data-Intensive Applications Hardware-Conscious DBMS Architecture for Data-Intensive Applications Marcin Zukowski Centrum voor Wiskunde en Informatica Kruislaan 13 1090 GB Amsterdam The Netherlands M.Zukowski@cwi.nl Abstract Recent

More information

COLUMN DATABASES A NDREW C ROTTY & ALEX G ALAKATOS

COLUMN DATABASES A NDREW C ROTTY & ALEX G ALAKATOS COLUMN DATABASES A NDREW C ROTTY & ALEX G ALAKATOS OUTLINE RDBMS SQL Row Store Column Store C-Store Vertica MonetDB Hardware Optimizations FACULTY MEMBER VERSION EXPERIMENT Question: How does time spent

More information

Cache-Aware Database Systems Internals. Chapter 7

Cache-Aware Database Systems Internals. Chapter 7 Cache-Aware Database Systems Internals Chapter 7 Data Placement in RDBMSs A careful analysis of query processing operators and data placement schemes in RDBMS reveals a paradox: Workloads perform sequential

More information

BDCC: Exploiting Fine-Grained Persistent Memories for OLAP. Peter Boncz

BDCC: Exploiting Fine-Grained Persistent Memories for OLAP. Peter Boncz BDCC: Exploiting Fine-Grained Persistent Memories for OLAP Peter Boncz NVRAM System integration: NVMe: block devices on the PCIe bus NVDIMM: persistent RAM, byte-level access Low latency Lower than Flash,

More information

Performance Issues and Query Optimization in Monet

Performance Issues and Query Optimization in Monet Performance Issues and Query Optimization in Monet Stefan Manegold Stefan.Manegold@cwi.nl 1 Contents Modern Computer Architecture: CPU & Memory system Consequences for DBMS - Data structures: vertical

More information

class 9 fast scans 1.0 prof. Stratos Idreos

class 9 fast scans 1.0 prof. Stratos Idreos class 9 fast scans 1.0 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ 1 pass to merge into 8 sorted pages (2N pages) 1 pass to merge into 4 sorted pages (2N pages) 1 pass to merge into

More information

Materialization Strategies in a Column-Oriented DBMS Daniel J. Abadi, Daniel S. Myers, David J. DeWitt, and Samuel R. Madden

Materialization Strategies in a Column-Oriented DBMS Daniel J. Abadi, Daniel S. Myers, David J. DeWitt, and Samuel R. Madden Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-26-78 November 27, 26 Materialization Strategies in a Column-Oriented DBMS Daniel J. Abadi, Daniel S. Myers, David

More information

Architecture-Conscious Database Systems

Architecture-Conscious Database Systems Architecture-Conscious Database Systems Anastassia Ailamaki Ph.D. Examination November 30, 2000 A DBMS on a 1980 Computer DBMS Execution PROCESSOR 10 cycles/instruction DBMS Data and Instructions 6 cycles

More information

Information Systems (Informationssysteme)

Information Systems (Informationssysteme) Information Systems (Informationssysteme) Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2018 c Jens Teubner Information Systems Summer 2018 1 Part IX B-Trees c Jens Teubner Information

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

Crescando: Predictable Performance for Unpredictable Workloads

Crescando: Predictable Performance for Unpredictable Workloads Crescando: Predictable Performance for Unpredictable Workloads G. Alonso, D. Fauser, G. Giannikis, D. Kossmann, J. Meyer, P. Unterbrunner Amadeus S.A. ETH Zurich, Systems Group (Funded by Enterprise Computing

More information

Performance in the Multicore Era

Performance in the Multicore Era Performance in the Multicore Era Gustavo Alonso Systems Group -- ETH Zurich, Switzerland Systems Group Enterprise Computing Center Performance in the multicore era 2 BACKGROUND - SWISSBOX SwissBox: An

More information

Advanced Database Systems

Advanced Database Systems Lecture II Storage Layer Kyumars Sheykh Esmaili Course s Syllabus Core Topics Storage Layer Query Processing and Optimization Transaction Management and Recovery Advanced Topics Cloud Computing and Web

More information

In-Memory Data Structures and Databases Jens Krueger

In-Memory Data Structures and Databases Jens Krueger In-Memory Data Structures and Databases Jens Krueger Enterprise Platform and Integration Concepts Hasso Plattner Intitute What to take home from this talk? 2 Answer to the following questions: What makes

More information

Storage hierarchy. Textbook: chapters 11, 12, and 13

Storage hierarchy. Textbook: chapters 11, 12, and 13 Storage hierarchy Cache Main memory Disk Tape Very fast Fast Slower Slow Very small Small Bigger Very big (KB) (MB) (GB) (TB) Built-in Expensive Cheap Dirt cheap Disks: data is stored on concentric circular

More information

Overview of Data Exploration Techniques. Stratos Idreos, Olga Papaemmanouil, Surajit Chaudhuri

Overview of Data Exploration Techniques. Stratos Idreos, Olga Papaemmanouil, Surajit Chaudhuri Overview of Data Exploration Techniques Stratos Idreos, Olga Papaemmanouil, Surajit Chaudhuri data exploration not always sure what we are looking for (until we find it) data has always been big volume

More information

HYRISE In-Memory Storage Engine

HYRISE In-Memory Storage Engine HYRISE In-Memory Storage Engine Martin Grund 1, Jens Krueger 1, Philippe Cudre-Mauroux 3, Samuel Madden 2 Alexander Zeier 1, Hasso Plattner 1 1 Hasso-Plattner-Institute, Germany 2 MIT CSAIL, USA 3 University

More information

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11 DATABASE PERFORMANCE AND INDEXES CS121: Relational Databases Fall 2017 Lecture 11 Database Performance 2 Many situations where query performance needs to be improved e.g. as data size grows, query performance

More information

H2O: A Hands-free Adaptive Store

H2O: A Hands-free Adaptive Store H2O: A Hands-free Adaptive Store Ioannis Alagiannis Stratos Idreos Anastasia Ailamaki Ecole Polytechnique Fédérale de Lausanne {ioannis.alagiannis, anastasia.ailamaki}@epfl.ch Harvard University stratos@seas.harvard.edu

More information

Datenbanksysteme II: Caching and File Structures. Ulf Leser

Datenbanksysteme II: Caching and File Structures. Ulf Leser Datenbanksysteme II: Caching and File Structures Ulf Leser Content of this Lecture Caching Overview Accessing data Cache replacement strategies Prefetching File structure Index Files Ulf Leser: Implementation

More information

External Sorting Implementing Relational Operators

External Sorting Implementing Relational Operators External Sorting Implementing Relational Operators 1 Readings [RG] Ch. 13 (sorting) 2 Where we are Working our way up from hardware Disks File abstraction that supports insert/delete/scan Indexing for

More information

Addressing the Memory Wall

Addressing the Memory Wall Lecture 26: Addressing the Memory Wall Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Cage the Elephant Back Against the Wall (Cage the Elephant) This song is for the

More information

Hash Joins for Multi-core CPUs. Benjamin Wagner

Hash Joins for Multi-core CPUs. Benjamin Wagner Hash Joins for Multi-core CPUs Benjamin Wagner Joins fundamental operator in query processing variety of different algorithms many papers publishing different results main question: is tuning to modern

More information

Data Processing on Modern Hardware

Data Processing on Modern Hardware Data Processing on Modern Hardware Jens Teubner, TU Dortmund, DBIS Group jens.teubner@cs.tu-dortmund.de Summer 2014 c Jens Teubner Data Processing on Modern Hardware Summer 2014 1 Part III Instruction

More information

Real-World Performance Training Star Query Edge Conditions and Extreme Performance

Real-World Performance Training Star Query Edge Conditions and Extreme Performance Real-World Performance Training Star Query Edge Conditions and Extreme Performance Real-World Performance Team Dimensional Queries 1 2 3 4 The Dimensional Model and Star Queries Star Query Execution Star

More information

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact: Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base

More information

File Structures and Indexing

File Structures and Indexing File Structures and Indexing CPS352: Database Systems Simon Miner Gordon College Last Revised: 10/11/12 Agenda Check-in Database File Structures Indexing Database Design Tips Check-in Database File Structures

More information

Data Processing on Modern Hardware

Data Processing on Modern Hardware Data Processing on Modern Hardware Jens Teubner, TU Dortmund, DBIS Group jens.teubner@cs.tu-dortmund.de Summer 2016 c Jens Teubner Data Processing on Modern Hardware Summer 2016 1 Part III Instruction

More information

C-Store: A column-oriented DBMS

C-Store: A column-oriented DBMS Presented by: Manoj Karthick Selva Kumar C-Store: A column-oriented DBMS MIT CSAIL, Brandeis University, UMass Boston, Brown University Proceedings of the 31 st VLDB Conference, Trondheim, Norway 2005

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing References Generalized Search Trees for Database Systems. J. M. Hellerstein, J. F. Naughton

More information

HANA Performance. Efficient Speed and Scale-out for Real-time BI

HANA Performance. Efficient Speed and Scale-out for Real-time BI HANA Performance Efficient Speed and Scale-out for Real-time BI 1 HANA Performance: Efficient Speed and Scale-out for Real-time BI Introduction SAP HANA enables organizations to optimize their business

More information

Datenbanksysteme II: Modern Hardware. Stefan Sprenger November 23, 2016

Datenbanksysteme II: Modern Hardware. Stefan Sprenger November 23, 2016 Datenbanksysteme II: Modern Hardware Stefan Sprenger November 23, 2016 Content of this Lecture Introduction to Modern Hardware CPUs, Cache Hierarchy Branch Prediction SIMD NUMA Cache-Sensitive Skip List

More information

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25 Indexing Jan Chomicki University at Buffalo Jan Chomicki () Indexing 1 / 25 Storage hierarchy Cache Main memory Disk Tape Very fast Fast Slower Slow (nanosec) (10 nanosec) (millisec) (sec) Very small Small

More information

STEPS Towards Cache-Resident Transaction Processing

STEPS Towards Cache-Resident Transaction Processing STEPS Towards Cache-Resident Transaction Processing Stavros Harizopoulos joint work with Anastassia Ailamaki VLDB 2004 Carnegie ellon CPI OLTP workloads on modern CPUs 6 4 2 L2-I stalls L2-D stalls L1-I

More information

Citation for published version (APA): Ydraios, E. (2010). Database cracking: towards auto-tunning database kernels

Citation for published version (APA): Ydraios, E. (2010). Database cracking: towards auto-tunning database kernels UvA-DARE (Digital Academic Repository) Database cracking: towards auto-tunning database kernels Ydraios, E. Link to publication Citation for published version (APA): Ydraios, E. (2010). Database cracking:

More information

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu Database Architecture 2 & Storage Instructor: Matei Zaharia cs245.stanford.edu Summary from Last Time System R mostly matched the architecture of a modern RDBMS» SQL» Many storage & access methods» Cost-based

More information

class 17 updates prof. Stratos Idreos

class 17 updates prof. Stratos Idreos class 17 updates prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ early/late tuple reconstruction, tuple-at-a-time, vectorized or bulk processing, intermediates format, pushing selects

More information

CSE 544 Principles of Database Management Systems

CSE 544 Principles of Database Management Systems CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 5 - DBMS Architecture and Indexing 1 Announcements HW1 is due next Thursday How is it going? Projects: Proposals are due

More information

I. Introduction. FlashQueryFile: Flash-Optimized Layout and Algorithms for Interactive Ad Hoc SQL on Big Data Rini T Kaushik 1

I. Introduction. FlashQueryFile: Flash-Optimized Layout and Algorithms for Interactive Ad Hoc SQL on Big Data Rini T Kaushik 1 FlashQueryFile: Flash-Optimized Layout and Algorithms for Interactive Ad Hoc SQL on Big Data Rini T Kaushik 1 1 IBM Research - Almaden Abstract High performance storage layer is vital for allowing interactive

More information

(Storage System) Access Methods Buffer Manager

(Storage System) Access Methods Buffer Manager 6.830 Lecture 5 9/20/2017 Project partners due next Wednesday. Lab 1 due next Monday start now!!! Recap Anatomy of a database system Major Components: Admission Control Connection Management ---------------------------------------(Query

More information

Query Processing on Multi-Core Architectures

Query Processing on Multi-Core Architectures Query Processing on Multi-Core Architectures Frank Huber and Johann-Christoph Freytag Department for Computer Science, Humboldt-Universität zu Berlin Rudower Chaussee 25, 12489 Berlin, Germany {huber,freytag}@dbis.informatik.hu-berlin.de

More information

Chapter 18: Parallel Databases

Chapter 18: Parallel Databases Chapter 18: Parallel Databases Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery

More information

Chapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction

Chapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction Chapter 18: Parallel Databases Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of

More information

CS122 Lecture 15 Winter Term,

CS122 Lecture 15 Winter Term, CS122 Lecture 15 Winter Term, 2014-2015 2 Index Op)miza)ons So far, only discussed implementing relational algebra operations to directly access heap Biles Indexes present an alternate access path for

More information

RAID in Practice, Overview of Indexing

RAID in Practice, Overview of Indexing RAID in Practice, Overview of Indexing CS634 Lecture 4, Feb 04 2014 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke 1 Disks and Files: RAID in practice For a big enterprise

More information

Principles of Data Management. Lecture #9 (Query Processing Overview)

Principles of Data Management. Lecture #9 (Query Processing Overview) Principles of Data Management Lecture #9 (Query Processing Overview) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Notable News v Midterm

More information

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17 Announcement CompSci 516 Database Systems Lecture 10 Query Evaluation and Join Algorithms Project proposal pdf due on sakai by 5 pm, tomorrow, Thursday 09/27 One per group by any member Instructor: Sudeepa

More information

Main-Memory Database Management Systems

Main-Memory Database Management Systems Main-Memory Database Management Systems David Broneske Otto-von-Guericke University Magdeburg Summer Term 2018 Credits Parts of this lecture are based on content by Jens Teubner from TU Dortmund and Sebastian

More information

La Fragmentation Horizontale Revisitée: Prise en Compte de l Interaction de Requêtes

La Fragmentation Horizontale Revisitée: Prise en Compte de l Interaction de Requêtes National Engineering School of Mechanic & Aerotechnics 1, avenue Clément Ader - BP 40109-86961 Futuroscope cedex France La Fragmentation Horizontale Revisitée: Prise en Compte de l Interaction de Requêtes

More information

HyPer-sonic Combined Transaction AND Query Processing

HyPer-sonic Combined Transaction AND Query Processing HyPer-sonic Combined Transaction AND Query Processing Thomas Neumann Technische Universität München December 2, 2011 Motivation There are different scenarios for database usage: OLTP: Online Transaction

More information

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments Administrivia Midterm on Thursday 10/18 CS 133: Databases Fall 2018 Lec 12 10/16 Prof. Beth Trushkowsky Assignments Lab 3 starts after fall break No problem set out this week Goals for Today Cost-based

More information

Column- Stores vs. Row- Stores How Different Are They Really?

Column- Stores vs. Row- Stores How Different Are They Really? Column- Stores vs. Row- Stores How Different Are They Really? Daniel J. Abadi (Yale) Samuel R. Madden (MIT) Nabil Hachem (AvantGarde) Presented By Pragnya Addala Introduc?on Introduc?on Significant amount

More information

Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09. Presented by: Daniel Isaacs

Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09. Presented by: Daniel Isaacs Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09 Presented by: Daniel Isaacs It all starts with cluster computing. MapReduce Why

More information