Oracle Database In-Memory Mark Weber Principal Sales Consultant November 12, 2014
Row Format Databases vs. Column Format Databases Row SALES Transactions run faster on row format Example: Insert or query a sales order Fast processing few rows, many columns Column SALES Analytics run faster on column format Example : Report on sales totals by region Fast accessing few columns, many rows Until Now Must Choose One Format and Suffer Tradeoffs 3
Breakthrough: Dual Format Database Existing Buffer Cache SALES Row Format New In-Memory Format SALES Column Format BOTH row and column formats for same table Simultaneously active and transactionally consistent OLTP uses proven row format Analytics & reporting use new in-memory Column format Same results regardless of which format is used 4
Oracle Database 12c In-Memory Option Goals Orders of Magnitude Faster Analytics Real Time Queries on OLTP Database or Data Warehouse Faster Mixed Workload OLTP Simple: Transparent to applications and easy to deploy Cost-Effective: Not require entire database to be in-memory
AGENDA 1. In-Memory Column Format 2. In-Memory Performance Optimizations 3. In-Memory Populate 4. In-Memory Compression 5. In-Memory Scale-Out 6
Oracle In-Memory Columnar Format Pure In-Memory Columnar SALES Available on all platforms Pure in-memory column format In-memory maintenance: allows fast OLTP No changes to disk format Enabled at tablespace, table, partition, subpartition level Can even be enabled for whole database 7
In-Memory Area: New Area within SGA System Global Area SGA Buffer Cache Large Pool Shared Pool Other Log Buffer In-Memory Area Contains data in the new In-Memory Column Format Controlled by INMEMORY_SIZE parameter Minimum size of 100MB SGA_TARGET must be large enough to accommodate this area 8
In-Memory Area: New Area within SGA SELECT * FROM V$SGA; NAME VALUE ------------------ --------- Fixed Size 2927176 Variable Size 570426808 Database Buffers 4634022912 Redo Buffers 13848576 In-Memory Area 10244836480 Contains In-Memory Column Format Controlled by INMEMORY_SIZE parameter Minimum size of 100MB SGA_TARGET must be large enough to accommodate this area 9
In-Memory Area: Composition In Memory Area IMCU IMCU IMCU IMCU IMCU IMCU IMCU IMCU Column Format Data SMU SMU SMU SMU SMU SMU SMU SMU Metadata Contains two subpools for: In Memory Compression Units (IMCUs) Snapshot Metadata Units (SMUs) IMCUs contain column formatted data SMUs contain metadata and transactional information Space required for SMU pool is a small percentage (typically <10%) of In-Memory Area Current pool sizes and status visible in V$INMEMORY_AREA view 10
In-Memory Compression Unit (IMCU) ROWID EMPID IMCU header Column CUs NAME DEPT SALARY Unit of column store allocation Columnar representation of a large number of rows (e.g. 0.5 million) Rows in one or more table extents Actual size depends on size of rows, compression factor, etc. Extent #13 Blocks 20-120 Extent #14 Blocks 82-182 Extent #15 Blocks 201-301 Each column stored as a separate contiguous Column Compression Unit (column CU) Rowids also stored as a Column CU 11
Column Compression Unit (CU) Dictionary VALUE ID Audi 0 BMW 1 Cadillac 2 Column value list BMW Audi BMW Cadillac BMW Audi Audi Column CU Min: Audi Max: Cadillac 2 0 2 1 2 0 0 Contiguous storage per column in an IMCU All CUs automatically store Min/Max values Multiple formats: For example, Dictionary Compression: CU stores (smaller) dictionary IDs instead of full values Additional compression also possible 12
In-Memory Area: Under the Hood V$INMEMORY_AREA: Current sizes of pools and pool statuses SQL> SELECT * from V$INMEMORY_AREA; POOL ALLOC_BYTES USED_BYTES POPULATE_STATUS ---------- ----------- ---------- --------------- 1MB POOL 1307574272 135266304 DONE 64KB POOL 318767104 524288 DONE V$IM_HEADER: List of IMCUs currently in the inmemory column store SQL> SELECT OBJD, TSN, ALLOCATED_LEN, NUM_ROWS, NUM_COLS FROM V$IM_HEADER; OBJD TSN ALLOCATED_LEN NUM_ROWS NUM_COLS ---------- ---------- ------------- ---------- ---------- 92918 5 19922944 532246 60 92918 5 19922944 532480 60 92918 5 19922944 522496 60 92918 5 11534336 532480 60 92918 5 19922944 530176 60 92918 5 19922944 532480 60 92918 5 15728640 419562 60
AGENDA 1. In-Memory Column Format 2. In-Memory Performance Optimizations 3. In-Memory Populate 4. In-Memory Compression 5. In-Memory Scale-Out and Scale-Up 14
In-Memory Performance Optimizations Columnar Format Vector Processing Operation pushdown In-Memory Storage Index Predicate Optimization 15
In-Memory Row Format: Slower for Analytics Buffer Cache SELECT COL4 FROM MYTABLE; X X X X X RESULT Row Format Needs to skip over unneeded data 16
In-Memory Columnar Format: Faster for Analytics IM Column Store SELECT COL4 FROM MYTABLE; Column Format X X X X RESULT Scans only the data required by the query 17
Vector Register STATE Vector Processing: Additional Advantage of Column Format Memory Example: Find all sales in state of CA Each CPU core scans only required columns CPU Load multiple region values CA CA CA CA Vector Compare all values in 1 instruction > 100x Faster SIMD vector instructions used to process multiple values in each instruction E.g. Intel AVX instructions with 256 bit vector registers Billions of rows/sec scan rate per CPU core Row format is millions/sec 18
Operation Pushdown: Reduce Rows Processed by Plan When possible, push operations down to In-Memory scan Greatly reduces # rows flowing up through the plan Similar to Exadata smart scan For example: Predicate Evaluation (for qualifying predicates equality, range, etc.): Inline predicate evaluation within CA the scan Each IMCU scan only returns qualifying rows instead of all rows CA CA Sales > 1000 IM scan Products State = CA IM scan Stores CA SALES > 1000 STATE = CA 19
In-Memory Storage Index: Eliminate IMCUs from Scan Min-Max Pruning Min/Max values serve as storage index Check predicate against min/max values Skip entire IMCU if predicate not satisfied Can prune for many predicates including equality, range, inlist, etc. Eliminates processing unnecessary IMCUs Example: Find stores with sales greater than $10,000 Min $4000 Max $7000 Min $8000 Max $12000 Min $13000 Max $15000 20
Predicate Optimization: Reduce Predicate Evaluations Avoid evaluating predicates against every column value Check range predicate against min/max values As before, skip IMCUs where min/max disqualifies predicate If min/max indicates all rows will qualify, no need to evaluate predicates on column values Example: Find stores with sales between $8000 and $14000 Min $4000 Max $7000 Min $8000 Max $13000 NO ROWS Skip IMCU ALL ROWS Skip Evaluation? Min $13000 Max $15000 SOME ROWS Needs evaluation 21
In-Memory Scans: Under the Hood Wealth of information in AWR reports V$SYSSTAT monitors all aspects of inmemory scans How many rows were scanned How many rows were avoided by dictionary pruning How many predicates were optimized How many IMCUs were min-max pruned Etc.. IM scan bytes in-memory IM scan bytes uncompressed IM scan CUs columns accessed IM scan CUs columns decompressed IM scan CUs columns theoretical max IM scan rows IM scan rows range excluded IM scan rows excluded IM scan rows optimized IM scan rows projected IM scan CUs predicates received IM scan CUs predicates applied IM scan CUs predicates optimized IM scan CUs pruned IM scan segments minmax eligible. 22
AGENDA 1. In-Memory Column Format 2. In-Memory Performance Optimizations 3. In-Memory Populate 4. In-Memory Compression 5. In-Memory Scale-Out
In-Memory Column Store Populate Populate: Generates IMCUs from Row Format Performed by populate servers (new background processes) Initialization parameter: INMEMORY_POPULATE_SERVERS Two basic kinds of populate 1. Initial Populate: Initial creation of IMCU 2. Repopulate: Recreate IMCU after it has been modified Threshold-based: After exceeding threshold of changes to IMCU Trickle-driven: Constant background activity An IMCU with any modifications is a potential candidate Initialization parameter to limit trickle activity: INMEMORY_TRICKLE_REPOPULATE_SERVERS_PERCENT Change threshold or trickle action Repopulate IMCU IMCU Row Format SMU SMU (Initial) Populate DML operations Workload transactions 24
In-Memory Column Store Populate: Enabling ALTER TABLE sales INMEMORY; ALTER TABLE revenue NO INMEMORY; CREATE TABLE customers PARTITION BY LIST (PARTITION p1 INMEMORY, (PARTITION p2 NO INMEMORY); ALTER TABLE accounts INMEMORY NO INMEMORY (photo); Selectively enable in-memory storage New INMEMORY clause Eligible: Tables, Partitions, Sub-Partitions, Materialized Views Ineligible: IOTs, Hash clusters (pure OLTP features) Can also exclude unneeded columns 25
In-Memory Column Store Populate: Prioritizing CREATE TABLE orders (c1 number, c2 varchar(20), c3 number) INMEMORY PRIORITY CRITICAL; ALTER TABLE sales INMEMORY PRIORITY MEDIUM; ALTER TABLE accounts INMEMORY PRIORITY NONE; PRIORITY sub-clause enables pre-populate On startup, create, alter Levels: CRITICAL > HIGH > MEDIUM > LOW Controls order (not speed) of populate Default PRIORITY is NONE Populate only on first access 26
In-Memory Column Store Populate: Under the Hood V$IM_SEGMENTS Shows current size of each segment in memory Shows how much remains to be populated SQL> select segment_name, populate_status, inmemory_priority, inmemory_size, bytes_not_populated from v$im_segments; SEGMENT_NAME POPULATE_STATUS INMEMORY_PRIORITY INMEMORY_SIZE BYTES_NOT_POPULATED ------------ --------------- ----------------- ------------- ------------------- ACCOUNTS STARTED HIGH 196606 2434886912 SALES COMPLETED CRITICAL 135790592 0
AGENDA 1. In-Memory Column Format 2. In-Memory Performance Optimizations 3. In-Memory Populate 4. In-Memory Compression 5. In-Memory Scale-Out 28
In-Memory Column Store Compression IMCUs compressed during population ALTER MATERIALIZED VIEW mv1 INMEMORY MEMCOMPRESS FOR QUERY; CREATE TABLE history (Name varchar(20), Desc varchar(200)) INMEMORY MEMCOMPRESS FOR CAPACITY LOW; Controlled by MEMCOMPRESS sub-clause Ascending order of compression levels: FOR DML (for heavily updated tables, slower for queries) FOR QUERY LOW (default: light compression and fastest) FOR QUERY HIGH (slightly more compression) FOR CAPACITY LOW (balances capacity and performance) FOR CAPACITY HIGH (maximizes capacity) Allows in-memory storage tiering 29
In-Memory Column Store Compression Mode Compression Factor Query Typical Range Observed Max Speed QUERY 2x-7x Up to 10x Fastest CAPACITY LOW 4x-9x Up to 20x Very Fast CAPACITY HIGH 7x-12x Up to 30x Fast Compression ratios can be highly inflated by choosing a bad uncompressed format, or reporting most compressible table. Our results are measured relative to Oracle s efficient row format for customer data.. MEMCOMPRESS FOR QUERY LOW or HIGH LOW: Fastest performance HIGH: Slightly greater compression Lightweight compression schemes Queries run on compressed data Faster than on uncompressed data 30
In-Memory Column Store Compression Mode Compression Factor Query Typical Range Observed Max Speed QUERY 2x-6x Up to 10x Fastest CAPACITY LOW 4x-9x Up to 20x Very Fast CAPACITY HIGH 7x-12x Up to 30x Fast Compression ratios can be highly inflated by choosing a bad uncompressed format, or reporting most compressible table. Our results are measured relative to Oracle s efficient row format for customer data.. MEMCOMPRESS FOR CAPACITY LOW Balances throughput and capacity Adds Oracle custom ZIP (OZIP) on top of COMPRESS FOR QUERY World s fastest decompressor, tuned for DB query performance 2x - 3x faster than LZO (standard for fast zip) Further optimized on SPARC M7 via Software on Silicon 31
In-Memory Column Store Compression Mode Compression Factor Query Typical Range Observed Max Speed QUERY 2x-6x Up to 10x Fastest CAPACITY LOW 4x-9x Up to 20x Very Fast CAPACITY HIGH 7x-12x Up to 30x Fast Compression ratios can be highly inflated by choosing a bad uncompressed format, or reporting most compressible table. Our results are measured relative to Oracle s efficient row format for customer data.. MEMCOMPRESS FOR CAPACITY HIGH Compress with emphasis on space-savings Heavier weight decompression required before query processing Trades off some performance for capacity Extra 1.5-2x compression 32
In-Memory Column Store Compression: Under the Hood
AGENDA 1. In-Memory Column Format 2. In-Memory Performance Optimizations 3. In-Memory Populate 4. In-Memory Compression 5. In-Memory Scale-Out
Scale-Out In-Memory Database to Any Size Scale-Out across servers to grow memory and CPUs In-Memory queries parallelized across servers to access local column data Direct-to-Wire InfiniBand protocol speeds messaging 35
Scale-Out: Unique Fault Tolerance Duplicate IMCUs on another node Enabled via DUPLICATE subclause Enabled per table/partition Application transparent Similar to storage mirroring Only Available on Engineered Systems Downtime eliminated by using duplicate after failure 36
Summary Dual Format Architecture Fully consistent row and column format Best of both worlds OLTP and Analytics performance. Typically, row format (Buffer cache) memory < 10% of column format memory New In-Memory Column Format In-memory only representation Seamlessly built into Oracle Database Engine Compatible with all Oracle Database features Cost Effective Use in-memory for hot data, flash for intermediate data, disk for cold data 37