Oracle Database In-Memory By Example Andy Rivenes Senior Principal Product Manager DOAG 2015 November 18, 2015
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle.
Program Agenda Getting Started Configuring & Populating In-Memory Column Store Querying the In-Memory Column Store Joins & Aggregation with In-Memory Column Store
In-Memory By Example - Schema Overview Star schema based on TPCH benchmark Lineitem& Orders tables combined to create fact table Schema built with a 3GB scale factor
In-Memory By Example Demonstration Details Virtual Box on my laptop Set of SQL scripts to show how Database In-Memory works SGA size = 4G IM column store size = 1504M _small_table_threshold = 1572864000 Tables have "cache" attribute set Each script connects as a new session
Program Agenda Getting Started Configuring & Populating In-Memory Column Store Querying the In-Memory Column Store Joins & Aggregation with In-Memory Column Store
Row Format Databases vs. Column Format Databases Rows Stored Contiguou sly SALES Transactions run faster on row format Example: Query or Insert a sales order Fast processing few rows, many columns Columns Stored Contiguou sly SALES Analytics run faster on column format Example : Report on sales totals by region Fast accessing few columns, many rows Until Now Must Choose One Format and Suffer Tradeoffs
Breakthrough: Dual Format Database Normal Buffer Cache SALES Row Format SALES New In-Memory Format SALES Column Format BOTH row and column formats for same table Simultaneously active and transactionally consistent Analytics & reporting use new in-memory Column format OLTP uses proven row format
Configuring : In-Memory Column Store System Global Area (SGA) Buffer Cache Large Pool Shared Pool Other Log Buffer In-Memory Area New Component of SGA Static Area (Not a cache!) Controlled by INMEMORY_SIZE parameter - Minimum size of 100MB SGA_TARGET must be large enough to accommodate
Populating : In-Memory Column Store ALTER TABLE sales INMEMORY; ALTER TABLE sales NO INMEMORY; CREATE TABLE customers PARTITION BY LIST (PARTITION p1 INMEMORY, (PARTITION p2 NO INMEMORY); ALTER TABLE sales INMEMORY NO INMEMORY (PROD_ID); New INMEMORY ATTRIBUTE Segment types eligible - Tables - Partitions - Subpartitions - Materialized views Possible to populate only certain columns from a table or partition
Populating : In-Memory Column Store Population is completed by a new set of background processes ora_w001_orcl Number of processes controlled by parameter INMEMORY_MAX_POPULATE_SERVERS
Configuring & Populating In-Memory Column Store Examples
Configuring & Populating In-Memory Column Store Summary Size of the column store is controlled by INMEMORY_SIZE What size is the column store on your systems? To add a table to the column store set IN MEMORY attribute alter table lineorder inmemory; Monitor the column store using v$im_segments v$im_segments also shows you the MEMCOMPRESS ratios What was the largest compression ratio you saw?
Program Agenda Getting Started Configuring & Populating In-Memory Column Store Querying the In-Memory Column Store Joins & Aggregation with In-Memory Column Store
Querying: Why is an In-Memory scan faster than the buffer cache? Buffer Cache SELECT COL4 FROM MYTABLE; X X X X X RESULT Row Format
Querying: Why is an In-Memory scan faster than the buffer cache? IM Column Store SELECT COL4 FROM MYTABLE; RESULT Column Format X X X X X 17
Querying : Oracle In-Memory Column Store Storage Index Example: Find all sales from stores with a store_id of 8 Memory Min 1 Max 3 Min 4 Max 7 Each column is the made up of multiple column units Min / max value is recorded for each column unit in a storage index SALES Column Format Min 8 Max 12 Min 7 Max 15? Storage index provides partition pruning like performance for ALL queries
Querying : Orders of Magnitude Faster Analytic Data Scans Memory CPU Load multiple region values STATE Vector Register CA CA CA CA Example: Find all sales in state of CAlifornia Vector Compare all values an 1 cycle > 100x Faster Each CPU core scans local in-memory columns SIMD vector instructions used to process multiple values in each instruction - Originally designed for graphics & science Billions of rows/sec scan rate per CPU core - Row format is millions/sec
Querying : Determining If In-Memory Is Used By A Query Examine the execution plan
Querying : Determining If In-Memory Is Used Session level statistics are the best way to determine if In- Memory was used Also indicate what the benefits were The demo will show the IM session level statistics
Querying: In-Memory Column Store Examples
Querying: In-Memory Column Store Summary Control the use of the column store using INMEMORY_QUERY alter session set INMEMORY_QUERY=disable; INMEMORY & NO_INMEMORY hints Simple query shows column store is 10X faster than buffer cache Use session statistics to check how little of the column store is accessed Column store is 40X faster than the buffer cache for a single row access with no index Index is just as fast as the column store, but you have to maintain it Storage index MIN/MAX pruning reduces data to be scanned significantly Use session statistics to check how much
Program Agenda Getting Started Configuring & Populating In-Memory Column Store Querying the In-Memory Column Store Joins & Aggregation with In-Memory Column Store
Joining : Combining Data Also Dramatically Faster Example: Find total sales in outlet stores Stores Type Store ID Bloom Filter StoreID in 15, 38, 64 Sales Converts joins of data in multiple tables into fast column scans Type= Outlet Store ID Amount Bloom filter pushdown - Filtering pushed down into scan Joins tables 10x faster Sum
Joining : Bloom Filter How to identify a bloom filter in the execution plan Why is this significant? Join converted to additional filter applied to the fact table Columnar format very efficient at processing filters
Joining : Multiple Bloom Filters Select d.d_year, s.s_nation, sum(lo_revenue - lo_supplycost) profit From LINEORDER l, DATE_DIM d, PART p, SUPPLIER s Where l.lo_orderdate = d.d_datekey And l.lo_partkey = p.p_partkey And l.lo_suppkey = s.s_suppkey And (p.p_mfgr ='MFGR#12' or p_mfgr='mfgr#2') And s.s_region = 'AMERICA Group by d.d_year, s.s_nation Order by d.d_year, s.s_nation; Analytics is about finding patterns & trends by aggregating data Possible to apply multiple bloom filters to the fact table due to sophisticated Optimizer Simultaneously joining to all of the dimension tables while scanning the fact table
Joining : Multiple Bloom Filter Execution Plan
Aggregation : In-Memory Aggregation Select p.product_name, st.store_name, sum(amount_sold) profit From SALES S, STORES st, PRODUCTS p, Where s.prod_id = p.prod_id And s.store_id = st.store_id And s.type = OUTLETS And p.type = FOOTWARE Group by p.product_name, st.store_name Order by p.product_name, st.store_name; New Vector Group-By enables extremely efficient in-memory array based aggregation Simultaneously accumulates aggregate values into inmemory arrays during fact table scan
Aggregation : In-Memory Aggregation Example: Report sales of footwear in outlet stores Products Footwear Stores Outlets Outlets In-Memory Report Outline Footwear $ $$ $$$ $ Sales Sales Dynamically creates in-memory report outline Then report outline filled-in during fast fact scan Reports run much faster - Without predefined cubes Also offloads report filtering to Exadata Storage servers
Aggregation : Vector Group by Step 1 --------------------------------------------------------------------------------------- Id Operation Name -------------------------------------------------------------------------------------- 0 SELECT STATEMENT 1 TEMP TABLE TRANSFORMATION 2 LOAD AS SELECT 3 PX COORDINATOR 4 PX SEND QC (RANDOM) :TQ10001 5 BUFFER SORT 6 VECTOR GROUP BY 7 KEY VECTOR CREATE BUFFERED :KV0000 8 PX RECEIVE 9 PX SEND HASH :TQ10000 10 PX BLOCK ITERATOR * 11 TABLE ACCESS INMEMORY FULL PRODUCTS 12 LOAD AS SELECT 13 PX COORDINATOR 14 PX SEND QC (RANDOM) :TQ20001 15 HASH GROUP BY 16 PX RECEIVE 17 PX SEND HASH :TQ20000 18 VECTOR GROUP BY 19 HASH GROUP BY 20 KEY VECTOR CREATE BUFFERED :KV0001 21 PX BLOCK ITERATOR * 22 TABLE ACCESS INMEMORY FULL STORES Scan each of the dimension tables and create a Key vector Key vector is used to complete the join and enable the vector group by
Aggregation : Vector Group by Step 2 -------------------------------------------------------------------------------------- Id Operation Name -------------------------------------------------------------------------------------- 0 SELECT STATEMENT 1 TEMP TABLE TRANSFORMATION 2 LOAD AS SELECT 3 PX COORDINATOR 4 PX SEND QC (RANDOM) :TQ10001 5 BUFFER SORT 6 VECTOR GROUP BY 7 KEY VECTOR CREATE BUFFERED :KV0000 8 PX RECEIVE 9 PX SEND HASH :TQ10000 10 PX BLOCK ITERATOR * 11 TABLE ACCESS INMEMORY FULL PRODUCTS 12 LOAD AS SELECT 13 PX COORDINATOR 14 PX SEND QC (RANDOM) :TQ20001 15 HASH GROUP BY 16 PX RECEIVE 17 PX SEND HASH :TQ20000 18 VECTOR GROUP BY 19 HASH GROUP BY 20 KEY VECTOR CREATE BUFFERED :KV0001 21 PX BLOCK ITERATOR * 22 TABLE ACCESS INMEMORY FULL STORES Create a temporary table for the payload columns from the dimension tables This will allow a more efficient join back later
Aggregation : Vector Group by Step 3 * 23 HASH JOIN * 24 HASH JOIN 25 TABLE ACCESS FULL SYS_TEMP_0FD9D662E_2D4940 26 VIEW VW_VT_80F21617 27 HASH GROUP BY 28 PX RECEIVE 29 PX SEND HASH :TQ50000 30 VECTOR GROUP BY 31 HASH GROUP BY 32 KEY VECTOR USE :KV0000 33 KEY VECTOR USE :KV0001 34 PX BLOCK ITERATOR * 35 TABLE ACCESS INMEMORY FULL SALES 36 TABLE ACCESS FULL SYS_TEMP_0FD9D662F_2D4940 37 TABLE ACCESS FULL SYS_TEMP_0FD9D662D_2D4940 --------------------------------------------------------------------------------------------- Scan the SALES table with the help of the Key Vectors Join the results back to the temporary tables created with the payload columns
Joins & Aggregation with In-Memory Column Store Examples
Joins & Aggregation with In-Memory Column Store Summary Bloom filters allow joins to be converted to filters on fact table Bloom filter on the column store is 10x faster than on buffer cache Vector group by used when number of rows processed is large Vector group by allows aggregation to occur during scan of fact table Vector group by is 3x faster than In-Memory column store alone
In-Memory Recap SUMMARY
Full table Scan Performance 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 In-Memory Buffer Cache 700 X FASTER Elapse Time (sec)
Single Record Lookup 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 In-Memory Buffer Cache Index Access 700 X FASTER Elapse Time (sec)
Multiple Table Joins 10 8 6 12 X FASTER Elapse Time (sec) 4 2 0 In-Memory Buffer Cache
Multiple Column Aggregation 25 20 3 X FASTER 15 Elapse Times (sec) 10 5 0 In-Memory No VG Vector Group
Summary Size of the column store is controlled by INMEMORY_SIZE Compression Advisor can tell you how much space you need To add a table to the column store set INMEMORY attribute alter table lineorder inmemory; Control the population of the column store via the PRIORITY attribute Resource manager can also be used to control the work processes To disable the use of the column store use INMEMORY_QUERY or hint Monitor the column store using v$im_segment & v$mystat All additional Information you need is on the In-Memory blog http://blogs.oracle.com/in-memory
Additional Resources Related White Papers Oracle Database In-Memory White Paper Oracle Database In-Memory Best Practices Oracle Database In-Memory Aggregation Paper When to use Oracle Database In-Memory Oracle Database In-Memory Advisor Join the Conversation https://twitter.com/theinmemoryguy https://blogs.oracle.com/in-memory/ https://www.facebook.com/oracledatabase http://www.oracle.com/goto/dbim.html Related Videos In-Memory YouTube Channel Managing Oracle Database In-Memory Database In-Memory and Oracle Multitenant Industry Experts Discuss Oracle Database In-Memory Software on Silicon Any Additional Questions Oracle Database In-Memory Blog My email: andy.rivenes@oracle.com