<Insert Picture Here> Controlling resources in an Exadata environment
Agenda Smart IO IO Resource Manager Compression Hands-on time Exadata Security Flash Cache Storage Indexes Parallel Execution
Agenda Smart IO IO Resource Manager Compression
SMART IO
How we (used to) read and write data Data read and write has many forms SQL statements Full and Incremental Backups Restore of backups Loading of data Creation of tablespaces/datafiles Exadata is designed to do this fast Latest and greatest in hardware Optimized software to work together with the hardware
Are fast nodes and storage enough? NO! The speed is determined by the weakest link Processing speed of the database nodes Processing speed of the storage environment The storage network and components that tie them together 4Gb fiber is not fast enough 4 single thread sessions can easily use throughput of a 4Gb card
Can we go any faster besides hardware Yes! Limit the amount of processing done on the database nodes Scanning full tables on the storage, not on the DB nodes Only retrieve columns and rows that you actually need Encrypt and Decrypt on the storage side, not on the DB Transfer tasks (writing zero s to datafiles) to the storage Free CPU power from the DB nodes Pay less license fees because you need less DB nodes Waste less power and heat because you have less systems Pay less because you need less iron in your environment Fully use the CPU s on the storage side
Smart IO applications Smart scan Query 1TB and only receive process the actual results, not 1TB Retrieve parts from flash and parts from disk Smart file creation and block formatting Let the storage write the 5TB of zero s for the new datafiles instead of the database nodes. Parallel and faster! Both new tablespaces but also RMAN restore benefit Smart incremental backup Let the storage decide which blocks to back-up. Parallel, fast so no more full database scans from the RMAN process Smart Scans on Encrypted Columns and Tablespaces Smart Scans for Data Mining Scoring
Smart Scan, first get the basics This is generic Oracle business Database sessions read one or more blocks in the SGA After reading/processing, the block stays in the SGA Sessions can re-use the block in the SGA Storage allocated based on Most Recent Used Algorithm Database sessions read blocks in the PGA Amount of data too large to fit in the SGA Called Direct reads, stored in the sessions PGA Blocks evaluated as they come in Blocks deleted after usage (new query means reading again) On Exadata, Smart Scan kicks in for the Direct Reads which are the most resource intensive queries
Datastream in an Oracle Database
Smart Scan functional summary Smart scan is implemented by function shipping Statement is processed by the instance Block ID s are determined per Exadata cell Predicates (where clause) and Block ID s are shipped to the cell Cell processes the blocks and returns the filtered rows Cell has libraries to understand Oracle block format Predicate evaluation (functions in the where clause) Column selection Join filtering through bloom filters Works on compressed and uncompressed blocks Tablespace and column encryption is supported
Smart Scan What happens? Database decides Direct Read is needed Too much data to fit in the SGA Determines list of blocks that need to be accessed Either indexes or tables Database detects all required data is on Exadata Creates list of blocks per Exadata storage cell Ships list of blocks, the required columns and applicable where predicates to the Exadata storage cell
Smart Scan What happens (cont..) Exadata storage cells retrieve requested blocks Parallel on cells and parallel in multiple threads per cell Based on column requirements and where predicates, retrieve data from the blocks Gather retrieved data and create Oracle-like blocks Ship blocks with data to the database node(s) Database receives virtual blocks from all cells Gathers in PGA and determine result for session Send result to session and delete virtual blocks Query 1TB and only receive 10GB in the DB nodes!
Smart scan when is it used? Optimizer does not decide to use smart scan It is a run-time decision First, a scan decides if direct reads can be used Decision based on Table size Number of dirty buffers Amount of data already cached Other heuristics (see manual) The behavior is the same as non-exadata behavior
Smart scan when is it used? Cont. Setting of CELL_OFFLOAD_PROCESSING parameter TRUE / FALSE All the files of a tablespace need to reside on Exadata storage Smart scans are used for scans In sub-queries and in-lines as well Used for the following row sources: Table scan Index (fast full) scan Bitmap index scan
Smart Scan Predicting offload A stable plan is important Explain plan should not change in a running environment No additional parsing for the same statement Explain plans helps you see Exadata offload Operations that could be offloaded Predicates that could be offloaded Joins that could be offloaded though Bloom filtering A certain Explain Plan does not guarantee offloading! For more information on Oracle s Bloom filtering, see http://antognini.ch/papers/bloomfilters20080620.pdf
Explain plan example ------------------------------------------- Id Operation Name ------------------------------------------- 0 SELECT STATEMENT *1 HASH JOIN *2 HASH JOIN *3 TABLE ACCESS STORAGE FULL SALES *4 TABLE ACCESS STORAGE FULL SALES *5 TABLE ACCESS STORAGE FULL SALES ------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------------------------------------- 1 - access("t"."cust_id"="t2"."cust_id" AND "T1"."PROD_ID"="T2"."PROD_ID" AND "T1"."CUST_ID"="T2"."CUST_ID") 2 - access("t"."prod_id"="t1"."prod_id") 3 - storage("t1"."prod_id"<200 AND "T1"."AMOUNT_SOLD"*"T1"."QUANTITY_SOLD">10000 AND "T1"."PROD_ID"<>45) filter("t1"."prod_id"<200 AND "T1"."AMOUNT_SOLD"*"T1"."QUANTITY_SOLD">10000 AND "T1"."PROD_ID"<>45) 4 - storage("t"."prod_id"<200 AND "T"."PROD_ID"<>45) filter("t"."prod_id"<200 AND "T"."PROD_ID"<>45) 5 - storage("t2"."prod_id"<200 AND "T2"."PROD_ID"<>45) filter("t2"."prod_id"<200 AND "T2"."PROD_ID"<>45)
Manipulating Explain Plan output CELL_OFFLOAD_PLAN_DISPLAY parameter AUTO (default) Explain plan will show predicate offload only if tablespace resides on Exadata storage ALWAYS Explain plan will show predicate offload whether the tablespace resides on Exadata storage or not NEVER Explain Plan will never indicate predicate offload even if tablespaces resides on Exadata storage
Detecting Scan offloads Trace of the session executing the statement Querying (G)V$ views (G)V$SYSSTAT (G)V$SQL (G)V$SESSTAT etc
Example: V$SYSSTAT cell physical IO interconnect bytes Bytes transferred between the storage nodes and the database nodes physical IO disk bytes Bytes physically read on the Exadata storage nodes. This includes both IO performed for both block IO and for smart scans. cell physical IO bytes eligible for predicate offload Blocks that were processed by the smart scan process using the column list and the where predicates
V$SYSTAT values and efficiency select name from table where col >= 100 Phys. IO Phys. IO eligible for offload Phys. IO interconnect 10Gb 10Gb 2Gb Efficiency: = 2Gb 10Gb = 20%
V$SYSTAT values and efficiency select a.name, b.* from table a, table b where a.id = b.id and a.col >= 100 Phys. IO Phys. IO eligible for offload Phys. IO interconnect 10Gb 5Gb 5Gb Efficiency: = Efficiency: = 1Gb 5Gb 5Gb 10Gb = 20% = 50%
V$SQL makes it easier We can use the following columns physical_read_bytes How much data was read by the cell io_interconnect_bytes How much data was transported through the interconnect io_cell_offload_eligible_bytes How much of the physical read data was processed in the cell io_cell_offload_returned_bytes How much of the processed data was actually returned to the DB This is per statement
Smart scan inside the cell Smart scan is handled by the Cellsrv process on the cell Cellsrv is Multi-threaded Serves block IO and smart IO Runs a piece of RDBMS code to support smart IO Can provide storage to one or more databases Does not communicate with other cells
Predicate disk data flow PredicateCachePut Queues new IO requests IO jobs issues IOs PredicateDiskRead PredicateFilter Filter raw data PredicateCacheGet Send result back Jobs can execute concurrently Concurrent IOs can be issued for a single RDBMS client Concurrent filter jobs can be applying predicates Exadata adds another level of parallelism in query processing
Other Smart improvements Smart file creation Offloads process of formatting new blocks to the cell storage. Block ids (instead of formatted blocks) are shipped to the cells Smart file creation is used whenever a file is created Tablespace creation File resize (increase in size) RMAN restore Statistics involved (V$SYSSTAT) cell physical IO bytes saved during optimized file creation cell physical IO bytes saved during optimized RMAN file restore
Other Smart improvements Smart incremental backup Offloads identifying blocks to backup to Exadata cell (SCN) Used automatically Unless Fast Incremental Backup feature is used V$BACKUP_DATAFILE for smart incremental backup BLOCKS_SKIPPED_IN_CELL number of blocks that were read and filtered by the cells to optimize the RMAN incremental backup. BLOCKS Size of the backup data file in blocks.
New wait events Smart Scan cell smart table scan - Database is waiting for table scans to complete on a cell. cell smart index scan - Database is waiting for index or indexorganized table (IOT) fast full scans. Smart file creation cell smart file creation - Event appears when the database is waiting for the completion of a file creation on a cell. cell smart restore from backup - Event appears when the database is waiting for the completion of a file initialization for restore from backup on a cell. Smart incremental Backup cell smart incremental backup - Event appears when the database is waiting for the completion of an incremental backup on a cell.
Exadata Smart Features - Summary Long running actions benefit the most Smart Scans for optimizing full table/index/bitmap scans Smart file creation for datafile creation and restoring backups Smart Incremental Backups for incremental backup creations Explain plan displays possible offloads Does not guarantee offload Indicator STORAGE shows offload options Various new wait events Do not get scared if you see them
Q U E S T I O N S A N S W E R S On Exadata Smart Features
Agenda Smart IO IO Resource Manager Compression
IO RESOURCE MANAGEMENT
Why Would Customers Be Interested in I/O Resource Manager? Exadata Storage can be shared by multiple types of workloads and multiple databases Sharing lowers administration costs Sharing leads to more efficient usage of storage But, workloads may not happily coexist ETL jobs interfere with DSS query performance One production data warehouse can interfere with another Extraordinary query performance also means that one query can utilize all of Exadata s I/O bandwidth! Non-priority queries can substantially impact the performance of critical queries Customers will need a way to control these workloads
Consequence of I/O Bandwidth Limits Production Database 200 MB/s 15 GB/s desired bandwidth: 0.2 + 15 GB/s desired bandwidth: 0.2 + 15 + 15 = 30.2 GB/s available I/O Bandwidth: 21 GB/s Development Database desired bandwidth: 15 GB/s Storage Network Storage
IO Manager solves the problem Production Database 200 MB/s actual bandwidth: 0.2 + 12.8 GB/s actual bandwidth: 0.2+12.8 + 8 = 30.2 21 GB/s available I/O Bandwidth: 21 GB/s 15 12.8 GB/s Development Database actual bandwidth: 15 8 GB/s Storage Network Storage
When Does I/O Resource Manager Help the Most? Conflicting Workloads Multiple consumer groups Multiple databases Concurrent database administration Backup, ETL, File creation etc Of course only if I/O is a bottleneck Significant proportion of the wait events are for I/O Including the CELL WAIT events
I/O Scheduling, the Traditional Way With traditional storage, I/O schedulers are black boxes You cannot influence their behavior! I/O requests are processed in FIFO order Some reordering may be done to improve disk efficiency RDBMS I/O Requests Traditional Storage Server Disk Queue H L H L L L RDBMS High-Priority Workload Low-Priority Workload
I/O Scheduling, the Exadata Way Exadata limits the number of outstanding I/O requests Issues enough I/Os to keep disk performing efficient Limit prevents low-priority intensive workload from flooding the disk Subsequent I/O requests are internally queued Exadata dequeues I/O requests, based on database and the user s resource plans Inter-database plans specify bandwidth between multiple databases Intra-database plans specify bandwidth between workloads within a database
I/O Scheduling, the Exadata Way Sales Data Warehouse High Priority Consumer Group Queue Exadata Cell Sales Database Finance Database H H L L L Low Priority Resource Group Queue Finance Data Warehouse High Priority Consumer Group Queue H L L L L I/O Resource Manager Sales-Priority L H H H Finance-Priority Low Priority Consumer Group Queue
Plans are known by the CellSRV Inter-database (or between databases) Specified through CellCLI and pushed to CELLSRV Intra-database (or inside a database) Pushed by RDBMS to CELLSRV Server. Intra-database plans are regular Resource Manager Plans I/O s are tagged Every ASM/RDBMS I/O is tagged by the sender identity (Database ID/Consumer Group ID) CELLSRV uses Resource Manager component (similar to RDBMS) to schedule I/O s
Setting up IO Resource Management Inter-Database resource plans Setup on the Exadata cell Enabled / disabled on Exadata cell level Intra-Database resource plans IORM on Exadata cell must be enabled Inter-database plans not required Setup Resource Manager on the database Map sessions to consumer groups Create database resource plan Add permissions for users Enable Database Resource plan
How can we limit what? Per database or resource group On disk Level Minimum % of IO Only kicks in when fighting over disk IO s Speed may vary but has a guaranteed minimum Limited % of IO Always limiting IO to certain % Even if system is not being used On Flash Level No need to limit flash I/O fast enough Limit access to Flash storage space
How to Configure Inter-Database IORM Specify DB name, level and allocation % Per level, 100% is available Various options are available OTHER is wildcard for non-specified databases CellCLI> alter iormplan dbplan = ((name = production, level=1, allocation=100), (name = test, level=2, allocation=80), (name = other, level=2, allocation=20)) IORMPLAN successfully altered CellCLI> alter iormplan active IORMPLAN successfully altered
Limiting resources on Disk Starting patchset 11.2.0.2 Between users inside a database User default Resource Management Directive max_utilization_limit" applies for both CPU and IO Between databases User the limit parameter in the IORMPlan Example: ALTER IORMPLAN dbplan=((name=prod, limit=75), (name=test, limit=20), (name=other, limit=5))
Control access to flash cache Starting patchset 11.2.0.2 Prevent databases to access flash cache Low priority databases, Test databases etc New attribute introduced for IORM: FlashCache Cellcli> ALTER IORMPLAN dbplan=((name=prod, flashcache=on), (name=dev, flashcache=on), (name=test, flashcache=off), (name=other, flashcache=off))
Measuring the Benefit from IORM Method 1: Monitor performance of target workload Demonstrates the effectiveness of IORM to the workload owner Measure target workload s query times or transaction rates Method 2: Monitor I/O statistics of target workload Characterizes I/O traffic per workload for each cell Demonstrates the effect of IORM on each workload Measure using Exadata IORM metrics
Example Production database Users doing normal production Development / Test database Schema s used for Testing Schema s used for Development Both databases run on same cluster
Before IO resource Management 100 90 80 70 60 50 40 30 PROD TEST DEV 20 10 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 Without IORM
Creating an IORM Schema DB name Resource group Inter % Exadata cell Intra % Resource group PROD - Level=1, alloc=60 - DEVTST DEV Level=2, alloc=100 75 TEST 25 Setup on Exadata cell PROD=60, DEVTST=40, OTHER=0 Setup inside Database Resource Mgr Resource group DEV = 75% Resource group TEST = 25%
With and without IORM 100 90 80 70 60 50 40 30 PROD TEST DEV 20 10 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 Without IORM With IORM
IO Resource Manager - Summary IORM manages load effectively Within a database Between databases Only if IO is the bottleneck, check AWR for this To verify IORM, monitor your critical workload With and without other workloads With and without IORM Use IORM metrics to monitor I/O rates and wait times Or use your own Metrics / end user
Q U E S T I O N S A N S W E R S On IO Resource Manager
Agenda Smart IO IO Resource Manager Compression
COMPRESSION
Data Growth Challenges IT must support exponentially growing amounts of data Explosion in online access and content Government data retention regulations Performance often declines as data grows IT budgets are flat or decreasing Need to grow data Without hurting performance Without growing cost Powerful and efficient compression is key
Oracle Database Compression Overview Compress All Your Data Compression Feature Application Fit Availability Table compression Compress for Query Compress for Archive OLTP Table Compression SecureFiles Compression SecureFiles Deduplication NEW Data Warehouses All Applications Actively Updated Data All Applications Unstructured (File) Data All Applications Unstructured (File) Data Oracle 8 Built on Hybrid Columnar Compression Technology Database 11g Advanced Compression Database 11g Advanced Compression Database 11g Advanced Compression Backup Compression Network Compression RMAN Compression Data Pump Compression Data Guard Redo Transport Compression Database 11g Advanced Compression Database 11g Advanced Compression
Hybrid Columnar Compression Compression Unit Hybrid Columnar Compressed Tables New approach to compressed table storage Useful for data that is bulk loaded and queried Update activity is light 10x to 15x Reduction How it Works Tables are organized into Compression Units CUs are larger than database blocks Usually around 32K Within Compression Unit, data is organized by column instead of by row Column organization brings similar values close together, enhancing compression
Hybrid Columnar Compression Technology Overview Compression Unit Logical structure spanning multiple database blocks Data organized by column during data load Each column compressed separately All column data for a set of rows stored in compression unit Typically 32k (4 blocks x 8k block size) Logical Compression Unit BLOCK HEADER BLOCK HEADER BLOCK HEADER BLOCK HEADER CU HEADER C1 C2 C3 C4 C7 C6 C5 C8 C8
Compress for Query Built on Hybrid Columnar Compression 10x average storage savings 100 TB Database compresses to 10 TB Reclaim 90 TB of disk space Space for 9 more 100 TB databases 10x average scan improvement 1,000 IOPS reduced to 100 IOPS 100 TB 10 TB
Compress for Archive Built on Hybrid Columnar Compression Compression algorithm optimized for max storage savings Benefits any application with data retention requirements Best approach for ILM and data archival Minimum storage footprint No need to move data to tape or less expensive disks Data is always online and always accessible Run queries against historical data (without recovering from tape) Update historical data Supports schema evolution (add/drop columns)
Compress for Archive Optimal workload characteristics for Archive Compression Any application (OLTP, Data Warehouse) Cold or Historical Data Data loaded with bulk load operations Minimal access and update requirements Instead of record lock a Compression Unit lock 15x average storage savings 100 TB Database compresses to 6.6 TB Keep historical data online forever Up to 70x savings seen on production customer data
Compression in Exadata ILM and Data Archiving Strategies OLTP Applications Table Partitioning Heavily accessed data (read and write) Partitions using OLTP Table Compression Cold or historical data Partitions using Compress for Archive Data Warehouses Table Partitioning Heavily accessed data (read) Partitions using Compress for Query Cold or historical data Partitions using Compress for Archive
Hybrid Columnar Compression Outside of Exadata Only supported for data stored on Exadata Storage Cells Exadata Database Machines Exadata Sparc Super Cluster Oracle ZFS Appliance systems Oracle Pillar Data Systems Storing and accessing HCC data on other systems Storage is possible including DataGuard, Recovery etc Access only possible after de-compression of data Can be done on any 11gR2+ system
Hybrid Columnar Compression Business as Usual Fully supported with B-Tree, Bitmap Indexes, Text indexes Materialized Views Exadata Server and Cells including offload Partitioning, Parallel Query, PDML, PDDL Schema Evolution support, online, add/drop columns Data Guard Physical and Logical Standby (>11.2) Support Data only accessible if standby supports HCC too! Streams is not supported GoldenGate will be supported soon!
Hybrid Columnar Compressed Tables Details Data loaded using Direct Load uses EHCC Parallel DML INSERT /*+ APPEND */ Direct Path SQL*Loader Optimized algorithms avoid or greatly reduce overhead of decompression during query Individual row lookups consume more CPU than row format Need to reconstitute row from columnar format
Hybrid Columnar Compressed Tables Details continued.. Updated rows automatically migrate to lower compression level to support frequent transactions Table size will increase moderately All un-migrated rows in Compression Unit are locked during migration Row will get a new ROWID after update Data loaded using Conventional Insert (non-bulk) uses the lower compression level
Hybrid Columnar Compressed Tables Details continued.. Specialized columnar query processing engine Runs in Exadata Storage Server to run directly against compressed data Column optimized processing of query projection and filtering Result is returned uncompressed
Compression Advisor New Advisor in Oracle Database 11g Release 2 DBMS_COMPRESSION PL/SQL Package Estimates Hybrid Columnar Compress storage savings on non- Exadata hardware Requires Patch # 8896202 SQL> @advisor Table: GL_BALANCES Compression Type: Compress for Query HIGH Estimated Compression Ratio: 10.17 PL/SQL procedure succesfully completed. SQL>
Hybrid Columnar Compression Customer Success Stories Data Warehouse Customers (Warehouse Compression) Top Financial Services 1: 11x Top Financial Services 2: 24x Top Financial Services 3: 18x Top Telco 1: 8x Top Telco 2: 14x Top Telco 3: 6x Scientific Data Customer (Archive Compression) Top R&D customer (with PBs of data): 28x OLTP Archive Customer (Archive Compression) Oracle E-Business Suite, Oracle Corp.: 23x Custom Call Center Application, Top Telco: 15x
Summary IT must support exponentially growing amounts of data Without growing cost Without hurting performance Exadata and Hybrid Columnar Compression Extreme Storage Savings Compress for Query Compress for Archive Improve I/O Scan Rates
Q U E S T I O N S A N S W E R S
Now you do it!