Managed Services Cloud Services Consulting Services Licensing Maximize Database Join and Sort Performance Utilizing Exadata Flash Kasey Parker Sr. Enterprise Architect Kasey.Parker@centroid.com
Centroid Overview Leader in Technology, Consulting and Managed Services since 1997 Part of Oracle s Top 25 Strategic Partner Program Focused on Consulting, Managed Services, Cloud Services and Resell Capabilities From Oracle Applications to Technology to Infrastructure Oracle s Tech and Engineered Systems Partner of the Year: 2014 & 2015 Specializations Oracle Database & Core Technologies Oracle Engineered Systems Oracle Server & Storage Solutions Oracle Performance Tuning Oracle Data Warehousing Oracle Business Intelligence Oracle Applications Top 25 Strategic Partner
About Kasey Parker QUICK FACTS Reside in Frisco, TX with wife and 5 children Oracle Architect / DBA Over 15 Years of Oracle Experience Oracle Certified Exadata Specialist Oracle Certified Professional - DBA Performance Tuning specialist Data Warehousing specialist Last 5 years focused on Oracle Engineered Systems Many industries Financial Services, Manufacturing, Health/Nutrition, Government, Retail Utah Oracle Users Group (UTOUG) Board Member Academic Brigham Young University Alumnus B.S. in Management Information Systems
Agenda Exadata Overview Why Exadata? Exadata Flash Smart Flash Cache Flash-based Grid disks Using Flash for Temp to speed up joins & sorts Risk/Reward How to configure Q&A
EXADATA OVERVIEW
Exadata Architecture X6 Complete Optimized Fully Redundant Scale-Out Scale-Out Database Servers 2-socket server è 44 cores, 256GB to 768GB DDR4 DRAM or 8-socket server è 144 cores, up to 6TB DRAM Oracle Database, ASM, RAC, Oracle Linux Scale-Out, 2-socket Intelligent Storage Servers 20 Xeon cores/server enables DB offload to storage Extreme Flash Storage è 8x 3.2TB PCI Flash Drives or High Capacity Storage è 4x 3.2TB PCI Flash Cards + 12 x 8 TB SAS drives High-Speed InfiniBand Network Unified internal connectivity (40 Gb/sec) 10 Gb or 1 Gb Ethernet data center connectivity Slide Material courtesy of Oracle
Elastic Config: Incremental Scale Out Achieve any Level of Performance with Minimum Hardware v Database Server Incrementally add DB or Storage Servers w 44 CPU Cores Extreme Flash Storage Add Racks to Continue Scaling 25.6 TB PCI Flash 20 CPU Cores High-Capacity Storage u Start Small 12.8 TB PCI Flash 96 TB Disk 20 CPU Cores 2 DB Servers 3 Storage Servers Full Rack Multi-Rack Enable DB CPU cores as needed with Capacity on Demand Expand older Exadata machines with new servers
Workload Optimized Configurations DB In-Memory Machine Wants many DB Servers few Storage Servers Extreme Flash OLTP Machine All-flash IOPs enables capacity based OLTP sizing Data Warehouse Machine More High Capacity Storage for longer data retention 16 Database Servers + 5 High Capacity Storage Servers 8 Database Servers + 8 Extreme Flash Storage Servers 8 Database Servers + 14 High Capacity Storage Servers Slide Material courtesy of Oracle
Exadata in the Cloud All Oracle Database EE features and options Extreme performance with In-memory for OLTP, analytics Proven Exadata performance and technologies Business critical availability and security Easy no-risk migration for public or hybrid cloud Zero infrastructure management No CapEx, low OpEx with monthly subscription FINANCE SALES CALL CENTER HR DW SUPPLY CHAIN
Exadata X6-2 Standard Configs Use (OECA) for other configurations X6-2 Full X6-2 Half X6-2 Quarter X6-2 Eighth Database Servers 8 4 2 2 Database Grid Cores 352 (min 112) 176 (min 56) 88 (min 28) 44 (min 16) Database Grid Memory (GB) 2048 (max 6144) 1024 (max 3072) 512 (max 1536) 512 (max 1536) InfiniBand switches 3 3 2 2 Ethernet switch 1 1 1 1 Exadata Storage Servers 14 7 3 3 Storage Grid CPU Cores 280 140 60 30 Raw PCI Flash Capacity Raw Hard Disk Capacity Usable mirrored capacity Usable Triple mirrored capacity EF 358.4 TB 179.2 TB 76.8 TB 38.4 TB HC 179.2 TB 89.6 TB 38.4 TB 19.2 TB EF N/A N/A N/A N/A HC 1344 TB 672 TB 288 TB 144 TB EF 130 TB 65 TB 27.9 TB 13.9 TB HC 508.3 TB 254.2 TB 108.9 TB 54.5 TB EF 102 TB 51 TB 21.9 TB 10.9 TB HC 398.7 TB 199.4 TB 85.4 TB 42.7 TB
Exadata X6-2 Standard Configs Use (OECA) for other configurations X6-2 Full X6-2 Half X6-2 Quarter X6-2 Eighth Database Servers 8 4 2 2 Database Grid Cores 352 (min 112) 176 (min 56) 88 (min 28) 44 (min 16) Database Grid Memory (GB) 2048 (max 6144) 1024 (max 3072) 512 (max 1536) 512 (max 1536) InfiniBand switches 3 3 2 2 Ethernet switch 1 1 1 1 Exadata Storage Servers 14 7 3 3 Storage Grid CPU Cores 280 140 60 30 Raw PCI Flash Capacity Raw Hard Disk Capacity Usable mirrored capacity Usable Triple mirrored capacity EF 358.4 TB 179.2 TB 76.8 TB 38.4 TB HC 179.2 TB 89.6 TB 38.4 TB 19.2 TB EF N/A N/A N/A N/A HC 1344 TB 672 TB 288 TB 144 TB EF 130 TB 65 TB 27.9 TB 13.9 TB HC 508.3 TB 254.2 TB 108.9 TB 54.5 TB EF 102 TB 51 TB 21.9 TB 10.9 TB HC 398.7 TB 199.4 TB 85.4 TB 42.7 TB
Exadata X6-2 SQL IO Performance X6-2 Full Rack X6-2 Half Rack X6-2 Quarter X6-2 Eighth Flash (Cache) Extreme Flash 350 GB/s 175 GB/s 75 GB/s 38 GB/s SQL Bandwidth 1,3 High Capacity 301 GB/s 150 GB/s 64 GB/s 32 GB/s Flash SQL IOPS 2,3 8K Reads 4,500,000 2,250,000 1,125,000 562,500 8K Writes 4,144,000 2,072,000 1,036,000 518,000 Disk SQL Extreme Flash N/A N/A N/A N/A Bandwidth 1,3 High Capacity 25 GB/s 12.5 GB/s 5.4 GB/s 2.7 GB/s Disk SQL IOPS 2,3 Extreme Flash N/A N/A N/A N/A High Capacity 36,000 18,000 7,800 3,900 1 - Bandwidth is peak physical scan bandwidth achieved running SQL, assuming no compression. Effective data bandwidth will be much higher when compression is factored in. 2 - IOPS Based on read IO requests of size 8K running SQL, typically with sub-millisecond latencies. Note that the IO size greatly effects flash IOPS. Others quote IOPS based on 2K, 4K or smaller IOs that are not relevant for databases and measure IOs using low level tools instead of SQL. 3- Actual Performance varies by application.
WHY EXADATA?
Why Exadata? Exadata designed to eliminate the most common bottleneck for large databases IO performance from storage to database
Why Exadata? Solving the IO Bottleneck Solution 1: Enlarge the pipe Physical disks, on all cells, work in parallel to serve IO requests Large Infiniband pipe (40GB/Sec)
Why Exadata? Can t we do that with other highperformance storage solutions? YES Nothing Magical about Exadata hardware, and it s still the same Oracle Database
Why Exadata? Solving IO Bottleneck Solution 2: Reduce IO operations Exadata s Secret Sauce: Storage Offloading, Smart Flash Cache and Hybrid Columnar Compression (HCC) 10X reduction in data sent to database servers common
EXADATA FLASH
Smart Flash Cache I/Os 4.5 Million 8K Read 4.1 Million 8K Write IOPS from SQL Caches Read and Write I/Os in PCI flash Transparently accelerates read and write intensive workloads Dual-format Columnar Flash Cache Persistent write cache speeds database recovery Exadata Flash Cache is much more effective than flash tiering architectures used by others Caches current hot data, not yesterday s Caches data in granules 8x to 16x smaller than tiering Greatly improves the effectiveness of flash Other Flash Features can be configured if needed E.g. Cache compression, Cache pinning, Flash Disks (for Temp)
Smart Flash Cache I/Os 4.5 Million 8K Read 4.1 Million 8K Write IOPS from SQL Caches Read and Write I/Os in PCI flash Transparently accelerates read and write intensive workloads Dual-format Columnar Flash Cache Persistent write cache speeds database recovery Exadata Flash Cache is much more effective than flash tiering architectures used by others Caches current hot data, not yesterday s Caches data in granules 8x to 16x smaller than tiering Greatly improves the effectiveness of flash Other Flash Features can be configured if needed E.g. Cache compression, Cache pinning, Flash Disks (for Temp)
Smart Flash Cache Coming Soon Announced at OOW 2016 for a Future Exadata Release When queries have high temp IO and become bottlenecked on disk Smart flash cache intelligently caches temp IO Writes to flash for temp reduces elapsed time Reads from flash for temp reduces elapsed time further Once released and tested, may remove need for creating flashbased grid disks for temp
Flash Based Cell Disks Usage Smart Flash Cache Uses all available space by default Managed automatically for maximum efficiency Flash-based Grid Disks Premium, persistent database storage Requires deliberate planning for efficient usage One potential use case is for temp tablespace
Why Use Flash for Temp? Elapsed time greatly reduced for statements bottlenecked by temp I/O Even well-tuned data warehouses often have high Temp I/O Particularly related to large hash joins Offloads IOPS from hard disks Improves temp I/O performance and frees up HDD I/O capacity By default Exadata does nothing to speed up temp I/O No storage cell offloading for temp Flash cache not used for temp operations (yet J) Must use flash-based grid disks to use flash for temp Newer Exadata have enough flash for flash cache and temp
Caveats Evaluate the trade-offs and determine if temp on flashbased grid disks is right for your environment Reduced flash cache size Redundancy requirements for temp tablespace External redundancy carries availability risk even for temp Normal redundancy requires using double the amount of flash May require additional maintenance during patching Is your database even bottlenecked by temp operations?
Benefits Case Study 1: Large organization in Utah Data Warehouse running on Exadata X2-2 ¼ rack (1TB flash) Temp I/O significant bottleneck Temp read and write I/O 4 th and 5 th top wait events on DB Significant improvement after moving temp to 340GB flash disk Dropped temp I/O out of the top 10 wait events Temp I/O latency reduced 8X Temp heavy SQL saw average of 3X performance improvement
Benefits Case Study 2: Large organization in Michigan Data Warehouse running on Exadata X5-2 Full rack (89TB flash) Significant improvement after moving temp to 9TB flash disk: Dramatic performance gain on temp heavy ETL: One Job Reduced from 17.5 hours to 2.7 hours (6.5X improvement) Another Job Reduced from 6.5 hours to 2.1 hours (3.1X improvement) BI team saw over 2.5X performance gain on reports overall Report HDD Temp Flash Temp Improvement 1 99 38 2.6 X 2 12 4 3 X 3 10 3 3.3 X 4 794 217 3.7 X 5 894 233 3.8 X 6 5 1 5 X
Creating Temp Tablespace on Flash 1) Determine size of flash-based grid disk 2) Drop Flash Cache 3) Recreate Flash Cache at new reduced size 4) Create the flash grid disk 5) Create ASM Diskgroup on the flash grid disk 6) Create temporary tablespace on the new diskgroup 7) Alter users to use the new temp tablespace
Creating Flash-based Grid Disks Find Current Celldisk detail using CellCLI: CELLCLI> list celldisk detail name: disktype: CD_00_cm01celadm01 HardDisk size: 3.6049652099609375T... (12 total hard disks disks on a cell) name: FD_00_cm01celadm01 disktype: FlashDisk size: 1.455474853515625T... (4 total flash disks on a cell)
Creating Flash-based Grid Disks Find Current Flashcache detail: CELLCLI> list flashcache detail name: cm01celadm01_flashcache size: 5.82122802734375T... (this is per storage cell)
Creating Flash-based Grid Disks Note: To keep databases online, do all steps one cell at a time; otherwise, can do steps in parallel 1) Calculate the new flashcache size: E.g. Need 9728GB for flash grid disk on X5 full rack 5961G (original flashcache) * 14 (# cells) 9728G (flash for temp) = 73726G 73726GB available for flash cache Divide by number of cells (e.g. 14) = 5266.14GB flashcache per cell
Creating Flash-based Grid Disks 2) Drop Flash Cache: If keeping databases online do one cell at a time, otherwise could do in parallel If write-back flash is enabled will need to flush first cellcli > alter flashcache all flush oracle@cm01dbadm01 ~]$ dcli -c cm01celadm01 cellcli -e drop flashcache cm01celadm01: Flash cache cm01celadm01_flashcache successfully dropped
Creating Flash-based Grid Disks 3) Create Flashcache at new size: Total flash minus the size of the new grid disk [oracle@cm01dbadm01 ~]$ dcli -c cm01celadm01 cellcli -e create flashcache all size=5.1426566t cm01celadm01: Flash cache cm01celadm01_flashcache successfully created Verify flash cache status is normal CELLCLI> list flashcache detail name: cm01celadm01_flashcache size: 5.14263916015625T status: normal
Creating Flash-based Grid Disks 4) Create new Flash-based Grid Disks [oracle@cm01dbadm01 ~]$ dcli -c cm01celadm01 cellcli -e create griddisk all flashdisk prefix=flash cm01celadm01: GridDisk flash_fd_00_cm01celadm01 successfully created cm01celadm01: GridDisk flash_fd_01_cm01celadm01 successfully created cm01celadm01: GridDisk flash_fd_02_cm01celadm01 successfully created cm01celadm01: GridDisk flash_fd_03_cm01celadm01 successfully created
Creating Flash-based Grid Disks Repeat Steps 2-4 for other storage cells E.g. dcli -c cm01celadm02 cellcli -e drop flashcache dcli -c cm01celadm02 cellcli -e create flashcache all size=5.1426566t dcli -c cm01celadm02 cellcli -e create griddisk all flashdisk prefix=flash and so on for each of the storage cells
Create Flash-based ASM Diskgroup 1) Login via SQLPlus to ASM instance as sysasm 2) Create new diskgroup on the flash grid disk SQL> create diskgroup FLASH_TEMP normal redundancy disk 'o/*/flash*' attribute 'compatible.rdbms'='11.2.0.4.0', 'compatible.asm'='12.1.0.2.0', 'cell.smart_scan_capable'='true', 'au_size'='4m'; Diskgroup created.
Create Flash-based ASM Diskgroup 3) Mount new diskgroup on all ASM instances First, login to each ASM instance and verify state Should already be mounted on instance created on SELECT GROUP_NUMBER AS GRP_NUM, NAME, STATE, TOTAL_MB, FREE_MB, USABLE_FILE_MB, ROUND((CASE WHEN (TOTAL_MB!= 0) THEN FREE_MB / TOTAL_MB ELSE 0 END), 2)*100 '%' PERCENT_FREE FROM V$ASM_DISKGROUP ORDER BY 1;
Create Flash-based ASM Diskgroup On Other ASM instances the diskgroup is dismounted SQL> SELECT GROUP_NUMBER AS GRP_NUM, NAME, STATE, TOTAL_MB, FREE_MB FROM V$ASM_DISKGROUP ORDER BY 1; GRP_NUM NAME STATE TOTAL_MB FREE_MB ------- ------------ ------------ ----------- ----------- 4 FLASH_TEMP DISMOUNTED 0 0 3 DATAC1 MOUNTED 571490304 409207004 2 DBFS_DG MOUNTED 4845120 4735532 1 RECOC1 MOUNTED 63555072 25188236 Note: Query all ASM instances at once using GV$ASM_DISKGROUP
Create Flash-based ASM Diskgroup Mount diskgroup on each ASM instance SQL> alter diskgroup FLASH_TEMP mount; Diskgroup altered. SQL> SELECT GROUP_NUMBER AS GRP_NUM, NAME, STATE, TOTAL_MB, FREE_MB FROM V$ASM_DISKGROUP ORDER BY 1; GRP_NUM NAME STATE TOTAL_MB FREE_MB ------- ------------ ------------ ----------- ----------- 1 DATAC1 MOUNTED 571490304 409207004 2 DBFS_DG MOUNTED 4845120 4735532 3 RECOC1 MOUNTED 63555072 25191856 4 FLASH_TEMP MOUNTED 9961728 9959904
Create Flash Temp Tablespace SQL> CREATE TEMPORARY TABLESPACE FLASH_TEMP TEMPFILE '+FLASH_TEMP' SIZE 20480M AUTOEXTEND ON NEXT 4096M MAXSIZE 20480M, '+FLASH_TEMP' SIZE 20480M AUTOEXTEND ON NEXT 4096M MAXSIZE 20480M,... / TABLESPACE GROUP '' EXTENT MANAGEMENT LOCAL UNIFORM SIZE 64M Tablespace created.
Move Users to Temp Tablespace SQL> ALTER USER SH TEMPORARY TABLESPACE FLASH_TEMP; User altered. Recommend using only for users who need it
Conclusion When to use Exadata Flash for Temp tablespace? Need to speed up large hash joins and sorts IO is bottlenecked Significant portion of IO is from Temp operations Typically suited more for DW than OLTP workloads Because DW workloads typically don t use flash cache as much and drive larger temp operations Newer Exadata versions have more flash TEMP
Questions?