Oracle Exadata High Availability Secrets Explained: Direct from Development Technical Presentation

Similar documents
Safe Harbor Statement

Oracle Maximum Availability Architecture for Oracle Cloud

Copyright 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13

Copyright 2018, Oracle and/or its affiliates. All rights reserved.

Oracle Database 18c and Autonomous Database

Oracle MAA Blueprints for Oracle Cloud Infrastructure (OCI) Deployments

Consolidate with Oracle Exadata

Database Consolidation with Oracle Exadata

Oracle MAA Reference Architectures

Oracle Exadata X7. Uwe Kirchhoff Oracle ACS - Delivery Senior Principal Service Delivery Engineer

Zero Data Loss Recovery Appliance DOAG Konferenz 2014, Nürnberg

B. Using Data Guard Physical Standby to migrate from an 11.1 database to Exadata is beneficial because it allows you to adopt HCC during migration.

Oracle MAA Blueprints for Oracle Bare Metal Cloud Deployments

<Insert Picture Here> Oracle MAA und RAC Best Practices und Engineered Systems

Oracle Zero Data Loss Recovery Appliance

Exadata Implementation Strategy

Safe Harbor Statement

Oracle EXAM - 1Z Oracle Exadata Database Machine Administration, Software Release 11.x Exam. Buy Full Product

<Insert Picture Here> Exadata MAA Best Practices Series Session 12: Exadata Patching & Upgrades

Oracle Zero Data Loss Recovery Appliance (ZDLRA)

Oracle Database 12c: Clusterware & RAC Admin Accelerated Ed 1

Exadata Monitoring and Management Best Practices

Database Level 100. Rohit Rahi November Copyright 2018, Oracle and/or its affiliates. All rights reserved.

Mike Hughes Allstate Oracle Tech Lead, Oracle Performance DBA

Oracle Exadata: Strategy and Roadmap

Copyright 2011, Oracle and/or its affiliates. All rights reserved.

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. reserved. Insert Information Protection Policy Classification from Slide 8

What Really Sets Apart Exadata from the Rest

Database Tables to Storage Bits: Data Protection Best Practices for Oracle Database

Storage Optimization with Oracle Database 11g

Rapid database cloning using SMU and ZFS Storage Appliance How Exalogic tooling can help

High Availability Best Practices for Database Consolidation

Large-Scale Patch Automation for the Cloud-Generation DBAs

Global Data Services (GDS)

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Oracle Database 12c: RAC Administration Ed 1

Focus On: Oracle Database 11g Release 2

Oracle Exadata. Smart Database Platforms - Dramatic Performance and Cost Advantages. Juan Loaiza Senior Vice President Oracle Database Systems

ZDLRA High Availability for Backup and Recovery

1Z Oracle Exadata X5 Administration Exam Summary Syllabus Questions

Exadata Database Machine: 12c Administration Workshop Ed 2

ORACLE RAC DBA COURSE CONTENT

Oracle Database 12c R2: RAC Administration Ed 2

Maximize Availability on Private Clouds

Maximum Availability Architecture: Overview. An Oracle White Paper July 2002

Exadata Database Machine Security Tina Rose Platform Integration MAA Team, Exadata Development

WLS Neue Optionen braucht das Land

Maximum Availability Architecture. Oracle Best Practices For High Availability

How to Troubleshoot Databases and Exadata Using Oracle Log Analytics

Oracle Database Exadata Cloud Service Exadata Performance, Cloud Simplicity DATABASE CLOUD SERVICE

Oracle 11g: RAC and Grid Infrastructure Administration Accelerated Release 2

Exadata Implementation Strategy

Oracle Database 12c High Availability For Consolidation and Cloud Deployments

Oracle Database 11g: RAC Administration Release 2 NEW

Oracle Real Application Clusters (RAC) Your way to the Cloud

Oracle 11g: RAC and Grid Infrastructure Administration Accelerated Release 2

Oracle Database 12c: RAC Administration Ed 1 LVC

Database Level 200. Sanjay Narvekar November Copyright 2018, Oracle and/or its affiliates. All rights reserved.

Oracle Exam 1z0-027 Oracle Exadata Database Machine X3 Administration Version: 6.13 [ Total Questions: 72 ]

<Insert Picture Here> Managing Oracle Exadata Database Machine with Oracle Enterprise Manager 11g

High Availability Infrastructure for Cloud Computing

The Fastest and Most Cost-Effective Backup for Oracle Database: What s New in Oracle Secure Backup 10.2

Maximum Availability Architecture (MAA): Oracle E-Business Suite Release 12

Exadata Database Machine: 12c Administration Workshop Ed 1

Exadata Database Machine: 12c Administration Workshop Ed 2

Exadata Database Machine: 12c Administration Workshop Ed 2 Duration: 5 Days

Oracle Real Application Clusters (RAC) 12c Release 2 What s Next?

Oracle Autonomous Database

Enterprise Manager: Scalable Oracle Management

Consolidate and Prepare for Cloud Efficiencies Oracle Database 12c Oracle Multitenant Option

Eliminate Idle Redundancy with Oracle Active Data Guard

Help Us Help You - TFA Collector and the Support Tools Bundle

New England Data Camp v2.0 It is all about the data! Caregroup Healthcare System. Ayad Shammout Lead Technical DBA

Maximum Availability Architecture. Oracle Best Practices for High Availability

Oracle Exadata and OVM Best Practice Overview

Oracle Maximum Availability Architecture Best Practices for Oracle Multitenant

Craig Blitz Oracle Coherence Product Management

Exadata Database Machine Administration Workshop

Create a DBaaS Catalog in an Hour with a PaaS-Ready Infrastructure

The Right Choice for DR: Data Guard, Stretch Clusters, or Remote Mirroring. Ashish Ray Group Product Manager Oracle Corporation

Private Cloud Database Consolidation Name, Title

Best Practices for Disaster Recovery in Oracle Cloud Infrastructure ORACLE WHITE PAPER AUGUST 2018

Exdata Database Machine: 12c Administration Workshop Ed 2

ACCURATE STUDY GUIDES, HIGH PASSING RATE! Question & Answer. Dump Step. provides update free of charge in one year!

<Insert Picture Here> Exadata MAA Best Practices Series Session 1: E-Business Suite on Exadata

Exadata Database Machine Administration Workshop

Maximum Availability Architecture on Dell PowerEdge Servers and Dell/EMC Storage over Wide Area Networks

Session 1079: Using Real Application Testing to Successfully Migrate to Exadata - Best Practices and Customer Case Studies

Javaentwicklung in der Oracle Cloud

Solaris Engineered Systems

Exadata Database Machine Administration Workshop NEW

Future of Database. - Journey to the Cloud. Juan Loaiza Senior Vice President Oracle Database Systems

Azure Webinar. Resilient Solutions March Sander van den Hoven Principal Technical Evangelist Microsoft

PrepAwayExam. High-efficient Exam Materials are the best high pass-rate Exam Dumps

Performance Innovations with Oracle Database In-Memory

WebLogic & Oracle RAC Active GridLink for RAC

DBAs can use Oracle Application Express? Why?

High Availability- Disaster Recovery 101

Maximize Availability

Andy Mendelsohn, Oracle Corporation

Transcription:

Oracle Exadata High Availability Secrets Explained: Direct from Development Technical Presentation René Kundersma Consulting Member of Technical Staff; MAA and Exadata Best Practices Oracle Server Technologies

Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle. 3

Program Agenda 1 2 3 4 5 Exadata & Maximum Availability Architecture MAA Reference Architectures MAA Features in Exadata Patching & Rolling Upgrades Summary 4

Program Agenda 1 2 3 4 5 Exadata & Maximum Availability Architecture MAA Reference Architectures MAA Features in Exadata Patching & Rolling Upgrades Summary 5

Cost of Downtime Gartner: Cost of downtime @ $336K - $540K per hr or $11M - $100M per Year IDC: $100k - $500k per hr with critical failures up to $1M/hr with average cost of unplanned downtime between $1.25B - $2.5B per year In addition, Company reputation & Customer Loyalty http://blogs.gartner.com/andrew-lerner/2014/07/16/the-cost-of-downtime/ https://kapost-files-prod.s3.amazonaws.com/published/54ef73ef2592468e25000438/idc-devops-and-the-cost-of-downtime-fortune- 1000-best-practice-metrics-quantified.pdf

High Availability (HA) Business Challenges Eliminate risk of downtime and data loss Improve service while increasing return on investment 7

Exadata Addressing High Availability Challenges Protection from Planned & Unplanned Outages Type of Outage High Availability Challenges Protection using Exadata Planned Outages Unplanned Outages Disruptive Schema Changes due to application changes to meet ever-changing business requirements Downtime required for lifecycle management like periodic upgrades of firmware & software, data migration Data Corruptions due to hardware/software faults, media issues Application Brownouts due to server, instance storage failures or due to planned maintenance Disaster Recovery (DR) Challenges where the DR site is not keeping up with Production Schema Changes impacts are greatly reduced with faster changes, index and object rebuilds and reorganizations Downtime required for lifecycle management is mitigated using fast online upgrades, patching automation, standby first patching, zero downtime migration Data Corruptions are prevented or the potential downtime is reduced dramatically with additional corruption prevention, detection and auto-repair Application Brownout reduced to sub-second with fastest instance recovery. Disaster Recovery (DR) Challenges are mitigated with fastest redo apply resulting in low Recovery Time Objective 8

Exadata: Built-in Hardware Fault-Tolerance Redundant Database Servers Active-Active highly available clustered servers Hot-swappable power supplies and fans Redundant power distribution units Redundant Network Redundant 40Gb/s IB connections and switches Client access using HA bonded networks Redundant Storage Grid Data mirrored across storage servers Redundant, non-blocking I/O paths

Oracle Maximum Availability Architecture (MAA) Production Site RAC / RAC One Scalability Server HA Flashback Human error correction Application Continuity Application HA ASM Local storage protection Edition-based Redefinition, Online Redefinition, Data Guard, GoldenGate Minimal downtime maintenance, upgrades, migrations Enterprise Manager Cloud Control Site Guard, Coordinated Site Failover Global Data Services Service Failover / Load Balancing Sharding Horizontal Partitioning, Scalability, Shared Nothing architecture Active Standby Site Active Data Guard Data Protection, DR Query Offload GoldenGate Active-active replication Heterogeneous RMAN, Oracle Secure Backup, Zero Data Loss Recovery Appliance Backup to disk, tape or cloud 10

On-Premises Cloud at Customer Public Cloud Exadata Database Machine Exadata Cloud Machine Exadata Cloud Service Customer Data Center Purchased Customer Managed Customer Data Center Subscription Oracle Managed Oracle Cloud Subscription Oracle Managed 11

Program Agenda 1 2 3 4 5 Exadata & Maximum Availability Architecture MAA Reference Architectures MAA Features in Exadata Patching & Rolling Upgrades Summary 12

Oracle MAA Availability Tiers Availability Service Levels for Unplanned and Planned Outages PLATINUM GOLD SILVER BRONZE Zero Outage for Platinum Ready Applications Zero data loss Comprehensive HA and Disaster Protection Recovery in seconds with zero or near-zero data loss High Availability (HA) for Recoverable Local Outages Zero Downtime Rolling Maintenance for Patches and Patch Set Updates Basic Oracle Restart Backups plus redo for Oracle data protection

Bronze : Single Instance Database on Exadata Primary Datacenter Single Instance Database on a Cluster Local backup Bronze Summary Single instance database on a cluster with backups & auto-restart Optional replication of backup data to remote site Restore from backup to resume service following unrecoverable outages Remote Datacenter Replicated backup Features Oracle Clusterware HA Capabilities Online Maintenance Corruption Protection Multitenant Flashback Technologies Recovery Manager Recovery Appliance / ZFS / Cloud Backup 14

Planned Maintenance Unplanned Outages Unplanned Outages and Planned Maintenance Bronze - Single Instance Database on Exadata Events Downtime (RTO) Data Loss Potential (RPO) Database instance failure Minutes Zero Recoverable server failure Minutes to hours Zero Data corruptions, unrecoverable server failure, database or site failures Hours to days Online file move, reorganization/redefinition, and certain patches Zero Zero Hardware or operating system maintenance and database patches, patch set updates and bundle patches Minutes to hours Database upgrades: patch sets and full database releases Minutes to hours Zero Since Last Backup (or) Near Zero with Recovery Appliance Zero Platform migrations Hours to a day Zero Application upgrades that modify back-end database objects Hours to days Zero 15

Silver : Active/Active Database Clustering Primary Datacenter RAC / RAC One Database Silver Summary RAC or RAC One with remote backups Fast Instance failovers for planned/unplanned outages Backups replicated to remote site for DR Restore from backup to resume service following unrecoverable Database Files Remote Datacenter Local backup Replicated backup Features outages Multitenant Online Maintenance Exadata & Basic Corruption Protection plus Lost Write Protection Flashback Technologies Recovery Manager Recovery Appliance / ZFS / Backup Storage Real Application Cluster / RAC One 16

Planned Maintenance Unplanned Outages Unplanned Outages and Planned Maintenance Silver High Availability with Fast Failover Events Downtime (RTO) Data Loss Potential (RPO) Database instance failure Seconds Zero Recoverable server failure Seconds Zero Data corruptions, database unable to restart, site failure Minutes to Hours Since Last Backup (or) Near Zero with Recovery Appliance Online file move, reorganization/redefinition, and patching Zero Zero Hardware or operating system maintenance and database patches, patch set updates and bundle patches Database upgrades: patch sets and full database releases Minutes to Hours Zero Platform migrations Hours to days Zero Zero Zero App upgrades that modify back-end database objects Hours to days Zero 17

Gold: Physical Replication, Zero Data Loss, Fast Failovers Primary Datacenter Local Standby Gold Summary RAC cluster provides HA within the primary data center Active Data Guard replication to remote data center for DR and comprehensive data protection. Optional replication to a local standby in another availability domain for automatic database and application role transitions. Primary Local backup Read-only workloads are offloaded to the remote copy. Remote Datacenter Remote Standby Local backup Features Multitenant Online Maintenance Exadata & Basic Corruption Protection Flashback Technologies Real Application Cluster Recovery Manager Recovery Appliance / ZFS / Backup Storage Active Data Guard with optional Far Sync * Local standby or remote standby or both is required for Gold 18

Planned Maintenance Unplanned Outages Unplanned Outages and Planned Maintenance Gold Comprehensive HA and Data Protection Events Downtime Data Loss Potential Database instance failure Seconds Zero Recoverable server failure Seconds Zero Data corruptions, database unable to restart, site failure Seconds to minutes Zero to seconds Online file move, reorganization/redefinition, and patching Zero Zero Hardware or operating system maintenance and database patches, patch set updates and bundle patches Zero Zero Database upgrades: patch sets, full database releases Seconds to minutes Zero Platform migrations Seconds to minutes Zero Application upgrades that modify database objects Hours Zero 19

Platinum: Zero Application Outage, Zero Data Loss Primary Datacenter Availability Domain Remote Datacenter Remote Standby Local ADG Standby/ GoldenGate App Cont, EBR Availability Domain Local backup Local backup Platinum Summary Features All benefits of Gold + Zero application downtime with GoldenGate and Edition Based Redefinition Zero application impact with Application Continuity Zero data loss replication configuration across data centers with Far Sync Active Data Guard Multitenant Online Maintenance Exadata & Basic Corruption Protection Flashback Technologies Recovery Manager Recovery Appliance / ZFS / Backup Storage Real Application Cluster Active Data Guard with Far Sync GoldenGate Application Continuity Edition-Based Redefinition 20

Planned Maintenance Unplanned Outages Unplanned Outages and Planned Maintenance Platinum Zero Outage for Platinum Ready Applications Events Downtime Data Loss Potential Database instance failure Zero Zero Recoverable server failure Zero Zero Data corruption, database unable to restart, site failure Zero Zero Online file move, reorganization/redefinition, patching Zero Zero Hardware or operating system maintenance and database patches, patch set updates and bundle patches Zero Zero Database upgrades: patch sets, full database releases Zero Zero Platform migrations Zero Zero Application upgrades that modify database objects Zero Zero 21

Program Agenda 1 2 3 4 5 Exadata & Maximum Availability Architecture MAA Reference Architectures MAA Features in Exadata Patching & Rolling Upgrades Summary 22

Exadata: Maximum Availability Architecture Features Code & Configuration Data Protection Brownout Reduction Quality of Service Performance Management 23

Exadata: Maximum Availability Architecture Features Code & Configuration Data Protection Brownout Reduction Quality of Service Performance Management 24

Failed Storage Here lies disk drive in slot 11 who lived a long life and served its purpose 25

Sick Component Handling Disks

Exadata: Data Protection Corruption Detection, Prevention & Repair If an application update in the database encounters corruption Database reads from the ASM mirror Repairs the corruption using the good copy This repair happens without impacting other database processes and application When a network packet in the I/O path between DB server and storage node is corrupted Storage cell prevents the write ASM retries by re-sending the packet Application never encounters corruptions Automatic disk inspection and repair when disks are idle Hardware Assisted Resilient Data (HARD) compliant checks 27

Exadata: Data Protection Disk Sector / IO Error Detection, Prevention, and Repair Application never encounters the IO error Just in case the Administrator would like to know, we log the following: <Cell side> Begin scrubbing CellDisk:CD_06_cell06. Read Error on Cell Disk CD_06_cell06 (/dev/sdg) at device offset 2794139418624 bytes with size 1048576 bytes (errno: Input/output error [5]) Read Error on Grid Disk RECOC1_CD_06_cell06 at grid disk offset 423267139584 bytes with size 1048576 bytes from disk scrub Read Error on Cell Disk CD_06_cell06 (/dev/sdg) at device offset 2794140467200 bytes with size 1048576 bytes (errno: Input/output error [5]) Read Error on Grid Disk RECOC1_CD_06_cell06 at grid disk offset 423268188160 bytes with size 1048576 bytes from disk scrub Disk sector goes bad Exadata Scrubbing finds bad sector and ASM repairs it Broadcast: 1 events ASM REPAIR diskgroup of opcode 10 for diskgroup RECOC1 to: ClientHostName = dbnode1.domain.com, ClientPID = 46838 Broadcast: 1 events ASM REPAIR diskgroup of opcode 10 for diskgroup RECOC1 to: ClientHostName = dbnode1.domain.com, ClientPID = 40270 Finished scrubbing CellDisk:CD_06_cell06, scrubbed blocks (1MB):2860960, found bad blocks:2

Exadata: Data Protection Resurrect drive Storage Failures When a drive is reported as failed, but not physically failed Automatic power cycle the drive to avoid false positive drive failure Works on both High Capacity Disks & Extreme Flash Cells When a drive predicatively fails, avoid flooding it with rebalance IO while maintaining its status in the diskgroup in case its needed Blue light indicating its safe to replace disk avoiding human errors 29

Exadata: Data Protection Efficient Rebalance with Service Level Protection Intelligent and flexible rebalance power setting Testing in MAA labs to find best balance between redundancy restoration timing and service level protection. MAA best practice default of 4 set at deployment time MAA best practice max of 64 available as needed MOS note 757552.1 available with more information and guidance Performs database-aware priority restores Control files, log files, SP files, TDE key stores, OCR, Wallets and then database files (MOS 1968607.1) 12.2 ASM rebalance restores redundancy first drastically reducing secondary failure exposure window 12.2 Exadata leverages flash cache for rebalance reads improving redundancy restoration performance by up to 30% 30

Exadata: Maximum Availability Architecture Features Code & Configuration Data Protection Brownout Reduction Quality of Service Performance Management 31

Seconds Exadata: Quality of Service Capping For Optimal Performance Cell Side IO Latency Capping (Hard Disk & Flash ) When excessive IO is performed to a cell over PCI The read IO is redirected to the partner cell The write IO is canceled and temporarily written to healthy flash on the same cell Cell Side Disk Confinement When a disk goes bad and is taken offline Diagnostic is automatically run on the disk to determine health If healthy, disk is returned to ONLINE status and resynched If unhealthy, health factor drop is performed, rebalance is performed and blue LED is lit after completion Database Side IO Latency Capping new in 12.2.0.1 LGWR Delay after Hung IO 40 30 20 10 0 1 Exadata 30 Traditional Storage 32

Exadata: Quality of Service - Priority Database-Aware I/O priorities Exadata prioritizes OLTP transaction I/O ahead of I/Os requests for reports & queries For both flash and disks! Exadata prioritizes flash space for OLTP (Write) data ahead of scan (Read) data Reduces IO wait times for log writes This is not possible on ordinary general purpose storage DBAs know that reports wreak havoc on OLTP performance 33

Exadata: Maximum Availability Architecture Features Code & Configuration Data Protection Brownout Reduction Quality of Service Performance Management 34

Exadata: Management Notification & Replacement Process for any Faults Components break Fault Components get sick Intelligent hardware/software integration helps prevent human error Cell Shutdown causing application outage Management Fully automated notification and replacement process through ASR (Auto Service Request) Exadata uniquely qualified to handle sick components with full stack integration. Exadata provides system/service level high availability. Blue light indicating disk replacement can be performed. Cell shutdown prevention and notification when redundancy would be compromised Smart handshake with database tier during cell (or cellserv) shutdown to prevent application outage 35

Exadata: Management Health Check using EXAchk Utility EXAchk provides configuration specific, up-to-date health check across the entire stack Covers Exadata, Database, Grid Infrastructure, ASM critical issues Provides MAA scorecard with MAA configuration gaps and guidance to mitigate Automated periodic scheduled runs with email notifications Continuous evolution of configuration checks EXAchk helps with saving a lot of time and money due to proactive health verification which dramatically reduces downtime Currently has over 1000 checks per target Note: Automated Exachk Healthcheck MOS 107954.1 36

Development Backed Best Practices Continuous Improvement, Always a Priority Idea Weekly Expert Review / Testing Publication MOS Note 757552.1 Default Exadata deployment Engineered System with Best Practices Exadata Health Check (Exachk) You are here But you are also here! 37

EXAChk: Sample Reports Assessment Report Health Score, Summary, Findings Findings & Recommendations How to Solve the problem? MAA Score Card Critical Issues, incompatible features usage 38

Exadata AWR Support One Stop Shopping for Performance Problems 39

Exadata AWR Support Unique Configuration and Outlier Detection Configuration differences detected across storage servers Exadata Storage Server Model Exadata Storage Version (group by package type/package version) Exadata Storage Information (group by all columns - flash cache size, flash log size, # hard disks, # flash, # griddisks) Exadata Griddisks (group by # griddisks, griddisk size and disk type) Exadata Celldisks (group by disk type, celldisk size, # celldisks) Statistical differences detected compared to data sheet limits Max IOPS/throughput for OS statistics are colored dark red Outliers for OS and Cell Server statistics are colored (pinkish-red for high, yellow for low) 40

Performance Problem The system is slow today 41

Exadata AWR Support Outlier Detection Example from a Real (Big) Customer

Exadata: Maximum Availability Architecture Features Code & Configuration Data Protection Brownout Reduction Quality of Service Performance Management 43

Brownouts and Blackouts Its All about Service Levels A brownout is a significant service level degradation. A blackout is a complete service level interruption Brownouts and blackouts translate to lost productivity and revenue Systems are complicated with many components, and an issue at one layer can easily cascade to another layer and exacerbate the impact. Engineered systems are uniquely qualified to solve this very tough problem.

Exadata Marquee/New HA Features Reduced HA Brownout Fast Node Death Detection on Database Nodes and Cells Example of Database node power failure with an OLTP workload and CSS misscount=60 45

Application Brownout in a Typical Configuration Storage Controller Proprietary Protocol Timeouts SAN/LAN Clusterware Timeout Storage Controller SCSI Timeout Each layer of the application stack has its own failure detection method Vendors try to obfuscate these details by quoting client side failure numbers In most cases the fault detection times are additive For example if storage controller crashes it will take 2 SCSI timeouts for the database server to detect such a failure 46

Seconds Exadata: Unique Brownout Reduction Features Instant Failure Detection Maximum Application Uptime If a server disappears from both InfiniBand switches, declare it dead in less than two seconds No waiting for long heartbeat timeouts Reduces application brownouts from 30+ seconds to < 2 seconds Active/Active IB configuration provides: Extreme throughput - 40 Gb/s QDR Extreme availability - RDS failover in 1-2 seconds with minimum application impact 350 300 250 200 150 100 50 0 Application Brownout 0,8 Exadata 300 3rd Party Storage 47

Brownouts and Blackouts Flex ASM Flex ASM enables continuous RDBMS<->ASM communication after an ASM instance crash without the need for a service failover Completely transparent to the application with no service level impact Flex ASM configured with cardinality=all on Exadata

Risk Mitigation Downtime Oracle Exadata Database Machine Unplanned Downtime Before Oracle Exadata With Oracle Exadata Difference Number of instances per year 7.1 0.7 6.5 90% MTTR (hours) 2.9 0.4 2.5 86% Productive hours lost per 100 users per year 1,021 66 955 94% Unplanned Downtime Revenue Impact Total revenue impact per year $423,700 $5,800 $417,900 99% Planned downtime Number of instances per year 10.9 6.0 4.9 45% MTTR (hours) 4.6 1.9 2.7 59% Productive hours lost per 100 users per year 68 60 8 12% % Benefit Source: IDC 49

Program Agenda 1 2 3 4 5 Exadata & Maximum Availability Architecture MAA Reference Architectures MAA Features in Exadata Patching & Rolling Upgrades Summary 50

Exadata Database Machine Software Architecture Review (Bare Metal / Physical) Database Grid Oracle Database and Grid Infrastructure Exadata (firmware, Linux, Exadata) Storage Grid Exadata (firmware, Linux, Exadata) Networking Exadata (InfiniBand switch software) Other: Ethernet switch, PDU 51

Software Maintenance Recommended Update Schedule Frequency Database / Grid Infrastructure Exadata 3-12 months Quarterly Update Quarterly Update 1-2 years Patch Set New Release 2-4 years New Release All software maintenance for Exadata MOS 888828.1 Responses to security scan findings MOS 1405320.1 Quality maintenance readiness with Exachk Provides version recommendation Critical issues exposure report Late-breaking issues - MOS Alerts for Hot Topics 52

Zero Downtime Software Maintenance Rolling Software Update Support Component to Update Database / Grid Infrastructure Exadata Database Server Exadata Storage Server Exadata InfiniBand switch Rolling Update Yes* Yes Yes Yes * OJVM PSU Mitigate impact and risk Automatic client failover ASM high redundancy Out-of-place apply Test system Data Guard 53

Reduce Risk and Downtime with Data Guard Data Guard Standby First Patching (MOS 1265700.1) Standby First Patching Steps 1. Update software on Standby DB Dictionary GI/DB Home Exa DBnode Exa Storage Exa Switches Site A - Standby Primary DB Dictionary GI/DB Home Exa DBnode Exa Storage Exa Switches Site B - Primary Standby 2. Test new software 3. Switchover 4. Update software on Standby 5. Run SQL portion of BP on Primary 54

Grid Infrastructure and Database Software Upgrades Exadata Database Server Tools and methods the same as non-exadata systems Opatch / OPatchAuto / Oplan OUI, DBUA, ASMCA, EM Choices of Upgrades Rolling (or) Non-rolling In-Place (or) Out-of-Place Exadata-specific Quarterly Updates Superset of generic PSU Permitted on non-exadata systems only when supporting Exadata system (DR, test) 55

Firmware Upgrades Exadata Database Server, Exadata Storage Server (Cell), Network Switches Patch Manager (patchmgr) to patch all the nodes, cells, switches Can be run even from a non-exadata to patch all Exadata racks Components upgraded in parallel Rolling Upgrades Database up all the time Single component upgraded at one time for availability Non-Rolling Upgrades Database down Multiple DB Nodes / Storage Cells / Switches upgraded in parallel 5x speed up in Storage Server Software update with 12.2.1.1.0 56

Best Practices for Exadata Planned Maintenance Leverage Exachk for simple software planning Configure for zero downtime software maintenance Reduce risk with Standby First Updating Leverage Lights Out Patching with notification function Take advantage of Exadata Engineered defaults Configure Storage Diskgroup as High Redundancy 57

Program Agenda 1 2 3 4 5 Exadata & Maximum Availability Architecture Oracle Database High Availability Features MAA Reference Architectures MAA Features in Exadata Summary 58

Exadata is Highly Engineered and Standardized Less Risk, High Uptime = Better Results Less Deployment Risk and Faster to Market Delivered assembled, debugged, and ready-to-run Less Performance and Availability Risks Optimized database-to-disk including firmware, OS, network Industry experts at every layer of the stack help design, build and support Exadata. Includes MAA input, bug fixes, and configuration practices. Less Operating Risk All failure modes tested end-to-end. All systems identical. Reduces issue resolution times, reduces vendor management overhead and improves SLAs Operational Play Book (including online elasticity) 59

Summary: High Availability Decisions Made Easier Protection From Unintuitive double storage failures Data loss and downtime Unexpected issues during planned maintenance Unexpected production workload profile Known critical issues Resource depletion that affects service levels Over-customization Use Exadata + MAA High redundancy or Normal redundancy with Data Guard For disasters: Data Guard, Golden Gate -> See http://www.oracle.com/goto/maa For local failures requiring recovery: Test your restore/recovery strategy to ensure it works If you can negotiate the downtime within service levels, take it. If not, leverage rolling patch capabilities available at every tier Test environment similar to production, DBMS_WORKLOAD_REPLAY Run EXAchk monthly and when a new release is published Capacity Planning, Resource Management, Enterprise Manager, RAS for new customers Walk the Oracle line as much as you can, and you will gain the most bang for your buck from your engineered system Copyright 2016 Oracle and/or its affiliates. All rights reserved. 60