Virtualization 201 Management and Risk Mitigation PASS Virtualization Virtual Chapter 2014.05.15 About David Klee @kleegeek davidklee.net gplus.to/kleegeek linked.com/a/davidaklee Specialties / Focus Areas / Passions: Performance Tuning & Troubleshooting Virtualization Cloud Enablement Infrastructure Architecture High Availability Disaster Recovery Health Monitoring Capacity Management virtualization.sqlpass.org hadrvc.sqlpass.org 2 1
Virtualization 101 Refresher for previous discussion at: http://virtualization.sqlpass.org/home.aspx?eventid=882 Focused on: What is virtualization? What does it do for DBAs? Private cloud infrastructure concepts Resources and queues Data management benefits 3 201 Session Agenda Management and Risk Mitigation What Changes? Deploying VMs Backups & Restores RPO, RTO, & MTTR Extending HA Extending DR 4 2
Managing VMs What Changes? What Stays the Same? 5 Organization Silos DBAs usually caught in the middle Remember the phrase: Trust But Verify 6 3
Performance Statistics Verify through the use of objective metrics Physical world, stats to collect: CPU Memory Storage Networking Windows Perfmon how-to at bit.ly/1sqsvns Virtual world, stats to add: Host-level CPU, memory, storage, networking Host-level CPU contention (CPU Ready / CPU Wait Time Per Dispatch) Memory overcommitment and impact levels Physical I/O & network path consumption 7 Access to VM Manager One more layer to gain access and interpret Interpret results & overlay with in-guest metrics Statistic collection: Hyper-V: Perfmon System Center VMware: PowerShell - vcenter 8 4
Deploying Templates Create a standardized virtual machine Partially or fully configured Convert it to a template Deploy a new VM from the template Post-deploy configuration Change resource allocations Ready to go! Master Gold Template 9 Backups Which way is the right way? 10 5
Infinite Possibilities Physical server admins have their methods Virtual machine admins have their methods DBAs have their methods too Backup strategies: VM-level backups? In-guest or guest-agnostic? database level? 11 Technicalities Organizational requirement for data restoration point Can the VM admin meet it with traditional VM backups or SAN tools? Don t say it demonstrate it. Is VM-level backup application aware? I.e. Windows VSS snapshot Typical Combo: VM-level + -level VM backup: OS + instance binaries full/diff/tran log backups 12 6
Restoration Testing Dramatically easier to test system restoration process than with physical servers Create offline (non-routable) isolated VLAN Restore / hot-clone Active Directory controller Restore your VM Fire up and test! Zero impact to production Repeatable and possibly scriptable 13 HA & DR Endless Possibilities & Infinite Options 14 7
HA / DR Considerations Criticality of the application? Actual cost of an outage? Planned downtime window duration? Unplanned downtime recovery target? Organizational downtime SLAs? Licensing limitations? Key terms: Recovery Point Objective (RPO) Recovery Time Objective (RTO) Mean Time to Recovery (MTTR) 15 Organizational Silo Priorities Defining the distinction between HA & DR Existing & VM infrastructure HA/DR strategies Discuss and agree to strategy before virtualizing Sufficient for production? Sufficient for pre-production? 16 8
High Availability Minimize disruption for failures 17 High Availability Resiliency from unplanned system failures Risk mitigation with system recovery Distinguish between planned and unplanned downtime Metrics to care about: Mean Time to Recovery (MTTR) Recovery Point Objective (RPO) Recovery Time Objective (RTO) High Availability is not Disaster Recovery Virtualization HA is probably not a replacement for HA, but is complementary 18 9
HA Options No HA (stand-alone instance) Traditional shared-storage clustering with WSFC Database-level mirroring AlwaysOn Availability Groups Replication Log shipping All about risk mitigation! 19 Reduce Your Risk Single best example Two-Node cluster Active Node Standby Node Active Node Now a single point of failure! 20 10
Virtualization HA (Average failover time: 2m 45s) 21 Virtualization HA is not Provides resiliency for unplanned host failures OS reboots still required Does not eliminate downtime from hardware failures Does not protect you from architectural decisions Virtualization HA IS complementary to HA 22 11
Failure Domains N+1 physical hosts might not always work Example: 16-node VM cluster two blade chassis 23 Disaster Recovery Now things get really interesting 24 12
May 11, 2014 1.5 miles from my house (Source: https://twitter.com/28storms/status/465644873149063168/photo/1) 25 Disaster Recovery Resiliency from unplanned site outages Risk mitigation with system recovery Distinguish between planned and unplanned downtime Metrics to care about: Recovery Point Objective (RPO) Recovery Time Objective (RTO) Define a disaster & scenarios i.e. entire site fail over or fail over a subset of VMs Consider fail back as well as fail over Disaster Recovery is not High Availability Virtualization DR might be a replacement for DR 26 13
Disaster Recovery Options No DR (stand-alone instance) Off-site backups / restores (sneaker-net) Database-level (async) mirroring AlwaysOn Availability Groups Replication Log shipping Geo-cluster 27 Virtualization DR Options VMware vsphere and Microsoft Hyper-V VM & backup replication VM-level restore with DB restore SAN to SAN replication (manual failover / failback) SAN to SAN replication (automated failover / failback) VM-level asynchronous block-level replication None! (Use only DR features) 28 14
5/14/2014 Infinite Possibilities VM A 15 minute block-level replication VM A WAN 60 minute block-level replication DR Site Virtualization Primary Site Virtualization VM B VM B 29 Infinite Possibilities VM A Virtualization Virtualization DR Site Primary Site 15 minute LUN-level replication VM B 30 15
5/14/2014 Infinite Possibilities VM AG1 (Availability Group asynchronous replication) VM AG3 (sync repl.) App App DR Site Virtualization Primary Site Virtualization VM AG2 31 Conclusions Virtualization crosses organizational silos Work with other silos to use the best combination of features to best minimize risk Stay tuned! 301 and 499 sessions coming soon! 32 16
Questions? @kleegeek davidklee.net gplus.to/kleegeek linked.com/a/davidaklee 33 Thanks for attending! 17