1 Copyright 11, Oracle and/or its affiliates All rights reserved Cloud Consolidation with Oracle (RAC) How much is too much? Markus Michalewicz Senior Principal Product Manager Oracle RAC, Oracle America 2 Copyright 11, Oracle and/or its affiliates All rights reserved 1
Safe Harbor Statement The following is intended to outline our general product direction It is intended for information purposes only, and may not be incorporated into any contract It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle 3 Copyright 11, Oracle and/or its affiliates All rights reserved Agenda Consolidation on Database Clouds Registered, Starting, Running Memory Management CPU(_COUNT) Management Database Resource Management Oracle RAC based Consolidation Per Server Limits remain Real Time (RT) Processes Registered, Starting, Running Cluster Limits Additional Considerations Summary 4 Copyright 11, Oracle and/or its affiliates All rights reserved 2
Consolidation on Database Clouds Business Drivers Lower: CapEx and OpEx Servers, Storage S/W licenses OpEx Maintenance Management Reduce IT Costs Reduce Complexity Reduce: Configurations Services Standardize: Versions Enable: Resource Elasticity Rapid Provisioning Fast Deployment Increase Agility Increase Quality of Service Enhance: IT service time Availability Security 5 Copyright 11, Oracle and/or its affiliates All rights reserved Database Cloud Architectures Common building blocks are shared server and storage pools Infrastructure Cloud Database Cloud Database Cloud DW CRM ERP DW ERP CRM DW ERP CRM Hypervisor Hypervisor Server Platform Database 6 Copyright 11, Oracle and/or its affiliates All rights reserved 3
11/15/11 Registered, Starting, Running 7 Copyright 11, Oracle and/or its affiliates All rights reserved Oracle (RAC) Registered, Starting, Running makes a difference When talking about limits in general, three stages need to be considered: Registered Starting Registered: n instances are defined to run on a machine (potentially) Running Typically, limits are discussed based on: Running at the same time When using cluster based consolidation, starting databases need to be considered in addition Starting: Registered databases and instances start Default is starting at the same time Running: Registered databases and instances are running (concurrently) Copyright 11, Oracle and/or its affiliates All rights reserved 4
11/15/11 Oracle (RAC) Registered vs Running Registered databases and instances could Registered: n instances potentially start or run at the same time Oracle s Quality of Service Management or scripts can be used to model policies to run certain databases only at certain times; eg geographic region over time Assume the cluster is PST based: are defined to run on a machine (potentially) Define and Enable vs Classify and Measure Evaluate and Report Analyze and Recommend EMEA based s run 10:00pm am PST Implement and Control APAC based s run 6:00pm 3am PST USA based s run :00am 6pm PST 9 Running: Registered databases and instances are running (concurrently) Copyright 11, Oracle and/or its affiliates All rights reserved 10 Copyright 11, Oracle and/or its affiliates All rights reserved 5
Starting block: One Database instance per server Database Cloud Simplest of all consolidation cases: One database Instance per Server Multiple databases may run in the cluster DW ERP CRM Database 11 Copyright 11, Oracle and/or its affiliates All rights reserved Starting block: One Database instance per server Parameters to consider: Memory CPU I/O Processes Network 12 Copyright 11, Oracle and/or its affiliates All rights reserved 6
One instance per server as the basis The Oracle database is designed for this case: The base assumption is: one instance per server Instance parameters are based on this assumption Main resources used: Memory CPU I/O Resources regulated by default : Memory SGA / PGA Targets CPU 13 Copyright 11, Oracle and/or its affiliates All rights reserved Recommendation 1a: Manage Memory carefully (and dynamically) Avoid memory starvation and swapping as it has negative impact on the system Do not oversubscribe system resources eg: Shared Memory identifiers and segments Hugepages See M note 3613231 for details Especially: ER 10153 for use_large_pages Set memory targets carefully rule of thumb: (sga_target + pga_aggregated_target) <= 0% of physically available memory (- fixed allocation) If running DW workload, weigh PGA target (eg 3x) Memory targets do not consider third party applications 0% % 14 Copyright 11, Oracle and/or its affiliates All rights reserved 7
Recommendation 1b: Manage Memory carefully (and dynamically) Set memory targets carefully rule of thumb: (sga_target + pga_aggregated_target) <= 0% of physically available memory (- fixed allocation) If running DW workload, weigh PGA target (eg 3x) Memory targets do not consider third party applications Fixed allocation à eg Memory used by Using multiple instances per server example: (sga_target A + pga_aggregated_target A ) + (sga_target B + pga_aggregated_target B ) <= 0% of physically available memory (- fixed all) B A 0% % 15 Copyright 11, Oracle and/or its affiliates All rights reserved CPU_COUNT determines further settings CPU use per (instance) can be regulated With Oracle Database 11g Rel 2 use CPU_COUNT and / or Instance Caging CPU_COUNT considerations: Used in pfile / spfile Instance specific initialization parameter If more than one instance is used, the parameter defaults to values suitable for one instance per server CPU_COUNT counts CPUs as reported by! This may include threads ( reports CPUs ) CPU_COUNT is the basis for further resource allocation and performance consideration f x (CPU_COUNT) Data Structures Concurrency Parallelism Processes Memory Allocation Load Calculation Copyright 11, Oracle and/or its affiliates All rights reserved
Recommendation 2: Use CPU_COUNT to cage instances CPU usage should be regulated The scheduler schedules CPU as requested by each individual instance The scheduler does not know about the priority of the various instances on the server Use CPU_COUNT or ideally Instance Caging Instance Caging is configured in just 2 steps: 1 Set cpu_count parameter Max number of CPUs the instance can use at any time 2 Set resource_manager_plan parameter Enables CPU Resource Manager Eg out-of-box plan DEFAULT_PLAN A B 17 Copyright 11, Oracle and/or its affiliates All rights reserved Instance Caging 1 Copyright 11, Oracle and/or its affiliates All rights reserved 9
Instance Caging: Partitioning Approach CPUs 32 Provides maximum isolation 2 For performance-critical databases D 24 If one database instance is idle, its CPU allocation is unused A B C 12 4 Instance D: 4 CPUs Instance C: 4 CPUs Instance B: 4 CPUs Instance A: 4 CPUs 0 19 Copyright 11, Oracle and/or its affiliates All rights reserved Instance Caging: Over-Provisioning Approach Best used for non-critical databases that are typically well-behaved CPUs 32 Contention for CPU if database instances are sufficiently loaded Typically not enough contention to destabilize or instances Best approach if the goal is fully utilized CPUs A B C D 2 24 12 4 Instance D: CPUs Instance C: 6 CPUs Instance B: 4 CPUs Instance A: 4 CPUs 0 Copyright 11, Oracle and/or its affiliates All rights reserved 10
Instance Caging under the covers If cpu_count is set to 4 on a CPU server All foreground processes make progress But only 4 foregrounds are running at any time 32 Partitioning Approach 32 Over-Provisioning Approach Most backgrounds are not managed Critical and use very little CPU MMON, Job Scheduler slaves are managed No CPU affinity Instance D: CPUs All CPUs may be used CPU utilization averaged across all CPUs 25% Instance D: 4 CPUs 12 12 Instance C: 6 CPUs Instance C: 4 CPUs More information: http://wwworaclecom/technetwork/database/focus-areas/performance/instance-caging-wp-654pdf Instance B: 4 CPUs Instance B: 4 CPUs 4 4 Instance A: 4 CPUs Instance A: 4 CPUs 2 24 0 2 24 0 21 Copyright 11, Oracle and/or its affiliates All rights reserved Over-Provisioning Approach It s still hardware that s the limit Best used for non-critical databases that are typically well-behaved Contention for CPU if database instances are sufficiently loaded 140 1 100 0 60 40 CPU UBl Average Typically not enough contention to destabilize or instances If assumptions listed are met If the load is significant and of longer duration, system stability can get impacted 0 t1 t2 t3 t4 t5 t6 t7 t t9 t10 t11 t12 32 CPUs 2 A B C D 24 12 4 0 Instance D: CPUs Instance C: 6 CPUs Instance B: 4 CPUs Instance A: 4 CPUs 22 Copyright 11, Oracle and/or its affiliates All rights reserved 11
How much over-provisioning is OK? 4 As CPU_COUNT does not consider the quality of a CPU an absolute maximum is hard to determine / depends on the system CPUs 32 In general SUM (CPU_COUNT) of all CPU_COUNTs set for all instances on one server should not exceed: 3x number of physical CPUs Eg: for a core machines à 3x <= 4 A B C D 2 24 12 Instance D: CPUs Instance C: 6 CPUs 4 Instance B: 4 CPUs Instance A: 4 CPUs 0 23 Copyright 11, Oracle and/or its affiliates All rights reserved Database Resource Management 24 Copyright 11, Oracle and/or its affiliates All rights reserved 12
Recommendation 3: Use Resource Manager for runaway queries 3 steps to use Resource Manager: 1 Group sessions with similar performance objectives into Consumer Groups 2 Allocate resources to consumer groups using Resource Plans 3 Enable Resource Plan 25 Copyright 11, Oracle and/or its affiliates All rights reserved Summary 1 Set memory targets carefully: 4 (sga_target A + pga_aggregated_target A ) + (sga_target B + pga_aggregated_target B ) <= 0% of physically available memory (- fixed all) 2 Use CPU_COUNT or ideally Instance Caging SUM (CPU_COUNT) of all CPU_COUNTs set for all instances on one server should not exceed: 3x number of physical CPUs B A 0% % 32 2 24 12 Instance D: CPUs Instance C: 6 CPUs Instance B: 4 CPUs 4 Instance A: 4 CPUs 0 26 Copyright 11, Oracle and/or its affiliates All rights reserved 13
Oracle RAC based Consolidation 27 Copyright 11, Oracle and/or its affiliates All rights reserved Oracle RAC based Consolidation Per server limits remain in principle Most per server limits remain unchanged compared to a SI Oracle RAC introduces a few more processes to consider Typically, per server limits are reached before cluster limits Oracle RAC s introduce LMS Real Time (RT) processes per instance LMS RT processes need to be considered in particular Per Server Limits A1 B1 C1 Clusterware A2 B2 C2 Clusterware Cluster Limits 2 Copyright 11, Oracle and/or its affiliates All rights reserved 14
Real Time (RT) Processes 29 Copyright 11, Oracle and/or its affiliates All rights reserved Oracle RAC based Consolidation Considerations for Real Time (RT) Processes in general A Real Time process can only run on one CPU (core) at a time The usage the CPU is typically short The general rule of thumb is: The aggregated number of RT processes per server should not exceed the number of cores per server One Oracle RAC instance has typically at least one RT process (LMS) per default An Oracle ASM instance has one RT process Oracle Clusterware uses various RT processes A1 B1 C1 Clusterware A2 B2 C2 Clusterware 30 Copyright 11, Oracle and/or its affiliates All rights reserved 15
Instance D: CPUs Instance C: 6 CPUs Instance B: 4 CPUs Instance A: 4 CPUs 11/15/11 Oracle RAC based Consolidation Background for LMS Real Time (RT) Process recommendation The number of LMS RT processes per instance is determined by a function on CPU_COUNT In order to guarantee optimized performance and reliability, the general rule of thumb for RAC is: The aggregated number of LMS RT processes per server should not exceed [cores per server]-1 See M note: 55151 for details This leaves one CPU free for additional RT processes to be assigned as needed, as LMS RT can stay on a CPU for a moment A1 B1 C1 Clusterware A2 B2 C2 Clusterware 31 Copyright 11, Oracle and/or its affiliates All rights reserved Oracle RAC based Consolidation Deviating from the Real Time (RT) Process recommendation When deviating from the RT rule, per server limits as discussed for the Single Instance still apply Changing from RT to time sharing for LMS processes does not have an impact on data consistency, but might affect performance under high load For Oracle RAC One Node the LMS process is mostly ignorable and can be set to time sharing without impact Per Server Limits A1 B1 C1 0% A2 B2 C2 Clusterware % 4 32 Copyright 11, Oracle and/or its affiliates All rights reserved
Oracle RAC based Consolidation Automatic adjustment of LMS process priority in 113 With 113 the number of LMS RT processes are monitored and adjusted according to the number of CPUs on the node periodically The goal is to keep RT LMSs <= # CPUs (physical CPUs / cores are considered) This excludes any ASM instance running on the system as well as any pre-113 database instance This mechanism only monitors 113 or later instances Per Server Limits 4 CPUs 4 CPUs A1 B1 C1 D1 F1 E1 A2 B2 C2 D2 4 4 F2 E2 33 Copyright 11, Oracle and/or its affiliates All rights reserved Oracle RAC based Consolidation Recommendation 4: Over-provision only based on CPU_COUNT Do not over-provision the number of LMS RT processes on one server Limit the number of RT LMS when over-provisioning is required Use CPU_COUNT first Directly reduce the number of LMS RT processes (gcs_server_processes) Downgrade additional LMS RT processes to time share (TS) Per Server Limits In 113 and later, the RT to CPU rule will be enforced by automatically downgrading subsequently started LMS RT processes to TS A1 B1 C1 D1 F1 E1 A2 B2 C2 D2 F2 E2 4 32 2 24 12 4 0 Instance D: CPUs Instance C: 6 CPUs Instance B: 4 CPUs Instance A: 4 CPUs 34 Copyright 11, Oracle and/or its affiliates All rights reserved 17
11/15/11 Registered, Starting, Running 35 Copyright 11, Oracle and/or its affiliates All rights reserved Oracle (RAC) Registered vs Running Registered databases and instances could Registered: n instances potentially start or run at the same time Oracle s Quality of Service Management or scripts can be used to model policies to run certain databases only at certain times; eg geographic region over time Assume the cluster is PST based: are defined to run on a machine (potentially) Define and Enable vs Classify and Measure Evaluate and Report Analyze and Recommend EMEA based s run 10:00pm am PST Implement and Control APAC based s run 6:00pm 3am PST USA based s run :00am 6pm PST 36 Running: Registered databases and instances are running (concurrently) Copyright 11, Oracle and/or its affiliates All rights reserved 1
Leverage Oracle RAC for Database Clouds Generic Quality of Service Management (113) Support for Memory Guard Protects Oracle RAC EE policy-managed databases from memory stress outages Support for Measure-Only Mode Define Performance Classes and Policies Evaluate performance of test consolidations Monitor runtime health of consolidated deployments Define and Enable Evaluate and Report Classify and Measure Analyze and Recommend Implement and Control 37 Copyright 11, Oracle and/or its affiliates All rights reserved Oracle RAC based Consolidation Cluster Limits Starting s is the main concern In most cases, per server limits will be reached before cluster limits are reached Cluster limits apply to: Registered resources (databases) Starting databases in the cluster reason: Starting databases register with the cluster for fencing purposes Currently and for example, 100 starting Oracle RAC databases on a 4 node cluster are supported Number assumes each RAC database uses an instance on each node Starting: Registered databases and instances start Default is starting at the same time A1 B1 D1 C1 Clusterware A2 B2 C2 D2 Clusterware Cluster Limits 3 Copyright 11, Oracle and/or its affiliates All rights reserved 19
Additional Consideration 39 Copyright 11, Oracle and/or its affiliates All rights reserved (in general) Recommendation 5: Consider indirectly managed resources Quite some resources used by the Oracle Database are allocated as a function of the CPU_COUNT parameter as discussed; eg: Parallelism (PQ operations) Processes Load Calculation The recommendations given should tackle some of those additional resources Processes and PQ operations need to be considered separately and in addition f x (CPU_COUNT) Data Structures Concurrency Parallelism Processes Memory Allocation Load Calculation 0% % 40 Copyright 11, Oracle and/or its affiliates All rights reserved
The Benefits of Standardization Easier deployment and better predictability Standardization of software and hardware simplifies planning Standardized hardware means a predictable behavior should demand increase and additional hardware needs to be added (horizontal scaling approach) Nodes Using application profiling based on current system(s) and performance baselines allows for a predictable deployment of new applications on the same system using existing profiles New Application A Small OLTP First profile Medium OLTP Large OLTP Then deploy 41 Copyright 11, Oracle and/or its affiliates All rights reserved Summary 42 Copyright 11, Oracle and/or its affiliates All rights reserved 21
F1 E1 F2 E2 Instance D: CPUs Instance C: 6 CPUs Instance B: 4 CPUs Instance A: 4 CPUs 11/15/11 (Oracle RAC) Summary + 0% % + CPUs A B C D 4 32 2 24 12 4 0 Instance D: CPUs Instance C: 6 CPUs Instance B: 4 CPUs Instance A: 4 CPUs f x (CPU_COUNT) + Data Structures Concurrency Parallelism + = Processes Per Server Limits Memory Allocation Load Calculation 0% A1 B1 C1 D1 A2 B2 C2 D2 4 32 2 24 12 4 0 % 43 Copyright 11, Oracle and/or its affiliates All rights reserved Q&A 44 Copyright 11, Oracle and/or its affiliates All rights reserved 22
45 Copyright 11, Oracle and/or its affiliates All rights reserved 46 Copyright 11, Oracle and/or its affiliates All rights reserved 23