Taking a trip down vsphere memory lane

Memory Management Concepts Memory virtualization - Beyond CPU virtualization the next critical component is Memory virtualization. This involves sharing the physical system memory and dynamically allocating it to virtual machines. Virtual machine memory virtualization is very similar to the virtual memory support provided in modern operating systems. Processes see virtual memory - Applications see a contiguous address space that is not necessarily tied to the underlying physical memory. The operating system keeps a map of virtual memory addresses to physical memory addresses in a page table. Guest operating systems use page tables to map virtual memory addresses to physical memory addresses - The page table walker receives the virtual address and traverses the page table tree to produce the corresponding physical address. When the page table walk is completed, the virtual/physical address mapping is inserted into the TLB to speed up future accesses to that address. The MMU translates virtual addresses to physical addresses and the TLB cache help the MMU speed up these translations - All modern x86 CPUs include a Memory Management Unit (MMU) and a Translation Look-aside Buffer (TLB) to optimize virtual memory performance. The MMU translates virtual addresses to physical addresses. Page table is consulted if a TLB hit is not achievable - The TLB is a cache which the MMU uses to speed up these translations. If the requested address is in the TLB, then the physical address is quickly located and accessed, known as a TLB hit. If the requested address is not in the TLB (TLB miss), the page table has to be consulted. The TLB is updated if a TLB hit is achievable - The page table walker receives the virtual address and traverses the page table tree to produce the corresponding physical address. When the page table walk is completed, the virtual/physical address mapping is inserted into the TLB to speed up future accesses to that address. MMU Virtualization In order to run multiple virtual machines on a single system, another level of memory virtualization is required. 2 Taking a trip down vsphere memory lane

This is host physical memory (a.k.a machine memory). The guest operating system continues to control the mapping of virtual addresses to physical addresses, but the operating system does not have direct access to host physical memory. Therefore, the VMM is responsible for mapping guest physical memory (PA) to host machine memory (MA). To accomplish this the MMU must be virtualized. There are two techniques for virtualizing the MMU: Software using Shadow Page Tables Hardware using either Intel s Extended Page Tables (EPT) or AMD s Rapid Virtualization Indexing (RVI). Software MMU - Shadow Page Tables To virtualize the MMU in software, the VMM creates a shadow page table for each primary page table that the virtual machine is using. The VMM populates the shadow page table with the composition of two mappings: VA > PA Virtual memory addresses to guest physical addresses. This mapping is specified by the guest operating system and is obtained from the primary page table. PA > HA Guest physical memory addresses to host physical memory addresses. This mapping is defined by the VMM and VMkernel. By building shadow page tables that capture this composite mapping, the VMM points the hardware MMU directly at the shadow page tables, allowing the memory accesses of the virtual machine to run at native speed. It also prevents the virtual machine from accessing host physical memory that is not associated. Hardware MMU Virtualization Software MMU is where the VMM maps guest physical pages to host physical pages in the shadow page tables, which are exposed to the hardware. The VMM also synchronizes shadow page tables to guest page tables (mapping of VA to PA). With Hardware MMU, the guest operating system does VA to PA mapping. The VMM maintains the mapping of guest physical addresses (PA) to host physical addresses (MA) in an additional level of page tables called nested page tables. The guest page tables and nested page tables are exposed to hardware. When a virtual address is accessed, the hardware walks the guest page tables, as in the case of native execution. However, for every guest physical page accessed during the guest page table walk, the hardware also walks the nested page tables to determine the corresponding host physical page. Taking a trip down vsphere memory lane 3

This translation eliminates the need for the VMM to synchronize shadow page tables with guest page tables. However, the extra operation also increases the cost of a page walk, thereby affecting the performance of applications that stress the TLB. This cost can be reduced by use of large pages, which reduces the stress on the TLB for application with good spatial locality. When hardware MMU is used, ESX VMM and VMkernel aggressively try to use large pages for their own memory. Memory Virtualization Overhead With software MMU virtualization, shadow page tables are used to accelerate memory access and thereby improve memory performance. Shadow page tables however, consume additional memory and also incur CPU overhead in certain situations when: New processes are created - the virtual machine updates a primary page table. The VMM must trap the update and propagate the change into the corresponding shadow page table(s). This slows down memory mapping operations and the creation of new processes in virtual machines. The virtual machine switches context from one process to another - the VMM must intervene to switch the physical MMU to the shadow page table root of the new process. Running a large number of processes - shadow page tables need to be maintained Allocating pages - the shadow page table entry mapping this memory must be created on demand, slowing down the first access to memory. (The native equivalent is a TLB miss.) For most workloads, hardware MMU virtualization provides an overall performance win over shadow page tables. There are some exceptions: workloads that suffer frequent TLB misses or that perform few context switches or page table updates. Memory Reclamation Challenges Virtual machine memory de-allocation acts just like an operating system, in that the guest operating system frees a piece of guest physical memory by adding these memory page numbers to the guest free list. 4 Taking a trip down vsphere memory lane

This can cause several challenges as: VM physical memory is not freed The data of the freed memory might not be modified at all. As a result when a particular piece of guest physical memory is freed, the mapped host physical memory does not usually change its state and only the guest free list is changed. The hypervisor is not aware when the VM releases memory It is difficult for the hypervisor to know when to free host physical memory when guest physical memory is de-allocated or freed, because the guest operating system free list is not accessible to the hypervisor. The hypervisor is completely unaware of which pages are free or allocated in the guest operating system. As a result, the hypervisor cannot reclaim host physical memory when the guest operating system frees guest physical memory. VM Memory Reclamation Techniques The hypervisor relies on these techniques to free the host physical memory - The hypervisor must rely on memory reclamation to reclaim the host physical memory freed up by the guest operating system. The memory reclamation techniques are: Transparent page sharing (default) - When multiple virtual machines are running, some of them might have identical sets of memory content. This presents opportunities for sharing memory across virtual machines (as well as sharing within a single VM). With transparent page sharing, the hypervisor can reclaim the redundant copies and keep only one copy, which is then shared across multiple virtual machines in host physical memory. Due to the virtual machines isolation, the guest operating system is not aware that it is running inside a virtual machine and is not aware of the states of the other virtual machines running on the same ESX/ESXi host. When the total amount of free host physical memory becomes low, none of the virtual machines will free guest physical memory, because the guest operating system cannot detect the host physical memory shortage. Ballooning - makes the guest operating system aware of the low host physical memory status so it can free up some of its memory. If the virtual machine has plenty of idle and free memory guest physical memory, inflating the balloon will not induce guest paging and will not affect guest performance. However, if the guest is already under memory pressure, the guest operating system decides which guest physical pages are to be paged out. In cases where transparent page sharing and ballooning are not sufficient to reclaim memory, ESX/ESXi employs host-level swapping. This is supported by creating a swap file (vswp) when the virtual machine is started. Then, if necessary, the hypervisor can directly swap out guest physical memory to the swap file, which frees host physical memory for other virtual machines. Taking a trip down vsphere memory lane 5

Host-level swapping may however, severely penalize guest performance. This occurs when the hypervisor has no knowledge of which guest physical pages should be swapped out and the swapping may conflict with the native memory management of the guest operating system. For example, the guest operating system will never page out its kernel pages, because they are critical to ensure guest kernel performance. The hypervisor, however, cannot identify those guest kernel pages, so it might swap out those pages. Memory Compression - Compressed pages are stored in cache and decompression is submillisecond compared to swapping. Memory Management Reporting This example report shows the three reclamation techniques for a single cluster ESX host called VIXEN. Notice how the balloon driver (purple) is used to reclaim some memory, whilst on average about 3GB of memory is being shared. In this example, there is no requirement for swapping on this ESX host. 6 Taking a trip down vsphere memory lane

Why does the Hypervisor Reclaim Memory? The hypervisor reclaims memory to support ESX/ESXi memory over commitment. Memory over commitment provides two important benefits: Higher memory utilization With memory over commitment, ESX/ESXi ensures that the host physical memory is consumed by active guest memory as much as possible. Typically, some virtual machines will be lightly loaded and their memory for much of the time will be idle. Memory over commitment allows the hypervisor to use memory reclamation techniques to take the inactive/unused host physical memory away from the idle virtual machines and give it to other virtual machines to use. Higher consolidation ratio - With memory over commitment, each virtual machine has as a smaller footprint in host physical memory, making it possible to fit more virtual machines on the host whilst still achieving good performance for all virtual machines. In the example above, a host can be enabled with 4GB of host physical memory to run three virtual machines with 2GB of guest physical memory each. This is assuming no memory reservations have been set. When to reclaim host memory ESX/ESXi maintains four host free memory states: high, soft, hard and low. These states are reflected by four thresholds: 6%, 4%, 2% and 1% of host physical memory. When to use ballooning or host-level swapping to reclaim host physical memory is largely determined by the current host free memory state. (Transparent Page Sharing is enabled by default). Example: Take an ESX host with 1GB of physical memory. If the amount of free host physical memory drops to 60MB, the VMkernel does nothing to reclaim memory. However, if that value dropped to 40MB, the VMkernel start ballooning virtual machines. If the value of free memory drops to 20MB, the VMkernel starts swapping and ballooning. If it drops to 10MB the VMkernel continues to swap until enough memory is reclaimed for it to use for other purposes. In the high state, the aggregate virtual machine guest memory usage is smaller than the host physical memory size. Whether or not host physical memory is overcommitted, the hypervisor will not reclaim memory through ballooning or host-level swapping. Taking a trip down vsphere memory lane 7

NB:This is true only when the virtual machine memory limit is not set. If the host free memory drops towards the stated thresholds, the following reclamation technique is used: vswp file usage and placement guidelines A vswp file is created for every VM - Swap (vswp) files are created for each virtual machine hosted on ESX/ESXi when memory is overcommitted. These files are, by default, located with the virtual machine files in a VMFS datastore. Placement of a virtual machines swap file can affect the performance of vmotion - If the swap file is on shared storage, then vmotion performance is good because the swap file does not need to be copied. If the swap file is on the host s local storage, then vmotion performance is slightly degraded (usually negligible) because the swap file has to be copied to the destination host. VMkernel Swap When an ESX host is very short of memory it may have to resort to using (.vswp) swap files for the virtual machine memory. At this point, performance will be affected as the data that the OS believes is in memory is, in reality, now on disk. A virtual machine, by default can have up to 65% of its memory used by the Balloon Driver. It may also have a memory reservation. The reservation cannot be swapped or be used by Balloon Driver. Therefore, any memory outside the 65% used by the Balloon Driver and the reservation can be placed into a.vswp file. In reality you never want this to happen. A best practice is setting the reservation to 35% of the virtual machines memory to avoid this. Swapping will only happen if there are extreme levels of memory contention. 8 Taking a trip down vsphere memory lane

Resource Pool Memory Usage Another example of Resource Pool monitoring below, this time for Memory. This report plots the Guest Memory Usage within the Pool against its limit. The black line displays the Pools Memory usage which matches the pattern of the stacked Guest Usage. The slight gap between them is the Memory Overhead. Use of Limits Within the vsphere environment, it is possible to set CPU and Memory resource limits directly to a virtual machine and/or to a Resource Pool/vApp hosting virtual machines. If a limit is set, it will override any other resource setting for your virtual machines. When setting lower Memory limits on virtual machines which have higher virtual memory (vram) granted, the hypervisor enforces the limit by invoking the Balloon driver (vmmemctrl. sys) to reclaim the difference between the limit and the granted memory. The example chart below shows the Average Granted Memory in MB (Line) and Memory Reclamation by vmmemctl.sys (Area) for two Web application virtual machines (Web 2 and Web 3) over a period of 1 day. Both virtual machines have been granted 4GB of vram. Taking a trip down vsphere memory lane 9

The chart shows that the amount of Granted Memory decreases as the amount of Memory reclamation increases, more specifically during the main part of the working day (8am 6pm). The chart also shows that at the end of the day the amount of reclamation for Web 2 VM decreases, thus the Average Granted Memory increases, whilst the Granted Memory for Web 3 VM decreases slightly during the same time period. Enforcing Limits Web2 VM From the original chart showing the Average Granted Memory and Reclaimed Memory for Web2 and Web3, further investigation displays what Shares have been allocated and confirms what Memory limit is set (if any). 10 Taking a trip down vsphere memory lane

The report above shows that: Web2 has a Normal value of Memory shares assigned, displayed as the pink line on the second Y axis. Normal shares are derived by: 10 * size of VM s available memory = 10 * 4096 or 40960. The Memory limit is also shown as a red line set at 2048MB (2GB). The VM s profile across this day shows that the amount of Memory reclamation taking place to enforce the limit is constant throughout most of the day at around the 1800MB high water mark. There is a period of between 7:30am - 8:30am where the Average Granted Memory rises towards its allocated value of 4096MB when the reclaimed memory falls and also at the end of the day when it is likely that the Memory demand on the associated ESX host drops to allow the hypervisor to grant the requested Memory back to the VM. However, the overall picture shows that the Average Granted Memory is close to or at the limit. Enforcing Limits Web3 VM By creating the same report for the Web3 VM, it is clear that is also has the same amount of shares set at 40960 (Normal) and the limit is also the same set at 2048MB. So even though both virtual machines have been granted 4096MB of vram, they are in a resource pool that has a Memory limit set. Taking a closer look at the amount of Memory reclamation, the high water mark is at Taking a trip down vsphere memory lane 11

approximately 1500MB which is 300MB less than Web2 allowing the amount of Granted Memory to stay higher on average. There are periods during the day where the Reclaimed Memory drops to near zero but the amount of Granted Memory stays constant at 2500MB and not the 4096MB as expected. Why is this? Well the reason is down to the fact that Web2 and Web3 are on different ESX hosts and are being hosted with other VMs all requesting different amounts of resources at different times during the chosen analysis period. So it would be prudent to analyze the overall Memory Usage per hosted VM further. ESX Host (Web2) VM Active Memory This report displays the stacked Active Memory usage of all VMs on ESXVS18. Over the analysis period, the amount of Active Memory peaks at approximately 6000MB and the numbers of hosted VMs stays constant at 12. The report also shows that towards the end of the day the Active Memory usage for at least 2 of the VMs reduces, taking the overall usage down to 3500MB, a drop of 2500MB from the daily peak. When compared with the Memory Usage breakdown for Web2, it is over this period of the day that the Memory reclamation reduces for Web2 and its Average Granted Memory increases. 12 Taking a trip down vsphere memory lane

ESX Host (Web3) VM Active Memory If the same report is produced for the ESX server hosting Web3 We can see that as the Active Memory is reduced towards the end of the day on Web3, the Host has taken in another VM (salmon pink) and some VM s Memory Usage has also increased. The Peak usage has risen from 8000MB during the middle of the day to just over 9000MB at 10pm. By referring back to the Memory report for Web3, it shows that, although Memory Reclamation has reduced the Granted Memory has also been slightly reduced. Why? Because a new VM has been absorbed pushing the overall Active Memory usage of all VMs on the ESX host up by an extra 1000MB, even though the Web3 Active Memory has reduced. It is also worth mentioning at this point, that not all of the VMs hosted alongside the Web VMs are in a Resource Pool with a Memory Limit or have one directly applied. Taking a trip down vsphere memory lane 13

Limits are enforced! The Web2 and Web3 VMs are hosted in a Resource Pool with a Memory limit of 2048. The above report confirms that the Memory is indeed being limited at this value for the VMs. Memory Limits A guide Granted Memory overruled by Resource Pool limit - Whatever vram is given to a virtual machine, it will be overruled by any limit setting either via a Resource Pool/vApp or directly on the VM. Enforces limits by reclaiming memory from VM - When your Granted Memory value is higher than a Memory limit, the hypervisor will enforce the limit by reclamation (Ballooning). Be aware of any limits - Be aware of any resource limits assigned to virtual machines, when attempting to produce accurate Capacity and Performance reports. Otherwise inaccurate and very confusing / potentially misleading reports will be created. Monitor your VM Active and Host Consumed VM Memory - From the examples shown in this presentation, it is important to monitor these Memory metrics to understand the overall VM Memory usage which can lead to a reduction in Granted Memory, how much actual Physical Host Memory is being consumed and by who and to identify any virtual machines whose Memory may have been limited. Reduce the Granted Memory rather than enforce limits - In some circumstances, certainly in the short-term, it may be reasonable to set lower Memory limits to virtual machines than 14 Taking a trip down vsphere memory lane

what has been Granted, specifically if the Application hosted is required 24/7. However, it is recommended to monitor the Active Memory for each virtual machine and identify where it is reasonable to reduce the amount of vram associated rather than enforce a limit. This can bring benefits such as a reduction in vram licensing costs and in Memory management overhead due to a reduction in Memory reclamation. Use Reservations where necessary - When reducing Granted Memory in preference to setting limits, use Reservations to ensure that the virtual machine has that amount of memory guaranteed. This in turn, also ensures that the associated vswp file becomes zero bytes in size rather than the size of the Granted Memory (minus any smaller sized reservations). Example: VM1. 4096MB Granted, 2048 Reservation = 2048 vswp file. VM2. 4096MB Granted, 4096 Reservation = 0 vswp file. NB : Reserving memory for virtual machines does restrict the ability to overcommit Memory on an ESX host as it is guaranteeing (locking) the memory. Monitoring VM and Host Memory Usage Active amount of physical host memory currently used by the guest displayed as Guest Memory Usage in vcenter at Guest level Consumed amount of physical ESX memory allocated (granted) to the guest, accounting for savings from memory sharing with other guests. includes memory used by Service Console & VMKernel displayed as Memory Usage in vcenter at Host level displayed as Host Memory Usage in vcenter at Guest level If consumed host memory > active memory Host physical memory not overcommitted Active guest usage low but high host physical memory assigned Perfectly normal When monitoring memory usage, the question of why consumed host memory is greater than active? may arise. The reason is, that for physical hosts that are not overcommitted on memory, consumed host memory represents the highest amount of memory usage by a virtual machine. It is possible in the past that this virtual machine was actively using a very large amount of memory. Taking a trip down vsphere memory lane 15

Because the host is not overcommitted, there is no reason for the hypervisor to invoke any reclamation techniques. Therefore it is possible that whilst the active guest memory usage is low, the host physical memory assigned to it is high. This is a perfectly normal situation. If consumed host memory <= active memory Active guest memory might not completely reside in host physical memory This might point to potential performance degradation If consumed host memory is less than or equal to active guest memory, it might be because the active guest memory does not completely reside in host physical memory. This might occur if a guest s active memory has been reclaimed by either the balloon driver or if the virtual machine has been swapped out by the hypervisor. In both cases, this is probably due to high memory overcommitment. Active and Consumed Memory Report This report shows the Active Memory and Consumed Host Memory for virtual machine ORMNVAT01. From this report, it is clear that Active Memory does not exceed the Consumed Memory and thus Host Physical memory is not overcommitted. This report also shows what the Windows operating system thinks it is using and is reporting that it is using over double the amount of Active Memory. Why? It is likely that the Windows operating system hosted in this virtual machine is not paravirtualized and therefore does not know it is a virtualized system. 16 Taking a trip down vsphere memory lane

Memory Troubleshooting Active host-level swapping - When ESX/ESXi is actively swapping the memory of a virtual machine in and out of disk, the performance of that virtual machine will degrade. The overhead of swapping a virtual machines memory in and out from disk can also degrade the performance of other virtual machines. Monitor the memory swap in rate and memory swap out rate counters for the host. If either of these measurements is greater than zero, then the host is actively swapping virtual machine memory. In addition identify the virtual machines affected by monitoring the memory swap in and swap out rate counters at the virtual machine level. Guest operating system paging - If overall demand for host physical memory is high, the ballooning mechanism might reclaim memory that is needed by an application or guest operating system. In addition, some applications are highly sensitive to having any memory reclaimed by the balloon driver. Monitor the Balloon counter at both the host and virtual machine levels. If ballooning is reported at the virtual machine level, check for high paging activity within the guest operating system. Perfmon in Windows and vmstat / sar in Linux provide the values for page activity. When swapping occurs before ballooning - In some cases, virtual machine memory can remain swapped to disk even though the ESX/ESXi host is not actively swapping. It can occur when high memory activity caused some virtual machine memory to be swapped to disk and the virtual machine has not yet attempted to access this swapped memory. As soon as this is accessed it will swap it back in from disk. There is one common situation that can lead to virtual machine memory being left swapped out even though there was no observed performance problem. When the virtual machines operating system first boots, there will be a period of time before the balloon driver begins running. In that time, the virtual machine might access a large portion of its allocated memory. If many virtual machines are powered on at the same time, the spike in memory demand, together with the lack of balloon drivers, might force the ESX/ESXi host to resort to host-level swapping. Memory Performance Best Practices Allocate enough memory to hold the working set of applications running in the virtual machine - By monitoring the working set size of the applications over specified period of time, taking into account any known busy periods but excluding any random peaks, it is possible to allocate enough vram without overcommitting memory but also avoid performance issues through swapping. Never disable the balloon driver - Memory reclamation is performed by the balloon driver. The balloon driver is installed when VMware Tools is installed in the virtual machine. It is highly recommended to install the Tools package whenever a new virtual machine is provisioned. Taking a trip down vsphere memory lane 17

Keep transparent page sharing enabled - Transparent Page Sharing (TPS) is enabled by default within the vsphere environment, to allow for memory overcommitment by de-duplicating multiple operating system pages. There is an option to disable this functionality, but by doing so would remove the ability to overcommit memory. Avoid over committing memory - it is recommended not to overcommit memory to the point at which the underlying ESX host is constantly reclaiming memory to satisfy the demands of the hosted virtual machines. vram FAQ s. What is vram? vram or virtual RAM is the total memory configured to a virtual machine. How is configured vram capacity determined? Configured vram is equal to the sum total of vram configured to all powered-on virtual machines managed by a single instance of VMware vcenter Server or by multiple instances of VMware vcenter Server in Linked Mode. How am I compliant with this licensing model? Is there a hard stop at my vram limit? To be compliant, the 12 month rolling average of the daily high watermark of configured vram must be equal to or less than the available pooled vram capacity. VMware vcenter Server will not impose a hard limit (with the exception of VMware vcenter Server for Essentials) on configured vram, but will provide alerts that configured vram is approaching or has surpassed available pooled capacity. The VMware policy is that customers should buy licenses in advance of use. How do I procure more vram? You simply need to buy and assign more VMware vsphere CPU licenses. 18 Taking a trip down vsphere memory lane

vram Licensing Model How it works VMware vsphere 5 is licensed on a per-processor socket basis with a vram entitlement. Each VMware vsphere 5 processor license comes with an entitlement to a certain amount of vram capacity, or memory configured to virtual machines. Unlike in vsphere 4.x where core and physical RAM entitlements are tied to a server and cannot be shared among multiple hosts, the vram entitlements of vsphere 5 licenses are pooled, i.e. aggregated, across all vsphere servers managed by a vcenter Server instance or multiple vcenter Servers instances in Linked Mode. Example: A user has two 2-CPU (each with 6 cores) hosts with 128GB of physical RAM each that they wish to license with VMware vsphere Enterprise edition. Each physical CPU requires a license, so a minimum of four VMware vsphere 5 Enterprise licenses are required. Each VMware vsphere 5 Enterprise license provides a vram entitlement of 64GB. This means that with 4 vsphere Enterprise licenses the user creates a vram pool of 4 x 64GB = 256GB So far the user has not created any virtual machines, so he has not configured any capacity of the vram pool. When a virtual machine is powered on, the vram configured to that virtual machine counts against the pooled vram capacity up to a maximum of 96GB (i.e. a virtual machine with 128GB of configured vram will only use 96GB from the pooled vram capacity). All powered on VMs, including virtual appliances or service VMs created by vsphere features or solutions running on vsphere, count against the vram pool capacity in the amount equal to their configured vram up to a maximum of 96GB. Taking a trip down vsphere memory lane 19

References http://www.vmware.com/files/pdf/vsphere_pricing.pdf http://www.vmware.com/technical-resources/performance/resources.html http://www.metron-athene.com/training/webinars/index.html 20 Taking a trip down vsphere memory lane

Taking a trip down vsphere memory lane 21

Metron Metron, Metron-Athene and the Metron logo as well as athene and other names of products referred to herein are trade marks or registered trade marks of Metron Technology Limited. Other products and company names mentioned herein may be trade marks of the respective owners. Any rights not expressly granted herein are reserved. www.metron-athene.com