Understanding VMware Capacity

Topics Why OS Monitoring Can be Misleading 5 Key VMware Metrics for Understanding VMWa re capacity How VMware processor scheduling impacts CPU capacity measurements Measuring Memory Capacity Measuring Disk Storage Latency Calculating Headroom in VMs Dangers with OS Metrics Almost every time we discuss data capture for VMware, we ll be asked by someone if we can capture the utilization of specific VMs, by monitoring the OS. The simple answer is no. The more complex answer is that we can capture the data from the OS, but it may not be reliable. So here s an example of why. We have 2 VMs. Within the 1 second interval we are looking at, one of the VMs was only allocated the CPU for ½ a second. In that ½ second the VM used 50% of it s possible CPU time. So from the OS perspective it was running at 50% CPU utilization. If we look at data from VMware, we ll see that VMware knows the VM only used ½ the CPU available in ½ a second. Or 25%. The 2nd VM was running on CPU for the entire second. And again it used 50% of it s possible CPU. So, to the OS, it appears it was running at 50% CPU utilization, and VMware has the same result. The more contention there is for CPU time, the more time VMs will spend Dormant/Idle, and the further apart the values will be. This effect means that any metrics which have an element of time in their calculation cannot be relied upon to be accurate. 2 Understanding VMware Capacity

Here is data from a real VM The (top) dark blue line is the data captured from the OS, and the (Bottom) light blue line is the data from VMware. There clearly is some correlation between the two. At the start of the chart there is about a 1.5% CPU difference. Given we re only running at about 4.5% CPU that is an overestimation by the OS of about 35%. But at about 09:00 the difference is ~0.5% so the difference doesn t remain stable either. Historically it s not been unusual to see situations where the OS metric is reporting 70% CPU utilization and VMware is reporting 30%. The effect we saw between the OS and VMware is caused by time slicing. In a typical VMware host we have more vcpus assigned to VMs than we do physical cores. A situation known as over-provisioning, and to some extent the original purpose of virtualization. The processing time of the physical cores has to be shared among the vcpus in the VMs. The more vcpus we have the less time each can be on the core, and therefore the slower time passes for that VM. To keep the VM in time extra time interrupts are sent in quick succession. So time passes slowly and then very fast. Understanding VMware Capacity 3

Time is no longer a constant, but the OS doesn t know that. So the safest approach is to avoid using anything from the OS that involves an element of time. Significant improvements have been made in this area over the releases of VMware. VMware tools has a number of tricks to try and make the OS metrics as close as possible, as well as improved co-scheduling of CPUs. But the basic concept remains in place. Later I will discuss how it can be ok to use averages and estimates for reporting on the future, when we have the choice of accurate data from VMware, or less accurate data from the OS. I would suggest taking accuracy where we can easily do so, has to be the better option. There are plenty of metrics provided by VMware. Some of these are familiar to anybody who s ever monitored a computer (CPU%, IOPS etc.). So for a short list of 5 I ve selected 5 metrics that are important and available in VMware environments. To some extent I ve cheated to make a list of 5, because for some I m looking to get the same metric from different levels in the environment. Most of these I will cover in greater detail in the sections following this. CPU MHz Why not %? Well because we can t compare the CPU% Used of a VM with the % of the Host or the Cluster. We want to judge the amount of processing power an individual VM requires. That way when we want to move it to another cluster, or are considering the size of the existing cluster, we have a comparable value to work with. I fully admit it s not perfect. The processing you can do with 1MHz today is greater than that of 10 years ago. But it s still better than %. Ready Time It s a measure of CPU contention. The bigger the number the less happy your users are. But it doesn t always mean you are short of CPU power. 4 Understanding VMware Capacity

Active Memory The amount of memory the VM has accessed in the last 20 seconds. If there isn t space for all the active memory in the cluster, then performance will be sub optimal. Ballooned Memory Memory ballooning works by robbing Peter to pay Paul. When memory is in short supply and/or over committed, pages of RAM in a VM OS can be filled by a balloon. These pages are then not actually available to the OS, and the space freed up and will be used by other VMs that need the RAM resources. As memory demand goes down, VMware deflates the VMs balloon and the RAM is again available to the VM. Again this is a sign of contention for resource and introduces additional overhead to the VMware hypervisor. Host Disk Latency Among the metrics that are provided for Hosts are the Disk Latency metrics. Disk IO performance is still the greatest performance challenge faced by most organizations I meet. Imagine you are driving a car, and you are stationary. There could be several reasons for this. You may be waiting to pick up someone, you may have stopped to take a phone call, or it might be that you have stopped at a red light. The 1st two of these (pick up, phone), you decided to stop the car to perform a task. But in the 3rd case, the red light is stopping Understanding VMware Capacity 5

you doing something you want to do. You spend the whole time at the red light ready to move away as soon as you get a green light. That time you spend waiting at a red light is ready time. When a VM wants to use the processor, but is stopped from doing so. It accumulates ready time. This has a direct impact on the performance of the VM. Ready Time can be accumulated even if there are spare CPU MHz available. For any processing to happen all the vcpus assigned to the VM must be running at the same time. This means if you have a 4 vcpu VM, all 4 vcpus need available cores or hyperthreads to run. So the fewer vcpus a VM has, the more likely it is to be able to get onto the processors. You can reduce contention by having as few vcpus as possible in each VM. And if you monitor CPU Threads, vcpus and Ready Time for the whole Cluster, then you ll be able to see if there is a correlation between increasing vcpu numbers and Ready Time. Here is a chart showing data collected for a VM. In each hour the VM is doing ~500 seconds of processing. The VM has 4 vcpus Despite just doing 500 seconds of processing, the Ready Time accumulated is between ~1200 and ~1500 seconds. So anything being processed spends 3 times as long waiting to be processed, as it does actually being processed. i.e. 1 second of processing could take 4 seconds to complete. 6 Understanding VMware Capacity

Now lets look at a VM on the same host, doing the same processing on the same day. Again we can see ~500 seconds of processing in each hour interval. But this time we only have 2vCPUs. The ready time is about ~150 seconds. i.e. 1 second of processing takes 1.3 seconds. By reducing the number of vcpus in the first VM, we could improve transaction times to somewhere between a quarter and a third of their current time. Here s an animation to show the effect of what is happening inside the host to schedule the physical CPUs/cores to the vcpus of the VMs. Clearly most hosts have more than 4 consecutive threads that can be processed. But let s keep this simple to follow. Understanding VMware Capacity 7

1. VMs that are ready are moved onto the Threads. 2. There is not enough space for all the vcpus in all the VMs. So some are left behind. (CPU Utilization = 75%, capacity used = 100%) 3. If a single vcpu VM finishes processing, the spare Threads can now be used to process a 2 vcpu vm. (CPU Utilization = 100%) 4. A 4 vcpu VM needs to process. 5. Even if the 2 single vcpu VMs finish processing, the 4 vcpu VM cannot use the CPU available. 6. And while it s accumulating Ready Time, other single vcpu VMs are able to take advantage of the available Threads 7. Even if we end up in a situation where only a single vcpu is being used, the 4 vcpu VM cannot do any processing. (CPU utilization = 25%) As mentioned when we discussed time slicing, improvements have been made in the area of co-scheduling with each release of VMware. Amongst other things the time between individual CPUs being scheduled onto the physical CPUs has increased, allowing for greater flexibility in scheduling VMs with large number of vcpus. Acceptable performance is seen from larger VMs. Along with Ready Time, there is also a Co-Stop metric. Ready Time can be accumulated against any VM. Co-Stop is specific to VMs with 2 or more vcpus and relates to the time stopped due to Co-Scheduling contention. E.g. One or more vcpus has been allocated a physical CPU, but we are stopped waiting on other vcpus to be scheduled. I d love to do an animation of that but my PowerPoint skills would need seriously improving. Imagine the bottom of a ready VM displayed, sliding across to a thread and the top sliding across as other VMs move off the Threads. So the VM is no longer rigid it s more of an elastic band. Just to quickly recap Ready Time Any ready time accumulated has an impact on performance. (Although it may still be acceptable performance) It s not enough to monitor CPU%, you need to monitor Ready Time as well. The more vcpus a VM has, the harder it is to schedule it onto the CPU threads available. Use as few vcpus as 8 Understanding VMware Capacity

possible. VMware provide Ready Time as a number of seconds in an interval. It s possible to convert this using the interval length into a % value. Anything more than 10% Ready Time is likely to indicate unacceptable performance. Memory capacity is generally the limiting factor in a VMware cluster. This is improving as hardware specifically designed to be a host has become available with plenty of RAM able to be installed, but clusters still generally have less Memory headroom than CPU. Memory on VMware is not just a case of monitoring % used. Again like CPU we want to be able to compare memory utilizations in comparable metrics, so that s MB or GB. Then we move on to the specific VMware type metrics. Reservations A VM or a resource pool can have a reservation. Which means if it needs that memory, it gets to use it - no questions asked. However you cannot reserve more memory than exists in the host server. Limits. Just because a VM has been configured to have 8GB RAM, doesn t mean you can t set a lower limit of say 4GB RAM. At which point, if the VM tries to use more than the limit, then some data will be placed into the Swap file of the OS or VMware may balloon some of the VMs memory to free up pages that the OS can use. Ballooning When VMware wants to allocate memory to a VM, but there is a shortage of Memory, then a balloon may be inflated inside the memory of one (or more) VMs. This balloon pins itself into RAM and cannot be swapped out, thus forcing the OS to swap some memory out to disk. The pages pinned into memory are not all actually stored in memory on the Host, as it knows the contents of the balloon are not important. This then frees up pages, that the Host can allocate the VM it wanted to give memory too. One of the reasons this happens is that an OS will use a page to store some data, then later the program will no longer need that data, and the OS puts that page on the free list. However, the hypervisor has no idea the page is no longer needed. The data in the page has remained unchanged. In order to identify free pages when the balloon expands and the OS put it into the free pages, the hypervisor can Understanding VMware Capacity 9

identify the balloon memory pages, and therefore the pages that the OS is not using for other processes. Shared pages Shared Pages are pages in memory that are identical. Rather than store duplicates of the same page, the hypervisor will store a single copy, and point the appropriate VMs to it. This works well where servers are running the same OS and doing the same job, and therefore much of the memory in use is identical. Active Memory Active memory is the amount of memory that has been accessed in the last 20 seconds. Having sufficient memory to contain the active memory of the VMs is crucial to performance. Memory Available for VMs Memory available to the VMs shows what it says. There is some memory in the cluster that VMs cannot use as it s being used by the Hypervisor to support the VMs. The memory left over is available. Here is a typical graph we might use to understand how memory is being used by a VM. The VM has been granted about 4GB RAM. That s what the OS thinks it has to use. The host memory in use, shows how much physical RAM is currently being used by the VM. We can see that at 09:00 this increases and at the same time shared memory reduces. Memory Used is our term for Active memory, and remains steady throughout, as does the Memory Overhead on the host used to support this VM. What we can see is that only about 400MB of memory is being accessed on a regular basis. Between 1.5GB and 2GB of memory is not unique to this VM and is shared with others. 10 Understanding VMware Capacity

Here s an example of the balloon driver in use. The balloon driver takes about 2.5GB of space in the memory of the VM. This will cause the OS to swap out to disk until it cannot do any more, and then the hypervisor starts to swap out memory also. At that point, performance is likely to be impacted. Where ballooning is persistent (rather than an occasional spike), then changes should be made in order to ensure there is enough RAM available for the VMs in the cluster. Ballooning itself has an overhead on the hypervisor and as such there is the potential to impact performance for the host. Changes don t necessarily have to be more RAM installed. The very first thing to consider is if any VMs have been created and forgotten about. If we look the cluster as a whole we can see a significant event at the same time the previous VM saw the balloon inflate in memory. Shared memory plummets, this causes an increase in the demand on memory and, in turn this causes the balloon driver (memory control), to consume more memory, and the swapped memory to increase. Understanding VMware Capacity 11

Then, shared memory slowly recovers. The process to identify shares pages only checks a set number of pages each interval. So it takes a while to identify all the shared pages and free up the space taken by duplicates. What caused the shared memory to drop so much? Windows updates and a reboot. When a VM starts every page is unique until a duplicate is identified, which takes a short while. Why don t we measure the performance of IO from the OS. It s the OS that needs the IO after all. Well partly because the OS is subject to time slicing when looking at things like IOPS. But also, VMware can give us a couple of really nice statistics. VMware can still provide the KB/sec and IOPS. But it can also give us a breakdown of the latency for the Hosts datastores (the closest VMware can get to the hardware), which lets us identify if IO time is being spent in VMware, or externally in the storage device. There are 3 metrics: Queue Latency Kernel Latency Device Latency We only get Queue if we re not getting the performance we require out of Kernel and Device. So Kernel and Device remain the focus of any investigation into poor IO performance. We re talking about IO, why are we suddenly looking at the CPUs? 12 Understanding VMware Capacity

IO passed from an OS to the hypervisor has to be processed by the kernel. The Kernel can only run on processor 0. On this chart we have 2 distinct sets of CPUs. The reds and purples are a nice tight bunch, the blues and greens, are almost a nice tight bunch except processor 0. Processor 0 is always the busiest. The busier that processor gets, the more likely we are to experience problems with the performance of virtual hardware, such as IO. So if we are seeing a high kernel latency, one simple thing to look for is high CPU utilization on processor 0. Fortunately Hosts tend to run out of memory capacity before CPU capacity. It will come as no surprise to find that usually the vast majority of latency for a datastore is device latency. Here the green line is total latency, and the dark blue is device latency. As you can see they are a pretty close paring. So the question I get asked the most when discussing VMware capacity is How many more VMs can I fit in this cluster?. Understanding VMware Capacity 13

Which is similar to asking how many balls used for a variety of sports, does it take to fill an Olympic swimming pool? Unfortunately It depends is not an acceptable answer for a CIO. The business wants a number, so as a business focused IT department an answer must be given. The key is that it s ok to estimate. Anybody who s compared the average business forecast to what eventually happens in reality, knows the business is ok with estimates. So how do we figure out the number to tell the business. If we calculate the size of our average VM, and the size of the cluster, then divide one by the other and that s the total number of VMs, now just take off the current number of VMs right? Sounds simple. Except we need to define what s the size of our cluster. Are we allowing for one or more hosts to fail? Can we identify the size of largest host(s)? We also need to decide what metrics we are going to size on. Do you want to size on vcpus to Core ratio, or MHz CPU and MB Memory, or some other limitation? Can you then calculate what your average VM is at every point during the day and pick the peak or a percentile? Would you decide to agree on an average size for Small, Medium, and Large VMs, then calculate the number of each currently and extrapolate with the existing ratios. You have to be able to answer these questions before you can start to do the calculations. 14 Understanding VMware Capacity

Clearly you need data to work with for this. You can manually read info out of vsphere client, and note it down. But I d suggest you find a tool to automate the data collection. You ll need to review the data and make sure it s a good period to be using for the exercise. E.g. not during windows updates and a reboot of every VM! You should also try to include the known projects. You might have 1000 VMs currently, but if there are 250 planned for implementation in the next 6 months you ll want to take them into account. Here s an example of a good peak (circled). The actual peak is a blip that we don t want to size on. But the circled peak is a nice clean example, that s inline with other days. Understanding VMware Capacity 15

Given the size of the cluster in MB Memory and MHz CPU, the number of current VMs, the size of an average VM, and the size of the largest host I put together a spreadsheet. There s a calculation that takes the size of the largest host off the size of the cluster, then calculates 90% of the result. Then calculates the number of average VMs that will fit, and the space available in average VMs for both Memory and CPU. The smallest of the values is then displayed along with either Memory or CPU as the Bound By metric. Conditional formatting on a cell displaying the number of VMs available sets a Red, Amber, Green status. By including a sheet that can contain the number of VMs needed for future projects, then I calculated a second value including them. 16 Understanding VMware Capacity

Exporting some of the values I calculated on a regular basis, enables me to then trend over time, the number of VMs that are available in the cluster. Still taking into account the largest host failing, and 90% of the remaining capacity being the max. In this case, activity was actually falling overtime, and as such the number of VMs available in the cluster was increasing in terms of CPU capacity. A quick roundup: Ready time is important. It s a measure that directly shows negative effects on service time. VMs should have as few vcpus as possible. It makes it easier to schedule them onto CPUs. Memory is not just about occupancy, but look at ballooning, active and swap as well. Disk latency, can be observed and broken down to internal and external to VMware. To report back on the number of VMs you can fit in a cluster, decide what the size of your cluster actually is for this exercise. Find a good peak to work with. And trend your results over time. It s not a static picture. Understanding VMware Capacity 17

Metron Metron, Metron-Athene and the Metron logo as well as athene and other names of products referred to herein are trade marks or registered trade marks of Metron Technology Limited. Other products and company names mentioned herein may be trade marks of the respective owners. Any rights not expressly granted herein are reserved. www.metron-athene.com