On BigFix Performance: Disk is King How to get your infrastructure right the first time! Case Study: IBM Cloud Development - WW IT Services Authors: Shaun T. Kelley, Mark Leitch Abstract: Rolling out large enterprise software across any organization requires a smart infrastructure plan and an eye towards future scalability if the deployment is going to be a success. With IBM BigFix Software there are some specific recommendations for designing a deployment from a performance perspective. Here is how one team within IBM faced a performance challenge and solved it using a smart infrastructure plan. The Challenge: BigFix is able to manage hundreds of thousands of endpoints. It is able to do this on relatively modest hardware, with a high degree of accuracy and responsiveness. The basic architecture involves a massively distributed and scalable process model where light weight agents manage the endpoint. These agents will drive status reports to the BigFix server. The challenge is to ensure the server is properly configured to manage this workload. While managing CPU and memory are fairly standard and well understood, managing disk IO is often more difficult. We will describe a case study for the resolution of a poorly configured IO subsystem. We will then describe a practical benchmark approach for understanding the behavior of your IO subsystem. The Cause: For our case study, at first glance it appeared we met all the performance requirements for the application (with 24 CPU s and 48 GB of RAM). Further investigation made it clear that the original disk configuration with a RAID5 array was the bottleneck of the system. RAID 5 only provides parity with a small read performance increase, along with reduced write performance, when compared to other RAID levels. As a result, it was not a scalable choice for long term growth of the service. The Solution: We began by investigating the option of switching to a RAID10 array because RAID10 provides parity through the mirroring of the data. It is also excellent from a write performance perspective because data is striped across drives, which allows you to use the full I/O bandwidth of multiple drives simultaneously. However, through a lot of research we also discovered that there were additional performance benefits to separating the physical disks for the SQL database, the application, and the Operating System. Unfortunately there were insufficient drive bays on the existing device to accommodate several arrays configured as RAID10. Our next path of investigation was to move the SQL
database to external storage, whether fiber attached or network. There are some strong points to this solution such as high throughput/bandwidth, but there is a major caveat that needs to be taken into consideration. The IBM BigFix application, when processing computer reports, is doing a very large number of small writes to the disks. So the most important disk performance aspects to take into account are I/O operations per second (IOPS) and latency. Where the external storage solution suffers is the latency aspect because the external arrays experience lower IOPS due to greater latency, which impacts the time it takes to perform each write action. With these concepts in mind, we made the choice to acquire new hardware to resolve the issues. At this point we spent some time working with performance experts to ascertain the best possible configuration for our deployment. Through ongoing calls and emails we heard about a new option that sounded perfect for us, FusionIO or Enterprise io3 storage adapters. These are internal PCI Express expansion cards using a large amount of solid state memory to act as a disk. They have extremely high IOPS ratings (up to 375,000 writes) and industry leading latency numbers as low as 15 µs. Here is what we ordered:
With this setup we were able to increase the IOPS from our old server by 8,000%! When we made the transition to the new server we immediately saw an extreme performance improvement. The BigFix server is easily able to manage the workload with high speed and around the clock reliability. Evaluating Your IO Performance: So a basic question is, how do I know my I/O subsystem is performing adequately? While you can monitor live systems and often determine health by looking at I/O wait or I/O queue length characteristics, the simplest approach is to run I/O benchmarks. A long standing and basic tool for this is Iometer (URL). We will show a basic scenario for running Iometer step by step. When you first start Iometer, you will see a basic console with a machine (in this case, Cannibal ) and a number of worker threads. In this case, the number of worker threads defaults to the number of CPUs (or cores) on the system. It is also possible to run Iometer against a remote target, vary the number of worker threads, etc. However, for our purposes we will run a simple initial benchmark.
We first select the Disk Targets tab, and enable a volume per worker. For the volume, we will allocate 204,800 sectors per target. Given each sector is 512 bytes, this makes for a 100 MB target. This is relatively small, but this makes for a simple initial test.
Our next step is to select a sample workload via the Access Specifications tab. In this case, we are taking a workload that uses a 4K block size, with 25% read, and no random I/O. This is a straight sequential test that is relatively ideal, but makes for a simple starting point. Once again, we make sure to enable across all of the workers (by either selecting each one, or cloning the first worker). Finally we hit the Test Setup tab. Here we make the simple choice to scale the workers with the number of CPUs, and to run all selected disk targets for all workers. This will give us a real world concurrent I/O workload. A variety of options are available to capture results, scale the workload, etc. While this example is small, in the real world running with large volumes for a long period of time is recommended. Short duration I/O tests tend to be unrealistic as I/O performance can change over time.
To run the workload, we simply select the green flag to initiate the run. Note we have selected the Last Update result view, and ensure results are updated every two seconds. The default is infinity, so if it appears things are not working, simply adjust the update frequency.
How do you know if results are acceptable? For IBM BigFix, we consider results in the 5,000 to 10,000 IOPS range with 1 ms latency to be healthy. Typically there is a throughput and latency curve where results Improve with multiple workers, and then start to degrade (the so called knee in the curve ). The intent is for the workload to scale well with the number of CPU s expected to be driving IO intense BigFix operations (e.g. logs, database containers, BigFix reports, etc.). Summary: What you should take away from our experience is how important it is with the IBM BigFix product to properly assess the current and future footprint of your deployment and prioritize the hardware and disk configuration to minimize the risk of forced migration to new hardware. When buying the BigFix product make sure you ask your IBM sales and technical representatives if the hardware you are going to deploy on will be able to support your environment. In addition, you should actually verify the capability of the I/O subsystem. While many solutions may look great on paper, physical or virtual configuration issues may be involved that can hamper real world performance. By combining proper up front design with real world benchmarks, you may achieve a high performance and high scale BigFix implementation. Further Reading: In the event further reading is desired on managing BigFix at scale, the following technical resources are available. IBM developerworks: IBM BigFix Query: Unleashing the Chief Security Officer @ Scale: URL IBM BigFix Version 9.x: Capacity Planning, Performance, and Management Guide: URL