QLIKVIEW SCALABILITY BENCHMARK WHITE PAPER Hardware Sizing Using Amazon EC2 A QlikView Scalability Center Technical White Paper June 2013 qlikview.com
Table of Contents Executive Summary 3 A Challenge and a Solution 3 Hardware Setup 4 The Testing Data Model and Application 5 User Behavior 8 Testing Scenarios 8 Saturation / Soak Test 9 Concurrency 13 Conclusion 17 Notes and Considerations 18 Uploading Your Data to Amazon 18 Reproducibility 18 Overall Application Performance 18 Computational Efficiency 19 Network Connection / Latency Test 19 Reference 20 QlikView Scalability Benchmark Whitepaper 2
TASKS AND TOOLS Capacity Planning the process of determining the production capacity needed to meet demand for a product and service. Sizing an approximation of the hardware resources required to support a specific software implementation. Amazon EC2 (AEC2) presents a true virtual computing environment, allowing you to use web service interfaces to launch instances with a variety of operating systems and applications. Challenge #1 Do you have dedicated hardware to use for testing? Challenge #2 Do you possess the necessary skills to setup and configure your own hardware from the ground up? Challenge #3 Do you have the budget and time to procure new hardware for your sizing exercise with hopes it will work as you expect? Executive Summary Sizing and capacity planning exercises are often conducted when preparing a high-performing, scalable Business Intelligence solution. However, identifying the appropriate hardware to sustain an optimal Business Intelligence deployment can become a challenge. All too common, estimations and guesses are used based on historical information provided by the software vendor or from knowledge of previously installed solutions. Not all environments are created equally and therefore may provide unpredictable results that are not ideal if not proven. One sure method is to perform your own tests on known hardware of equal or greater power. Though, time, costs and skill sets may prevent many organizations from doing so; the general availability of Amazon Elastic Cloud computing can make it possible. This paper takes you through a few performance testing scenarios using QlikView 11 on two distinct systems with similar hardware specifications: An Amazon EC2 (AEC2) instance, located in Europe A dedicated machine, located in our scalability labs in Sweden Analyses of the results from both systems are presented. When compared, they reveal how a specific AEC2 instance can be used as an approximation for initial dedicated hardware sizing as well as prove viable for a Proof-of- Concept environment when implementing QlikView 11. Consequently, it results in a more flexible and affordable method of sizing for QlikView. A Challenge and a Solution Hardware sizing for new customers can be difficult for many reasons, the most common being hardware availability and time constraints. Moreover, attempting to procure hardware prior to having any concrete results may not meet company policy. With an AEC2 instance, there is an opportunity to rent a server in the cloud to perform the sizing exercises as well as use it for the proof of concept, test, development and production QlikView deployments QlikView Scalability Benchmark Whitepaper 3
Hardware Setup This investigation will conduct various tests on two systems: 1 Amazon EC2 instance 1 physical, dedicated machine Each server will perform the same units of work in order to identify comparable performance results. The tests were implemented using the QV Scalability Tool available in the QlikCommunity. NOTE: Amazon has recently launched a new EC2 instance called High Memory Cluster Eight Extra Large Instance. According to its specifications, it can be compared to a QlikTech whitelisted two CPU socket server - the IBM x3650 M4. The following two server configurations were used: Server CPU IBM x3650 M4 Physical Server 2x Intel E5-2670 Clock speed: 2.6 GHz Max Turbo Boost: 3.3 GHz AMAZON Instance High Memory Cluster Eight Extra Large Instance 2x Intel E5-2670 (88 EC2 Compute units) Clock speed: 2.6 GHz Max Turbo Boost: 3.3 GHz RAM 192 GB 244 GB Storage 15k SAS Hard Drive SSD Instance Storage Operating System Windows 2008 R2 Enterprise Windows 2008 R2 Datacenter Location Lund, Sweden Europe Load Testing Client QlikView Scalability Tool uses Apache JMeter (Located in Lund, Sweden) Table 1: Amazon EC2 & physical server hardware specifications QlikView Scalability Benchmark Whitepaper 4
The Testing Data Model and Application In all testing scenarios, the load testing client is used to access a Sales Dashboard application which allows for a variety of in-depth business discovery. The data model consists of either a 500 million row fact table or a subset of that table containing 50 million rows. The fact tables were defined with 25 fields, containing a high cardinality percentage (95% unique data). In addition, 5 more dimensional tables were used to complete the data model and give the application more context. Our tests utilized real-world production QlikView apps, containing multiple tabs and dozens of sheet objects, simulating users answering their own streams of questions. The simulation program executed actions such as applying selections, lassoing data, opening tabs, applying what-if analysis and employing different aggregations; simulating most discovery activities that a business user would perform. NOTE: For detailed information and larger screenshots on the actual QlikView application used during these tests please refer to page 7 of the QlikView Scalability Benchmark White Paper under the QlikView Apps section. Figure 1: Test QlikView application data model QlikView Scalability Benchmark Whitepaper 5
Figure 2: Dashboard Figure 3: Profitability Figure 4: Customer QlikView Scalability Benchmark Whitepaper 6
Figure 5: Products Figure 6: What-If Analysis Figure 7: Order Detail QlikView Scalability Benchmark Whitepaper 7
User Behavior In order to emulate real-life usage, the tests simulated two user types: Dashboard user Users which have an average think-time of around 30 seconds in-between selections. A session is typically active for 7-8 minutes. Analyst user Users which have an average think-time of around 20 seconds inbetween selections. A session is typically active for 10-12 minutes. Testing Scenarios Results were collected using three types of testing scenarios. The following scenarios were used to compare the similarities and differences between the AEC2 instance and a dedicated physical server: Saturation / Soak test - a scenario designed to measure the following where more processing capacity was required than available: Throughput (number of clicks / selections per minute) Response time (seconds) CPU utilization (percentage) RAM consumption (GB) Concurrency test a scenario designed to measure the following with a high concurrency of users: Throughput (number of clicks / selections per minute) Response time (seconds) CPU utilization (percentage) RAM consumption (GB) Network Connection / Latency test a scenario designed to estimate differences in Round Trip Time (RTT) and bandwidth between the client and QlikView Server. (Notes and Considerations) Graph Key: Light green represents the Amazon EC2 instance Blue represents the dedicated physical server QlikView Scalability Benchmark Whitepaper 8
SATURATION / SOAK TEST The saturation test, also known as soak testing, involves taxing a system with a significant load extended over a substantial period of time to discover how the system behaves under sustained use. For this test, a 500 million row data set was used along with the QlikView application previously described. A total of 30 concurrent users (21 Dashboard users and 9 Analyst users) was simulated. The server was forced to peek to 100% CPU using a frequency that prevented it from recovering in-between peeks; meaning that average CPU is above 60%, hence saturation. The table and graphs below depict the results of the corresponding metrics - showing a similar pattern for resource utilization. The Amazon instance handles a lower number of requests but utilizes more CPU on average than the physical server. The test concludes that this particular AEC2 instance type performed similarly enough when compared to the physical server and can be used for estimation. Saturation Test (Average) Data: 500M Rows 30 Concurrent Users Server Throughput (clicks / min) Response Time (seconds) CPU Utilization (%) Total RAM (GB) Amazon EC2 38 4.9 73 117 Physical Machine 42 2.5 62 130 Table 2: Saturation Test Results QlikView Scalability Benchmark Whitepaper 9
Figure 8: Saturation Test - Throughput (clicks / min) Figure 9: Saturation Test - Response time (average) QlikView Scalability Benchmark Whitepaper 10
Figure 10: Saturation Test - CPU Utilization (average) Figure 11: Saturation Test - RAM Utilization (GB) over time QlikView Scalability Benchmark Whitepaper 11
Figure 12: Saturation Test RAM Utilization (GB) over actions QlikView Scalability Benchmark Whitepaper 12
Concurrency Concurrency testing involves the execution of concurrent activities from multiple simulated users. The concurrency test used a 50 million row data set with 90 concurrent users (63 dashboard and 27 analyst users). This setup allows for more concurrent users since neither the RAM nor CPU is saturated. The table and graphs below depict the results of the corresponding metrics; showing a similar pattern for resource utilization. The Amazon instance handles a lower number of requests but utilizes slightly more CPU on average than the physical server. The test concludes that the AEC2 instance performed similarly when compared to the physical server and can be used for estimation. It is important to note that response times have varied a bit during this test, due to network bandwidth and latency when accessing the specified systems, and should be considered when gauging results. (See the section marked Notes and Considerations for additional information) Concurrency Test (Average) Data: 50M Rows 100 Concurrent Users Server Throughput (selections / min) Response Time (seconds) CPU Utilization (%) Total RAM (GB) Amazon EC2 Physical Machine 143 1.4 18 54 148 0.4 12 56 Table 3: Concurrency Test Results QlikView Scalability Benchmark Whitepaper 13
Figure 13: Concurrency Test - Throughput per minute (Number of clicks / selections) Figure 14: Concurrency Test - Average response time (latency a considerable factor here) QlikView Scalability Benchmark Whitepaper 14
Figure 15: Concurrency Test - CPU Utilization Figure 16: Concurrency Test - RAM Utilization QlikView Scalability Benchmark Whitepaper 15
Figure 17: Concurrency test - RAM Utilization (GB) over actions QlikView Scalability Benchmark Whitepaper 16
Conclusion Our tests conclude that an Amazon EC2 instance can be used to approximate computing power and resource utilization when sizing physical dedicated hardware for QlikView. RAM consumption is seen to be almost identical for a certain application when considering how many clicks have been served. CPU consumption is higher for an Amazon instance, due to an intermediate virtualization layer. This means that Amazon will provide an upper bound of expected resource usage for a corresponding physical machine when it comes to CPU. (See section Computational Efficiency for more information on this topic.) With that taken into consideration, it is fair to state that a QlikView deployment that runs well in an Amazon instance will perform better, on a physical dedicated server. An Amazon EC2 instance proves to be a great alternative for sizing and capacity planning exercise when preparing a scalable Business Intelligence solution. QlikView Scalability Benchmark Whitepaper 17
Notes and Considerations UPLOADING YOUR DATA TO AMAZON Uploading data to an Amazon instance is not a trivial task. Uploading through Remote Desktop (RDP) typically yields 2-300 KB per second, which results to about one hour per gigabyte of data. Amazon is aware of this and thus created a service where you can send your data to them. You can read more about this here: http://aws.amazon.com/importexport/ REPRODUCIBILITY These tests, including the applications and the scripts, are available from the QlikView Scalability Center. The Scalability Center can be reached via email ScalabilityLab@qliktech.com. OVERALL APPLICATION PERFORMANCE These tests and their user scenarios were set up specifically to avoid using the highly efficient QlikView cache. (The QlikView server was forced, through unique selections, to calculate a query that was not previously calculated.) This was necessary in order to compare the computational performance of the two setups. Modifying the user selections to reduce the uniqueness would allow for more concurrent users and significantly lower average response times. The resource usage [RAM and CPU] can be measured and used as an approximation for a corresponding on premise physical server. If desired performance is achieved with the Amazon instance, then the physical server can be a good recommendation for QlikView in a production environment. Approximation of actual user perceived response times may be hard to estimate due to latency/bandwidth dependencies as well as the additional latency caused by the virtualization layer. The QlikView Server min working-set setting, found in the Management Console, can be changed during these tests to detect the required level of RAM. Too low RAM typically leads to increased CPU utilization as well as response times. The following Scalability Tools are recommended when simulating QlikView usage during sizing activities, see http://community.qlikview.com/docs/doc-2705. QlikView Scalability Benchmark Whitepaper 18
COMPUTATIONAL EFFICIENCY Dividing consumed CPU clock cycles with the throughput (Number of clicks per minute) gives a value for the average calculation of cost per action. The test results have shown that Amazon consumes more clock cycles in comparison to the physical machine for serving the same amount of actions. By calculating this measure for the test executions in this paper shows that the physical server consumed around 35% less clock cycles per action for the concurrency test and around 23% less clock cycles for the saturation test. Other tests have shown a difference in efficiency ranging from 5-35 % to advantage for the physical server in comparison to the tested Amazon instance. This further acknowledges our finding that this specific AEC2 instance can be used as an approximation of the physical server. NETWORK CONNECTION / LATENCY TEST A network connection test was performed in order to understand how connection speed impacts the overall testing results. We conducted a simple test that measured the elapsed time when downloading single files of various sizes through a web browser. Table 4 presents the results: Network Latency Connection Test File Size AMAZON Instance Physical Server 24 KB 280 ms 47 ms 240 KB 452 ms 62 ms 1.4 MB 670 ms 140 ms 14 MB 2.8 s 1.3 s Table 4: Single file download The data in Table 4 shows the Amazon instance to be at a disadvantage when compared to the dedicated physical server. This is because the physical server is connected to the same network switch as the client, whereas with the Amazon EC2 instance, client traffic flows through various points prior to reaching the Amazon instance, therefore resulting in longer data transfer times. This factor contributes to the differences in response times for the Saturation and Concurrency test. The communication from a QlikView AJAX client will typically correspond to many small and a few large downloads during a session. If there is a penalty added to each of these individual requests then the overall performance will degrade. This is one of the factors why Amazon will not provide a good estimate on the user perceived response times when moving to a physical machine. Amazon sizing should only be performed for estimating resource consumption. (i.e. RAM and CPU). QlikView Scalability Benchmark Whitepaper 19
Reference: QlikView Scalability Tool: http://community.qlikview.com/docs/doc-2705 QlikView Scalability Benchmark White Paper http://www.qlikview.com/us/explore/resources/whitepapers/qlikview-scalability-benchmark QlikView Server Memory Management and CPU Utilization Technical Brief http://www.qlikview.com/us/explore/resources/technical-briefs?language=english QlikView Server Linear Scaling Technical Brief http://www.qlikview.com/us/explore/resources/technical-briefs?language=english Scaling Up vs. Scaling Out in a QlikView Environment Technical Brief http://www.qlikview.com/us/explore/resources/technical-briefs?language=english QlikView Architecture and System Resource Usage Technical Brief http://www.qlikview.com/us/explore/resources/technical-briefs?language=english QlikView Scalability Overview Technology White Paper http://www.qlikview.com/us/explore/resources/whitepapers/qlikview-scalability-overview 2013 QlikTech International AB. All rights reserved. QlikTech, QlikView, Qlik, Q, Simplifying Analysis for Everyone, Power of Simplicity, New Rules, The Uncontrollable Smile and other QlikTech products and services as well as their respective logos are trademarks or registered trademarks of QlikTech International AB. All other company names, products and services used herein are trademarks or registered trademarks of their respective owners. The information published herein is subject to change without notice. This publication is for informational purposes only, without representation or warranty of any kind, and QlikTech shall not be liable for errors or omissions with respect to this publication. The only warranties for QlikTech products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting any additional warranty. QlikView Scalability Benchmark Whitepaper 20