Extrapolation Tool for Load Testing Results

Extrapolation Tool for Load Testing Results Subhasri Duttagupta, Rajesh Mansharamani Performance Engineering Lab Tata Consulting Services Mumbai, India subhasri.duttagupta@tcs.com, rajesh.mansharamani@tcs.com Abstract Load testing of IT applications is fraught with the challenges of time to market, quality of results, high cost of commercial tools, and accurately representing production like scenarios. It would help IT projects to be able to test with a small number of users and extrapolate to scenarios with much larger number of users. This in turn will cut down cycle times and costs and allow for a variety of extrapolations closer to production. We present a simple extrapolation technique based on statistical empirical modeling, which we have found to be more than 9% accurate across a range of applications running across a number of hardware servers. The technique has currently been validated for scenarios where the hardware is the bottleneck and is extensible to a wider range of scenarios as well. Keywords-Extrapolation; load testing; S-Curves; regression; I. INTRODUCTION Complex IT applications today need to scale to thousands of concurrent users. Their performance scalability is usually assessed through load testing which is the process of subjecting a system to a desired work level. A typical IT application comprises of multiple components with multitiered architecture and is deployed in a distributed complex environment. Before the application is deployed on the production server, the application owners would like to get answers to the following questions through load testing: 1. What hardware and software resources are needed to guarantee that system performance meet the service level agreements (SLAs) under the given workloads? 2. What is the maximum load level that the system will be able to handle? 3. What would be the average response time, throughput and resource utilization under the expected workload? 4. What are the bottlenecks of the system? To obtain a qualitative idea of how well a system functions in the "real world", it is desirable to perform load testing in a production-like environment. This calls for significant investment in load test environment and load testing tools. Moreover, significant time needs to be invested to test and time the application and meet the SLAs. We therefore explore an extrapolation strategy that uses the metrics obtained through actual load testing and is capable of extrapolating system performance metrics at a different scenario other than testing environment. Actual system performance depends on many factors such as the number of concurrent users, deployment architecture, workload characteristics, technology configurations and background load. Hence, extrapolation of system performance can be considered in many dimensions. However, this paper focuses on the problem of extrapolation of system throughput from smaller to larger values of concurrent users referred to as load of the system. Though there have been earlier attempts to address the problem using performance models such as simulation models [3], [13] or analytical models [1], [6], [9], most of these models deal with specific application benchmarks and are validated against a certain specific hardware configuration. We propose a generic extrapolation tool that is validated against a number of applications and is extensible to multiple hardware configurations. The contributions of the paper can be listed as follows: The proposed extrapolation technique requires that load testing results be available only for a few set of points (e.g., 5 4 users) and it is able to extrapolate throughput for more than 8 users thus reducing the load test time drastically. Ingredients of our solution are based on simple mathematical tools such as linear regression and statistical S-curve. Thus, using two previously known techniques, we propose a novel extrapolation technique that provides high accuracy for a number of sample applications. Proposed technique does not require modeling background or knowledge of complex mathematical theory. Further, proposed technique of extrapolation is able to extrapolate throughput and response time for any number of users irrespective of 15-2% error in estimated maximum throughput value. The paper is organized as follows. Section 2 deals with related work, section 3 formulates the specific problem of extrapolation. Section 4 discusses load testing setup used for testing various applications. Section 5 discusses techniques of extrapolation, followed by deriving the maximum

throughput bound in Section 6. Section 7 mentions the sample applications used and the paper is concluded in Section 8. II. RELATED WORK Discrete-event simulation modeling [3], [11], [13] is an alternative scientific methodology for extrapolating from the test environment to the production configuration. But this involves careful analysis of each of the components of the infrastructure and representing them accurately in the queuing model while implementing the business function flow through the system. Analytical models based on various queuing network can be a cost-effective solution as opposed to simulation models but these models are built for specific applications. In [9] authors propose a non-state-space queuing network model for a specific J2EE application. Authors in [6] demonstrate how model building along with load testing information can help in making the application ready for deployment. But in all these cases, model building requires knowledge of the application whereas in our strategy an application can be taken as a black box and only the load testing results are required for extrapolation. The challenge of performing load testing using production-like environment can be addressed by performing it on the public cloud. Silk Performer Cloudburst [14] enables large load testing from multiple global points of reference using the enterprise cloud services. Performance Engineering Associates (PEA) [11] provide methodologies that can be used to model the application workloads and to predict the performance when server is upgraded. III. PROBLEM OF EXTRAPOLATION seconds) is the throughput of the system and denoted by the symbol X. Both X and R are functions of N. Then the problem of extrapolation can be defined as follows: Given the actual throughput and response time X and R of the system for a certain number of users up-to M on a specific deployment scenario, using extrapolation the technique must provide an estimate of the performance of the system for a larger number of users. The difference between predicted throughput and actual throughput for a given number of users is referred to as the prediction error. The goal is to minimize the prediction error for all values of users especially for a large number of users. In this paper, virtual users in load testing are referred to as users. IV. LOAD TESTING SETUP We perform load testing on various applications. All load testing is done with Apache Tomcat 6. as the application server and MySql 5.5 as the database server which is hosted on a different machine other than the application server. Load testing is done using FASTEST [2], a framework for automated system performance testing based on grinder that provides a single report of load testing correlating different metrics. All the sample applications are tested with three server configurations as given in Table I. These servers are categorized into high, mid and small-range servers based on the number of CPUs, available RAM and amount of disk space. TABLE I. SERVER CATEGORIES FOR SAMPLE APPLICATIONS Server Category High Range Servers Mid-range Servers Low-range Servers Features 8 Core CPU 2.66 GHz Xeon with 1MB L2 cache, 8 GB Physical RAM Quad Core AMD Opteron CPU 2.19 GHz with 2MB L2 cache, 4 GB RAM Inter Core Duo CPU 2.33 GHz with 4MB Cache, 2 GB RAM Figure 1. IT Application with N Users The paper considers load testing of an IT application that is accessed by N users as shown in Figure 1. It is assumed that users submit requests and wait for responses. The average response time of a request is denoted by the symbol R. A user typically spends time in entering the details of the request or in reviewing the responses the time that a user spends outside of waiting for a response, is referred to as think time. The average think time of a user is denoted by the symbol Z. The number of requests per unit of time (usually V. EXTRAPOLATION TECHNIQUES Throughput is obtained using a load testing tool for low values of virtual users. Then, these values are used to extrapolate throughput for a number of users. For an application which is scalable in nature, we expect the throughput to increase gradually until it reaches the maximum throughput that the system can offer. Below we discuss two alternate techniques for extrapolation. A. Extrapolation using Linear Regression Linear regression is useful in many practical applications for extending an approximately linear function to points close to existing data points. However, the technique may cause larger error for predicting results farther off from the existing data points. As throughputs are known for lower values of users, linear extrapolation is used here. Its

Throughput (pages/sec) Throughput (pages/sec) advantages are computational simplicity and ease of application. Extrapolation works best for slow growth area, short time horizons whereas uncertainty or forecasting error increases for long time horizons and short areas. Linear regression assumes that the past trend continues in the future and entire information of the data trend is embedded in the past and present data series. However, this does not take into account any external conditions or constraints and it fails if due to certain condition the past trend of data series does not continue. Throughput of a system is limited by either hardware or software bottlenecks. Before a system encounters any bottleneck, the throughput would increase linearly with the number of concurrent users. In such a scenario, each user is going to receive additional pages from the server thus leading to an increase in the total throughput at a constant rate (linear increase). This indicates that linear extrapolation is an obvious choice for predicting throughput of a system until the system encounters a bottleneck. This hypothesis is validated below. 8 7 6 5 4 3 2 1 2 4 6 Actual Test Result Extrapolated Result (Linear Regression) Figure 2: Extrapolation using Linear Regression Figure 2 shows the result of extrapolation using linear regression where x-axis gives the number of users and y-axis gives the throughput in terms of pages/sec. It also shows the actual throughput obtained from load testing of a sample application from 1 users to 4 users. Load testing results up-to 4 users (M=4) are used by linear regression. In Figure 2, we observe that the predicted throughput is very close to the actual throughput until the number of users reaches 2. As the throughput approaches towards the upper bound, the rate of increase of the throughput reduces until the rate drops to zero when the throughput actually reaches the upper bound. But extrapolated throughput using regression is not able to reflect this trend. Consequently, beyond 2 users, the prediction error is high i.e., (> 1%). B. Extrapolation using S-Curves Mathematical S-Curves, for example, logistic curves are sigmoid functions with the shape of alphabet S. These curves are used to estimate or forecast the rate of adoption of a technology. S curve represents correctly the rate at which the performance of a technology improves or market penetration of a product happens over time. Implicit in S- curve are assumptions of slow initial growth, subsequent rapid growth, and followed by declining growth as product penetrations reach certain saturation levels. S-curves are also used in project management as a means of representing the various expenditures of resources over the projected time of the project. The characteristic of initial increase followed by saturation makes S-curve a natural choice for extrapolation of throughput before the saturation level. If the number of users for load testing is N, then the following formula represents the throughput X using S-curve, Here gives the maximum throughput a system can achieve and constants a and b are estimated through standard linear estimation using the set of initial throughput values from load testing tool. Figure 3 shows the throughput obtained from extrapolation using S-curve. This technique uses the actual throughput from 1 users to 4 users and it predicts the throughput for the remaining 5 users to 4 users. The maximum throughput is taken as 575 pages/sec and is derived based on service demand as outlined in Section 6. It can be observed that S-curve has steep rate of increase and from 5 users to 13 users, the throughput increases from 14 pages/sec to 541 pages/sec, thus reaching close to the maximum throughput. 7 6 5 4 3 2 1 X X max /[1 a exp( bn)] Figure 3. Extrapolation using S-Curve Thus, S-curve incurs high error for lower loads but predicted throughput is close to actual throughput when throughput saturates for higher loads. C. Extrapolation using Mixed Mode This strategy makes use of the above two strategies namely extrapolation using linear regression and using S- (1) 1 2 3 4 5 Actual Test Result Extrapolated Result (S curve)

Throughput (pages/sec) curve. As the regression method performs better for smaller number of users, it should be used initially to predict the throughput. We propose that it should be used until the throughput predicted reaches a certain threshold (X th ). This threshold indicates the load beyond which there is a declining rate of growth for throughput. Beyond this point, as linear regression gives larger error, S-curve is used to predict the throughput. The resulting extrapolation which uses both the techniques is referred to as Mixed mode regression. Figure 4 shows the performance of all three techniques and it can be observed that the performance of extrapolation using Mixed mode exceeds that of two other techniques. Mixed mode technique utilizes the benefits of the other techniques and is able to incur smaller error of prediction for any number of users. However, still the issues remain: a) To find an estimate for the maximum throughput this is addressed in the next Section. b) To decide upon a suitable value for X th in a specific test scenario. Figure 6 shows the actual response time from load testing and response time using mixed mode. It can be observed that the estimated values are very close to the actual test results. Thus, mixed mode provides an approximation to both throughput and response time with high accuracy. Initialize load testing results as (N i, X i ) i=1 5 Estimate a linear regression using set of (N i, X i ) and extrapolate for higher values of N 8 7 6 5 4 3 2 1 1 2 3 4 5 Figure 4: Extrapolation Actual Test using Result Mixed Mode S curve In order to use the mixed mode, initially linear regression Figure 4. Extrapolation using Mixed Mode Is X > 5% of X max? Yes X th is reached, assign N th = N Estimate an S-curve using five pairs of (N j, X j ) from linear regression corresponding to N th, N th-1, N th-2, N th-3 and N th-4 No In order to use the mixed mode, initially linear regression is used to obtain throughput from 5 users to 15 users. For N=15 users, throughput is 313. which is more than 5% of the maximum throughput. In our test scenarios, various values of X th as a percentage of are tried out and we observe that if the throughput is greater than 5% of, then using S-curve for extrapolation provides low prediction error. Hence, X th is taken as 313 ( more than 5% of ). The value of N for which this occurs is referred to as N th. The parameters (a and b in Equation (1)) for S-curve are estimated using throughput values for N=11 to N=15. Then, extrapolation is done using S-curve from 16 users to 4 users. The resulting mixed mode technique is able to estimate throughput for 4 users within 5% of the actual throughput. The flowchart for mixed mode is presented in Figure 5. Extrapolate using S-curve for larger N such that X reaches close to X max Figure 5. Flow Chart for Mixed Mode.

Response time (sec) 3. 2.5 2. 1.5 1..5. 1 2 3 4 5 Figure 6: Response time using Mixed Mode Figure 6. Response time using Mixed Mode VI. Actual Test Result ASYMPTOTIC BOUNDS Accuracy of extrapolation scheme depends on correctly estimating maximum throughput of an application running on a specific hardware configuration. In this section, only an approximate bound on the maximum throughput is obtained. The underlying principle is based on queuing theory and is derived from service demands of all the available resources. In the absence of any software bottlenecks, we identify main four hardware resources such as CPU, disk, network and memory as shown in Figure 7. Figure 7. Users receiving services from hardware resources. Mixed Mode While throughput of an application is directly limited by the resource usage of CPU, disk and network, the available maximum memory limits the concurrent usage of these resources and thus, limits the maximum throughput. Hence, usage of memory is captured differently than the other three resources and this is discussed later while describing one of the sample applications. In order to compute service demand, a sample application script is run in single user mode over a fixed duration. To normalize the difference among individual runs, the application script is executed a number of iterations and then the usage statistics is gathered (in seconds) over all these runs for loading a single page. For example, the script for a telecom application is run 1 iterations and using atop utility of Linux, the following statistics are gathered: CPU busy time: 27.2 sec Disk busy time: 1.667sec Network busy time: 8.68sec The sample application script deals with loading 13 pages. Hence, service demand is computed using the following formula: Where I is the number of iterations and P is the number of pages the testing script accesses through urls. To obtain service demand for multiple core CPUs, it is further divided by the number of core. The resource with maximum service demand decides the maximum throughput. In the above example, a 4 core CPU is used and CPU is having the maximum service demand and it is calculated as, If the maximum service demand is denoted by S Dmax, and N users are used for load testing with think time Z, then the maximum throughput satisfies the following formula: where gives the maximum throughput that an application can achieve for N users and is the sum of service demand of all the hardware resources. The first term limits at lighter load and second term limits at higher load. In the above example, Z is taken as 6. sec and maximum throughput for N = 6 is about 956 pages/sec. This throughput is obtained from the second term. Sensitivity of the extrapolation technique with the estimated maximum throughput is analyzed and it is observed that even if is not estimated accurately, extrapolation using mixed mode is able to predict throughput with more than 9% accuracy in most scenarios. Figure 8 shows the throughput using mixed mode extrapolation for various values of X max. The telecom application mentioned earlier has the actual maximum throughput as 752 and the proposed extrapolation technique is used for three values of X max 785, 86, 95 which are having 5%, 15% and 25% error as compared to the actual X max. For X max = 86, throughput for N =6 is 86, thus error is about 7%. In practice, the maximum throughput of an application is only 9% of the estimated. This is because once any of the resources are 9% busy, the response time of that resource increases which in turn leads to increase in over-all response time. Then higher N does not result in higher throughput. Hence, a correction factor of.9 is used for estimated X max in practice.

Throuput Throughput (pages/sec) 1 9 8 7 6 5 4 3 2 1 2 4 6 8 Actual Results X(max) Error 5% X(max) Error 15% X(max) Error 25% Figure 8. Sensitivity of Extrapolation with X max A. Estimating the knee of the curve Maximum throughput bound helps in getting the shape of the throughput curve. But it is also important to know the number of users for which the throughput curve starts to saturate. This specific load of the system is denoted as N* and it identifies the knee of the throughput curve. Using the two bounds on maximum throughput i.e., bounds at light load and heavy load (as mentioned earlier), N* is obtained by equating these two bounds. Thus, In the previous example, N* is estimated as 6./.146 = 5736. This can be obtained from the mixed mode throughput curve as it approaches to. In the next section, we discuss applicability of mixed mode extrapolation for a few sample applications and we demonstrate the practicality of arriving at N* using the mixed mode regression. VII. TESTED APPLICATIONS The proposed strategy is tested with various applications ranging from lower to higher complexities. Below are given a short description on the applications that we tested and important observations in each case. DellDVD Store Application [5]: The Dell DVD Store is an open source simulation of an online ecommerce site with implementations in Microsoft SQL Server, Oracle and MySQL along with driver programs and web applications. This application has very low service demand for hardware resources on all the platforms. Its maximum throughput is bounded by the first term in the expression for which is applicable at low load situation and governed mainly by the think time denoted by Z. In this case Z is taken as 3.6 seconds. In Figure 9, it can be observed that the throughput is increasing linearly with loads of the system and the throughput has not reached the saturation level at N=8. Saturation is expected to occur at a higher load close to N=12. 25 2 15 1 5 5 1 Figure 9. Extrapolation in DellDVD application Actual Result s Mixed Mode ibatis JPetStore[7]: It is an ecommerce J2EE application Benchmark. The basis of the JPetStore is an on-line application where users can browse and search for various types of pets in five top-level categories. It displays details including prices, inventory and images for all items within each category and with authenticated login it provides full shopping cart facility that includes credit card option for billing and shipping. The throughput of this application is shown earlier in Figure 4 when it is run on a low-range server. For this application, disk is the resource with service demand on all the tested platforms. For a mid-range server, mixed mode uses as 593 based on the disk service demand and throughput is expected to saturate at 35 users. Using actual load testing, the maximum throughput obtained is 574 and it happens at 3 users. Telecom Reporting Application: This is a reporting application on mobile usage with a star- schema comprising of one big fact table and six dimensions tables. Typical reporting queries find the customers with maximum roaming usage or find the best month of the year in terms of minutes of usage etc. Figure 1 shows the throughput of telecom application on a mid-range server using mixed mode regression. In mid-range and high-range servers, the network is the bottleneck and the throughput saturates at 1148 when network is more than 9% busy whereas in low range category server, CPU becomes the bottleneck and throughput saturates at 752 for 6 users.

Throughput (pages/sec) Throughput 14 12 1 8 6 4 2 Figure 1. Extrapolation in Telecom application equiz Application[8]: This provides web-enabled technology platform to assess and verify technical skills of people throughout a large software company in an automated fashion. The application is implemented with java servlets and stored procedures and incorporates an automatic code evaluation (ACE) framework. The system is extensible to any domain requiring a finite set of technical skills. Figure 11 shows the throughput predicted using mixed mode regression when equiz application is run on a high-range server. For this application, it is found that memory on the database server becomes the bottleneck. We use the following technique to derive in case of a certain virtual memory limit. 25 2 15 1 5 2 4 6 8 Actual Results Mixed Mode 2 4 6 Actual Test Result Mixed Mode Figure 11: Extrapolation in equiz application Using load testing, throughput is observed at the application server and the virtual memory size is observed at the database server by varying the load from 1 to 5 users. Using these observations, a relationship is derived between the throughput of the application server and the corresponding virtual memory size at the database server which in turn is used to derive the maximum throughput limit of 225 pages/sec for the virtual memory limit of 8G on a mid-range server. Figure 11 shows the actual throughput and throughput extrapolated using our technique on a high-range server as the application server and a mid-range server as a database server. We observe that highest throughput obtained is 213 pages/sec. The maximum number of users supported is 5 after which throughput reduces due to the memory constraint. As a summary, Table II lists two estimated bounds and N* under three server categories that we discussed. We also list the maximum throughput obtained in actual load testing and the maximum N (maxn) after which the throughput starts decreasing. For equiz application, memory being the bottleneck, it is not possible to come up with an estimate of N * as done before. However, we estimated it using the mixed mode regression curve. VIII. CONCLUSIONS Load testing of IT projects faces many challenges high cost of commercial load testing tools, accuracy of load testing results, infeasibility of mirroring a production-like test environment etc. These projects can reduce the cost of load testing and reduce the effort involved to make the product ready for launch provided there is a tool for extrapolating the load testing results from a small number of users to various deployment scenarios. In this paper, we propose a strategy for extrapolation of load testing results from small number of users to large number of users. We describe two methods of extrapolation using statistical S-curve and using linear regression and articulate their merits and demerits. Utilizing merits of these two methods, we propose a combined technique mixed mode regression that is able to predict the throughput with high accuracy. This technique is useful for predicting throughput before any of the hardware resources is saturated and under the assumption that none of the software bottlenecks are affecting the system throughput. This technique can be extended to situations where hardware configurations change to reflect the production environment or usage pattern of the application by the end users goes through a change. Thus, the concept of virtual load testing [12] is useful in this regard. Incorporating the tool with the capacity planning model could fasten the process of making an application ready for deployment as identified in [4]. We plan to further extend the capability of the proposed extrapolation strategy using a suitable analytical model for the system.

TABLE II. ESTIMATED BOUNDS AND THEIR ACTUAL VALUES Application High-range Mid-range Small-range X max(est) X max N * (Est) maxn X max(est) X max N * (Est) maxn X max(est) X max N * (Est) maxn Telecom 136 143 7 6 1198 1148 7 65 785 752 65 6 PetStore 558 546 32 3 593 574 35 3 575 571 35 35 equiz 225 212 5 4 225 212 45 4 115 16.6 2 2 REFERENCES [1] A. M. Ahmed, An efficient performance extrapolation for queuing models in transient analysis, In Proceedings of the 37th conference on Winter simulation, 25. [2] A. Khanapurkar, S. Malan, and R. Mansharamani, A Framework for Automated System Performance Testing, in Proceedings of the Computer Measurements Group s Conference, 21 [3] H. Arsham, Performance extrapolation in discrete-event systems simulation, Int. Journal of Systems Science, vol. 27, no. 9, 1996, pp. 863-869. [4] P. Cremonesi, and G. Nardiello, How to integrate Load Testing results with Capacity Planning techniques, In Proceedings of the Computer Measurement Group's Conference, 29. [5] Dell DVD Store Database Test Suite. http://linux.dell.com/dvdstore. [6] R. Gimarc, A. Spellmann, and J. Reynolds, Moving Beyond Test and Guess Using modeling with load testing to improve web application Readiness, In Proceedings of the Computer Measurement Group's Conference, 24. [7] JPetStore Application http://sourceforge.net/projects/ibatisjpetstore/ [8] A. Khanapurkar, and M. Nanda, Talent search Technology Platform, Computer Society of India, 45th National Annual Convention, 21. [9] S. Kounev, and A. Buchmann, Performance modeling and evaluation of large-scale J2EE applications, In Proceedings of the Computer Measurement Group's Conference, 23. [1] E. Lazowska, J. Zahorjan, G. Graham and K. Sevcik, Quantitative System Performance: Computer System Analysis Using Queueing Network Models, Prentice-Hall, 1984. [11] Methodology Packs by Performance Engineering Associates, http://www.pea-online.com/ [12] Gunther, N. J. Guerrilla Capacity Planning. Springer-Verlag, Heidelberg, Germany, 27. [13] R. Y. Rubinstein, Sensitivity Analysis and Performance Extrapolation for Computer Simulation Models, Operations Research, vol. 37, 1989, pp. 72-81. [14] SikPerformer from Microfocus Inc. http://liant.com/products/silk/silkperformer.aspx/