Performance Extrapolation for Load Testing Results of Mixture of Applications

Performance Extrapolation for Load Testing Results of Mixture of Applications Subhasri Duttagupta, Manoj Nambiar Tata Innovation Labs, Performance Engineering Research Center Tata Consulting Services Mumbai, India subhasri.duttagupta@tcs.com, m.nambiar@tcs.com Abstract Load testing of IT applications faces the challenge of providing high quality test results that would represent the performance in production like scenarios, without incurring high cost of commercial load testing tools. It would help IT projects to be able to test with a small number of users and extrapolate to scenarios with much larger number of users. Such an extrapolation strategy when applied to mixture of application workloads running on a shared server environment must take into consideration application characteristics (CPU/IO intensive, memory bound) as well the server capabilities. The goal is to predict the performance of mixture workload, the maximum throughput offered by the application mix and the maximum number of users supported by the system before the throughput starts degrading. In this paper, we propose an extrapolation strategy that analyses a system workload mix based on its service demand on various resources and extrapolates its performance using simple empirical modeling techniques. Moreover, its ability to extrapolate throughput of an application mixture even if there is a change in the mixture, can help in capacity planning of the system. Keywords-Extrapolation; Load Testing; S-curve; multiclasses of job, mixture of applications; I. INTRODUCTION A complex multi-tiered IT application comprises of multiple transactions of various characteristics and is deployed in a distributed complex environment. Before the application is launched, load testing is performed to ensure the application meets the SLA. The application performance characteristics would depend on many aspects such as workload characteristics, the number of users in the system, background load and server hardware configurations etc. Firstly, test environment results under small number of users cannot be directly mapped to production environment where the system load may be hundred or thousand times more. It is also not feasible to create a production-like test environment due to high cost involved; Load testing software with limited number of virtual users licenses add to the problem. The second issue is to accurately characterize the production server workload. In case of critical enterprise applications, every production server workload has distinct performance characteristics in storage access, processing power, and memory requirements that affect the scalability of the application. Moreover, in some organizations, different workloads frequently run side by side on the same hardware. In such a situation, rather than demand of the individual workload, the aggregate demand of the multiple classes of workloads running together decides the bottlenecks to the server. Besides, the application access pattern may undergo a change or shift during operation of the system resulting in a change in the mixture of production workloads which in turn may necessitate redoing the entire testing exercise. Thus, to estimate the application performance accurately under production workloads comprising of different characteristics, we require an extrapolation strategy that would takes into account load testing results with certain workload mixture and allow us to systematically predict the performance for larger workload of the same or different workload mixture. This paper proposes such an extrapolation strategy which does not require knowledge of the application functionalities but is able to predict the performance of the system for varied workloads. The significant contributions of the paper can be listed as follows: Given the throughput of the application and utilization of various system resources while performing the load testing only for a number of users (e.g., 5 4 users), the proposed extrapolation technique is able to extrapolate throughput for more than 6 users thus reducing the load test time drastically. The extrapolation strategy is applicable to mixture of workload scenarios where individual workload may have very different characteristics in terms of system resource requirement. Ingredients of our solution are based on simple mathematical tools such as linear regression and statistical S-curve. Thus, using two previously known techniques, the proposed solution is able to extrapolate application performance without any details of the system functionalities. The proposed solution is verified with a number of sample applications and over a number of server configurations. The paper is organized as follows: Section 2 outlines the related work and Section 3 formulates the specific problem of extrapolation. Section 4 introduces the basic extrapolation strategy and shows the result of extrapolation for a sample application. Section 5 discusses how the extrapolation strategy can be applied for multiple applications workload. The paper is concluded in Section 6.

II. RELATED WORK Two well-known approaches for extrapolating from a test environment to a production configuration are discrete-event simulation modeling and analytical modeling. Extrapolation using simulation modeling [3] involves representing each of the components of the infrastructure in the simulation and implementing the business function flow through the system. Analytical models based on various queuing network [1], [8] can be cost-effective solutions. Authors in [6] demonstrate how model building along with load testing information can help in making the application ready for deployment. A hybrid methodology combining layered queuing network and industry benchmarks is proposed in [5] for extrapolating performance measures of an application in case of any hardware resource changes. But in all these cases, model building requires knowledge of the application whereas in our strategy an application can be taken as a black box and only the load testing results are required for extrapolation. Besides earlier techniques of extrapolation are not tested with various applications and are not tried out on various server platforms. III. PROBLEM FORMULATION The paper considers load testing of an IT application that is accessed by N users as shown in Fig. 1. It is assumed that IT System may comprise of multiple applications and these N users may comprise of users accessing more than one applications or different transactions of the same application. The mixture of users is known a-priori i.e., if there are two applications what percentage of users accessing each application is known beforehand. In an IT system, users submit requests and wait for responses. The average response time of a request is denoted by the symbol R. A user typically spends time in entering the details of the request or in reviewing the responses the time that a user spends outside of waiting for a response, is referred to as think time. The average think time of a user is denoted by the symbol Z. The number of requests per unit of time (usually seconds) is the throughput of the system and denoted by the symbol X. Both X and R are functions of N. Then the problem of extrapolation can be defined as follows: Given the actual throughput and response time X and R of the system for a small number of users on a specific deployment scenario, using extrapolation the technique must provide an estimate of the performance of the system for a larger number of users. Given a certain mixture of users (workload mixture), the extrapolation technique should be able to provide the performance metrics for larger loads even if the workload mixture changes in future. In this paper, we deal with mixture of multiple applications but the same strategy is applicable for complex business applications with multiple transactions. The second scenario is commonly referred to as multiple job classes. The extrapolation strategy assumes that the server configuration on which the applications are running and Figure 1. Load testing of an IT system. initial performance metrics are gathered remains unchanged for larger number of users. Thus, the performance extrapolation of a set of applications is performed only in terms of loads. IV. EXTRAPOLATION OF LOAD TESTING OF INDIVIDUAL APPLICATIONS In our earlier work [7], we proposed the basic extrapolation strategy that uses a combination of linear regression and statistical S-curve and is capable of predicting maximum throughput as well as the maximum number of users that can be supported by the application. In this section, this strategy is explained briefly and the main steps are exemplified using two sample applications. In this paper our earlier strategy is extended to multiple applications where transactions performed by users of one application may vary significantly from users of another application. The proposed performance extrapolation technique takes two sets of input as follows: 1. Load testing results of the application for small number of users (typically below 5). It requires throughput for at least four such distinct number of users. 2. Utilization information of four hardware resources such as CPU, Disk, Network and Memory gathered from all the servers while performing the load test. A. Load Testing Setup We perform load testing on various applications. All load testing is done with Apache Tomcat 6. as the application server and MySql 5.5 as the database server which is hosted on a different machine other than the application server. Load testing is done using FASTEST [2], a framework for automated system performance testing based on grinder that provides a single report of load testing correlating different metrics. Our proposed strategy is tested with various sample applications. All the sample applications are tested with three server configurations as given in Table I. These servers are categorized into high, mid and small-range servers based on the number of CPUs, available RAM and amount of disk space. Proposed strategy is tested with various sample applications such as ibatis JPetStore [4] an ecommerce J2EE benchmark, a telecom reporting application on mobile

Throughput (pages/sec) usage with a star schema and an equiz system an online quizzing system used to identify and reward the best technical talents in a large IT company. TABLE I. SERVER CATEGORIES FOR SAMPLE APPLICATIONS Server Category Features High Range 8 Core CPU 2.66 GHz Xeon with 1MB Servers L2 cache, 8 GB Physical RAM Mid-range Servers Low-range Servers Quad Core AMD Opteron CPU 2.19 GHz with 2MB L2 cache, 4 GB RAM Quad Core SPARC Sun Fire V89 1.5GHz UltraSPARC IV+, 16 GB RAM Inter Core Duo CPU 2.33 GHz with 4MB Cache, 2 GB RAM 8 7 6 5 4 3 2 1 1 2 3 4 5 Actual Test Result S curve Mixed Mode Linear Regression B. Linear Regression and S-curve Throughput of a system is limited by either hardware or software bottlenecks. Before a system encounters any bottleneck, the throughput would increase linearly with the number of concurrent users. This indicates that linear extrapolation is an obvious choice for predicting throughput of a system. Fig. 2 shows the result of extrapolation using linear regression where x-axis gives the number of users and y-axis gives the throughput in terms of pages/sec. Fig. 2 also shows the actual load testing results of JPetStore application from 1 users to 4 users. Throughputs from 1 users to 4 users are used by linear regression to extrapolate throughput up-to 4 users. We observe that the predicted throughput provides high accuracy until the number of users reaches 2. As the throughput starts to saturate, the rate of increase of the throughput reduces but the extrapolated throughput does not show this trend. This specific problem is addressed by alternate technique namely statistical S-curve. Mathematical S-Curves are sigmoid functions with the shape of alphabet S. These curves are used to represent the rate at which the performance of a technology improves or market penetration of a product happens over time. Implicit in S-curve are assumptions of slow initial growth, subsequent rapid growth, and followed by declining growth closer to the saturation level. The characteristic of initial increase followed by saturation makes S-curve a natural choice for extrapolation of throughput. If the number of users for load testing is N, then the following formula represents the throughput X using S-curve, X X /[1 a exp( bn)] (1) max Here gives the maximum throughput a system can achieve and constants a and b are estimated using initial throughput values from load testing tool. The same Fig. 2, shows the throughput obtained from extrapolation using S-curve. This technique uses the actual throughput from 1 users to 4 users and it predicts the throughput for the remaining 5 users to 4 users. The maximum throughput is taken as 595 pages/sec and is derived based on service demand as outlined in the next section. It can be observed that S-curve has the problem of Figure 2. Extrapolation of throughput using various techniques. steep rate of increase from 5 users to 13 users, the throughput increases from 14 pages/sec to 541 pages/sec. Finally, we propose an alternate solution referred to as Mixed mode which uses a combination of linear regression and S curve. Regression method provides better accuracy for smaller number of users; it is used initially to predict the throughput until the throughput predicted reaches a certain threshold (X th ). This threshold indicates the load beyond which there is a declining rate of growth for throughput. Beyond this point, S-curve is used to predict the throughput. Fig. 2 shows the performance of all three techniques and it can be observed that the performance of extrapolation using Mixed mode exceeds that of two other techniques. Mixed mode technique utilizes the benefits of the two techniques and incurs smaller error (less than 5%) of prediction for any number of users. Details of the algorithm can be found in [5]. Extrapolation using mixed mode requires the estimate of the maximum throughput which is discussed below. C. Maximum Throughput Computation using Service Demand The objective is to estimate the maximum throughput achieved by an application in the multi-tiered environment while performing the load testing. This is done by calculating the service demand of different resources. In a typical load testing scenario, the application and database may run on different servers such that the resource set includes CPU, memory, disk and network associated with all the machines involved. During load testing a sample application script is run for certain duration and resource utilization on each of the servers is captured. In the beginning of load testing, the virtual number of users is slowly increased until it reaches the desired number of users. This duration is referred to as ramp up period. Further, the number of users is reduced gradually before the end of the test until it drops to zero - this duration is referred to as ramp down period. For resource utilization, it is essential to exclude these two periods and include only the duration over which the number of users remains approximately constant.

If the average utilization of a specific resource r during the observed period is U r and the average throughput obtained in the load test is X units/sec, then resource demand of that resource is given by: (2) For example, if average utilization of disk is 67% and the average throughput is 4 pages/sec, then the service demand of disk is: D r =.67/4 = 1.68 ms. Another technique to compute service demand of a resource is outlined in [7] where a sample web application script is run in single user mode over a fixed duration and resource usage statistics is gathered (in seconds) for loading a single page or performing a single transaction. The resource with maximum service demand among all the servers is the one that saturates first when the number of users or the number of transactions is increased. If the maximum service demand is denoted by D max, then the maximum throughput X max satisfies the following formula for N users and Z think time: (3) where is the sum of service demand of all the hardware resources. The first term limits at lighter load and second term limits at higher load. For the JPetstore application on a small-range server, disk is the resource with maximum service demand of 1.68 ms and the maximum throughput from the second term of (3) is 1/.168 = 595 pages/sec. Knee of the curve Maximum throughput provides the upper bound of throughput that the application can provide. But it is also important to know the number of users for which the throughput curve starts to saturate. This specific load of the system identifies the knee of the throughput curve and is denoted as N*. Using the two bounds on maximum throughput i.e., bounds at light load and heavy load (as mentioned earlier), N* is obtained by equating these two bounds. Thus, For the telecom reporting application on a mid-range server, Z is taken as 5. sec and the network service demand is.66 ms. Hence, N* = 5./.66 = 7575. Throughput extrapolation is done at least till N* users. V. PERFORMANCE EXTRAPOLATION OF MIXTURE OF APPLICATIONS In this section, we consider a situation where multiple applications having very different resource demands run simultaneously on the same server. The service demand of multiple applications is obtained by taking weighted average of service demands of individual applications where weights reflect the proportion of workload corresponding to a specific application. We consider three resources CPU, disk, network and service demand of memory is handled differently. If service demands of three resources of application 1 and 2 are known, then service demand of these resources for multiple applications is obtained as follows: (4) where w i reflects the percentage of workload belonging to i th application and D CPU1, D CPU2, are service demands of CPU for these two applications. Since w i gives the percentage, they add up-to 1. Here, Similarly, service demand for disk and network can be obtained by taking weighted average of D Disk1, D Disk2 and D Net1, D Net2. Similar to individual applications, it is the service demand of workload mixture that decides the maximum throughput and the maximum number of users that can be supported. First, we verify through actual testing this method of computing service demand for multiple applications mixture. Secondly, the maximum throughput value X max is computed from the maximum service demand and is used in the proposed Mixed mode extrapolation strategy. Table II: Service Demand (in ms) of various applications Applicat ion Mid-range server Small-range server Disk Network CPU Disk CPU Network Telecom.1.66.79.1.9.64 PetStore 2.56.31.58.6 1.4.35 Mixture 1.3.54.67.3 1.1.53 In Table II, the service demands of two applications telecom and JPetStore application on a SUN mid-range server are shown for three resources. In the multi-class scenario, 5% of the workload belongs to telecom whereas 5% workload belongs to JPetStore. Telecom reporting application has high service demand for network and CPU and very low service demand for disk. On the other hand, PetStore is an I/O bound job and has disk service demand of 2.56ms. Disk service demand for the mixture workload can be obtained using (4) as follows: D Disk =.5 x 2.56 +.5 x.1 = 1.33 ms From the table we verify that the workload mixture of workloads indeed has disk service demand as 1.3 ms. When the mixture changes, this is going to be different as the weights applied to individual service demand changes. Next, the maximum throughput is obtained from the disk service demand as X max = 1/ D Disk = 744 pages/sec. This value is used in the extrapolation of throughput for mixture of workloads where both the applications have equal percentage. In Fig. 3 the extrapolated throughput and response time using mixed mode technique and actual load testing results are shown. It can be verified that even for mixture of applications, the mixed mode extrapolation technique is able to provide more than 9% accuracy. The service demand of

Throughput (pages/sec) Response time Throughput (pages/sec) 8 Actual throughput Model throughput Actual:Response time Model Response time 2 1 8 6 1.5 6 4 2 1.5 2 4 6 4 2 2 4 6 8 Model:2 % Telecom Model:5 % Telecom Model:8% Telecom Actual:2% Telecom Figure 3. Extrapolation of throughput for mixture of applications. applications mixture also can be used to find the maximum number of users supported. For Z= 5. sec, N* = 5/ D Disk = 3846. In Fig. 3, the maximum throughput is obtained for 45 users beyond which the throughput is expected to degrade. Though the method of computing service demand is known from Queueing theory, it was not used earlier to predict throughput and the maximum users supported for mixture of applications. A. Applications with Common Bottleneck Resource In this section, we consider a scenario where two applications for example ibatis JPetStore and telecom reporting application run on a small-range server. The mixture of system workload is changed in order to find out the effect on the overall throughput. In Fig. 4, we show the extrapolated throughput as the percentage of workload belonging to telecom application varies from 2% to 8%. It can be seen that as the percentage of telecom application is increased, the maximum throughput of the combined workload is higher and overall extrapolated throughput is also higher. For both of these applications, CPU is the resource with maximum service demand. However, for the telecom application, the resource demand of CPU is lower. For a mixture of workloads, the maximum throughput depends on the service demand of mixture workloads and throughput is higher as the percentage of application with lower resource demand is increased. The percentage of the workload belonging to the telecom application varies between 2%, 5% and 8%. The maximum throughputs for these cases are 771, 865 or 13 respectively. Thus, for 4 users load and 2% telecom application load, the total throughput is 682 pages/sec and it is 737 pages/sec when the percentage increases to 8%. As we know the X max for individual applications, it is possible to extrapolate throughput for any other mixture of workloads. Thus, mixed mode extrapolation is capable of predicting the performance of a system even based on its future usage pattern. Figure 4. Extrapolation for applications with common bottleneck resource. B. Applications with Different Bottleneck This section deals with a situation where the mixture of workload is such that different applications have different bottleneck resources. A mixture of JPetStore and telecom reporting applications run simultaneously on a mid-range AMD server. In case of JPetStore application, I/O becomes the bottleneck and the service demand of disk is highest, whereas for telecom application, the network becomes the bottleneck. In addition, disk service demand is 1.6 ms for JPetStore whereas the network service demand is.7 ms for telecom application. Thus for a mixture of these two applications, the throughput is lowest when the percentage of telecom application is just 1% and throughput increases significantly as more workload belongs to telecom application. In Fig. 5, the extrapolated throughput is shown for three scenarios. For 5% of the load belonging to telecom application, the throughput is 826 pages/sec for N= 4 users 19% higher as compared to the throughput (672 pages/sec) where 1% load belongs to telecom application. Workload belonging to telecom applications contends for the network usage whereas workload belonging to JPetstore contends for I/O. As the applications have different bottlenecks, it helps in achieving higher throughput, when the workload consists of equal percentage from both the applications. For N= 3 users, JPetStore gives a throughput of 54 pages/sec and telecom application provides throughput of 623 pages/sec whereas mixture workload of equal percentage provides throughput of 1147 pages/sec for 6 users and the mixture can support more users. This result can be useful in obtaining higher throughput even in a virtualized environment where multiple applications run on a common shared server. C. Applications with Bottleneck resources being on different servers The third scenario we consider is when the applications have the resources causing bottleneck on different servers. This occurs when the workload is a mixture of telecom

Throughput Throughput 15 15 1 1 5 Figure 5 2 4 6 8 Model:1 % Telecom Model:9% Telecom Extrapolation for Applications with Different bottleneck resources. reporting application and an e-quizzing application. For the telecom application, network is the bottleneck on the application server and in case of the e-quiz application, CPU is the bottleneck on the database server. Thus, workloads of different applications do not contend for the same resource and throughput of one application is mostly not affected by the other application. Fig. 6 shows the extrapolated throughput for three different mixtures of two applications as they run on a highrange server. In e-quiz application users views the questions, take a test and then submit their results. This application requires a thinktime (Z = 2 sec) more than that of the telecom reporting application (Z = 4 sec). Thus, in load testing, the maximum throughput for e-quiz is much lower. When this application constitutes 8% of the workload, the throughput for 5 users is 56 pages/sec and as we increase percentage of workload belonging to telecom application, higher throughput of 13 pages/sec is obtained. VI. CONCLUSIONS Model:5 % Telecom Actual: 5% Telecom Load testing of IT projects attempts to ensure that the application meets SLA before it is actually launched in the production environment. But, limitations of load testing are its applicability for large number of users, lack of knowledge about the exact production workload characteristics etc. This paper proposes an extrapolation strategy for load testing results which allows one to obtain throughput and response time of an application for large number of users. The strategy uses initial load testing results and service demand computed from the utilization statistics of hardware resources. The proposed solution uses linear regression until the throughput reaches about half of the maximum throughput, then it uses statistical S-curve to extrapolate throughput. The paper considers mixture of application workloads having different resource demands. It presents the formula for computing the service demand of multiple applications mixture and demonstrates how the mixed mode extrapolation strategy can be applied to obtain the throughput for a mixture 5 2 4 6 8 1 Model:2% telecom Model:5 % Telecom Model:8% Telecom Actual:5% Telecom Figure 6. Extrapolation when bottleneck resources are on different servers. of workloads. Depending on the bottleneck resources and their locations, the maximum throughput of mixture can vary. The strategy allows extrapolation of any mixture of applications provided service demand information of individual applications is available. This can cut down the load testing time drastically and help in analyzing different scenarios without actually performing the test. Further, incorporating this tool with capacity planning model could fasten the process of making an application ready for deployment. This paper still leaves few areas which need to be explored in future. The proposed technique is currently going through the process of validation on the virtual environment and on the clouds. This technique can be further extended to situations where hardware configurations change to reflect the production environment. This will truly bridge the gap between the test and production environments. REFERENCES [1] A. M. Ahmed, An efficient performance extrapolation for queuing models in transient analysis, In Proceedings of the 37th conference on Winter simulation, 25. [2] A. Khanapurkar, S. Malan, and M. Nambiar, A Framework for Automated System Performance Testing, in Proceedings of the Computer Measurements Group s Conference, 21 [3] H. Arsham, Performance extrapolation in discrete-event systems simulation, Int. Journal of Systems Science, vol. 27, no. 9, 1996, pp. 863-869. [4] JPetStore Application http://sourceforge.net/projects/ibatisjpetstore/ [5] N Tiwari and K. C. Nair, "Performance Extrapolation that uses Industry Benchmarks with Performance Models", In Proceedings of Symposium on Peformance Evaluation of Computer and Telecommunication Systems, 21. [6] R. Gimarc, A. Spellmann, and J. Reynolds, Moving Beyond Test and Guess Using modeling with load testing to improve web application Readiness, In Proceedings of the Computer Measurement Group's Conference, 24. [7] S. Duttagupta, and R. Mansharamani, Extrapolation Tool for Load Testing Results, Proc. of Int. Symp. on Performance Evaluation of Computer Systems and Telecommunication Systems, SPECTS 211. [8] S. Kounev, and A. Buchmann, Performance modeling and evaluation of large-scale J2EE applications, In Proc. of the Computer Measurement Group's Conference, 23