On the Use of Performance Models in Autonomic Computing Daniel A. Menascé Department of Computer Science George Mason University 1 2012. D.A. Menasce. All Rights Reserved.
2
Motivation for AC main obstacle to further progress in IT is a looming software complexity crisis. (from an IBM manifesto, Oct. 2001). Tens of millions of lines of code Skilled IT professionals required to install, configure, tune, and maintain. Need to integrate many heterogeneous systems Limit of human capacity being achieved 3
Motivation for AC (cont d) Harder to anticipate interactions between components at design time: Need to defer decisions to run time Computer systems are becoming too massive, complex, to be managed even by the most skilled IT professionals The workload and environment conditions tend to change very rapidly with time 4
Multi-scale time workload variation 3600 sec of a Web Server 60 sec 1 sec 5
Self-Optimizing Systems Complex middleware and database systems have a very large number of configurable parameters. Web Server (IIS 5.0) Application Server (Tomcat 4.1) Database Server (SQL Server 7.0) HTTP KeepAlive acceptcount Cursor Threshold Application Protection Level minprocessors Fill Factor Connection Timeout maxprocessors Locks Number of Connections Logging Location Resource Indexing Performance Tuning Level Application Optimization MemCacheSize MaxCachedFileSize ListenBacklog MaxPoolThreads worker.ajp13.cachesize Max Worker Threads Min Memory Per Query Network Packet Size Priority Boost Recovery Interval Set Working Set Size Max Server Memory Min Server Memory User Connections 6
Autonomic Computing System that can manage themselves given high-level objectives. High-level objectives can be expressed in term of servicelevel objectives or utility functions. Autonomic computing is inspired in the human autonomic nervous system: The autonomic nervous system consists of sensory neurons and motor neurons that run between the central nervous system and various internal organs such as the: heart, lungs, viscera, glands. It is responsible for monitoring conditions in the internal environment and bringing about appropriate changes in them. The contraction of both smooth muscle and cardiac muscle is controlled by motor neurons of the autonomic system. http://users.rcn.com/jkimball.ma.ultranet/ BiologyPages/P/PNS.html The autonomic nervous system functions in an involuntary, reflexive manner. 7
Autonomic Systems Self-managing Self-configuring Self-optimizing Self-healing Self-protecting Self-* systems 8
Autonomic Systems Self-managing Self-configuring Self-optimizing Self-healing Self-protecting Self-* systems 9
Self-optimizing systems: motivation Arriving requests queue for threads 1 2... M Software threads cpu disk Q: How does the response time vary with the number of software threads, M? 10
2.5 2.0 Obs: For each workload level there is an optimal number of threads. Response Time (sec) 1.5 1.0 0.5 0.0 0 5 10 15 20 25 30 Number of Threads (M) L=1 L=2 L=3 L=3.5 SLA 11
Workload and QoS metrics Computer system Workload: transactions HTTP requests video downloads calls to Call Center QoS metrics: response time throughput availability page download time revenue throughput abandonment rate 12
Self-optimizing System Computer system Workload: transactions HTTP requests video downloads calls to Call Center Controller QoS metrics: response time throughput availability page download time revenue throughput abandonment rate Service Level Objectives 13
Self-optimizing System Computer system Workload: transactions HTTP requests Monitor Monitor video downloads calls to Call Center Controller QoS metrics: response time throughput availability page download time revenue throughput abandonment rate Service Level Objectives 14
Self-optimizing System Computer system Workload: transactions HTTP requests video downloads calls to Call Center Analyze & Plan Controller QoS metrics: response time throughput availability page download time revenue throughput abandonment rate Service Level Objectives 15
Self-optimizing System Workload: transactions HTTP requests video downloads calls to Call Center Computer system Controller Execute QoS metrics: response time throughput availability page download time revenue throughput abandonment rate Service Level Objectives 16
IBM s MAPE-K Model for AC AUTONOMIC MANAGER Analyze Plan Monitor Knowledge Execute Managed Element 17
Performance Model-Based state Autonomic Computing Value 18
Performance Model-Based state Autonomic Computing State: e.g., set of configuration parameters Value: e.g., QoS metric, utility function value Value Goal: find state that optimizes the value subject to cost constraints State space is large Objective function does not have a closed form Use performance models to compute value at each state. Use combinatorial search techniques to find nearoptimal solution. 19
Examples of Autonomic Computing Systems E-commerce system Internet Data Center Virtualized Environment SOA Software Systems 20
Examples of Autonomic Computing Systems E-commerce system Internet Data Center Virtualized Environment SOA Software Systems Preserving QoS of E-commerce Sites Through Self-Tuning: A Performance Model Approach," D.A. Menasce, R. Dodge and D. Barbara, Proc. 2001 ACM Conference on E-commerce, Tampa, FL, October 14-17, 2001 21
Multi-tier Architecture 22
arriving requests Computer E-commerce System completing requests (6) (1) (12) (7) Service Demand Computation (2) (8) QoS Controller Algorithm (5) (10) (11) QoS goals Workload Analyzer (9) Performance Model Solver (3) (4) QoS Controller 23
Queuing Model Used to Compute QoS Values 24
Experimental Results Arrival rate 100 90 80 70 Arrival Rate (requests/sec) 60 50 40 30 20 10 0 0 5 10 15 20 25 30 35 Time (Controller Intervals) 25
Results of QoS Controller 0.8 QoS 0.6 0.4 0.2 0.0-0.2-0.4-0.6-0.8 14.4 14.0 14.4 41.3 38.1 36.6 62.2 61.5 63.5 77.3 80.4 78.5 83.8 85.7 85.9 88.2 88.3 89.2 Arrival Rate (req/sec) 90.0 90.3 89.5 85.7 81.4 83.7 79.0 75.9 73.5 66.2 64.1 64.4 Controlled QoS Uncontrolled QoS 26
Experiment Results Arrival rate 100 90 80 QoS is not met! 70 Arrival Rate (requests/sec) 60 50 40 30 20 10 0 0 5 10 15 20 25 30 35 Time (Controller Intervals) 27
Examples of Autonomic Computing Systems E-commerce system Internet Data Center Virtualized Environment SOA Software Systems "Resource Allocation for Autonomic Data Centers using Analytic Performance Models," M. Bennani and D. Menascé, Proc. 2005 IEEE IntL. Conf. Autonomic Computing (ICAC-05), Seattle, WA, June 13-16, 2005. 28
Motivation for Dynamic Resource Allocation in Data Centers Large data centers host several application environments (AEs) AEs run customers applications (either online transactions or batch processes) on assigned servers (resources) AEs are subject to workloads that exhibit wide and unpredictable variations in their intensities Resources need to be redeployed dynamically among AEs to optimize some global utility function 29
Dynamic Resource Allocation Problem Application Environment 1 Application Environment 2... Application Environment M... Server 1 Server 2 Server N 30
Dynamic Resource Allocation Problem Application Environment 1 Application Environment 2... Application Environment M... Server 1 Server 2 Server N 31
Dynamic Resource Allocation Problem The data center has the following characteristics: M Application Environments (batch and online) A fixed number, N, of physical resources (servers) All servers have the same installed software A server can be assigned to any AE 32
Dynamic Resource Allocation Two-level Controllers Global Controller Decides how many servers to assign to each AE. Local Controller Local Controller Implements Global Controller s decisions. Server Server Server Server Server Server Application Environment 1 Application Environment M 33
34 Dynamic Resource Allocation Problem Utility Functions The global controller uses a global utility function, Ug, to assess the adherence of the overall data center performance to desired service levels (SLAs) 1 1 0 ),..., ( 1,, 1,, 1 = < < = = = = i i S s s i s i S s s i s i i M g a a U a U U U h U
Utility Function as a Function of the Response Time 100 90 80 Utility 70 60 50 40 U Ri, s + β Ki, s e i, s = R + β 1+ e i, s i, s i, s 30 20 10 0 0 2 4 6 8 10 12 14 Response Time (sec) 35
Utility Function as a Function of the Throughput 100 90 80 70 60 Utility 50 40 30 20 10 U 1+ e, s = Ki, s X + β β i i, s i, s i, s 1+ e 1 1 0 0 1 2 3 4 5 6 7 Throughput (tps) 36
Workload Variation for Online AEs 140 120 Arrival Rate (tps) 100 80 60 40 20 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 Control Interval Lambda1,1 Lambda1,2 Lambda1,3 Lambda2,1 Lambda2,2 37
Response Times for Class 1 of AE 1 0.40 0.35 0.30 Resp. Time (sec) 0.25 0.20 0.15 0.10 0.05 0.00 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 Control Interval R1,1 (no controller) R1,1 (controller) 38
Response Times for Class 2 of AE 1 0.35 0.30 Resp. Time (sec) 0.25 0.20 0.15 0.10 0.05 0.00 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 Control Interval R1,2 (no controller) R1,2 (controller) 39
Response Times for Class 3 of AE 1 0.40 0.35 0.30 Resp. Time (sec) 0.25 0.20 0.15 0.10 0.05 0.00 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 Control Interval R1,3 (no controller) R1,3 (controller) 40
Variation of the Number of Servers 14 12 10 No. Servers 8 6 4 2 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 Control Interval N1 (no controller) N1 (controller) N2 (no controller) N2 (controller) N3 (no controller) N3 (controller) 41
Variation of Global Utility 99 98 97 96 Ug 95 94 93 92 91 90 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 Control Interval No Controller With Controller 42
Examples of Autonomic Computing Systems E-commerce system Internet Data Center Virtualized Environment SOA Software Systems "Autonomic Virtualized Environments," D.A. Menasce and M.N. Bennani, IEEE International Conference on Autonomic and Autonomous Systems, July 19-21, 2006, Silicon Valley, CA, USA 43
CPU Allocation Problem for Autonomic Virtualized Environments Existing systems allow for manual allocation of CPU resources to VMs using CPU priorities or CPU shares. Need automated mechanism for the adjustment of CPU shares or the priorities of the virtual machines in order to maximize the global utility of the entire virtualized environment 44
CPU Allocation Problem for Autonomic Virtualized Environments (Cont d) Workload 1 Workload 2. Workload n U 1,1 U n,1 Virtual Machine 1 Virtual Machine 2... U 1 U 2 U M U g Virtualized Environment Virtual Machine M 45
Virtualization Controller Architecture CPU disk 1... Performance Predictor disk k Autonomic Controller Algorithm Utility Function Computation Performance Monitor to the VMM resource manager SLAs workload 46
CPU Shares Based Allocation: Workload Variation 70 60 50 Arrival Rate (tps) 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Control Interval 47
CPU Shares Based Allocation: CPU Shares Variation 90 80 70 60 CPU_SHARE (in %) 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Control Interval 48
CPU Shares Based Allocation: Response Time for VM1 10 9 8 VM1 Resp. Time (sec) 7 6 5 4 3 with controller no controller 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Control Interval R1 (no controller) R1 (controller) SLA 49
CPU Shares Based Allocation: Response Time for VM2 0.45 0.4 0.35 0.3 no controller VM2 Resp. Time (sec) 0.25 0.2 0.15 0.1 with controller 0.05 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Control Interval R2 (no controller) R2 (controller) SLA 50
CPU Shares Based Allocation: Global Utility 0.6 0.5 with controller 0.4 Virtual Environment Global Utility 0.3 0.2 0.1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30-0.1-0.2-0.3 no controller Control Interval Ug (no controller) Ug (controller) 51
Examples of Autonomic Computing Systems E-commerce system Internet Data Center Virtualized Environment SOA Software Systems 52
Self-Architecting SOA Software Systems (SASSY) SASSY allows domain experts to specify requirements using a high-level visual activity language. SASSY generates the initial software architecture optimized to maximize a utility function. The running system is constantly monitored and SASSY automatically rearchitects the system when needed. 53
Develop and Register Services Service Directory Develop QoS Architectural Patterns QoS Pattern Library Service Discovery Software Engineers Develop Software Adaptation Patterns Adaptation Pattern Library Domain Experts Specify SASs and SSSs Generation of Base Architecture Self- Architecting Service Binding and Coordination Logic Deployment Service Activity Schema (SAS) + SSSs Base Architecture Near-optimal Architecture Service Coordination Running system
SASSY: Run-time Adaptation Analyze and Determine Need to Re-Architect Architecture of running system Plan for Rearchitecting Monitor Running System Execute Software Adaptation Control KEY model r/w model read communication Running system humancomputer interaction
Architecture Adaptation Search Trajectories
Concluding Remarks Self-optimization is a key attribute of AC systems. Performance models and combinatorial search techniques can be useful in dynamically adjusting a system s configuration as the workload changes. 57
Thanks! Questions? www.cs.gmu.edu/faculty/menasce.html 58