On the Use of Performance Models in Autonomic Computing

Similar documents
Resource allocation for autonomic data centers using analytic performance models.

ON THE USE OF PERFORMANCE MODELS TO DESIGN SELF-MANAGING COMPUTER SYSTEMS

Sparrow. Distributed Low-Latency Spark Scheduling. Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica

Performance Assurance in Virtualized Data Centers

Autonomic Workload Execution Control Using Throttling

A Framework for Utility-Based Service Oriented Design in SASSY

QoS-aware resource allocation and load-balancing in enterprise Grids using online simulation

Quality of Service Aspects and Metrics in Grid Computing

Daniel A. Menascé, Ph. D. Dept. of Computer Science George Mason University

A Controller Based Approach for Web Services Virtualized Instance Allocation

Invited: Modeling and Optimization of Multititiered Server-Based Systems

Two-Level Iterative Queuing Modeling of Software Contention

SASSY: A Framework for Self-Architecting Service-Oriented Systems

Prediction-Based Admission Control for IaaS Clouds with Multiple Service Classes

Model-based Run-Time Software Adaptation for Distributed Hierarchical Service Coordination

Tuning Cognos ReportNet for a High Performance Environment

AUTONOMIC, OPTIMAL, AND NEAR-OPTIMAL RESOURCE ALLOCATION IN CLOUD COMPUTING

PARDA: Proportional Allocation of Resources for Distributed Storage Access

Treinamento em Análise Quantitativa & Planejamento de Capacidade. Virgilio A. F. Almeida

Autonomic Self-Optimization According to Business Objectives

A business-oriented load dispatching framework for online auction sites

An Autonomic Framework for Integrating Security and Quality of Service Support in Databases

Was ist dran an einer spezialisierten Data Warehousing platform?

Optimizing RDM Server Performance

Performance Evaluation of Distributed Software Systems

USING ECONOMIC MODELS TO TUNE RESOURCE ALLOCATIONS IN DATABASE MANAGEMENT SYSTEMS

Towards Model-based Management of Database Fragmentation

Monitoring & Tuning Azure SQL Database

White Paper. Major Performance Tuning Considerations for Weblogic Server

ITIL Capacity Management Deep Dive

Elastic Resource Provisioning for Cloud Data Center

Análise e Modelagem de Desempenho de Sistemas de Computação: Component Level Performance Models of Computer Systems

G Robert Grimm New York University

Typical scenario in shared infrastructures

Distributed File Systems Part II. Distributed File System Implementation

Treinamento em Análise Quantitativa & Planejamento de Capacidade. Virgilio A. F. Almeida

Resource Allocation Strategies for Multiple Job Classes

Cloud Performance Simulations

Oracle Exadata: Strategy and Roadmap

Full Text Search Agent Throughput

IZO MANAGED CLOUD FOR AZURE

Adaptive QoS Control Beyond Embedded Systems

When Average is Not Average: Large Response Time Fluctuations in n-tier Applications. Qingyang Wang, Yasuhiko Kanemasa, Calton Pu, Motoyuki Kawaba

Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS

Adapting Mixed Workloads to Meet SLOs in Autonomic DBMSs

Exam Questions

Automated Control of Internet Services

Transactum Business Process Manager with High-Performance Elastic Scaling. November 2011 Ivan Klianev

vcloud Automation Center Reference Architecture vcloud Automation Center 5.2

Fit for Purpose Platform Positioning and Performance Architecture

Chapter 3. Design of Grid Scheduler. 3.1 Introduction

Case Study II: A Web Server

REGULATING RESPONSE TIME IN AN AUTONOMIC COMPUTING SYSTEM: A COMPARISON OF PROPORTIONAL CONTROL AND FUZZY CONTROL APPROACHES

The Descartes Modeling Language: Status Quo

OPERATING SYSTEMS CS3502 Spring Processor Scheduling. Chapter 5

On BigFix Performance: Disk is King. How to get your infrastructure right the first time! Case Study: IBM Cloud Development - WW IT Services

Predicting the Effect on Performance of Container-Managed Persistence in a Distributed Enterprise Application

Energy efficient mapping of virtual machines

Virtual SQL Servers. Actual Performance. 2016

S-Store: Streaming Meets Transaction Processing

IBM Security QRadar Deployment Intelligence app IBM

2010 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media,

Key aspects of cloud computing. Towards fuller utilization. Two main sources of resource demand. Cluster Scheduling

Architecting & Tuning IIB / extreme Scale for Maximum Performance and Reliability

QoS Architectural Patterns for Self-Architecting Software Systems

Data Model Considerations for Radar Systems

ORACLE ENTERPRISE MANAGER 10g ORACLE DIAGNOSTICS PACK FOR NON-ORACLE MIDDLEWARE

DB2 Performance A Primer. Bill Arledge Principal Consultant CA Technologies Sept 14 th, 2011

Chapter. IT Infrastructure: Hardware and Software

Hybrid Auto-scaling of Multi-tier Web Applications: A Case of Using Amazon Public Cloud

instruction is 6 bytes, might span 2 pages 2 pages to handle from 2 pages to handle to Two major allocation schemes

Dispatcher. Phoenix. Dispatcher Phoenix Enterprise White Paper Version 0.2

On Network Flow Management in Virtualized Environments

Héctor Fernández and G. Pierre Vrije Universiteit Amsterdam

Dynamic control and Resource management for Mission Critical Multi-tier Applications in Cloud Data Center

Consolidating Microsoft SQL Server databases on PowerEdge R930 server

Batch Jobs Performance Testing

SWsoft ADVANCED VIRTUALIZATION AND WORKLOAD MANAGEMENT ON ITANIUM 2-BASED SERVERS

TPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage

Empirical Evaluation of Latency-Sensitive Application Performance in the Cloud

Taming the Beast: Optimizing Oracle EBS for Radical Efficiency

A Reflective Database-Oriented Framework for Autonomic Managers

Lies, Damn Lies and Performance Metrics. PRESENTATION TITLE GOES HERE Barry Cooks Virtual Instruments

Crescando: Predictable Performance for Unpredictable Workloads

How Cisco IT Improves Commerce User Experience by Securely Sharing Internal Business Services with Partners

Software Architecture Design of an Autonomic System

Oracle Database 11g: Real Application Testing & Manageability Overview

A Survey of Self-Protecting Computing Systems

BlackBerry AtHoc Networked Crisis Communication Capacity Planning Guidelines. AtHoc SMS Codes

Adaptive RED: An Algorithm for Increasing the Robustness of RED s Active Queue Management or How I learned to stop worrying and love RED

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein. Copyright 2003 Philip A. Bernstein. Outline

Extreme Performance Platform for Real-Time Streaming Analytics

Utility-based Optimal Service Selection for Business Processes in Service Oriented Architectures

Amazon Search Services. Christoph Schmitter

SLA Decomposition: Translating Service Level Objectives to System Level Thresholds

SQL Gone Wild: Taming Bad SQL the Easy Way (or the Hard Way) Sergey Koltakov Product Manager, Database Manageability

An Application Requirement-based Framework for Adaptive Middleware

Managing Web server performance with AutoTune agents

Runtime Software Architectural Models for Adaptation, Recovery and Evolution

Benchmarking Cloud Serving Systems with YCSB 詹剑锋 2012 年 6 月 27 日

Transcription:

On the Use of Performance Models in Autonomic Computing Daniel A. Menascé Department of Computer Science George Mason University 1 2012. D.A. Menasce. All Rights Reserved.

2

Motivation for AC main obstacle to further progress in IT is a looming software complexity crisis. (from an IBM manifesto, Oct. 2001). Tens of millions of lines of code Skilled IT professionals required to install, configure, tune, and maintain. Need to integrate many heterogeneous systems Limit of human capacity being achieved 3

Motivation for AC (cont d) Harder to anticipate interactions between components at design time: Need to defer decisions to run time Computer systems are becoming too massive, complex, to be managed even by the most skilled IT professionals The workload and environment conditions tend to change very rapidly with time 4

Multi-scale time workload variation 3600 sec of a Web Server 60 sec 1 sec 5

Self-Optimizing Systems Complex middleware and database systems have a very large number of configurable parameters. Web Server (IIS 5.0) Application Server (Tomcat 4.1) Database Server (SQL Server 7.0) HTTP KeepAlive acceptcount Cursor Threshold Application Protection Level minprocessors Fill Factor Connection Timeout maxprocessors Locks Number of Connections Logging Location Resource Indexing Performance Tuning Level Application Optimization MemCacheSize MaxCachedFileSize ListenBacklog MaxPoolThreads worker.ajp13.cachesize Max Worker Threads Min Memory Per Query Network Packet Size Priority Boost Recovery Interval Set Working Set Size Max Server Memory Min Server Memory User Connections 6

Autonomic Computing System that can manage themselves given high-level objectives. High-level objectives can be expressed in term of servicelevel objectives or utility functions. Autonomic computing is inspired in the human autonomic nervous system: The autonomic nervous system consists of sensory neurons and motor neurons that run between the central nervous system and various internal organs such as the: heart, lungs, viscera, glands. It is responsible for monitoring conditions in the internal environment and bringing about appropriate changes in them. The contraction of both smooth muscle and cardiac muscle is controlled by motor neurons of the autonomic system. http://users.rcn.com/jkimball.ma.ultranet/ BiologyPages/P/PNS.html The autonomic nervous system functions in an involuntary, reflexive manner. 7

Autonomic Systems Self-managing Self-configuring Self-optimizing Self-healing Self-protecting Self-* systems 8

Autonomic Systems Self-managing Self-configuring Self-optimizing Self-healing Self-protecting Self-* systems 9

Self-optimizing systems: motivation Arriving requests queue for threads 1 2... M Software threads cpu disk Q: How does the response time vary with the number of software threads, M? 10

2.5 2.0 Obs: For each workload level there is an optimal number of threads. Response Time (sec) 1.5 1.0 0.5 0.0 0 5 10 15 20 25 30 Number of Threads (M) L=1 L=2 L=3 L=3.5 SLA 11

Workload and QoS metrics Computer system Workload: transactions HTTP requests video downloads calls to Call Center QoS metrics: response time throughput availability page download time revenue throughput abandonment rate 12

Self-optimizing System Computer system Workload: transactions HTTP requests video downloads calls to Call Center Controller QoS metrics: response time throughput availability page download time revenue throughput abandonment rate Service Level Objectives 13

Self-optimizing System Computer system Workload: transactions HTTP requests Monitor Monitor video downloads calls to Call Center Controller QoS metrics: response time throughput availability page download time revenue throughput abandonment rate Service Level Objectives 14

Self-optimizing System Computer system Workload: transactions HTTP requests video downloads calls to Call Center Analyze & Plan Controller QoS metrics: response time throughput availability page download time revenue throughput abandonment rate Service Level Objectives 15

Self-optimizing System Workload: transactions HTTP requests video downloads calls to Call Center Computer system Controller Execute QoS metrics: response time throughput availability page download time revenue throughput abandonment rate Service Level Objectives 16

IBM s MAPE-K Model for AC AUTONOMIC MANAGER Analyze Plan Monitor Knowledge Execute Managed Element 17

Performance Model-Based state Autonomic Computing Value 18

Performance Model-Based state Autonomic Computing State: e.g., set of configuration parameters Value: e.g., QoS metric, utility function value Value Goal: find state that optimizes the value subject to cost constraints State space is large Objective function does not have a closed form Use performance models to compute value at each state. Use combinatorial search techniques to find nearoptimal solution. 19

Examples of Autonomic Computing Systems E-commerce system Internet Data Center Virtualized Environment SOA Software Systems 20

Examples of Autonomic Computing Systems E-commerce system Internet Data Center Virtualized Environment SOA Software Systems Preserving QoS of E-commerce Sites Through Self-Tuning: A Performance Model Approach," D.A. Menasce, R. Dodge and D. Barbara, Proc. 2001 ACM Conference on E-commerce, Tampa, FL, October 14-17, 2001 21

Multi-tier Architecture 22

arriving requests Computer E-commerce System completing requests (6) (1) (12) (7) Service Demand Computation (2) (8) QoS Controller Algorithm (5) (10) (11) QoS goals Workload Analyzer (9) Performance Model Solver (3) (4) QoS Controller 23

Queuing Model Used to Compute QoS Values 24

Experimental Results Arrival rate 100 90 80 70 Arrival Rate (requests/sec) 60 50 40 30 20 10 0 0 5 10 15 20 25 30 35 Time (Controller Intervals) 25

Results of QoS Controller 0.8 QoS 0.6 0.4 0.2 0.0-0.2-0.4-0.6-0.8 14.4 14.0 14.4 41.3 38.1 36.6 62.2 61.5 63.5 77.3 80.4 78.5 83.8 85.7 85.9 88.2 88.3 89.2 Arrival Rate (req/sec) 90.0 90.3 89.5 85.7 81.4 83.7 79.0 75.9 73.5 66.2 64.1 64.4 Controlled QoS Uncontrolled QoS 26

Experiment Results Arrival rate 100 90 80 QoS is not met! 70 Arrival Rate (requests/sec) 60 50 40 30 20 10 0 0 5 10 15 20 25 30 35 Time (Controller Intervals) 27

Examples of Autonomic Computing Systems E-commerce system Internet Data Center Virtualized Environment SOA Software Systems "Resource Allocation for Autonomic Data Centers using Analytic Performance Models," M. Bennani and D. Menascé, Proc. 2005 IEEE IntL. Conf. Autonomic Computing (ICAC-05), Seattle, WA, June 13-16, 2005. 28

Motivation for Dynamic Resource Allocation in Data Centers Large data centers host several application environments (AEs) AEs run customers applications (either online transactions or batch processes) on assigned servers (resources) AEs are subject to workloads that exhibit wide and unpredictable variations in their intensities Resources need to be redeployed dynamically among AEs to optimize some global utility function 29

Dynamic Resource Allocation Problem Application Environment 1 Application Environment 2... Application Environment M... Server 1 Server 2 Server N 30

Dynamic Resource Allocation Problem Application Environment 1 Application Environment 2... Application Environment M... Server 1 Server 2 Server N 31

Dynamic Resource Allocation Problem The data center has the following characteristics: M Application Environments (batch and online) A fixed number, N, of physical resources (servers) All servers have the same installed software A server can be assigned to any AE 32

Dynamic Resource Allocation Two-level Controllers Global Controller Decides how many servers to assign to each AE. Local Controller Local Controller Implements Global Controller s decisions. Server Server Server Server Server Server Application Environment 1 Application Environment M 33

34 Dynamic Resource Allocation Problem Utility Functions The global controller uses a global utility function, Ug, to assess the adherence of the overall data center performance to desired service levels (SLAs) 1 1 0 ),..., ( 1,, 1,, 1 = < < = = = = i i S s s i s i S s s i s i i M g a a U a U U U h U

Utility Function as a Function of the Response Time 100 90 80 Utility 70 60 50 40 U Ri, s + β Ki, s e i, s = R + β 1+ e i, s i, s i, s 30 20 10 0 0 2 4 6 8 10 12 14 Response Time (sec) 35

Utility Function as a Function of the Throughput 100 90 80 70 60 Utility 50 40 30 20 10 U 1+ e, s = Ki, s X + β β i i, s i, s i, s 1+ e 1 1 0 0 1 2 3 4 5 6 7 Throughput (tps) 36

Workload Variation for Online AEs 140 120 Arrival Rate (tps) 100 80 60 40 20 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 Control Interval Lambda1,1 Lambda1,2 Lambda1,3 Lambda2,1 Lambda2,2 37

Response Times for Class 1 of AE 1 0.40 0.35 0.30 Resp. Time (sec) 0.25 0.20 0.15 0.10 0.05 0.00 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 Control Interval R1,1 (no controller) R1,1 (controller) 38

Response Times for Class 2 of AE 1 0.35 0.30 Resp. Time (sec) 0.25 0.20 0.15 0.10 0.05 0.00 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 Control Interval R1,2 (no controller) R1,2 (controller) 39

Response Times for Class 3 of AE 1 0.40 0.35 0.30 Resp. Time (sec) 0.25 0.20 0.15 0.10 0.05 0.00 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 Control Interval R1,3 (no controller) R1,3 (controller) 40

Variation of the Number of Servers 14 12 10 No. Servers 8 6 4 2 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 Control Interval N1 (no controller) N1 (controller) N2 (no controller) N2 (controller) N3 (no controller) N3 (controller) 41

Variation of Global Utility 99 98 97 96 Ug 95 94 93 92 91 90 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 Control Interval No Controller With Controller 42

Examples of Autonomic Computing Systems E-commerce system Internet Data Center Virtualized Environment SOA Software Systems "Autonomic Virtualized Environments," D.A. Menasce and M.N. Bennani, IEEE International Conference on Autonomic and Autonomous Systems, July 19-21, 2006, Silicon Valley, CA, USA 43

CPU Allocation Problem for Autonomic Virtualized Environments Existing systems allow for manual allocation of CPU resources to VMs using CPU priorities or CPU shares. Need automated mechanism for the adjustment of CPU shares or the priorities of the virtual machines in order to maximize the global utility of the entire virtualized environment 44

CPU Allocation Problem for Autonomic Virtualized Environments (Cont d) Workload 1 Workload 2. Workload n U 1,1 U n,1 Virtual Machine 1 Virtual Machine 2... U 1 U 2 U M U g Virtualized Environment Virtual Machine M 45

Virtualization Controller Architecture CPU disk 1... Performance Predictor disk k Autonomic Controller Algorithm Utility Function Computation Performance Monitor to the VMM resource manager SLAs workload 46

CPU Shares Based Allocation: Workload Variation 70 60 50 Arrival Rate (tps) 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Control Interval 47

CPU Shares Based Allocation: CPU Shares Variation 90 80 70 60 CPU_SHARE (in %) 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Control Interval 48

CPU Shares Based Allocation: Response Time for VM1 10 9 8 VM1 Resp. Time (sec) 7 6 5 4 3 with controller no controller 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Control Interval R1 (no controller) R1 (controller) SLA 49

CPU Shares Based Allocation: Response Time for VM2 0.45 0.4 0.35 0.3 no controller VM2 Resp. Time (sec) 0.25 0.2 0.15 0.1 with controller 0.05 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Control Interval R2 (no controller) R2 (controller) SLA 50

CPU Shares Based Allocation: Global Utility 0.6 0.5 with controller 0.4 Virtual Environment Global Utility 0.3 0.2 0.1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30-0.1-0.2-0.3 no controller Control Interval Ug (no controller) Ug (controller) 51

Examples of Autonomic Computing Systems E-commerce system Internet Data Center Virtualized Environment SOA Software Systems 52

Self-Architecting SOA Software Systems (SASSY) SASSY allows domain experts to specify requirements using a high-level visual activity language. SASSY generates the initial software architecture optimized to maximize a utility function. The running system is constantly monitored and SASSY automatically rearchitects the system when needed. 53

Develop and Register Services Service Directory Develop QoS Architectural Patterns QoS Pattern Library Service Discovery Software Engineers Develop Software Adaptation Patterns Adaptation Pattern Library Domain Experts Specify SASs and SSSs Generation of Base Architecture Self- Architecting Service Binding and Coordination Logic Deployment Service Activity Schema (SAS) + SSSs Base Architecture Near-optimal Architecture Service Coordination Running system

SASSY: Run-time Adaptation Analyze and Determine Need to Re-Architect Architecture of running system Plan for Rearchitecting Monitor Running System Execute Software Adaptation Control KEY model r/w model read communication Running system humancomputer interaction

Architecture Adaptation Search Trajectories

Concluding Remarks Self-optimization is a key attribute of AC systems. Performance models and combinatorial search techniques can be useful in dynamically adjusting a system s configuration as the workload changes. 57

Thanks! Questions? www.cs.gmu.edu/faculty/menasce.html 58