Mean Value Analysis and Related Techniques

Similar documents
Mean Value Analysis and Related Techniques

Queueing Networks 32-1

Analysis of Simulation Results

Introduction to Queueing Theory for Computer Scientists

Queuing Networks, MVA, Bottleneck Analysis

Workload Characterization Techniques

Selection of Techniques and Metrics

Overview of Authentication Systems

Index. ADEPT (tool for modelling proposed systerns),

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition

Routing in Switched Networks

Determining the Number of CPUs for Query Processing

Testing Random- Number Generators

Why is scheduling so difficult?

Distributed File Systems Part II. Distributed File System Implementation

Review. EECS 252 Graduate Computer Architecture. Lec 18 Storage. Introduction to Queueing Theory. Deriving Little s Law

CSE120 Principles of Operating Systems. Prof Yuanyuan (YY) Zhou Scheduling

Types of Workloads. Raj Jain Washington University in Saint Louis Saint Louis, MO These slides are available on-line at:

Preview. Process Scheduler. Process Scheduling Algorithms for Batch System. Process Scheduling Algorithms for Interactive System

High Performance Computing Systems

Memory Management. CSE 2431: Introduction to Operating Systems Reading: , [OSC]

Analytic Performance Models for Bounded Queueing Systems

CS3733: Operating Systems

CSE 120 Principles of Operating Systems Spring 2017

Joe Wingbermuehle, (A paper written under the guidance of Prof. Raj Jain)

Multimedia Systems 2011/2012

CS 112: Computer System Modeling Fundamentals. Prof. Jenn Wortman Vaughan April 21, 2011 Lecture 8

An algorithm for Performance Analysis of Single-Source Acyclic graphs

Classical Encryption Techniques

Análise e Modelagem de Desempenho de Sistemas de Computação: Component Level Performance Models of Computer Systems

CPU Scheduling. Daniel Mosse. (Most slides are from Sherif Khattab and Silberschatz, Galvin and Gagne 2013)

L7: Performance. Frans Kaashoek Spring 2013

3. CPU Scheduling. Operating System Concepts with Java 8th Edition Silberschatz, Galvin and Gagn

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS

Introduction to Queuing Systems

Algorithms for System Performance Analysis

Advanced Operating Systems (CS 202) Scheduling (1)

Start of Lecture: February 10, Chapter 6: Scheduling

UNIT 4: QUEUEING MODELS

Chapter 5 CPU scheduling

CPU Scheduling Algorithms

Transport Protocols. Raj Jain. Washington University in St. Louis

Advanced Operating Systems (CS 202) Scheduling (1) Jan, 23, 2017

Aperiodic Servers (Issues)

Skill-Based Routing for Call Centers

Queueing Networks. Lund University /

Παράλληλη Επεξεργασία

Course Syllabus. Operating Systems

Part II. Raj Jain. Washington University in St. Louis

Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding effects of underlying architecture

Last Class: Processes

CSE373: Data Structures & Algorithms Lecture 12: Amortized Analysis and Memory Locality. Lauren Milne Spring 2015

Architectures for Carrier Network Evolution

Building RAID Storage Systems

Administrivia. CMSC 411 Computer Systems Architecture Lecture 19 Storage Systems, cont. Disks (cont.) Disks - review

Properties of Processes

King Saud University. College of Computer & Information Sciences. Information Technology Department. IT425: Operating Systems.

Verification and Validation of X-Sim: A Trace-Based Simulator

Lecture 8: Using Mathematica to Simulate Markov Processes. Pasi Lassila Department of Communications and Networking

6. Results. This section describes the performance that was achieved using the RAMA file system.

!! What is virtual memory and when is it useful? !! What is demand paging? !! When should pages in memory be replaced?

Virtual Memory. CSCI 315 Operating Systems Design Department of Computer Science

Worst-case Ethernet Network Latency for Shaped Sources

ECE454 Tutorial. June 16, (Material prepared by Evan Jones)

Cover sheet for Assignment 3

CS370 Operating Systems

CSE 473s Introduction to Computer Networks

Example. You manage a web site, that suddenly becomes wildly popular. Performance starts to degrade. Do you?

ATM Networking: Issues and Challenges Ahead

CS140 Operating Systems Midterm Review. Feb. 5 th, 2009 Derrick Isaacson

Operating Systems and Networks Recitation Session 4

LECTURE 3:CPU SCHEDULING

Lecture 5 / Chapter 6 (CPU Scheduling) Basic Concepts. Scheduling Criteria Scheduling Algorithms

10. Performance. Overview. What is performance: bandwidth and latency. Here are some examples of communication bandwidth and latency on a single link.

UNIT I (Two Marks Questions & Answers)

Stochastic Processing Networks: What, Why and How? Ruth J. Williams University of California, San Diego

Chapter 5: Process Scheduling

Versatile Models of Systems Using MAP Queueing Networks

FCM 710: Architecture of Secure Operating Systems

Operating Systems. Memory Management. Lecture 9 Michael O Boyle

Optimizing Resource Sharing Systems

CSE 120 Principles of Operating Systems

January 11, :51 WSPC/INSTRUCTION FILE amva-journal-lei. APEM - Approximate Performance Evaluation for Multi-core Computers

Lecture 5: Performance Analysis I

CS 318 Principles of Operating Systems

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs

Review. Preview. Three Level Scheduler. Scheduler. Process behavior. Effective CPU Scheduler is essential. Process Scheduling

Operating Systems. CPU Scheduling ENCE 360

Modeling VMware ESX Server Performance A Technical White Paper. William L. Shelden, Jr., Ph.D Sr. Systems Analyst

Lecture #15: Translation, protection, sharing

OPERATING SYSTEMS. UNIT II Sections A, B & D. An operating system executes a variety of programs:

>>> SOLUTIONS <<< Answer the following question regarding the basics of systems and performance modeling.

Goals of Memory Management

Main Memory (Part I)

ECE 752 Adv. Computer Architecture I

AC59/AT59/AC110/AT110 OPERATING SYSTEMS & SYSTEMS SOFTWARE DEC 2015

Modes of Operation. Raj Jain. Washington University in St. Louis

Wireless Local Area Networks (WLANs) Part I

CSE373: Data Structures & Algorithms Lecture 12: Amortized Analysis and Memory Locality. Linda Shapiro Winter 2015

For The following Exercises, mark the answers True and False

Transcription:

Mean Value Analysis and Related Techniques Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu Audio/Video recordings of this lecture are available at: 34-1

Overview 1. Analysis of Open Queueing Networks 2. Mean-Value Analysis 3. Approximate MVA 4. Balanced Job Bounds 34-2

Analysis of Open Queueing Networks Used to represent transaction processing systems, such as airline reservation systems, or banking systems. Transaction arrival rate is not dependent on the load on the computer system. Arrivals are modeled as a Poisson process with a mean arrival rate λ. Exact analysis of such systems Assumption: All devices in the system can be modeled as either fixed-capacity service centers (single server with exponentially distributed service time) or delay centers (infinite servers with exponentially distributed service time). 34-3

Analysis of Open Queueing Networks For all fixed capacity service centers in an open queueing network, the response time is: R i = S i (1+Q i ) On arrival at the i th device, the job sees Q i jobs ahead (including the one in service) and expects to wait Q i S i seconds. Including the service to itself, the job should expect a total response time of S i (1+Q i ). Assumption: Service is memory-less (not operationally testable) Not an operational law Without the memory-less assumption, we would also need to know the time that the job currently in service has already consumed. 34-4

Mean Performance Assuming job flow balance, the throughput of the system is equal to the arrival rate: X = λ The throughput of i th device, using the forced flow law is: X i = X V i The utilization of the i th device, using the utilization law is: U i = X i S i = X V i S i = λ D i The queue length of i th device, using Little's law is: Q i = X i R i = X i S i (1+Q i ) =U i (1+Q i ) Or Q i = U i /(1-U i ) Notice that the above equation for Q i is identical to the equation for M/M/1 queues. 34-5

Mean Performance The device response times are: In delay centers, there are infinite servers and, therefore: Notice that the utilization of the delay center represents the mean number of jobs receiving service and does not need to be less than one. 34-6

File server consisting of a CPU and two disks, A and B. With 6 clients systems: Example 34.1 34-7

Example 34.1 (Cont) 34-8

Example 34.1 (Cont) Device utilizations using the utilization law are: 34-9

Example 34.1 (Cont) The device response times using Equation 34.2 are: Server response time: We can quantify the impact of the following changes: 34-10

Example 34.1 (Cont) Q: What if we increase the number of clients to 8? Request arrival rate will go up by a factor of 8/6. Conclusion: Server response time will degrade by a factor of 6.76/1.406= 4.8 34-11

Example 34.1 (Cont) Q: What if we use a cache for disk B with a hit rate of 50%, although it increases the CPU overhead by 30% and the disk B service time (per I/O) by 10%. A: 34-12

Example 34.1 (Cont) The analysis of the changed systems is as follows: Thus, if we use a cache for Disk B, the server response time will improve by (1.406-1.013)/1.406 = 28%. 34-13

Example 34.1 (Cont) Q: What if we have a lower cost server with only one disk (disk A) and direct all I/O requests to it? A: the server response time will degrade by a factor of 3.31/1.406 = 2.35 34-14

MVA (Cont) Mean-value analysis (MVA) allows solving closed queueing networks in a manner similar to that used for open queueing networks It gives the mean performance. The variance computation is not possible using this technique. Initially limit to fixed-capacity service centers. Delay centers are considered later. Load-dependent service centers are also considered later. Given a closed queueing network with N jobs: R i (N) = S i (1+Q i (N-1)) Here, Q i (N-1) is the mean queue length at i th device with N-1 jobs in the network. It assumes that the service is memoryless 34-15

MVA (Cont) Since the performance with no users ( N=0 ) can be easily computed, performance for any number of users can be computed iteratively. Given the response times at individual devices, the system response time using the general response time law is: The system throughput using the interactive response time law is: 34-16

MVA (Cont) The device throughputs measured in terms of jobs per second are: X i (N)= X(N) V i The device queue lengths with N jobs in the network using Little's law are: Q i (N)= X i (N) R i (N)= X(N) V i R i (N) Response time equation for delay centers is simply: R i (N) = S i Earlier equations for device throughputs and queue lengths apply to delay centers as well. Q i (0)=0 34-17

Example 34.2 Consider a timesharing system Each user request makes ten I/O requests to disk A, and five I/O requests to disk B. The service times per visit to disk A and disk B are 300 and 200 milliseconds, respectively. Each request takes two seconds of CPU time and the user think time is four seconds. 34-18

Initialization: Number of users: N=0 Example 34.2 (Cont) Device queue lengths: Q CPU =0, Q A =0, Q B = 0 Iteration 1: Number of users: N=1 Device response times: System Response time: 34-19

Example 34.2 (Cont) System Throughput: X=N/(R+Z)=1/(6+4)=0.1 Device queue lengths: Iteration 2: Number of users: N=2 Device response times: 34-20

Example 34.2 (Cont) System Response time: System Throughput: X=N/(R+Z)=2/(7.4+4)=0.175 Device queue lengths: 34-21

MVA Results for Example 34.2 System Throughput (Jobs/sec) 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0.0 0 20 40 Number of Users 34-22 Response Time in Seconds 140.0 120.0 100.0 80.0 60.0 40.0 20.0 0.0 0 20 40 Number of Users

Box 34.1: MVA Algorithms Inputs: N = number of users Z = think time M = number of devices S i = service time/visit to ith device V i = number of visits to ith device Outputs: X = system throughput Q i = average # of jobs at ith device R i = response time of ith device R = system response time U i = utilization of the ith device 34-23

Quiz 34A: MVA Part 1: Fill in the rows for N=0 and N=1 only. R i = S i (1 + Q i ) X = N Z+R V i 25 20 4 Z=5 S i 0.04 0.03 0.025 R = P M i=1 R iv i Q i = XV i R i 5s 25 40ms A 20 30ms B 4 25ms N R C R A R B V C R C V A R A V B R B R R+Z X Q C Q A Q B 0 1 2 Part 2: Fill in the row for N=2. 34-24

MVA Assumptions MVA is applicable only if the network is a product form network. This means that the network should satisfy the conditions of job flow balance, one step behavior, and device homogeneity. Also assumes that all service centers are either fixedcapacity service centers or delay centers. In both cases, we assumed exponentially distributed service times. 34-27

Homework 34A For a timesharing system with two disks (user and system), the probabilities for jobs completing the service at the CPU were found to be 0.75 to disk A, 0.15 to disk B, and 0.1 to the terminals. The user think time was measured to be 5 seconds, the disk service times are 30 milliseconds and 25 milliseconds, while the average service time per visit to the CPU was 40 milliseconds. Using the queueing network model shown in Figure 32.8: Use MVA to compute system throughput and response time for N=1,,5 interactive users 34-28

Balanced Job Bounds A system without a bottleneck device is called a balanced system. Balanced system has a better performance than a similar unbalanced system Allows getting two sided bounds on performance An unbalanced system's performance can always be improved by replacing the bottleneck device with a faster device. Balanced System: Total service time demands on all devices are equal. 34-29

Balanced Job Bounds (Cont) Thus, the response time and throughput of a time-sharing system can be bounded as follows: Here, D avg =D/M is the average service demand per device. These equations are known as balanced job bounds. These bounds are very tight in that the upper and lower bound are very close to each other and to the actual performance. For batch systems, the bounds can be obtained by substituting Z=0 34-30

Balanced Job Bounds (Cont) Assumption: All service centers except terminals are fixed-capacity service centers. Terminals are represented by delay centers. No other delay centers are allowed because the presence of delay centers invalidates several arguments related to D max and D avg. 34-31

Derivation of Balanced Job Bounds Steps: 1. Derive an expression for the throughput and response time of a balanced system. 2. Given an unbalanced system, construct a corresponding `best case' balanced system such that the number of devices is the same and the sum of demands is identical in the balanced and unbalanced systems. This produces the upper bounds on the performance. 3. Construct a corresponding `worst case' balanced system such that each device has a demand equal to the bottleneck and the number of devices is adjusted to make the sum of demands identical in the balanced and unbalanced systems. This produces the lower bounds on performance. 34-32

Example 34.4 For the timesharing system of Example 34.1 D CPU = 2, D A = 3, D B = 1, Z = 4 D = D CPU + D A + D B = 2+3+1 = 6 D avg = D/3 = 2 D max = D A = 3 The balanced job bounds are: 34-33

Example 34.4 (Cont) 34-34

Example 34.4 (Cont) 34-35

Example 34.4 (Cont) Throughput 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 MVA BJB 0 5 10 15 20 Response Time 60 50 40 30 20 10 0 BJB MVA 0 5 10 15 20 # of Users # of Users 34-36

Summary 1. Open queueing networks of M/M/1 or M/M/ can be analyzed exactly 2. MVA allows exact analysis of closed queueing networks. Given performance of N-1 users, get performance for N users. 3. Balanced Job bounds: A balanced system with D i = D avg will have better performance and an unbalanced system with some devices at D max and others at 0 will have worse performance. 34-37

Homework 34C For the data of homework 34A: Write the expressions for balanced job bounds on the system throughput and response time of the system and compute the bounds for up to N=30 users. 34-38