Workload Characterization Techniques

Similar documents
Testing Random- Number Generators

Analysis of Simulation Results

Clustering CS 550: Machine Learning

Unsupervised Learning

CSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies.

Clustering and Visualisation of Data

CSE 5243 INTRO. TO DATA MINING

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

CSE 5243 INTRO. TO DATA MINING

Cluster Analysis. Ying Shen, SSE, Tongji University

General Instructions. Questions

Selection of Techniques and Metrics

DATA CLASSIFICATORY TECHNIQUES

Tree Models of Similarity and Association. Clustering and Classification Lecture 5

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

10701 Machine Learning. Clustering

Gene Clustering & Classification

Mean Value Analysis and Related Techniques

Forestry Applied Multivariate Statistics. Cluster Analysis

CHAPTER 4: CLUSTER ANALYSIS

Chapter 1. Using the Cluster Analysis. Background Information

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.

Inital Starting Point Analysis for K-Means Clustering: A Case Study

Averages and Variation

CLUSTER ANALYSIS. V. K. Bhatia I.A.S.R.I., Library Avenue, New Delhi

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology

3. Cluster analysis Overview

Hierarchical Clustering

Road map. Basic concepts

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

15 Wyner Statistics Fall 2013

Quantitative - One Population

Exploratory Analysis: Clustering

Unsupervised Learning

Supervised and Unsupervised Learning (II)

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

Introduction to Queueing Theory for Computer Scientists

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Cluster analysis. Agnieszka Nowak - Brzezinska

Clustering (COSC 416) Nazli Goharian. Document Clustering.

ECLT 5810 Clustering

Hierarchical Clustering

Hierarchical Clustering Lecture 9

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

CHAPTER 2 TEXTURE CLASSIFICATION METHODS GRAY LEVEL CO-OCCURRENCE MATRIX AND TEXTURE UNIT

Cluster Analysis. Angela Montanari and Laura Anderlucci

3. Cluster analysis Overview

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

Measures of Central Tendency. A measure of central tendency is a value used to represent the typical or average value in a data set.

Machine Learning: Symbol-based

Probabilistic Models of Software Function Point Elements

Clustering (COSC 488) Nazli Goharian. Document Clustering.

Measures of Central Tendency

Statistical Pattern Recognition

Unsupervised Learning and Clustering

Multivariate Analysis

Statistical Pattern Recognition

COSC 6397 Big Data Analytics. Fuzzy Clustering. Some slides based on a lecture by Prof. Shishir Shah. Edgar Gabriel Spring 2015.

VIDAEXPERT: DATA ANALYSIS Here is the Statistics button.

Anonymization Algorithms - Microaggregation and Clustering

ECLT 5810 Clustering

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

STA Module 2B Organizing Data and Comparing Distributions (Part II)

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)

Statistical Pattern Recognition

Preprocessing Short Lecture Notes cse352. Professor Anita Wasilewska

An Unsupervised Technique for Statistical Data Analysis Using Data Mining

Statistical Methods for Data Mining

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

Routing in Switched Networks

Distances, Clustering! Rafael Irizarry!

University of Florida CISE department Gator Engineering. Clustering Part 4

COSC 6339 Big Data Analytics. Fuzzy Clustering. Some slides based on a lecture by Prof. Shishir Shah. Edgar Gabriel Spring 2017.

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

Parallel Computing: Parallel Algorithm Design Examples Jin, Hai

University of Florida CISE department Gator Engineering. Clustering Part 2

Unsupervised Learning I: K-Means Clustering

Clustering Part 4 DBSCAN

STA 570 Spring Lecture 5 Tuesday, Feb 1

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures

Multivariate Capability Analysis

CS573 Data Privacy and Security. Li Xiong

Supervised vs. Unsupervised Learning

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

4. Ad-hoc I: Hierarchical clustering

TRANSACTIONAL CLUSTERING. Anna Monreale University of Pisa

Cluster Validation. Ke Chen. Reading: [25.1.2, KPM], [Wang et al., 2009], [Yang & Chen, 2011] COMP24111 Machine Learning

Market basket analysis

ECLT 5810 Data Preprocessing. Prof. Wai Lam

Unsupervised Learning : Clustering

Automated Clustering-Based Workload Characterization

CHAPTER-13. Mining Class Comparisons: Discrimination between DifferentClasses: 13.4 Class Description: Presentation of Both Characterization and

Getting to Know Your Data

Hierarchical Clustering

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Transcription:

Workload Characterization Techniques Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu These slides are available on-line at: http://www.cse.wustl.edu/~jain/cse567-08/ 6-1

Overview! Terminology! Components and Parameter Selection! Workload Characterization Techniques: Averaging, Single Parameter Histograms, Multi-parameter Histograms, Principal Component Analysis, Markov Models, Clustering! Clustering Method: Minimum Spanning Tree, Nearest Centroid! Problems with Clustering 6-2

Terminology! User = Entity that makes the service request! Workload components: " Applications " Sites " User Sessions! Workload parameters or Workload features: Measured quantities, service requests, or resource demands. For example: transaction types, instructions, packet sizes, source-destinations of a packet, and page reference pattern. 6-3

Components and Parameter Selection! The workload component should be at the SUT interface.! Each component should represent as homogeneous a group as possible. Combining very different users into a site workload may not be meaningful.! Domain of the control affects the component: Example: mail system designer are more interested in determining a typical mail session than a typical user session.! Do not use parameters that depend upon the system, e.g., the elapsed time, CPU time. 6-4

Components (Cont)! Characteristics of service requests: " Arrival Time " Type of request or the resource demanded " Duration of the request " Quantity of the resource demanded, for example, pages of memory! Exclude those parameters that have little impact. 6-5

Workload Characterization Techniques 1. Averaging 2. Single-Parameter Histograms 3. Multi-parameter Histograms 4. Principal Component Analysis 5. Markov Models 6. Clustering 6-6

Averaging! Mean! Standard deviation:! Coefficient Of Variation: s/ x! Mode (for categorical variables): Most frequent value! Median: 50-percentile 6-7

Case Study: Program Usage in Educational Environments! High Coefficient of Variation 6-8

Characteristics of an Average Editing Session! Reasonable variation 6-9

Single Parameter Histograms! n buckets m parameters k components values.! Use only if the variance is high.! Ignores correlation among parameters. 6-10

Multi-parameter Histograms! Difficult to plot joint histograms for more than two parameters. 6-11

Principal Component Analysis! Key Idea: Use a weighted sum of parameters to classify the components.! Let x ij denote the ith parameter for jth component. y j = i=1n w i x ij! Principal component analysis assigns weights w i 's such that y j 's provide the maximum discrimination among the components.! The quantity y j is called the principal factor.! The factors are ordered. First factor explains the highest percentage of the variance. 6-12

Principal Component Analysis (Cont)! Statistically: " The y's are linear combinations of x's: y i = j=1n a ij x j Here, a ij is called the loading of variable x j on factor y i. " The y's form an orthogonal set, that is, their inner product is zero: <y i, y j > = k a ik a kj = 0 This is equivalent to stating that y i 's are uncorrelated to each other. " The y's form an ordered set such that y 1 explains the highest percentage of the variance in resource demands. 6-13

Finding Principal Factors! Find the correlation matrix.! Find the eigen values of the matrix and sort them in the order of decreasing magnitude.! Find corresponding eigen vectors. These give the required loadings. 6-14

Principal Component Example 6-15

Principal Component Example! Compute the mean and standard deviations of the variables: 6-16

Principal Component (Cont)! Similarly: 6-17

Principal Component (Cont)! Normalize the variables to zero mean and unit standard deviation. The normalized values x s0 and x r0 are given by: 6-18

Principal Component (Cont)! Compute the correlation among the variables:! Prepare the correlation matrix: 6-19

Principal Component (Cont)! Compute the eigen values of the correlation matrix: By solving the characteristic equation:! The eigen values are 1.916 and 0.084. 6-20

Principal Component (Cont)! Compute the eigen vectors of the correlation matrix. The eigen vectors q 1 corresponding to λ 1 =1.916 are defined by the following relationship: or: { C}{ q} 1 = λ 1 {q} 1 or : q 11 =q 21 6-21

Principal Component (Cont)! Restricting the length of the eigen vectors to one:! Obtain principal factors by multiplying the eigen vectors by the normalized vectors:! Compute the values of the principal factors.! Compute the sum and sum of squares of the principal factors. 6-22

Principal Component (Cont)! The sum must be zero.! The sum of squares give the percentage of variation explained. 6-23

Principal Component (Cont)! The first factor explains 32.565/(32.565+1.435) or 95.7% of the variation.! The second factor explains only 4.3% of the variation and can, thus, be ignored. 6-24

Markov Models! Markov the next request depends only on the last request! Described by a transition matrix:! Transition matrices can be used also for application transitions. E.g., P(Link Compile)! Used to specify page-reference locality. P(Reference module i Referenced module j) 6-25

Transition Probability! Given the same relative frequency of requests of different types, it is possible to realize the frequency with several different transition matrices.! If order is important, measure the transition probabilities directly on the real system.! Example: Two packet sizes: Small (80%), Large (20%) " An average of four small packets are followed by an average of one big packet, e.g., ssssbssssbssss. 6-26

Transition Probability (Cont) " Eight small packets followed by two big packets.! Generate a random number x. x < 0.8 generate a small packet; otherwise generate a large packet. 6-27

Clustering 6-28

Clustering Steps 1. Take a sample, that is, a subset of workload components. 2. Select workload parameters. 3. Select a distance measure. 4. Remove outliers. 5. Scale all observations. 6. Perform clustering. 7. Interpret results. 8. Change parameters, or number of clusters, and repeat steps 3-7. 9. Select representative components from each cluster. 6-29

1. Sampling! In one study, 2% of the population was chosen for analysis; later 99% of the population could be assigned to the clusters obtained.! Random selection! Select top consumers of a resource. 6-30

! Criteria: 2. Parameter Selection " Impact on performance " Variance! Method: Redo clustering with one less parameter! Principal component analysis: Identify parameters with the highest variance. 6-31

3. Transformation! If the distribution is highly skewed, consider a function of the parameter, e.g., log of CPU time 6-32

4. Outliers! Outliers = data points with extreme parameter values! Affect normalization! Can exclude only if that do not consume a significant portion of the system resources. Example, backup. 6-33

5. Data Scaling 1. Normalize to Zero Mean and Unit Variance: 2. Weights: x ik 0 = w k x ik w k relative importance or w k = 1/s k 3. Range Normalization: Affected by outliers. 6-34

! Percentile Normalization: Data Scaling (Cont) 6-35

Distance Metric 1. Euclidean Distance: Given {x i1, x i2,., x in } and {x j1, x j2,., x jn } d={ k=1n (x ik -x jk ) 2 } 0.5 2. Weighted-Euclidean Distance: d= k=1n {a k (x ik -x jk ) 2 } 0.5 Here a k, k=1,2,,n are suitably chosen weights for the n parameters. 3. Chi-Square Distance: 6-36

Distance Metric (Cont)! The Euclidean distance is the most commonly used distance metric.! The weighted Euclidean is used if the parameters have not been scaled or if the parameters have significantly different levels of importance.! Use Chi-Square distance only if x.k 's are close to each other. Parameters with low values of x.k get higher weights. 6-37

Clustering Techniques! Goal: Partition into groups so the members of a group are as similar as possible and different groups are as dissimilar as possible.! Statistically, the intragroup variance should be as small as possible, and inter-group variance should be as large as possible. Total Variance = Intra-group Variance + Inter-group Variance 6-38

Clustering Techniques (Cont)! Nonhierarchical techniques: Start with an arbitrary set of k clusters, Move members until the intra-group variance is minimum.! Hierarchical Techniques: " Agglomerative: Start with n clusters and merge " Divisive: Start with one cluster and divide.! Two popular techniques: " Minimum spanning tree method (agglomerative) " Centroid method (Divisive) 6-39

Minimum Spanning Tree-Clustering Method 1. Start with k = n clusters. 2. Find the centroid of the i th cluster, i=1, 2,, k. 3. Compute the inter-cluster distance matrix. 4. Merge the the nearest clusters. 5. Repeat steps 2 through 4 until all components are part of one cluster. 6-40

Minimum Spanning Tree Example! Step 1: Consider five clusters with ith cluster consisting solely of ith program.! Step 2: The centroids are {2, 4}, {3, 5}, {1, 6}, {4, 3}, and {5, 2}. 6-41

Spanning Tree Example (Cont)! Step 3: The Euclidean distance is:! Step 4: Minimum inter-cluster distance = 2. Merge A+B, D+E. 6-42

Spanning Tree Example (Cont)! Step 2: The centroid of cluster pair AB is {(2+3) 2, (4+5) 2}, that is, {2.5, 4.5}. Similarly, the centroid of pair DE is {4.5, 2.5}. 6-43

Spanning Tree Example (Cont)! Step 3: The distance matrix is:! Step 4: Merge AB and C.! Step 2: The centroid of cluster ABC is {(2+3+1) 3, (4+5+6) 3}, that is, {2, 5}. 6-44

Spanning Tree Example (Cont)! Step 3: The distance matrix is:! Step 4: Minimum distance is 12.5. Merge ABC and DE Single Custer ABCDE 6-45

Dendogram! Dendogram = Spanning Tree! Purpose: Obtain clusters for any given maximum allowable intra-cluster distance. 6-46

! Start with k = 1. Nearest Centroid Method! Find the centroid and intra-cluster variance for i th cluster, i= 1, 2,, k.! Find the cluster with the highest variance and arbitrarily divide it into two clusters. " Find the two components that are farthest apart, assign other components according to their distance from these points. " Place all components below the centroid in one cluster and all components above this hyper plane in the other.! Adjust the points in the two new clusters until the inter-cluster distance between the two clusters is maximum.! Set k = k+1. Repeat steps 2 through 4 until k = n. 6-47

Cluster Interpretation! Assign all measured components to the clusters.! Clusters with very small populations and small total resource demands can be discarded. (Don't just discard a small cluster)! Interpret clusters in functional terms, e.g., a business application, Or label clusters by their resource demands, for example, CPU-bound, I/O-bound, and so forth.! Select one or more representative components from each cluster for use as test workload. 6-48

Problems with Clustering 6-49

Problems with Clustering (Cont)! Goal: Minimize variance.! The results of clustering are highly variable. No rules for: " Selection of parameters " Distance measure " Scaling! Labeling each cluster by functionality is difficult. " In one study, editing programs appeared in 23 different clusters.! Requires many repetitions of the analysis. 6-50

Summary! Workload Characterization = Models of workloads! Averaging, Single parameter histogram, multi-parameter histograms,! Principal component analysis consists of finding parameter combinations that explain the most variation! Clustering: divide workloads in groups that can be represented by a single benchmark 6-51

Exercise 6.1! The CPU time and disk I/Os of seven programs are shown in Table below. Determine the equation for principal factors. 6-52

Exercise 6.2! Using a spanning-tree algorithm for cluster analysis, prepare a Dendogram for the data shown in Table below. Interpret the result of your analysis. 6-53

! Read chapter 6 Homework! Submit answers to exercises 6.1 and 6.2 6-55