Markov Chain Analysis Example

Save this PDF as:

Size: px
Start display at page: Transcription

1 Markov Chain Analysis Example Example: single server with a queue holding up to requests 0 denotes the arrival rate denotes the service rate Steady-state analysis: mapping of Markov chain on a set of linear equations state : p = p state : p + p = p + p state : p0 + p = p + p steady state implies that arrivals = completions state 0: p = p0 normalization equation: p0 + p + p + p = resolution of system of equations to derive state probabilities p0 = / ( + / + ( / ) + ( / ) ) p = ( / ) / ( + / + ( / ) + ( / ) ); p =... derivation of mean values and distribution functions from state probabilities utilization: U = - p0; mean number of jobs in system: N = p + p + p ; blocking probability B; distribution functions for waiting time, response time, etc.

2 Queuing Network Analysis - Example Example: communication system (unbounded queues) prot. stack arrival rate: 5 packets/s = 5 /s = 00 MIps / MI = 50 /s MI prozessor 00 MIps = 0 /s medium Mb phy network 0 Mbps prot. stack = /s,8 MI prozessor 0 MIps Results (assuming exponential service times and arrivals): utilization = (e.g. physical network util. = 8 %) population per station n i = ( mean total population N = i n i =7,9 mean response time T = N/ = 96 msec (Little s law)

3 Discrete-event Simulation Idea: model the relevant events of the system process the events according to their temporal order (similar to the real execution with the exception that simple processing blocks are modeled only) Example: Execution on a -processor system (non-preemptive) / /5 Legend: green: blue: / / 5 / 6 /6 execution time process priority (=low) P P Discussion no assumptions danger of including too many details evaluation is time consuming problems with rare events large set of tools and models available 6

4 Comparision of Methods best case Analysis worst case average Analysis of multiprocessor systems Verification of Real Time requests Modelling of parallel processing (prec. constraints) Process graph no Task graph not really? Schedulability Utiliz.-based no no Resp.-based no no Markov chains no Queuing networks Operational analysis no no no no Simulation no Measurements no

5 Heuristic Search Most heuristics are based on an iterative search comprising the following elements: selection of an initial (intermediate) solution (e.g. a sequence) evaluation of the quality of the intermediate solution check of termination criteria select initial solution select next solution (based on previous solution) search strategy evaluate quality acceptance criteria satisfied y accept solution as best solution so far n termination criteria satisfied y 5

6 Hill-Climbing Discussion simple local optimizations only: algorithm is not able to pass a valley to finally reach a higher peak idea is only applicable to small parts of optimization algorithms but needs to be complemented with other strategies to overcome local optimas 6

7 Random Search also called Monte Carlo algorithm Idea: random selection of the candidates for a change of intermediate solutions or random selection of the solutions (no use of neighborhood) Discussion: simple (no neighborhood relation is needed) not time efficient, especially where the time to evaluate solutions is high sometimes used as a reference algorithm to evaluate and compare the quality of heuristic optimization algorithms idea of randomization is applied to other techniques, e.g. genetic algorithms and simulated annealing 7

8 Simulated Annealing Idea: simulate the annealing process of material: the slow cooling of material leads to a state with minimal energy, i.e. the global optimum Classification: Search strategy random local search Acceptance criteria unconditional acceptance of the selected solution if it represents an improvement over previous solutions otherwise probabilistic acceptance Termination criteria static bound on the number of iterations (cooling process) 8

9 Simulated Annealing Discussion and Variants Discussion: parameter settings for cooling process is essential (but complicated) slow decrease results in long run times fast decrease results in poor solutions discussion whether temperature decrease should be linear or logarithmic straightforward to implement Variants: deterministic acceptance nonlinear cooling (slow cooling in the middle of the process) adaptive cooling based on accepted solutions at a temperature reheating 9

10 Genetic Algorithms Basic Operations crossover mutation

11 Genetic Algorithms Basic Algorithm crossover Replacement and selection rely on some cost function defining the quality of each solution selection population replacement mutation Different replacement strategies, e.g. survival of the fittest General parameters: size of population mutation probability candidate selection strategy (mapping quality on probability) replacement strategy (replace own parents, replace weakest, influence of probability) Application-specific parameters: mapping of problem on appropriate coding handling of invalid solutions in codings Crossover selection is typically random

12 Genetische Algorithmen Minimum Spanning Tree small population results in inbreeding larger population works well with small mutation rate tradeoff between size of population and number of iterations

13 Genetic Algorithms Basic Operations Mutation => crating a new member of the population by changing one member

14 Genetic Algorithms Basic Operations Crossover => crating a new member of the population from two members

15 Genetische Algorithmen Traveling Salesman Problem minimal impact of mutation rate with small population negativ impact of high mutation rate with larger population (increased randomness) impact not quite clear 5

16 Genetic Algorithms Discussion finding an appropriate coding for the binary vectors for the specific application at hand is not intuitive problems are redundant codings, codings that do not represent a valid solution, e.g. coding for a sequencing problem tuning of genetic algorithms may be time consuming parameter settings highly depend on problem specifics suited for parallelization of optimization 6

17 Tabu Search Idea: extension of hill-climbing to avoid being trapped in local optima allow intermediate solutions with lower quality maintain history to avoid running in cycles Classification: Search strategy deterministic local search Acceptance criteria acceptance of best solution in neighborhood which is not tabu Termination criteria static bound on number of iterations or dynamic, e.g. based on quality improvements of solutions 7

18 Tabu Search Algorithm select initial solution select neighborhood set (based on current solution) remove tabu solutions from set increase neigborhood set is empty y n evaluate quality and select best solution from set termination criteria satisfied n update tabu list y The brain of the algorithm is the tabu list that stores and maintains information about the history of the search. In the most simple case a number of previous solutions are stored in the tabu list. More advanced techniques maintain attributes of the solutions rather than the solutions itself 8

19 Tabu Search Organisation of the History The history is maintained by the tabu list Attributes of solutions are a very flexible mean to control the search Example of attributes of a HW/SW partitioning problem with 8 tasks assigned to of different HW entities: (A) change of the value of a task assignment variable (A) move to HW (A) move to SW (A) combined change of some attributes (A5) improvement of the quality of two subsequent solutions over or below a threshold value Aspiration criteria: Under certain conditions tabus may be ignored, e.g. if a tabu solution is the best solution found so far all solutions in a neighborhood are tabu a tabu solution is better than the solution that triggered the respective tabu conditions Intensification checks whether good solutions share some common properties Diversification searches for solution that do not share common properties Update of history information may be recency-based or frequency-based (i.e. depending on the frequency that the attribute has been activated) 9

20 Tabu Search Discussion easy to implement (at least the neighborhood search as such) non-trival tuning of parameters tuning is crucial to avoid cyclic search advantage of use of knowledge, i.e. feedback from the search (evaluation of solutions) to control the search (e.g. for the controlled removal of bottlenecks) 0

21 Heuristic Search Methods Classification Search strategy search area global search (potentially all solutions considered) local search (direct neighbors only stepwise optimization) selection strategy deterministic selection, i.e. according to some deterministic rules random selection from the set of possible solutions probabilistic selection, i.e. based on some probabilistic function history dependence, i.e. the degree to which the selection of the new candidate solution depends on the history of the search no dependence one-step dependence multi-step dependence Acceptance criteria deterministic acceptance, i.e. based on some deterministic function probabilistic acceptance, i.e. influenced by some random factor Termination criteria static, i.e. independent of the actual solutions visited during the search dynamic, i.e. dependent on the search history

22 Heuristic Search Methods Classification Heuristic Search strategy Acceptance criterion Termination criterion Search area Selection strategy History dependence det. prob. stat. dyn. local global det. prob. random none onestep multistep hillclimbing tabu search simulated annealing genetic algorithms random search x x x x x x x x x x x x x x x x x x x x x x x x x x x x

23 Single Pass Approaches The techniques covered so far search through a high number of solutions. Idea underlying single pass approaches: intelligent construction of a single solution (instead of updating and modification of a number of solutions) the solution is constructed by subsequently solving a number of subproblems Discussion: single-pass algorithms are very quick quality of solutions is often small not applicable where lots of constraints are present (which require some kind of backtracking) Important applications of the idea: list scheduling: subsequent selection of a task to be scheduled until the complete schedule has been computed clustering: subsequent merger of nodes/modules until a small number of cluster remains such that each cluster can be assigned a single HW unit

24 Single Pass Approaches Framework derive guidelines for solution construction select subproblem The guidelines are crucial and represent the intelligence of the algorithm decide subproblem based on guidelines possibly recompute or adapt guidelines n final solution constructed y

25 List Scheduling List scheduling: subsequent selection of a task to be scheduled on some processor (or HW entity) operation is similar to a dynamic task scheduler of an operating system assign priorities to the tasks according to some strategy priorisation strategy select executable task with highest priority assign task to a processor according to some strategy assignment strategy n schedule complete? y 5

26 List Scheduling Example () Problem: processors 6 tasks with precedence constraints find schedule with minimal execution time /8 HLFET (highest level first with estimated times) length of the longest (critical) path to the sink node (node 6) Assignment strategy /6 /5 / 5 /5 first fit 6 / Resulting schedule: P P 5 6 Legend: green: red: estimated times levels (priorities)

27 List Scheduling Example () Problem (unchanged): processors 6 tasks with precedence constraints find schedule with minimal execution time / SCFET (smallest co-level first with estimated times) length of the longest (critical) path to the source node (node ) Assignment strategy first fit / /7 /5 5 6 /8 /6 Resulting schedule: P P 5 6 Legend: green: blue: estimated times co-levels (priorities)

28 List Scheduling Example () Problem: processors 6 tasks with precedence constraints find schedule with minimal execution time /8 HLFET (highest level first with estimated times) length of the longest (critical) path to the sink node (node 6) Assignment strategy /6 /5 / 5 /5 first fit 6 / Resulting schedule: P P 5 6 Legend: green: red: estimated times levels (priorities)

29 Clustering - Basics Partitioning of a set of nodes in a given number of subsets n assign each node to a different cluster compute the distance between any pair of clusters select the pair of clusters with the highest affinity merge the clusters termination criteria holds y Application: processor assignment (load balancing minimize interprocess communication) scheduling (minimize critical path) HW/SW partitioning Clustering may be employed as part of the optimization process, i.e. combined with other techniques 9

30 Clustering probabilistic Each node belongs with certain probabilities to different clusters deterministic A node belongs to exactly one cluster or not hierarchical Starts with a distance matrix of each pair of nodes Exact method: always the same result Termination after all nodes belong to one cluster partitioning Starts with given number of K clusters (independent from nodes) Results depend on the chosen initial set of clusters Termination after a given number of iterations 0

31 Hierarchical Clustering Stepwise reduction of the number of clusters Determine the distance between each pair of nodes Select the smallest distance Replace the selected pair in distance matrix by a cluster representative Recompute distance matrix Dendrogramm n All nodes in one cluster y Algorithm is kind of subsequent merger of nearest neighbors (nodes/clusters)

32 Hierarchical Clustering Dendrogramm Algorithm is kind of subsequent merger of nearest neighbors (nodes/clusters)

33 Partitioning Clustering (k-means) Choose positions of k initial cluster representative assign each node to the nearest cluster representative Recompute positions of the cluster representative Based on the positions of the nodes in each cluster n Number of iterations reached y

34 Clustering Application to Load Balancing assign each node to a different cluster compute the sum of the communication cost between any pair of clusters Optimization goal: minimize inter-process (intercluster) communication limit maximum load per processor (cluster) to 0 select the pair of clusters with the highest comm. cost that does not violate the capacity constraints merge the clusters y reduction of comm. cost without violation of constraints possible n

35 Clustering Application to Load Balancing

36 Clustering Variants Clustering methods Partitioning methods Hierarchical methods k-means Fuzzy-c-means SOM Clique One Pass Gustafson-Kessel algorithm Agglomeration (bottom up) Single linkage Complete linkage Average group Centroid MST ROCK Division (top down) Wards Tree Structural Vector Quantification Macnaughton-Smith algorithm Distance Metrics Euclidean Manhattan Minkowsky Mahalanobis Jaccard Camberra Chebychev Correlation Chi-square Kendalls s Rank Correlation 6

37 Clustering Hierarchical Algorithms Single linkage Complete Linkage Centroid-based Algorithms implement different methods to compute the distance between two clusters 7

38 Clustering Single Linkage P P P P P5 P7 P Distance between groups is estimated as the smallest distance between entities Example: d, d d. d(,)5 min Cluster # P P P P P5 P6 P7 P P P P P P P

39 Clustering Single Linkage P P P P P5 P7 P Cluster # P C P C57 P6 P C P C P Cluster # P C C57 P6 P Cluster # P P P P P5 C P6 -P P C P P P P P P P

40 Clustering Group Average P P P P P5 P7 P Distance between groups is defined as the average distance between all pairs of entities Example: d d d. 8 (,)5 5 5 Cluster # P P P P P5 P6 P7 P P P P P P P

41 Clustering Group Average P P P P P5 P7 P Cluster # P C P C57 P6 P C P C P Cluster # P P P P P5 P6 P7 C P C P P P P P P P Cluster # P C C57 P6 P

42 Clustering Centroid-based x P P P P x P5 P7 x P x Determine distances between centroids (k,l) Merge centroids with the least distance C C C C d ( k, l) xk xl yk y l Cluster # P P P P P5 P6 P7 P P P P P P P

43 Clustering Centroid-based x P P P P x P5 P7 x P x Cluster # C C C C57 C6 C C C C C Cluster # P P P P P5 P6 P7 P P P P P P P

44 Differences between Clustering Algorithms Y (m).5 x 0.5 x Y (m).5 x 0 Single Linkage x Complete Linkage - - Single Centroid K-means Complete WardLinkage X (m) 0 0 x x X (m) x 0 Y (m) X (m) x 0 - Centroid Linkage Y (m) Y (m).5 K-means X (m) x 0.5 x Ward X (m) x X (m) x 0

45 Clustering Discussion Results Exact results (single linkage) Not-exact results often several iterations are necessary (K-means) Metrics Strong impact to clustering results Not each metric is suitable for each clustering algorithm Decision for one- or multi-criteria metrics (separated or joint clustering) Selection of Algorithm Depends strongly on the structure of the data set and the expected results Some algorithms tend to separate outlayers in own clusters some large clusters and a lot of very small clusters (complete linkage) Only few algorithms are able to detect also branched, curved or cyclic clusters (single linkage) Some algorithms tend to return clusters with nearly equal size (K-means, Ward) Quality of clustering results The mean variance of the elements in each cluster (affinity parameter) is often used In general the homogeneity within clusters and the heterogeneity between clusters can be measured However, the quality prediction can be only as good as the quality of the used metric! 5

46 Branch and Bound with Underestimates Application of the A* algorithm to the scheduling problem Example: scheduling on a -processor system (processor A and B) Process graph communication processing f(x) = g(x) + h(x) g(x) exact value of partial schedule h(x) underestimate for remainder (min (altern.rem.proc, rem.comm. + rem.proc.) Legend: green: processing times blue: communication times Search is terminated when min {f(x)} is a terminal node (in the search tree) 6

47 Branch and Bound with Underestimates Example: computation of f() 5 5 -> A -> B f()=7 f()=7 8 6 case : path -- g(x) = = h(x) = f(x) = 6 case : path -- g(x) = 5 h(x) = f(x) = 9 A B A B -> A f()= -> > f(x) = g(x) + h(x) g(x) exact value of partial schedule h(x) underestimate for remainder

48 Branch and Bound with Underestimates Application of the A* algorithm to the scheduling problem Example: scheduling on a -processor system (processor A and B) Process graph Search Tree Legend: green: processing times blue: comm. times 9 -> A -> B 5 -> A 6 -> B f()= f()=8 f(5)=8 f(6)= f(x) = g(x) + h(x) g(x) exact value of partial schedule h(x) underestimate for remainder -> A -> B f()=7 f()=7 7 -> A 8 -> B f(7)=5 f(8)=8 9 -> A 0 -> B f(9)= f(0)=8 Search is terminated when min {f(x)} is a terminal node (in the search tree) 8

49 References A. Mitschele-Thiel: Systems Engineering with SDL Developing Performance- Critical Communication Systems. Wiley, 00. (section.5) C.R. Reeves (ed.): Modern Heuristic Techniques for Combinatorial Problems. Blackwell Scientific Publications, 99. H.U. Heiss: Prozessorzuteilung in Parallelrechnern. BI-Wissenschaftsverlag, Reihe Informatik, Band 98, 99. M. Garey, D. Johnson: Computer and Intractability. W.H. Freeman, New York,