Accurate emulation of CPU performance

Size: px

Start display at page:

Download "Accurate emulation of CPU performance"

Olivia Day
5 years ago
Views:

1 Accurate emulation of CPU performance Tomasz Buchert 1 Lucas Nussbaum 2 Jens Gustedt 1 1 INRIA Nancy Grand Est 2 LORIA / Nancy - Université

2 Validation of distributed systems Approaches: Theoretical approach (paper and pencil) the most general results and understanding very hard (leads to unsolvability results) Experimentation (real application on a real environment) realistic context, credibility difficulty of preparation and control, questionable reproducibility Simulation (modeled application inside modeled environment) very simple and perfectly reproducible experimental bias, possibly unrealistic Emulation (real application inside a modeled environment) control over the experiment parameters difficult Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 2 / 20

3 Emulation The perfect emulated environment should emulate (independently): Network bandwidth, latency, topology Performance and number of CPUs Memory capabilities Background noise (network, CPU, faults) Already implemented in Wrekavoc a tool to define and control heterogeneity of the cluster (but not perfect yet!) In this talk, however, we specifically concentrate on Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 3 / 20

4 Emulation The perfect emulated environment should emulate (independently): Network bandwidth, latency, topology Performance and number of CPUs Memory capabilities Background noise (network, CPU, faults) Already implemented in Wrekavoc a tool to define and control heterogeneity of the cluster (but not perfect yet!) In this talk, however, we specifically concentrate on Emulation of CPU Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 3 / 20

5 CPU emulation Various elements of CPU architecture could be emulated: speed number of cores sizes and properties of caches (and topology thereof) memory access speed (especially for NUMA systems) In this talk, we will talk about Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 4 / 20

6 CPU emulation Various elements of CPU architecture could be emulated: speed number of cores sizes and properties of caches (and topology thereof) memory access speed (especially for NUMA systems) In this talk, we will talk about Degradation of CPU speed Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 4 / 20

7 An example Unused Unused Unused Unused 50 % 50 % 70 % 30 % CPU 1 CPU 2 CPU 3 CPU 4 (1) controlling speed of each CPU/core independently Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 5 / 20

8 An example (continued) Unused Unused Unused Unused 50 % 50 % 70 % 30 % CPU 1 CPU 2 CPU 3 CPU 4 (2) being able to create separated scheduling zones Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 6 / 20

9 Dynamic frequency scaling (CPU-Freq) AKA Intel Enhanced SpeedStep or AMD Cool n Quiet Hardware solution to reduce: heat noise power usage For: no overhead of emulation completely unintrusive meaningful CPU time measure Against: only a finite set of different frequency levels Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 7 / 20

10 CPU-Lim Method available in Wrekavoc Algorithm: if CPU usage threshold send SIGSTOP to the process if CPU usage < threshold send SIGCONT to the process CPU usage = CPU time of the process process lifetime For: easy and almost POSIX-compliant Against: intrusive and unscalable decision based on one process instead of global CPU usage sleeping is indistinguishable from preemption Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 8 / 20

11 Fracas Based on idea from KRASH (load injection tool) idea Uses Linux Cgroups and Completely Fair Scheduler A predefined portion of the CPU is given to tasks burning CPU All other processes are given the remaining CPU time CPU burner CPU burner CPU burner Emulated processes Emulated processes Emulated processes Core 1 Core 2 Core 3 Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 9 / 20

12 Fracas Based on idea from KRASH (load injection tool) idea Uses Linux Cgroups and Completely Fair Scheduler A predefined portion of the CPU is given to tasks burning CPU All other processes are given the remaining CPU time For: unintrusive scalable Against: unportable to other systems sensitive to the configuration of the scheduler Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 9 / 20

13 Fracas and latency of the scheduler GFLOP / s ms 1 ms 10 ms 100 ms 1000 ms CPU Frequency [GHz] The smaller the latency, the better the emulation Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 10 / 20

14 Evaluation Based on different types of work: CPU intensive (Linpack benchmark) IO bound multiprocessing multithreading memory speed (STREAM benchmark) X-axis emulated frequency Y-axis speed perceived by the benchmark each test repeated 10 times, results = average 95% confidence interval using t-student distribution Evaluation performed on Grid 5000 platform nodes with two quad-core Intel Xeon X5570 processors nodes with a pair of single-core AMD Opteron 252 processors Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 11 / 20

15 Grid sites, 1600 machines Lille, Rennes, Orsay, Nancy, Bordeaux, Lyon, Grenoble, Toulouse, Sophia Dedicated to research on distributed systems and HPC Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 12 / 20

16 CPU intensive work GFLOP / s CPU-Freq CPU-Lim1 Fracas CPU Frequency [GHz] CPU-Lim is less predictable (the outcome has higher variance) Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 13 / 20

17 IO-bound work 6000 Loops / s CPU-Freq CPU-Lim1 Fracas CPU Frequency [GHz] CPU-Lim gives (unfair) advantage to IO-bound tasks Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 14 / 20

18 Multiprocessing Loops / s CPU-Freq CPU-Lim1 Fracas CPU Frequency [GHz] Fracas can t emulate CPU for multitask computation Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 15 / 20

19 Multithreading Loops / s CPU-Freq CPU-Lim1 Fracas CPU Frequency [GHz] CPU-Lim controls processes instead of scheduling entities Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 16 / 20

20 Memory speed GB / s CPU-Freq CPU-Lim1 Fracas CPU Frequency [GHz] Memory speed is affected differently by each method Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 17 / 20

21 Summary of the evaluation CPU-Freq: very good results coarse granularity CPU-Lim: not scalable due to implementation, intrusive higher variance controls processes, not threads Fracas: good behavior for a single-task workload scalable bad behavior for multitask workload Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 18 / 20

22 Future work Explore other approaches Improve Fracas to cover multitasking Emulate memory bandwidth Emulate other aspects of CPU Integrate Fracas into Wrekavoc Take over the world :) Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 19 / 20

23 Conclusions Presented Fracas, a method for CPU performance emulation based on Linux cgroups Compared with CPU-Freq and CPU-Lim (Wrekavoc) Evaluated experimentally on Grid 5000 None of the methods is perfect: CPU-Freq: coarse grained CPU-Lim: implementation problems, not scalable Fracas: works perfectly in single thread/process case, needs work in multithread/process case Questions? Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 20 / 20

Accurate emulation of CPU performance

INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE Accurate emulation of CPU performance Tomasz Buchert Lucas Nussbaum Jens Gustedt N 7308 Juin 2010 Distributed and High Performance Computing