Variability in Architectural Simulations of Multi-threaded

Size: px

Start display at page:

Download "Variability in Architectural Simulations of Multi-threaded"

Derek Arnold
5 years ago
Views:

1 Variability in Architectural Simulations of Multi-threaded threaded Workloads Alaa R. Alameldeen and David A. Wood University of Wisconsin-Madison

2 Motivation Experimental scientists use statistics Computer architects in simulation experiments don t! Why ignore statistics? Simulations are deterministic This can lead to wrong conclusions! Alaa Alameldeen and David Wood 2

3 Workload Variability Cycles Per Trans. (millions) OLTP DRAM Access latency (ns) Alaa Alameldeen and David Wood 3

4 Workload Variability Cycles Per Trans. (millions) OLTP Slower memory is better! DRAM Access latency (ns) Alaa Alameldeen and David Wood 4

5 What Went Wrong? Many possible executions for each configuration Why? Different timing effects OS scheduling decisions Different orders of lock acquisition Different transaction mixes This is magnified by short simulations Variability can lead to wrong conclusions Alaa Alameldeen and David Wood 5

6 Overview Variability is a real phenomenon for multi- threaded workloads Runs from same initial state can be different Variability is a challenge for simulations Simulations are short Our solution accounts for variability Multiple runs, statistical techniques Alaa Alameldeen and David Wood 6

7 Outline Motivation and Overview Variability in Real Systems Time and Space Variability Variability in Simulations Accounting for Variability Conclusions Alaa Alameldeen and David Wood 7

8 What is Variability? Differences between multiple estimates of a workload s performance Time Variability: Performance changes during different phases of a single run Space Variability: Runs starting from the same state follow different execution paths Alaa Alameldeen and David Wood 8

9 Time Variability in Real Systems Cycles Per Trans. (millions) OLTP One-second intervals Time(sec) Alaa Alameldeen and David Wood 9

10 Time Variability Example (Cont d) How is this handled in real experiments? Cycles Per Trans. (millions) OLTP Solution: Run your experiment long enough! One-minute intervals Time(sec) Alaa Alameldeen and David Wood 10

11 Space Variability in Real Systems Cycles Per Trans. (millions) OLTP One-second averages 5 runs Time(sec) Alaa Alameldeen and David Wood 11

12 Cycles per Trans. (millions) Space Variability Example (Cont d) How is this handled in real experiments? Same Solution: Run your experiment long enough! OLTP One-minute averages 5 runs 16-day simulation Time(sec) Alaa Alameldeen and David Wood 12

13 Outline Motivation and Overview Variability in Real Systems Variability in Simulations Simulation Infrastructure Injecting Randomness The Wrong Conclusion Ratio Accounting for Variability Conclusions Alaa Alameldeen and David Wood 13

14 Workloads Simulation Infrastructure Two scientific and five commercial benchmarks Target System: E10000-like 16-node system Full System Simulation Virtutech Simics running Solaris 8 on SPARC V9 A blocking processor model (Simics) An OoO processor model (TFSim Mauer et al., SIGMETRICS 02) Memory system simulator MOSI invalidation-based broadcast coherence protocol (Martin et al., HPCA-02) Alaa Alameldeen and David Wood 14

15 Simulating Space Variability? Simulations are deterministic Variability cannot be ignored for multi- threaded applications One execution may not be representative Execution paths affect simulation conclusions We need to obtain a space of results Alaa Alameldeen and David Wood 15

16 Injecting Randomness We introduce artificial random perturbations in each simulation run For each memory access, latency in nanoseconds becomes Latency + r (r = -2, -1, 0, 1, 2 nanoseconds, uniform dist.) Roughly models contention due to DMA traffic Other methods are possible Alaa Alameldeen and David Wood 16

17 Simulated Space Variability 1.10 Normalized Runtime Barnes-Hut Ocean ECPerf Slashcode OLTP Benchmark Apache SPECjbb max avg min 20 runs ~10 hrs sim. Space variability exists in our benchmarks Alaa Alameldeen and David Wood 17

18 Quantifying Variability: The Wrong Conclusion Ratio (WCR) Cycles Per Trans. (millions) OLTP WCR (16,32) = 18% WCR (16,64) = 7.5% WCR (32,64) = 26% ROB Size max avg min 20 runs 50 Xacts Alaa Alameldeen and David Wood 18

19 Outline Motivation and Overview Variability in Real Systems Variability in Simulations Accounting for Variability Conclusions Alaa Alameldeen and David Wood 19

20 Definition: Confidence Intervals Range of values expected to include population parameter (e.g. mean) Confidence Probability: Probability that true mean lies inside confidence interval For the same confidence probability: Sample Size Confidence Interval Alaa Alameldeen and David Wood 20

21 Accounting for Space Variability Cycles Per Trans. (millions) OLTP Sample Size (number of runs) Alaa Alameldeen and David Wood 21

22 Accounting for Space Variability Cycles Per Trans. (millions) OLTP Sample Size (number of runs) Simple solution: Estimate #runs such that confidence intervals do not overlap Tests of hypotheses can be used (paper) Alaa Alameldeen and David Wood 22

23 Conclusions Short runs of multi-threaded threaded workloads exhibit variability Variability can lead to wrong simulation conclusions Our Solution: Injecting randomness Multiple runs Apply statistical techniques Alaa Alameldeen and David Wood 23

24 Backup Slides Alaa Alameldeen and David Wood 24

25 Effects of OS Scheduling Same Threads Different Threads 4 L2 Set Size cycles Alaa Alameldeen and David Wood 25

26 WCR Definition Percentage of comparison simulation experiments that reach a wrong conclusion The correct conclusion is the relationship between averages of the two populations WCR can be used to estimate the wrong conclusion probability for single experiments Alaa Alameldeen and David Wood 26

27 Confidence Intervals - Equations The confidence interval for the mean of a normally distributed infinite population: ts y mean y + n ts n Sample Size needed to limit mean relative error to r: n = ts r Y 2 Alaa Alameldeen and David Wood 27

28 Hypothesis Testing Tests whether there is no difference between two population means Hypothesis: µ 32 = µ 64 tests whether the two means of the 32 and 64 ROB configurations are different Hypothesis is tested using sample means and variances If hypothesis rejected Our conclusion is significant Alaa Alameldeen and David Wood 28

29 Accounting for Time Variability Is time variability caused by the same effects that cause space variability? Use Analysis of Variance (ANOVA) If time variability is caused by different effects, we need to obtain a time sample Observations obtained from different starting points Alaa Alameldeen and David Wood 29

30 Multi-threaded threaded Workloads and Simulation Multi-threaded threaded workloads are important Workloads for commercial servers New architectures support multi-threading threading Performance metrics are different from traditional benchmarks Throughput-oriented oriented (transactions) IPC is not appropriate (idle time!) Simulation Challenge: Comparing systems running multi-threaded threaded applications Alaa Alameldeen and David Wood 30

31 Simulation of Multi-threaded threaded Workloads Simulation is slow! We cannot simulate the whole workload Solution: Run for a fixed number of transactions Measure the per-transaction runtime (cycles per transaction) Use to compare different systems Alaa Alameldeen and David Wood 31

Bandwidth Adaptive Snooping

Two classes of multiprocessors Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet Project Computer Sciences Department University of Wisconsin