Statistical Simulation of Superscalar Architectures using Commercial Workloads

Size: px

Start display at page:

Download "Statistical Simulation of Superscalar Architectures using Commercial Workloads"

Baldric Simon
5 years ago
Views:

1 Statistical Simulation of Superscalar Architectures using Commercial Workloads Lieven Eeckhout and Koen De Bosschere Dept. of Electronics and Information Systems (ELIS) Ghent University, Belgium CAECW 01, January 21, 2001

2 Outline Introduction Statistical Simulation Statistical profiling Synthetic trace generation Methodology Evaluation Conclusion 2

3 Introduction Architectural simulation trace-driven or execution-driven accurate long simulation times long traces to be stored Need for fast simulation techniques take part of a full trace analytical modeling trace sampling statistical simulation 3

4 Goal Previous work used SPEC benchmarks to evaluate statistical simulation In this talk we use both commercial and scientific workloads SPECint, SPECfp, system traces, multimedia, X graphics, database 4

5 Statistical Simulation Three steps: extract statistical profile from a program execution generate synthetic trace from it simulate on a trace-driven simulator Two major advantages: statistical profile is more compact than full trace fast simulation due to statistical nature design space exploration in limited time 5

6 Statistical Simulation real trace (e.g. SPEC benchmark) branch profiling branch statistics cache profiling cache statistics instruction profiling instruction statistics statistical profile synthetic trace generator synthetic trace trace-driven simulator 6

7 Statistical Profiling Microarchitecture-independent statistics instruction statistics Microarchitecture-dependent statistics branch statistics cache statistics Result: statistical simulation only to explore design options of processor core (cache and branch predictor are fixed) 7

8 Statistical Profiling Instruction Statistics Instruction mix (13 classes) Number of register operands Age of register operands probability that register operand was produced δ instructions before it in the trace (only RAW) Memory dependencies probability that load is memory-dependent on the δ-th store before it in the trace (only RAW) 8

9 Statistical Profiling Branch Statistics Six branch types conditional branch, unconditional branch, call with offset, indirect jump, indirect call, return Distinction branch prediction accuracy: refill pipeline on branch misprediction branch target prediction accuracy: singlecycle bubble in pipeline on correct branch prediction but target misprediction 9

10 Statistical Profiling Cache Statistics D-cache statistics L1 D-cache miss rate L2 D-cache miss rate I-cache statistics L1 I-cache miss rate L2 I-cache miss rate 10

11 Synthetic Trace Generation st Instruction-by-instruction through random number generation add ld br I-cache miss D-cache miss mispredicted Determine instruction type number of operands age of register operands memory dependency branch behavior D-cache behavior I-cache behavior 11

12 Methodology: microarchitecture Out-of-order processor 8 and 16 issue windows of 64 and 128 instructions McFarling branch predictor small cache configuration 8KB DM L1 I-cache, 8KB DM L1 D-cache, 64KB 2WSA unified L2 cache large cache configuration 32KB DM L1 I-cache, 64KB 2WSA L1 D-cache, 512KB 4WSA unified L2 cache Access time L1 I-cache (1 cycle), L1 D-cache (2 cycles), L2 cache (10 cycles), main memory (80 cycles) 12

13 Methodology: benchmarks 8 SPECint95 benchmarks 5 SPECfp95 benchmarks (hydro2d, su2cor, swim, tomcatv, wave5) 8 IBS system traces (mpeg, jpeg, gs, verilog, gcc, sdet, nroff, groff) 4 MediaBench applications (g721, gs, gsm, mpeg2) 4 X graphics benchmarks (DooM, POVRay, Xanim, Quake) 2 TPC-D queries running on Postgres 6.3 ~ 200 million instructions / trace 13

14 IPC prediction error = Evaluation IPC real trace - IPC synthetic trace IPC real trace IPC real trace = IPC when running real trace on trace-driven simulator IPC synthetic trace = IPC when running synthetic trace generated from the statistical profile of the real trace Simulation speed: s IPC /x IPC less than 1% after simulating 1 million instructions 14

15 IPC prediction error (1) IPC prediction error 40% 30% 20% 10% 0% -10% -20% -30% li gcc compress go ijpeg vortex m88ksim perl 157% 135% high D-cache miss rate hydro2d su2cor swim tomcatv wave5 mpeg jpeg gs verilog real_gcc sdet nroff groff g721_e gs gsm_e mpeg2 xanim xdoom xpovray xquake tpc-d.17 tpc-d.2 SPECint95 SPECfp95 IBS MediaBench X graphics TPC-D 16-issue, 128-entry window, small cache configuration 15

16 IPC prediction error (2) 30% IPC prediction error 20% 10% 0% -10% -20% -30% li gcc compress go ijpeg vortex m88ksim perl hydro2d su2cor swim tomcatv wave5 mpeg jpeg gs verilog real_gcc sdet nroff groff g721_e gs gsm_e mpeg2 xanim xdoom xpovray xquake tpc-d.17 tpc-d.2 SPECint95 SPECfp95 IBS MediaBench X graphics TPC-D 16-issue, 128-entry window, large cache configuration 16

17 IPC prediction error vs. static instruction count 160% w = 64; i = 8; 'small' cache 140% w = 128; i = 16; 'small' cache 120% w = 64; i = 8; 'large' cache IPC prediction error 100% 80% 60% 40% 20% nroff w = 128; i = 16; 'large' cache jpeg (IBS) verilog DooM mpeg (IBS) sdet groff Quake gs (IBS) gcc 0% -20% -40% vortex go TPC-D gcc (IBS) static instruction count (number of instructions executed at least once) 17

18 Conclusion (1) Higher IPC prediction errors for applications with smaller static instruction count: MediaBench applications SPECfp95 benchmarks 2 X graphics benchmarks (POVRay and Xanim) 5 SPECint95 benchmarks 18

19 Conclusion (2) Smaller IPC prediction errors for applications with larger instruction footprint: IBS system traces TPC-D traces 2 X graphics benchmarks (DooM and Quake) 3 SPECint95 benchmarks (go, gcc, vortex) IPC prediction error between -1% and 25% 19

20 Conclusion (3) Statistical simulation is a useful fast simulation technique for commercial workloads due to higher variability in instructions since commercial workloads have larger instruction footprint which makes a statistical technique more powerful 20

Annotated Memory References: A Mechanism for Informed Cache Management

Annotated Memory References: A Mechanism for Informed Cache Management Alvin R. Lebeck, David R. Raymond, Chia-Lin Yang Mithuna S. Thottethodi Department of Computer Science, Duke University http://www.cs.duke.edu/ari/ice