A Fast Instruction Set Simulator for RISC-V

Size: px

Start display at page:

Download "A Fast Instruction Set Simulator for RISC-V"

Luke Richard
6 years ago
Views:

1 A Fast Instruction Set Simulator for RISC-V Esperanto Technologies, Inc. 5 th RISC-V Workshop November 30, 2016 Esperanto Technologies: A Fast ISA Simulator for RISC-V 1 5 th RISC-V Workshop Nov 30, 2016

2 Background Esperanto is a stealth mode startup designing chips with RISC-V. Esperanto wanted a fast RISC-V ISA simulator capable of: Running large applications with minimal (<5x) slowdown Running large number of threads with good scalability Providing flexibility in testing instruction extensions Providing flexibility in gathering performance data Fast simulation is a key productivity tool Gives a fast compile, run and test/debug loop prior to silicon Current simulators were judged to be too slow Undertook project to modify Eltechs ExaGear to run RISC-V Esperanto Technologies: A Fast ISA Simulator for RISC-V 2 5 th RISC-V Workshop Nov 30, 2016

3 Motivation Evaluated two existing options: QEMU and Spike Would either be sufficient? We compiled several tests from Spec2006 benchmark both for x86-64 and RISC-V with the same compiler version and options: GCC O2 Esperanto Technologies: A Fast ISA Simulator for RISC-V 3 5 th RISC-V Workshop Nov 30, 2016

4 Comparison of native, QEMU and Spike runtimes Benchmark x86-64 native (in seconds) QEMU system mode ( in seconds) Spike pk (in seconds) 099.go , bzip2 475 Failed 41, bwaves 416.gamess 445.gobmk 435.gromacs 444 Failed 48,859 66* Failed 16,045 64* , Failed * Only one subtask Esperanto Technologies: A Fast ISA Simulator for RISC-V 4 5 th RISC-V Workshop Nov 30, 2016

5 Result of Spike and QEMU evaluation Spike is a very slow simulator: 86x 267x times slower than native Spike user mode (pk) requires static binaries not applicable in some cases QEMU system mode was not fast enough: 34x 68x times slower than native QEMU had no user mode support for RISC-V when we started the evaluation Esperanto Technologies: A Fast ISA Simulator for RISC-V 5 5 th RISC-V Workshop Nov 30, 2016

6 Fast Simulator Design Esperanto Technologies: A Fast ISA Simulator for RISC-V 6 5 th RISC-V Workshop Nov 30, 2016

7 User mode approach User mode emulation means we only need to run the RISC-V instructions in the application, not the OS. The OS still runs compiled to the native hardware. User mode approach advantages: Faster emulation (does not need sophisticated software MMU techniques) Does not need stable RISC-V kernel right now Kernel code does not blur performance results Esperanto Technologies: A Fast ISA Simulator for RISC-V 7 5 th RISC-V Workshop Nov 30, 2016

8 User mode approach RISC-V World! RISC-V Apps RISC-V Libs Fast Simulator x86 Apps x86 Libs x86-64 Linux x86-64 Native Hardware Esperanto Technologies: A Fast ISA Simulator for RISC-V 8 5 th RISC-V Workshop Nov 30, 2016

9 User mode key points Fast Simulator can only execute Linux RISC-V ELFbinaries with user mode RISC-V code is translated into corresponding x86-64 instructions RISC-V Linux system calls are translated into corresponding x86-64 Linux system calls RISC-V applications can only see a RISC-V world (some kind of chroot), but can communicate with host via kernel (for example, using sockets) Esperanto Technologies: A Fast ISA Simulator for RISC-V 9 5 th RISC-V Workshop Nov 30, 2016

Fast Simulator Translation Flow Perform optimizations here 3.

previously translated? 2. If the trace was not translated, then translate it 4.

Check if the trace was previously translated 5.

10 Fast Simulator Translation Flow Perform optimizations here 3. Run translated trace on x86 RISC-V APPLICATION FAST TRANSLATION Was this trace previously translated? 2. If the trace was not translated, then translate it 4. Save translated trace in cache RUNNING TRANSLATED TRACE { RISC-V traces 1. Check if the trace was previously translated 5. If trace was translated before restore it from cache CACHE 6. Run restored trace on x86 Esperanto Technologies: A Fast ISA Simulator for RISC-V 10 5 th RISC-V Workshop Nov 30, 2016

11 Fast Simulator Translation Flow Key Points Fast Simulator translates traces All compiled traces are saved in a Translation Cache and are then reused Several optimizations are applied, including efficient register allocation, peephole optimization, dynamic jump cache, etc. Floating point calculations directly use hardware FPU and FPU registers Esperanto Technologies: A Fast ISA Simulator for RISC-V 11 5 th RISC-V Workshop Nov 30, 2016

12 Fast Simulator Design Frontends New Arch ARM v7 x86-32 IR Component ARM v7 ARM v8 x86-64 Backends Esperanto Technologies: A Fast ISA Simulator for RISC-V 12 5 th RISC-V Workshop Nov 30, 2016

13 Fast Simulator Design Frontends RISC-V 64bit ARM v7 x86-32 IR Component ARM v7 ARM v8 x86-64 Backends Esperanto Technologies: A Fast ISA Simulator for RISC-V 13 5 th RISC-V Workshop Nov 30, 2016

14 Fast Simulator Design Benefits x86-64 backend was implemented previously and we just reused it Intermediate Representation (IR) was also reused All reused components are very reliable due to exhaustive testing in previous products For first reasonable version we had to implement only RISC-V frontend and tune other components Many optimizations worked without changes Esperanto Technologies: A Fast ISA Simulator for RISC-V 14 5 th RISC-V Workshop Nov 30, 2016

15 Development time frame About 2 months for a first reliable version which is able to run full Spec2006 benchmark About one month of performance tuning the optimizations to get the presented numbers Esperanto Technologies: A Fast ISA Simulator for RISC-V 15 5 th RISC-V Workshop Nov 30, 2016

16 As a Result Fast development High reliability High performance Esperanto Technologies: A Fast ISA Simulator for RISC-V 16 5 th RISC-V Workshop Nov 30, 2016

17 Performance results Esperanto Technologies: A Fast ISA Simulator for RISC-V 17 5 th RISC-V Workshop Nov 30, 2016

18 Performance results Intel(R) Core(TM) i GHz GCC 6.1.0, O2 optimization level RISC-V toolchain available on 24 OCT 2016 QEMU (RISC-V user mode appears!) Spec2006 benchmarks Esperanto Technologies: A Fast ISA Simulator for RISC-V 18 5 th RISC-V Workshop Nov 30, 2016

19 CINT2006 Fast Sim/x86-64 performance comparison x86-64 Fast Sim Spec name (in seconds) (in seconds) Ratio 400.perlbench bzip gcc mcf gobmk hmmer sjeng libquantum h264ref omnetpp astar xalancbmk GeoMean 2.47 Esperanto Technologies: A Fast ISA Simulator for RISC-V 19 5 th RISC-V Workshop Nov 30, 2016

20 CFP2006 Fast Sim/x86-64 performance comparison Spec name x86-64 (in seconds) Fast Sim (in seconds) Ratio 410.bwaves gamess milc zeusmp gromacs cactusADM leslie3d namd dealII soplex povray calculix GemsFDTD tonto lbm wrf sphinx GeoMean 3.48 Esperanto Technologies: A Fast ISA Simulator for RISC-V 20 5 th RISC-V Workshop Nov 30, 2016

21 CINT2006 QEMU user mode/fast Sim performance comparison Fast Sim QEMU user mode Spec name (in seconds) (in seconds) Ratio 400.perlbench bzip gcc mcf gobmk hmmer sjeng libquantum h264ref omnetpp astar xalancbmk GeoMean 1.71 Esperanto Technologies: A Fast ISA Simulator for RISC-V 21 5 th RISC-V Workshop Nov 30, 2016

22 CFP2006 QEMU user mode/fast Sim performance comparison Spec name Fast Sim (in seconds) QEMU user mode (in seconds) Ratio 410.bwaves gamess milc zeusmp gromacs cactusADM leslie3d namd dealII soplex povray calculix GemsFDTD tonto lbm wrf sphinx GeoMean 7.29 Esperanto Technologies: A Fast ISA Simulator for RISC-V 22 5 th RISC-V Workshop Nov 30, 2016

23 CINT2006 Spike pk/fast Sim performance comparison Fast Sim Spike pk Spec name (in seconds) (in seconds) Ratio 400.perlbench bzip gcc 890 failed 429.mcf gobmk hmmer sjeng libquantum h264ref 1986 failed 471.omnetpp astar xalancbmk 828 failed GeoMean Esperanto Technologies: A Fast ISA Simulator for RISC-V 23 5 th RISC-V Workshop Nov 30, 2016

24 CFP2006 Spike pk/fast Sim performance comparison Spec name Fast Sim (in seconds) Spike pk (in seconds) Ratio 410.bwaves gamess milc zeusmp gromacs 1360 failed 436.cactusADM leslie3d namd dealII soplex povray calculix GemsFDTD tonto lbm wrf 2280 failed 482.sphinx failed GeoMean Esperanto Technologies: A Fast ISA Simulator for RISC-V 24 5 th RISC-V Workshop Nov 30, 2016

25 Performance summary Fast Simulator is only 2.5 times slower than native on SpecInt2006 and 3.5 times on SpecFPU2006 (3x in average and 6.2x in the worst case) Fast Simulator is 1.7 times faster than QEMU user mode on SpecInt2006 and 7.3 times faster on SpecFPU2006 (4x in average and up to 13.5x on some tests ) Fast simulator ~30 times faster than Spike Esperanto Technologies: A Fast ISA Simulator for RISC-V 25 5 th RISC-V Workshop Nov 30, 2016

26 Why is Fast Simulator so fast? Use of a performance oriented architecture: Compiler style Intermediate Representation (allows implementing more optimizations) Smart trace collection Optimized x86-64 backend Many of optimization tricks in other components are enabled by default Hardware floating point emulation Esperanto Technologies: A Fast ISA Simulator for RISC-V 26 5 th RISC-V Workshop Nov 30, 2016

27 Good Multithreading Scalability Intel(R) Xeon Phi(TM) CPU 1.30GHz, 64 cores GeoBenchmark: $ mig_benchmark Number of threads x86-64 seconds Ratio Fast Sim seconds Ratio Esperanto Technologies: A Fast ISA Simulator for RISC-V 27 5 th RISC-V Workshop Nov 30, 2016

28 Future plans Still a proprietary simulator under development Future plans may include: Additional optimizations to improve performance Improved floating point precision support Support for additional extensions Support for additional metrics Improved debugging options System Mode? Esperanto Technologies: A Fast ISA Simulator for RISC-V 28 5 th RISC-V Workshop Nov 30, 2016

29 Summary Internal working name is ExaGear-RVx ExaGear-RVx is an extremely fast simulator Only 2.5 (int) to 3.5x(float) slower than native execution Up to 10+ times faster(4x in average) than QEMU user mode Good multithreading scalability Industrial level reliability Reasonable design flexibility Allows quick support for hardware ISA changes Esperanto Technologies: A Fast ISA Simulator for RISC-V 29 5 th RISC-V Workshop Nov 30, 2016

Lightweight Memory Tracing

Lightweight Memory Tracing Mathias Payer*, Enrico Kravina, Thomas Gross Department of Computer Science ETH Zürich, Switzerland * now at UC Berkeley Memory Tracing via Memlets Execute code (memlets) for