Introduction to gem5 Nizamudheen Ahmed Texas Instruments 1
Introduction A full-system computer architecture simulator Open source tool focused on architectural modeling BSD license Encompasses system-level architecture, as well as processor micro-architecture. The gem5 simulation infrastructure is the merger of M5 The best aspects of the M5 and The best aspects of GEMS Highly configurable simulation framework to support multiple ISAs, and diverse CPU models developed @ The University of Michigan GEMS [General Execution-driven Multiprocessor Simulator] detailed and flexible memory system model Includes support for multiple cache coherence protocols and interconnect models developed @ The University of Wisconsin Madison 2
Ref: http://gem5.org/dist/tutorials/hipeac2012/gem5_hipeac.pdf 3
Features - Framework A Simulation framework C++ based simple Discrete event simulation kernel sc_thread, sc_cthread and wait not supported gem5 Events provide mechanism to schedule, deschedule and reschedule events on the simulation time-line Object (derived from SimObject) schedule their own events on EventQueue. Python based front-end interface Python scripts to construct topology being simulated Initialization, configuration, simulation control & statistics Supports: Alpha, ARM, MIPS, Power, SPARC, and x86 ARM ARM detailed configuration similar to Cortex-A15 including support for Thumb, Thumb-2, VFPv3 and NEON instruction set extensions Multiple system simulation Example: multiple SoC connected over a simulated-ethernet link Boots Linux and Android Enough IP model supported to boot Linux VNC capabilities (Graphics capabilities) 4
Features System modes gem5 supports 2 fundamental modes of operation Full system (FS) Models bare hardware, including devices Interrupts, exceptions, privileged instructions, fault handlers Use-case: benchmarking individual applications, or set of applications on MP Additional feature: Simulated UART output & frame buffer output Syscall emulation (SE) Models user-visible ISA plus common system calls System calls emulated, typically by calling host OS Simplified address translation model, no scheduling Use-case: OS fast-boot 5
Features CPU Models Configurable CPU models : Supports 3 CPU models Simple Atomic/Timing : Fast CPU model InOrder: Detailed pipelined in-order CPU model O3: Detailed pipelined out-of-order CPU model Supports a domain specific language to represent ISA details Includes information to generate the decode function Example def bitfield OPCODE <31:26>; def bitfield IMM <12>; def signed bitfield MEMDISP <15:0>; decode OPCODE { 0: Integer::add({{ Rc = Ra + Rb; }}); 1: Integer::sub({{ Rc = Ra - Rb; }}); } 6
Features Memory reference Interfaces Three transport interfaces : functional, atomic, timing Functional Similar to TLM debug-transport Untimed call No state change intended Use-case: For loading binaries, memory introspection, etc.,. Atomic Similar to TLM blocking transport (but no wait) time annotation State change allowed (cache fill, eviction and so on) Use-case: LT style use-case, cache warming, etc.,. Timing Similar to TLM non-blocking transport Non-blocking interface, time annotations, multiple phases Use-case: Detailed memory access behavior analysis 7
Features Memory System (1) Memory System: Classic (from M5): Fast and configurable memory system model Ruby (from GEMS) : framework/infrastructure to model variety of cachecoherent memory-system. Classic memory model Fast and easily configurable memory-model. Supports Atomic as well as Timing mode operation Higher simulation speed compared to Ruby Models simplistic snooping cache coherency protocol. Less accurate than detailed Ruby model 8
Features Memory System(2) Ruby Detailed model for the memory subsystem. Supports Timing access interface. Does not supports atomic access interface. Supports a domain specific language called SLICC(Specification Language for Implementing Cache Coherence) support a wide variety of cache coherence protocols, from directory to snooping protocols and several points in between. SLICC file SLICC compiler Documentation and Cache controller model code for cache-coherency Includes Inclusive/exclusive cache hierarchies Various replacement policies Coherence protocols Interconnection network DMA & Memory controller Ruby accurately models on-chip network contention and flow control 9
Features Check pointing & Fast-forward Checkpointing Snapshot the relevant system state Restore it later The ISA, number of cores and memory-map need t be same to restore the session Use serialize and unserialize concepts Supported on classic memory-model as well as Ruby memory-model. Fast-forward Idea is to start the simulation in atomic mode and switch over to detailed mode for relevant/important simulation period Switch may consume few more simulation cycles to drain outstanding memory-access request 10
Flexibility Source: The gem5 Simulator, May 2011 issue of ACM SIGARCH Computer Architecture News 11
GEM5 accuracy Real System: ST-Ericsson Nova A9500 processor Dual-core ARM Cortex-A9 processor (1 GHz) running a Linux kernel It also features a number of DSP and ASIP cores along with a Mali-400 GPU GEM5 System Dual-core ARM Cortex-A9core running at 1 GHz 32-kB private L1 data and instruction caches, 512-kB shared L2 cache DDR physical memory running at 400MHz. Linux Kernel 2.6.38 Ref: Accuracy Evaluation of GEM5 simulator system A. Butko, R. Garibotti, L. Ost, and G. Sassatelli. In the proceeding of the IEEE International Workshop on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), York, United Kingdom, July 2012. Conclusion According to the results, the accuracy varies from 1.39% to 17.94% depending on the memory traffic. In the worst scenario, mismatch has been shown to result from overly simple model of the external DDR memory... 12
TI & GEM5 Wrapped GEM5 ISA simulator into SystemC wrapper and plugged that into the architecture modeling tool chain (1H2011). SystemC scheduler integration Classic memory-model to TLM 2.0 bridge Working closely with a 3P to upstream the SystemC wrapper and TLM 2.0 bridge into the standard gem5 code-base Plan to close this by 1H13 Enabled full-system performance optimization for next-gen heterogeneous SoC Running complex Linux workloads Heavily used to address many-core challenges 13
Event Event SystemC Integration (1) gem5 Model gem5 Model gem5 Event Queue time sc_event.notify(t) Pop when the time comes Ref: Integrating gem5 in systemc simulations, Alexandre Romaña, Texas Instruments http://www.m5sim.org/wiki/images/7/72/gem5_workshop_systemc_integration_ext.pdf 14
GEM5 classic Simulation bridge Generic TLM 2.0 AMBA TLM 2.0 SystemC Integration (2) Protocol SystemC amba Bridge Model Ref: Integrating gem5 in systemc simulations, Alexandre Romaña, Texas Instruments http://www.m5sim.org/wiki/images/7/72/gem5_workshop_systemc_integration_ext.pdf Free from carbon design systems 15
Tool dependency GCC 4.2 + Python SWIG Scons (build) Google Protocol Buffers 16
Summary gem5 introduction High-level features (CPU/Memory/System) Active gem5 community Gem5 community and user group is very active Past 100 days ~850 mails in the gem5-user mailing list reflector ~1200 mails in the gem5-dev mailing list reflector Resources Subscribe to the mailing lists gem5-users Questions about using/running gem5 gem5-dev Questions about modifying the simulator Submit a patch to our ReviewBoard http://reviews.gem5.org Read & Contribute to the wiki http://www.gem5.org 17
Q & A 18
Envisioned use-case for system simulation SW development and verification Binary translation models (QEMU/OVP) are fast enough to do this and have a mature SW development environment HW/SW performance verification Need performance measure of 1 st order accuracy, capturing the things that actually matters Early architecture Exploration Need an environment where it is fast and easy to model and connect the key architectural components of hardware platform HW/SW functional verification RTL is representative enough and has enough visibility and a mature methodology Courtesy: http://gem5.org/dist/tutorials/hipeac2012/02.introduction.m4v 19