Introduction to gem5. Nizamudheen Ahmed Texas Instruments

Similar documents
Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd

Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs

Hardware Software Bring-Up Solutions for ARM v7/v8-based Designs. August 2015

What is gem5 and where do I get it?

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

Getting the Most out of Advanced ARM IP. ARM Technology Symposia November 2013

Extending Fixed Subsystems at the TLM Level: Experiences from the FPGA World

Software Driven Verification at SoC Level. Perspec System Verifier Overview

Designing, developing, debugging ARM Cortex-A and Cortex-M heterogeneous multi-processor systems

Big.LITTLE Processing with ARM Cortex -A15 & Cortex-A7

Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces

System Simulator for x86

The Challenges of System Design. Raising Performance and Reducing Power Consumption

A Trace-driven Approach for Fast and Accurate Simulation of Manycore Architectures

Embedded HW/SW Co-Development

Test and Verification Solutions. ARM Based SOC Design and Verification

Verification Futures The next three years. February 2015 Nick Heaton, Distinguished Engineer

Early Software Development Through Emulation for a Complex SoC

Each Milliwatt Matters

SoC Platforms and CPU Cores

FPGA Adaptive Software Debug and Performance Analysis

Using Virtual Platforms To Improve Software Verification and Validation Efficiency

Combining Arm & RISC-V in Heterogeneous Designs

Speeding AM335x Programmable Realtime Unit (PRU) Application Development Through Improved Debug Tools

Modeling and Simulation of System-on. Platorms. Politecnico di Milano. Donatella Sciuto. Piazza Leonardo da Vinci 32, 20131, Milano

Copyright 2016 Xilinx

KeyStone II. CorePac Overview

Optimizing Hardware/Software Development for Arm-Based Embedded Designs

Verification Futures Nick Heaton, Distinguished Engineer, Cadence Design Systems

SIMULATOR AMD RESEARCH JUNE 14, 2015

Wind River. All Rights Reserved.

XPU A Programmable FPGA Accelerator for Diverse Workloads

Architectural Support for Operating Systems

RISC-V Core IP Products

Validation Strategies with pre-silicon platforms

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014

Maximizing heterogeneous system performance with ARM interconnect and CCIX

Cycle accurate transaction-driven simulation with multiple processor simulators

Learning gem5. Jason Lowe-Power. 2/5/2017 Jason Lowe-Power 1

NS115 System Emulation Based on Cadence Palladium XP

Tracing embedded heterogeneous systems

Contents of this presentation: Some words about the ARM company

Modular ARM System Design

RM3 - Cortex-M4 / Cortex-M4F implementation

Does FPGA-based prototyping really have to be this difficult?

Mapping applications into MPSoC

Chapter 5 B. Large and Fast: Exploiting Memory Hierarchy

Combining TLM & RTL Techniques:

RA3 - Cortex-A15 implementation

Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye

FPGA Entering the Era of the All Programmable SoC

Full-System Timing-First Simulation

Design Choices for FPGA-based SoCs When Adding a SATA Storage }

A Unified HW/SW Interface Model to Remove Discontinuities between HW and SW Design

ISA-L Performance Report Release Test Date: Sept 29 th 2017

The Architects View Framework: A Modeling Environment for Architectural Exploration and HW/SW Partitioning

Platform-based Design

Wai Chee Wong Sr.Member of Technical Staff Freescale Semiconductor. Raghu Binnamangalam Sr.Technical Marketing Engineer Cadence Design Systems

Cycle Approximate Simulation of RISC-V Processors

RAMP-White / FAST-MP

ARM Powered SoCs OpenEmbedded: a framework for toolcha. generation and rootfs management

Automatic Instrumentation of Embedded Software for High Level Hardware/Software Co-Simulation

Multi-core microcontroller design with Cortex-M processors and CoreSight SoC

Negotiating the Maze Getting the most out of memory systems today and tomorrow. Robert Kaye

ARMv8-A Software Development

Copyright 2014 Xilinx

Performance Verification for ESL Design Methodology from AADL Models

Diplomarbeit. zum Thema. Simulation of RISC-V based Systems in gem5

Hardware Design and Simulation for Verification

Embedded Systems: Architecture

5. ARM 기반모니터프로그램사용. Embedded Processors. DE1-SoC 보드 (IntelFPGA) Application Processors. Development of the ARM Architecture.

Parallel Simulation Accelerates Embedded Software Development, Debug and Test

Effective System Design with ARM System IP

Cadence SystemC Design and Verification. NMI FPGA Network Meeting Jan 21, 2015

Chapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs

Hardware/Software Co-design

Simulation Based Analysis and Debug of Heterogeneous Platforms

G Disco. Robert Grimm New York University

Virtual PLATFORMS for complex IP within system context

An introduction to CoCentric

Overview ESESC Tutorial

Zynq-7000 All Programmable SoC Product Overview

«Real Time Embedded systems» Multi Masters Systems

Instruction Encoding Synthesis For Architecture Exploration

Common Computer-System and OS Structures

2. HW/SW Co-design. Young W. Lim Thr. Young W. Lim 2. HW/SW Co-design Thr 1 / 21

Design methodology for multi processor systems design on regular platforms

Software Verification for Low Power, Safety Critical Systems

Linux Storage System Bottleneck Exploration

Cortex-A9 MPCore Software Development

Veloce2 the Enterprise Verification Platform. Simon Chen Emulation Business Development Director Mentor Graphics

Assembling and Debugging VPs of Complex Cycle Accurate Multicore Systems. July 2009

Verifying big.little using the Palladium XP. Deepak Venkatesan Murtaza Johar ARM India

Kernel perf tool user guide

MAGPIE TUTORIAL. Configuration and usage. Abdoulaye Gamatié, Pierre-Yves Péneau. LIRMM / CNRS-UM, Montpellier

Amortised Optimisation as a Means to Achieve Genetic Improvement

EEM870 Embedded System and Experiment Lecture 4: SoC Design Flow and Tools

QEMU for Xilinx ZynqMP. V Aug-20

Embedded Software Dynamic Analysis. A new life for the Virtual Platform

Transcription:

Introduction to gem5 Nizamudheen Ahmed Texas Instruments 1

Introduction A full-system computer architecture simulator Open source tool focused on architectural modeling BSD license Encompasses system-level architecture, as well as processor micro-architecture. The gem5 simulation infrastructure is the merger of M5 The best aspects of the M5 and The best aspects of GEMS Highly configurable simulation framework to support multiple ISAs, and diverse CPU models developed @ The University of Michigan GEMS [General Execution-driven Multiprocessor Simulator] detailed and flexible memory system model Includes support for multiple cache coherence protocols and interconnect models developed @ The University of Wisconsin Madison 2

Ref: http://gem5.org/dist/tutorials/hipeac2012/gem5_hipeac.pdf 3

Features - Framework A Simulation framework C++ based simple Discrete event simulation kernel sc_thread, sc_cthread and wait not supported gem5 Events provide mechanism to schedule, deschedule and reschedule events on the simulation time-line Object (derived from SimObject) schedule their own events on EventQueue. Python based front-end interface Python scripts to construct topology being simulated Initialization, configuration, simulation control & statistics Supports: Alpha, ARM, MIPS, Power, SPARC, and x86 ARM ARM detailed configuration similar to Cortex-A15 including support for Thumb, Thumb-2, VFPv3 and NEON instruction set extensions Multiple system simulation Example: multiple SoC connected over a simulated-ethernet link Boots Linux and Android Enough IP model supported to boot Linux VNC capabilities (Graphics capabilities) 4

Features System modes gem5 supports 2 fundamental modes of operation Full system (FS) Models bare hardware, including devices Interrupts, exceptions, privileged instructions, fault handlers Use-case: benchmarking individual applications, or set of applications on MP Additional feature: Simulated UART output & frame buffer output Syscall emulation (SE) Models user-visible ISA plus common system calls System calls emulated, typically by calling host OS Simplified address translation model, no scheduling Use-case: OS fast-boot 5

Features CPU Models Configurable CPU models : Supports 3 CPU models Simple Atomic/Timing : Fast CPU model InOrder: Detailed pipelined in-order CPU model O3: Detailed pipelined out-of-order CPU model Supports a domain specific language to represent ISA details Includes information to generate the decode function Example def bitfield OPCODE <31:26>; def bitfield IMM <12>; def signed bitfield MEMDISP <15:0>; decode OPCODE { 0: Integer::add({{ Rc = Ra + Rb; }}); 1: Integer::sub({{ Rc = Ra - Rb; }}); } 6

Features Memory reference Interfaces Three transport interfaces : functional, atomic, timing Functional Similar to TLM debug-transport Untimed call No state change intended Use-case: For loading binaries, memory introspection, etc.,. Atomic Similar to TLM blocking transport (but no wait) time annotation State change allowed (cache fill, eviction and so on) Use-case: LT style use-case, cache warming, etc.,. Timing Similar to TLM non-blocking transport Non-blocking interface, time annotations, multiple phases Use-case: Detailed memory access behavior analysis 7

Features Memory System (1) Memory System: Classic (from M5): Fast and configurable memory system model Ruby (from GEMS) : framework/infrastructure to model variety of cachecoherent memory-system. Classic memory model Fast and easily configurable memory-model. Supports Atomic as well as Timing mode operation Higher simulation speed compared to Ruby Models simplistic snooping cache coherency protocol. Less accurate than detailed Ruby model 8

Features Memory System(2) Ruby Detailed model for the memory subsystem. Supports Timing access interface. Does not supports atomic access interface. Supports a domain specific language called SLICC(Specification Language for Implementing Cache Coherence) support a wide variety of cache coherence protocols, from directory to snooping protocols and several points in between. SLICC file SLICC compiler Documentation and Cache controller model code for cache-coherency Includes Inclusive/exclusive cache hierarchies Various replacement policies Coherence protocols Interconnection network DMA & Memory controller Ruby accurately models on-chip network contention and flow control 9

Features Check pointing & Fast-forward Checkpointing Snapshot the relevant system state Restore it later The ISA, number of cores and memory-map need t be same to restore the session Use serialize and unserialize concepts Supported on classic memory-model as well as Ruby memory-model. Fast-forward Idea is to start the simulation in atomic mode and switch over to detailed mode for relevant/important simulation period Switch may consume few more simulation cycles to drain outstanding memory-access request 10

Flexibility Source: The gem5 Simulator, May 2011 issue of ACM SIGARCH Computer Architecture News 11

GEM5 accuracy Real System: ST-Ericsson Nova A9500 processor Dual-core ARM Cortex-A9 processor (1 GHz) running a Linux kernel It also features a number of DSP and ASIP cores along with a Mali-400 GPU GEM5 System Dual-core ARM Cortex-A9core running at 1 GHz 32-kB private L1 data and instruction caches, 512-kB shared L2 cache DDR physical memory running at 400MHz. Linux Kernel 2.6.38 Ref: Accuracy Evaluation of GEM5 simulator system A. Butko, R. Garibotti, L. Ost, and G. Sassatelli. In the proceeding of the IEEE International Workshop on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), York, United Kingdom, July 2012. Conclusion According to the results, the accuracy varies from 1.39% to 17.94% depending on the memory traffic. In the worst scenario, mismatch has been shown to result from overly simple model of the external DDR memory... 12

TI & GEM5 Wrapped GEM5 ISA simulator into SystemC wrapper and plugged that into the architecture modeling tool chain (1H2011). SystemC scheduler integration Classic memory-model to TLM 2.0 bridge Working closely with a 3P to upstream the SystemC wrapper and TLM 2.0 bridge into the standard gem5 code-base Plan to close this by 1H13 Enabled full-system performance optimization for next-gen heterogeneous SoC Running complex Linux workloads Heavily used to address many-core challenges 13

Event Event SystemC Integration (1) gem5 Model gem5 Model gem5 Event Queue time sc_event.notify(t) Pop when the time comes Ref: Integrating gem5 in systemc simulations, Alexandre Romaña, Texas Instruments http://www.m5sim.org/wiki/images/7/72/gem5_workshop_systemc_integration_ext.pdf 14

GEM5 classic Simulation bridge Generic TLM 2.0 AMBA TLM 2.0 SystemC Integration (2) Protocol SystemC amba Bridge Model Ref: Integrating gem5 in systemc simulations, Alexandre Romaña, Texas Instruments http://www.m5sim.org/wiki/images/7/72/gem5_workshop_systemc_integration_ext.pdf Free from carbon design systems 15

Tool dependency GCC 4.2 + Python SWIG Scons (build) Google Protocol Buffers 16

Summary gem5 introduction High-level features (CPU/Memory/System) Active gem5 community Gem5 community and user group is very active Past 100 days ~850 mails in the gem5-user mailing list reflector ~1200 mails in the gem5-dev mailing list reflector Resources Subscribe to the mailing lists gem5-users Questions about using/running gem5 gem5-dev Questions about modifying the simulator Submit a patch to our ReviewBoard http://reviews.gem5.org Read & Contribute to the wiki http://www.gem5.org 17

Q & A 18

Envisioned use-case for system simulation SW development and verification Binary translation models (QEMU/OVP) are fast enough to do this and have a mature SW development environment HW/SW performance verification Need performance measure of 1 st order accuracy, capturing the things that actually matters Early architecture Exploration Need an environment where it is fast and easy to model and connect the key architectural components of hardware platform HW/SW functional verification RTL is representative enough and has enough visibility and a mature methodology Courtesy: http://gem5.org/dist/tutorials/hipeac2012/02.introduction.m4v 19