Exploring SW Performance using SoC Transaction-level Modeling

Similar documents
Transaction level modeling of SoC with SystemC 2.0

Transaction Level Modeling with SystemC. Thorsten Grötker Engineering Manager Synopsys, Inc.

Transaction Level Modeling with SystemC. Thorsten Grötker Engineering Manager Synopsys, Inc.

Fast and Accurate Source-Level Simulation Considering Target-Specific Compiler Optimizations

Modeling Software with SystemC 3.0

Hardware/Software Co-design

Modeling and Simulation of System-on. Platorms. Politecnico di Milano. Donatella Sciuto. Piazza Leonardo da Vinci 32, 20131, Milano

ESE Back End 2.0. D. Gajski, S. Abdi. (with contributions from H. Cho, D. Shin, A. Gerstlauer)

SpecC Methodology for High-Level Modeling

4 th European SystemC Users Group Meeting

Embedded System Design and Modeling EE382V, Fall 2008

Hardware Design and Simulation for Verification

Test and Verification Solutions. ARM Based SOC Design and Verification

System-level design refinement using SystemC. Robert Dale Walstrom. A thesis submitted to the graduate faculty

Long Term Trends for Embedded System Design

ReconOS: An RTOS Supporting Hardware and Software Threads

The Use Of Virtual Platforms In MP-SoC Design. Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006

SystemC abstractions and design refinement for HW- SW SoC design. Dündar Dumlugöl. Vice President of Engineering, CoWare, Inc.

Design methodology for multi processor systems design on regular platforms

Chapter 2 M3-SCoPE: Performance Modeling of Multi-Processor Embedded Systems for Fast Design Space Exploration

Introduction to MLM. SoC FPGA. Embedded HW/SW Systems

Co-synthesis and Accelerator based Embedded System Design

EEM870 Embedded System and Experiment Lecture 4: SoC Design Flow and Tools

Embedded System Design

Multi-core microcontroller design with Cortex-M processors and CoreSight SoC

Flexible and Executable Hardware/Software Interface Modeling For Multiprocessor SoC Design Using SystemC

Energy Estimation Based on Hierarchical Bus Models for Power-Aware Smart Cards

Embedded System Design Modeling, Synthesis, Verification

Cover Page. The handle holds various files of this Leiden University dissertation

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey

The Architects View Framework: A Modeling Environment for Architectural Exploration and HW/SW Partitioning

Software Driven Verification at SoC Level. Perspec System Verifier Overview

The Challenges of System Design. Raising Performance and Reducing Power Consumption

Simulink -based Programming Environment for Heterogeneous MPSoC

Rapid-Prototyping Emulation System using a SystemC Control System Environment and Reconfigurable Multimedia Hardware Development Platform

Embedded System Design and Modeling EE382N.23, Fall 2015

Design Space Exploration of Systems-on-Chip: DIPLODOCUS

Transaction-Level Modeling Definitions and Approximations. 2. Definitions of Transaction-Level Modeling

A Cache Hierarchy in a Computer System

Performance Verification for ESL Design Methodology from AADL Models

Experiences and Challenges of Transaction-Level Modelling with SystemC 2.0

Automatic Instrumentation of Embedded Software for High Level Hardware/Software Co-Simulation

ARM System-Level Modeling. Platform constructed from welltested

Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications

Platform-based Design

Cycle accurate transaction-driven simulation with multiple processor simulators

Communication Abstractions for System-Level Design and Synthesis

Concurrent Programming

Concurrent Programming. Implementation Alternatives. Content. Real-Time Systems, Lecture 2. Historical Implementation Alternatives.

Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces

Model-based Analysis of Event-driven Distributed Real-time Embedded Systems

Integrating Instruction Set Simulator into a System Level Design Environment

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews

Codesign Framework. Parts of this lecture are borrowed from lectures of Johan Lilius of TUCS and ASV/LL of UC Berkeley available in their web.

Implementing a Real-Time Architecting Method in a Commercial CASE Tool

Doğan Fennibay MS Thesis Presentation - June 7th, Supervisor: Assoc. Prof. Arda Yurdakul

Operating Systems (2INC0) 2018/19. Introduction (01) Dr. Tanir Ozcelebi. Courtesy of Prof. Dr. Johan Lukkien. System Architecture and Networking Group

V&V: Model-based testing

System Level Design with IBM PowerPC Models

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

Vertical Reuse of functional verification from subsystem to SoC level (with seamless SoC emulation)

Code Generation for QEMU-SystemC Cosimulation from SysML

System-level simulation (HW/SW co-simulation) Outline. EE290A: Design of Embedded System ASV/LL 9/10

A Generic RTOS Model for Real-time Systems Simulation with SystemC

Universal Verification Methodology(UVM)

The Next Generation of Virtual Prototyping: Ultra-fast Yet Accurate Simulation of HW/SW Systems

I/O Management and Disk Scheduling. Chapter 11

A Hybrid Instruction Set Simulator for System Level Design

Jump-Start Software-Driven Hardware Verification with a Verification Framework

Design of Embedded Hardware and Firmware

DIGITAL VS. ANALOG SIGNAL PROCESSING Digital signal processing (DSP) characterized by: OUTLINE APPLICATIONS OF DIGITAL SIGNAL PROCESSING

Modeling and Simulating Discrete Event Systems in Metropolis

An H.264/AVC Main Profile Video Decoder Accelerator in a Multimedia SOC Platform

Hardware-Software Codesign. 6. System Simulation

Wireless Base Band Device (bbdev) Amr Mokhtar DPDK Summit Userspace - Dublin- 2017

Hardware-Software Codesign

Top-Level View of Computer Organization

System-On-Chip Architecture Modeling Style Guide

SPACE: SystemC Partitioning of Architectures for Co-design of real-time Embedded systems

System Debugging Tools Overview

OCCN: A Network-On-Chip Modeling and Simulation Framework. M.Coppola, S.Curaba, M.Grammatikakis, R.Locatelli, G.Maruccia, F.Papariello, L.

Overview. Technology Details. D/AVE NX Preliminary Product Brief

2 TEST: A Tracer for Extracting Speculative Threads

Quantitative Analysis of Transaction Level Models for the AMBA Bus

Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany

Embedded Systems 1: On Chip Bus

Efficient Power Estimation Techniques for HW/SW Systems

A Frame Study for Post-Processing Analysis on System Behavior: A Case Study of Deadline Miss Detection

EE382V: System-on-a-Chip (SoC) Design

SCope: Efficient HdS simulation for MpSoC with NoC

Integration of System-On-Chip Simulation Models

Intel Thread Building Blocks

Fujitsu System Applications Support. Fujitsu Microelectronics America, Inc. 02/02

Boost Verification Results by Bridging the Hardware/Software Testbench Gap

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd

Diplomatic Design Patterns

Modeling and SW Synthesis for

Intel Thread Building Blocks

SEMICON Solutions. Bus Structure. Created by: Duong Dang Date: 20 th Oct,2010

Abstraction. Abstraction

Transcription:

Exploring SW Performance using SoC Transaction-level Modeling Imed Moussa, Thierry Grellier and Giang Nguyen TNI-Valiosys, France Email: {imed.moussa, thierry.grellier@tni-valiosys.com Abstract This paper presents VISTA, a new methodology and tool dedicated to analyse system level performance by executing full-scale SW application code on a transaction-level model of the SoC platform. The SoC provider provides a cycle-accurate functional model of the SoC architecture using the basic SystemC Transaction Level Modeling (TLM) components provided by VISTA : bus models, memories, IPs, CPUs, and RTOS generic services. These components have been carefully designed to be integrated into a SoC design flow with an implementation path for automatic generation of IP HW interfaces and SW device drivers. The application developer can then integrate the application code onto the SoC architecture as a set of SystemC modules. VISTA supports cross-compilation on the target processor and back annotation, therefore bypassing the use of an ISS. We illustrate the features of VISTA through the design and simulation of an MPEG video decoder application. 1. Introduction One of the reasons for the focus on SW is the lagging SW design productivity compared to rising complexity. Although SoCs and board-level designs share the general trend toward using software for flexibility, the criticality of the software reuse problem is much worse with SoCs. The functions required of these embedded systems have increased markedly in complexity, and the number of functions is growing just as fast. Coupled with quickly changing design specifications, these trends have made it very difficult to predict development cycle time. Meanwhile, the traditional embedded systems software industry has so far not addressed issues of "hard constraints," such as reaction speed, memory footprint and power consumption, because they are relatively unimportant for traditional, board-level development systems [1, 2, 3, 4]. But, these issues are critical for embedded software running on SoCs. VISTA is targeted to address the system-level design needs and the SW design reuse needs for SoC design. 2. VISTA Methodology Approach System level architects and application SW developers look for performance analysis and overall behavior of the system. They do not necessarily need and cannot make use of a cycle-accurate model of the SoC platform. However, a pure un-timed C model is not satisfactory either since some timing notions will still be required for performance analysis or power consumption. The capability of VISTA to bring HW/SW SoC modeling and characterization to the fore is key. VISTA can be decomposed into at least two major use paradigms: 1- Creation of the SoC virtual platform for system analysis and architecture exploration. 2- Use of the SoC virtual platform for SW development and system analysis by the systems houses SW or system designers. Before system performance analysis, the designer has to leverage the existing HW library delivered with VISTA to create the appropriate SoC platform architecture. The generic VISTA elements delivered in the library include: Bus (STBus, AHB/APB), Memories, Peripherals (timers, DMA, IOs, etc.) and RTOS elements.

Platform capture Generated Skeleton and accesses System Back-Annotation C++ com pilation Instrumented SystemC Model Vista Libs : Extended SystemC Kernel RTOS model Bus model System Analysis System Simulator Figure 1: VISTA Design and Analysis Flow Using the provided virtual model of the SoC, including the hardware abstraction layer and the RTOS layer, system designers and application SW designers can use VISTA to simulate and analyze system performance in terms of latency, bus loading, memory accesses, arbitration policy, tasks activity. These parameters allow for exploring SW performance as it can be seen for the video decoder application presented in the next section. The first step during the design flow is to create the application tasks in C. Next, using VISTA graphical front-end, the applications tasks and RTOS resources can be allocated on the black-box. Preliminary simulation can now be performed by executing the application code on the virtual platform to ensure that the code is functionally correct. Then, the code is crosscompiled on the target processor. VISTA will process the output from the cross-compilation phase in order to create timing information, which can be back annotated on the original application code to reflect in situ execution of the code on the target platform. Finally, timed simulation can be performed to analyze for system performance and power consumption. The information extracted from this phase can enable the SW developer to further develop and refine the application code. VISTA s simulator is a compiled C++ program that takes advantage of the SystemC2.0 kernel, modified so as to get proper debugging and tracing information. The simulator is launched either as a stand-alone process or as a dynamic library loaded into the process of the GUI. Ideally both options should be available as option one is easier to debug and option two is faster. Several simulation modes are possible. First, interactive simulation: the user can set graphical input devices on ports of the diagram, like sliders, knobs, etc. Outputs can be monitored similarly by setting waveform viewers like the one shown below, which gathers in one single window the functionality of an oscilloscope (continuous traces) and of a logic analyzer (discrete traces). Interactive simulation is essential in the prototyping phases of a software development. Secondly, batch simulation will also be possible. 2. 1 Abstract Communication The accesses are a mechanism to abstract the communication between the modules and allow a seamless refinement of the communication into the IP modules from the functional level to the transactional level. SystemC2.0 already provides ports and channels for this purpose. We have enriched the channel notion with introducing a pattern of three related modules: 1. The slave access first publishes the operations to the channel. It is notified the channel transactions corresponding to these operation invocations, and accepts or rejects them. For example, the amba slave access notifies the amba bus channel whether a transaction is ok, fails or needs to be split. Note that the publishing mechanism makes that a slave

access doesn t need to reproduce the slave module interfaces. A slave access can be shared by several modules, but can only be bound to a single channel. 2. The channel is a pure transactional model of the communication. It guards the actual invocation of the slave operation until the communication transactions are completed. Data are not circulating through the channel, so far they are transmitted within the parameters of the operation. So only the relevant information for an arbitration are given, such as the amount of bytes to transmit, and the word size. If we refer to the simple bus transactional interface defined in [5], a vista channel interface is quite similar, only the parameters regarding the data transmission are discarded. The channel can also contain a runtime modifiable configuration to analyse a system performance. For example a bus channel can have a configurable cycle rate, a FIFO channel may have a configurable depth. It may also contain some measures from which reveling statistics can be extracted or computed within the channel model. 3. The master access is used to encapsulate the invocation of a slave operation through a channel. Each encapsulated operation is called in two steps, the first step is the decode phase to identify the serving slave operation, and then the invocation of the operation through the channel. This invocation adds new parameters to the slave operation which configures the channel transactions to be generated. Note that a master access only addresses a single channel, and that it can invocate a subset of the operations of several slaves, so that it has its own interface which may differ from the union of the slaves interfaces and doesn t require to be an explicit sc_interface. IP Master Master Access TLM Channel Slave Access IP Slave void process() { access->op1 (); void op1() { op1_t op = decode(); (*op)(); Generic Bus + Config. (AHB, Stbus,..) void op1() ; void op2() ; op1_t m_op1; op2_t m_op2 Or Fifo Figure 2 : Abstract Communication Access Defining the accesses module is very straight forward so far they follow a simple pattern. This is why our tool is able of generating the accesses code with very little configuration information once the communication channel is chosen. Figure2 shows the example of accesses to a bus communication channel. It combines UML class diagrams to describe modules and SystemC2.0 [6] graphical notation for ports and interfaces. Furthermore a graphical notation has been introduced to stereotype the accesses modules. It can be used as a full replacement to the class diagram. Both the master and the slave accesses can be customized by a policy. In the case of the master access, this policy indicates whether two concurrent operation calls by the master IP are serialized or have interlaced transactions within the access. The channel is indeed only capable of managing the concurrency between master accesses, but remains a concurrency between the threads of the modules sharing a master access. The slave access policy determines the concurrency on the operations provided by the slave IP. Several masters can generate several interleaved transactions to the same slave, especially when the slave splits the transactions, requiring the slave to have a customizable concurrency policy. IP Master TLM Bus Saccess Maccess TLM Fifo IP Slave void process() { access->op1 (); write(n); read(n); Composition push(n); pop(n) void op1() ; void op2() ; op1_t m_op1; op2_t m_op2 Figure 3 : Channel composition : Master and slave accesses composition

At last, one may think of composing channels. For example, a slave IP can be first attached to a FIFO channel which master access could then become the slave access of a bus channel. These compositions can be the beginning of a methodology to refine the communication of a system and progressively going from a purely functional model to an architectural timed model. A first step would be to detach some functions of the sequential algorithm into modules that prefigures the architecture. Then regarding the iteration on the data, between the functions, one may start ahead the detached function as soon as it has enough data to start with, for example a subset of a picture. This is quite the same idea of setting a pipeline communication, so one may then prepare a communication through a Fifo channel to synchronize the execution flow of the detached functions. Note that the Fifo can also store control so far these a operation calls which are stacked. At this stage we a system level model very similar to what we can have with SDL. We can introduce timing at this stage to analyse the the parallelism of the all system. Then one may decide the partitioning of the system and thus share the communication resources. Hence a bus can refine the fifo communications, some data passing can be referring a shared memory. This is generally the impact of hardware level resource sharing which is ignored by the other specification languages which stops their analysis at the Fifo level. 2.2 Timing Annotation Providing timing annotations is a complex task. The camera module example shows the limited underlying systemc 2.0 mechanisms. Mainly we rely on an event which is never notified but this event has a timeout to resume the execution. This is satisfying for the hardware models, not for the software models. These models shall distinguish active wait from passive wait. The approach is very similar to SystemC 3.0 direction [8], with the wait and consume functions, though some other aspects are also taken into consideration which leads to have several consume-like functions. The goal is to avoid annotating and recompiling anew the system each time a new cpu is selected, or each time a cpu frequency changes. Hence the annotation doesn t contain a timing, but a timing table index. The table indexes are the cpu index and the code chunk index. The cpu index is maintained by the simulation engine. void consume(int cycle_idx, int ad_idx = 0); vs_timing_table_t consume(vs_timing_table_t, int cycle_idx, vs_address_table_t = 0, int ad_idx); Because of the function inlining, preserving the modularity of the compilation requires to have a separate consume function that allows to change the current timing table and thus using the inline function tables. The timing in the table doesn t include absolute time, but a number of CPU cycles so that the elapsed time can be calculated accordingly to the CPU frequency. The last parameter is for a future version : the idea is to use a code offset to take into account the effect of an instruction cache. The consume functions shall be regarded as transaction calls to the operating system which guards the execution of the code. Consume suspends the execution of the code until the cycles required to execute this code have been allocated to that particular segment. Then one have to decide of the good tradeoffs for the simulation speed and the granularity of the annotations. 3. MPEG Video Decoder Case Study The MPEG video decoder is a demo built to show the way of modeling a system at transaction level with the access mechanism. Its purpose was to get a hint on how fast a simulation of this level could, and which details are necessary to build an analysis. The first step was to design the platform architecture. The edit mode of the tool allows to draw the hierarchical blocks of the design with their operations. We can either define new modules or reuse existing ones that are already included into a component library. Some components are provided with the tool such as the AMBA AHB bus. We made a simplified system with 8 modules having one to two sc_threads : A control unit (SW) organizing all the data flows between the modules. It contains 2 threads, one to control the reception and the other to control the emission. The CPU has not been modeled. A memory, present uniquely to model the transactions to it. Figure 4 : MPEG system capture A camera (HW), getting the speaker video sequence, and writing video frames to memory. An extract of the code is shown in figure 5 and figure 6. The module define an operation start_capture following the access pattern. It also contains a sc_thread that repesents the filming action. A

sc_event is used to trigger the sc_thread within start_capture. Once the camera is filming the sc_thread has a timed execution with a wait. This wait is not triggered by the start_capture but by a timeout. SystemC 2.0 currently enforces the usage of an event here though. // master access definition struct camera_master:vs_amba_ahb_master { void write(const vs_address_t ad, const int num_bytes) { vs_amba_ahb_transaction tx( this, m_write->slave(), ad, num_bytes, // slave vs_amba_ahb_transaction::tx_write, access definition typedef vs_amba_ahb_slave camera_slave; typedef camera_slave::communication_if:: op<void, false, TYPELIST_1(const vs_address_t)> vs_amba_ahb_transaction::hsz_32_bits, vs_amba_ahb_transaction::brst_incr*256, vs_amba_ahb_transaction::tr_sequential); (*m_write)(vs_bus_base::master_exclusive, 1,&tx, ad, num_bytes); memory_write_op* m_write; void end_of_elaboration() {decode(m_write, mem::begin, mem::write); camera_master(const char* name) : vs_amba_ahb_master(name) { ; Figure 5 : Access code A graphic card (HW), reading and composing the in and out images to print them. A codec which emulates the memory transactions to encode and decode the images. One would have prefer reusing an IP which hasn t been delivered in time, so we have made it a transaction generator. A modem (HW) which emulates the transaction to emit the encoded picture and receive a picture accordingly to a baud rate. A video card which combines both pictures and synchronize with a GUI posix thread tracing the images. A keyboard to activate the flows and switch between simple and double display. The tool generates code for the accesses to the bus and a skeleton of module to be completed with the operation implementation. Then everything is compiled and linked to obtain a simulator of the platform. The simulator can be launched as a self standing simulator like a classical SystemC2.0 program, or one may use the VISTA tool to control and monitor the execution of the simulation (step by step, continuous, pause, resume). // serving module definition SC_MODULE(camera) { sc_port<vs_slave_access_if> m_slave_access; sc_port<camera_master> m_master_access; void start_capture(const vs_address_t) { { m_dest = dest; m_capturing = true; m_vid_seq.reset(); m_do_capture.notify(); camera_start_capture_op m_start_capture; enum { START_CAPTURE = 0x000000F0, ; void end_of_elaboration() { m_slave_access->add_op(&m_start_capture, START_CAPTURE); void capture_image() { while (1) { if (m_capturing) vs_wait(17, SC_MS); // 60 img/second else wait(m_do_capture); m_vid_seq.next_frame(); sc_event m_do_capture; bool m_capturing; qcif_reader m_vid_seq; vs_address_t m_dest; SC_CTOR(camera) : m_start_capture(this, &camera::start_capture), m_vid_seq("back"), m_capturing(false), { SC_THREAD(capture_image);;end_module(); ; Figure 6 : IP Module code Figure 7: VISTA Analysis session The tool also allows to drag observable variables to an oscilloscope which displays their states. These variables are figured like kind of output ports in the tool. The tool can interact with the simulation to change some settings of the system for analysis purpose. Some configuration variables can be introduced in a system model. They are figured like kind of inputs ports. These observable and configuration variables are provided with the vista library : there are no signals at the transactional level of abstraction. VISTA embeds a protocol which allows to run the monitoring and the simulator tools on two different

hosts. It even allows to monitor SystemC programs without having modeled them with the tool. Using these vista variables allows to drive an analysis session illustrated in figure 7. For example, one may think of increasing or decreasing the baud rate of the modem or its buffer size. One may also maintain a an observable Boolean variable, true as long as a percentile of missed frame occurs. 4. Conclusion We have presented a new methodology and tool for modeling SoC virtual platform for SW development and system level performance analysis and exploration. Using our VISTA methodology approach, we have been able to simulate the performances of a limited Visio phone system by generating 10 000 bus transactions per second, and running 0.5 second real time simulation in 20s while having the traces activated. This environment is not primary intended for debugging a system. Some defects may be rather difficult to be shown with the tool, and the assertions may hardly be used for other purposes than performance and parallelism analysis due to the computational model. But we think that such a level of modelling can allow to extract a formal specification of the system, if the communication channels are previously formalized. [8,9]. 5. References [1] Kanishka Lahiri, Anand Raghunathan, Sujit Dey, Fast Performance Analysis of Bus-Based System-On- Chip Communication Architectures. Proceedings of the IEEE/ACM International Conference on Computer- Aided Design (ICCAD),November 1999. [2] Frederic Doucet, Rajesh K. Gupta, Microelectronic System-on-Chip Modeling using Objects and their Relationships ; in IEEE D&T of Computer 2000. [3] A. Clouard, G. Mastrorocco, F. Carbognani, A. Perrin, F, Ghenassia. Towards Bridging the Precision Gap between SoC Transactional and Cycle Accurate Levels, DATE 2002. [4] A. Ferrari and A. Sangiovanni-Vincentelli, System Design. Traditional Concepts and New Paradigms. Proceedings of the 1999 Int. Conf. On Comp. Des, Oct 1999, Austin. [5] Jon Connell and Bruce Johnson, Early Hardware/Software Integration Using SystemC2.0 ; in Class 552, ESC San Francisco 2002. [6] Functional Specification for SystemC 2.0, Version 2.0-P, 0ct 2001 [7] Thorsten Grotker. Modeling Software with SystemC3.0. 6 th European SystemC User Group Meeting, Italy October 2002. [8] Fabrice Baray. Contribution à l intégration de la vérification de modèle dans le processus de conception Codesign. Thesis, University of Clermont Ferrand 2001. [9] S. Dellacherie, S. Devulder, J-L. Lamber. Software verification based on linear programming. Formal Verification Conference 1999.