Towards an automatic co-generator for manycores. architecture and runtime: STHORM case-study

Similar documents
GRAPHIC RENDERING APPLICATION PROFILING ON A SHARED MEMORY MPSOC ARCHITECTURE. Matthieu Texier, Raphaël David, Karim Ben Chehida

SPIRIT IP-XACT Controlled ESL Design Tool Applied to a Network-on-Chip Platform

The Use Of Virtual Platforms In MP-SoC Design. Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006

PACHA : Low Cost Bare Metal Development for Shared Memory Manycore Accelerators

EEM870 Embedded System and Experiment Lecture 4: SoC Design Flow and Tools

Design Space Exploration and Application Autotuning for Runtime Adaptivity in Multicore Architectures

Hardware Design and Simulation for Verification

A New Electronic System Level Methodology for Complex Chip Designs

2.1 Typical IP-XACT based flow The IP-XACT standard can be applied in various parts of a typical SoC design flow as depicted in Figure 1

Long Term Trends for Embedded System Design

Applications to MPSoCs

N. VENTROUX. SoCsare becomingmore and more complex. Complexity in a chip is increasing x1.6 every 2 years (ITRS 2013)

Key technologies for many core architectures

Design methodology for multi processor systems design on regular platforms

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews

Chapter 2 M3-SCoPE: Performance Modeling of Multi-Processor Embedded Systems for Fast Design Space Exploration

SCope: Efficient HdS simulation for MpSoC with NoC

Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs

Transaction Level Analysis for a Clustered and Hardware-Enhanced Task Manager on Homogeneous Many-Core Systems

Automatic Instrumentation of Embedded Software for High Level Hardware/Software Co-Simulation

A Generic RTOS Model for Real-time Systems Simulation with SystemC

Choosing IP-XACT IEEE 1685 standard as a unified description for timing and power performance estimations in virtual platforms platforms

Computer-Aided Recoding for Multi-Core Systems

A SystemC Extension for Enabling Tighter Integration of IP-XACT Platforms with Virtual Prototypes

Introduction to System-on-Chip

OpenMP for next generation heterogeneous clusters

MPSOC Design examples

Embedded HW/SW Co-Development

A Unified HW/SW Interface Model to Remove Discontinuities between HW and SW Design

Contents 1 Introduction 2 Functional Verification: Challenges and Solutions 3 SystemVerilog Paradigm 4 UVM (Universal Verification Methodology)

Memory Performance Characterization of SPEC CPU2006 Benchmarks Using TSIM1

Power Aware Architecture Design for Multicore SoCs

2 TEST: A Tracer for Extracting Speculative Threads

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis

Experiences and Challenges of Transaction-Level Modelling with SystemC 2.0

Research Collection. KISS PULPino - Updates on PULPino updates on PULPino. Other Conference Item. ETH Library

R3-7. SASIMI 2015 Proceedings. A Verilog Compiler Proposal for VerCPU Simulator. January 29,

Software Driven Verification at SoC Level. Perspec System Verifier Overview

Model homogenization for power estimation and design exploration

MARTE based design approach for targeting Reconfigurable Architectures

The Architects View Framework: A Modeling Environment for Architectural Exploration and HW/SW Partitioning

GpuWrapper: A Portable API for Heterogeneous Programming at CGG

Modeling and Simulation of System-on. Platorms. Politecnico di Milano. Donatella Sciuto. Piazza Leonardo da Vinci 32, 20131, Milano

Thermal Modeling and Active Cooling

Towards the integration of security patterns in UML Component-based Applications

Hardware Software Bring-Up Solutions for ARM v7/v8-based Designs. August 2015

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning

MPSoC Design Space Exploration Framework

Designing HIPAOC : High Performance Architecture On Chip. By:- Anvesh Polepalli Prashant Ahir

Approximate Computing with Runtime Code Generation on Resource-Constrained Embedded Devices

Co-Design of Many-Accelerator Heterogeneous Systems Exploiting Virtual Platforms. SAMOS XIV July 14-17,

Embedded Systems: Projects

Cover TBD. intel Quartus prime Design software

Cover TBD. intel Quartus prime Design software

Generation of UVM compliant Test Benches for Automotive Systems using IP-XACT with UVM-SystemC and SystemC AMS

Verification Futures The next three years. February 2015 Nick Heaton, Distinguished Engineer

A framework for optimizing OpenVX Applications on Embedded Many Core Accelerators

Efficient Hardware Acceleration on SoC- FPGA using OpenCL

A Fast Timing-Accurate MPSoC HW/SW Co-Simulation Platform based on a Novel Synchronization Scheme

VERIFICATION OF AXIPROTOCOL SYSTEM VERILOG

An Efficient AXI Read and Write Channel for Memory Interface in System-on-Chip

Will Everything Start To Look Like An SoC?

Vivado HLx Design Entry. June 2016

Using UPF for Low Power Design and Verification

Model-based control of a handling system with SysML

Improving Parallel MPSoC Simulation Performance by Exploiting Dynamic Routing Delay Prediction

Hardware-Software Codesign

2PARMA Project and P2012 Platform

SoC Design Environment with Automated Configurable Bus Generation for Rapid Prototyping

ECE 448 Lecture 15. Overview of Embedded SoC Systems

NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU

THE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT HARDWARE PLATFORMS

A holistic Pre-to-Post solution for Post-Si validation of SoC s

MPJ Express Meets YARN: Towards Java HPC on Hadoop Systems

Optimizing DMA Data Transfers for Embedded Multi-Cores

Performance Verification for ESL Design Methodology from AADL Models

Message Passing Improvements to Shared Address Space Thread Synchronization Techniques DAN STAFFORD, ROBERT RELYEA

Design Space Exploration Using Parameterized Cores

Communication Oriented Design Flow

VERIFICATION OF RISC-V PROCESSOR USING UVM TESTBENCH

Comprehensive AMS Verification using Octave, Real Number Modelling and UVM

A SystemC TLM Framework for Distributed Simulation of Complex Systems with Unpredictable Communication

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Hardware in the Loop Functional Verification Methodology

Self-optimisation using runtime code generation for Wireless Sensor Networks

PyMTL: A Python-Based Framework for Hardware Modeling

Flexible MPSoC Platform with Fast Interconnect Exploration for Optimal System Performance for a Specific Application

Evaluation of Runtime Task Mapping Heuristics with rsesame - A Case Study

QEMU and SystemC. Màrius Màrius Montón

Available online at ScienceDirect. Procedia Computer Science 89 (2016 )

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey

Near-Data Processing for Differentiable Machine Learning Models

Best Practices of SoC Design

Outline. SLD challenges Platform Based Design (PBD) Leveraging state of the art CAD Metropolis. Case study: Wireless Sensor Network

DATA-MANAGEMENT DIRECTORY FOR OPENMP 4.0 AND OPENACC

The Veloce Emulator and its Use for Verification and System Integration of Complex Multi-node SOC Computing System

MIPI Alliance Overview

System Level Assessment of an Optical NoC in an MPSoC Platform

Will Everything Start To Look Like An SoC?

Transcription:

Procedia Computer Science Towards an automatic co-generator for manycores Volume 51, 2015, Pages 2809 2813 architecture and runtime: STHORM case-study ICCS 2015 International Conference On Computational Science Charly Bechara, Karim Ben Chehida and Farhat Thabet CEA, LIST, 91191 Gif-sur-Yvette CEDEX, FRANCE charly.bechara@cea.fr, karim.ben-chehida@cea.fr, farhat.thabet@cea.fr Keywords: Runtime Manycore IP-XACT - Automatic generator STHORM - SESAM Introduction The increasing design complexity of manycore architectures at the hardware (HW) and software (SW) levels imposes to have powerful tools capable of validating every functional and non-functional property of the architecture. At the design phase, the chip architect needs to explore several parameters from the design space, and iterate on different instances of the architecture, in order to meet the defined requirements. Each new architectural instance requires the configuration and the generation of a new hardware model/simulator, its runtime, and the applications that will run on the platform, which is a very long and error-prone task. In this context, the IP-XACT [3] standard has become widely used in the semiconductor industry to package IPs and provide low level SW stack to ease their integration. In this work, we present a primer work on a methodology to automatically configuring and assembling an IP-XACT golden model and generating the corresponding manycore architecture HW model, low-level software runtime and applications. We use the STHORM [1] manycore architecture as a case study. Automatic generator methodology The idea is to work on a unique IP-XACT model with different abstractions (mainly at the interface level) commonly used in the design space exploration (DSE) and implementation phases to guarantee the coherency of the TLM (Transaction Level Modeling) and the RTL (Register Transfer Level) architecture models. The DSE phase is based on fast TLM simulations, result analysis considering the target optimization criteria (performance, power, and reliability) and global parameters modification of the IP-XACT model to close the loop and guide its convergence throughout iterations. The IP-XACT design flow methodology, shown in Figure 1, is composed of four main steps: 1. IP-XACT platform model: assembling an IP-XACT model of the manycore architecture from the IP-XACT IP (Intellectual Property) library considering the different IP parameters. From the IP- XACT platform model, which is an xml format, two design configurations could be derived to target TLM level and RTL level interconnect abstractions. 2. Platform Generators: in order to build a platform simulator corresponding to the design parameters of the current DSE iteration, it is important to automate the generation of the corresponding TLM or RTL simulators, the software runtime and the application (using for example the IP-XACT standardized Tight Generator Interface (TGI)) and adapt them to take into account a set of parameters corresponding to the DSE iteration (such as the number of processors/clusters, degree of parallelism, custom IPs used, etc ). a. TLM/RTL simulator: Starting from TLM/RTL models, IP libraries and the configuration parameters, a custom generator can produce the corresponding TLM or RTL simulator. Selection and peer-review under responsibility of the Scientific Programme Committee of ICCS 2015 c The Authors. Published by Elsevier B.V. doi:10.1016/j.procs.2015.05.439 2809

b. SW runtime: the low level hardware dependent software (HDS) layer (corresponding mainly to simple register accesses and the system memory map) can be generated by aggregating the IP level HDS information. The SW runtime used in this study [4] is a set of libraries (communication, execution engines, synchronization, resource management ) where the resource management library is built on top of the HDS layer. A custom generator can build a new runtime for this design iteration. c. Application: a custom generator can exploit the new configuration parameters to restructure the application accordingly. For instance, OpenMP pragmas can be inserted. Figure 1 The unified IP-XACT based design flow for fast design space exploration 3. Manycore architecture simulator: The fast simulation phase is based on a Timed TLM simulator designed in the laboratory called SESAM [2] that delivers reports and statistics on some functional and non-functional criteria such as performance, power and reliability. The SESAM simulator will take as input the generated TLM top netlist, the TLM IP library, the generated SW runtime, and the compiled application to launch a global simulation. SESAM supports also the integration of RTL 2810

models for co-simulation. After convergence of the DSE loop, the final step will be the generation of the RTL netlist for the overall manycore architecture from the IP-XACT model, and then follow the traditional hardware simulation and emulation flow with the corresponding EDA (Electronic Design Automation) tools. 4. Design analysis & optimization: the design analysis tool is in charge of the comparison of the resulting metrics with respect to the initial system requirements. Based on the comparison results, the design optimization engine modifies the initial IP-XACT model parameters and even its specifications, based on heuristics. STHORM case-study In this work, we use STHORM [1] manycore architecture and HBDC (Human Body Detection Counter) application as a case study. In order to model the STHORM architecture in SESAM (Figure 2), we extract the following information from the architectural description: the modules that do the actual computation or processing (such as the processor STxP70, the Hardware Synchronizer HWS [5], the Fabric Controller, and other elements), the memories and caches, the interconnection networks, and the latencies of the different modules (measured using special counters from the HW emulated design, or on the real chip). Each component is a SystemC model with TLM interfaces. From the IP-XACT model of the whole architecture, the toolchain generates the top level netlist for SESAM, the low level runtime software, and the system map of the architecture. This corresponds to phases 1, 2.a and part of the 2.b of our methodology. 2811

Figure 2 STHORM model in SESAM The HBDC application runs in an airport security context, and counts the number of passengers that passes in front of the camera or multi-camera configuration. In our case, the real-time requirements are: 4 cameras with HD resolution, 30 fps, and 10 detected humans by image. The overall computation power needed is around 50 GOPS. The profiling of the application resulted that 90% of the execution time is passed in the human extraction part. This part is highly parallelizable by sub-images and dynamic, thus can be run on multiple processors. This is a promising property for the DSE. Conclusion and Future work In this preliminary study, we have introduced the problem of system model coherency in the design space exploration flow for digital systems. The current work consists of building the automation system of the generator for configurable SW runtime and the applications. In addition, we are currently working on the 4 th last phase of the methodology (design analysis & optimization) in order to have a closed-loop automated DSE flow. 2812

References [1] D. Melpignano, L. Benini, E. Flamand, B. Jego, T. Lepley, G. Haugou, F. Clermidy, and D. Dutoit. 2012. Platform 2012, a many-core computing accelerator for embedded SoCs: performance evaluation of visual analytics applications. In Proceedings of the 49th Annual Design Automation Conference (DAC '12). [2] N. Ventroux, A. Guerre, T. Sassolas, L. Moutaoukil, G. Blanc, C. Bechara, R. David, "SESAM: An MPSoC Simulation Environment for Dynamic Application Processing," Computer and Information Technology, 10th IEEE International Conference on Computer and Information Technology, June 2010. [3] IEEE Standard for IP-XACT, Standard Stricture for Packaging, Integrating, and Reusing IP within Tool Flows, IEEE Computer Society and the IEEE Standards Association Corporate Advisory Group. IEEE std 1685TM-2009, 18 Feb. 2010. [4] Y. Lhuillier, M. Ojail, A. Guerre, J.M. Philippe, K. Ben Chehida, F. Thabet, C. Andriamisaina, C. Jaber, and R. David. 2014. HARS: A hardware-assisted runtime software for embedded many-core architectures. ACM Trans. Embed. Comput. Syst, March 2014 [5] Thabet, Farhat; Lhuillier, Yves; Andriamisaina, Caaliph; Philippe, Jean-Marc; David, Raphael, "An efficient and flexible hardware support for accelerating synchronization operations on the STHORM many-core architecture," Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013, vol., no., pp.531,534, 18-22 March 2013 2813