Mapping of Applications to Multi-Processor Systems

Similar documents
Mapping of Applications to Multi-Processor Systems

Software Synthesis, Code Generation and Timing Analysis

ΗΜΥ 653 Ενσωματωμένα Συστήματα και Συστήματα Πραγματικού Χρόνου Εαρινό Εξάμηνο 2017

Universiteit van Amsterdam 1

Hardware/ Software Partitioning

Evaluation and Validation

Standard Optimization Techniques

UvA-DARE (Digital Academic Repository)

Hardware-Software Codesign

Distributed Operation Layer

Middleware. Peter Marwedel TU Dortmund, Informatik 12 Germany. Graphics: Alexandra Nolte, Gesine

Standard Optimization Techniques

Hardware-Software Codesign. 1. Introduction

Combined System Synthesis and Communication Architecture Exploration for MPSoCs

IN order to increase design productivity, raising the level

Optimizations - Compilation for Embedded Processors -

Distributed Operation Layer Integrated SW Design Flow for Mapping Streaming Applications to MPSoC

Hardware-Software Codesign. 1. Introduction

Specifications and Modeling

Evaluation and Validation

Computer-Aided Recoding for Multi-Core Systems

Embedded & Real-time Operating Systems Communication Libraries

A Process Model suitable for defining and programming MpSoCs

NGUYEN KHAC HIEU REVIEW OF SYSTEM DESIGN FRAMEWORKS. Master of Science thesis

Applications to MPSoCs

Hardware-Software Codesign

A Multiobjective Optimization Model for Exploring Multiprocessor Mappings of Process Networks

EFFICIENT AUTOMATED SYNTHESIS, PROGRAMING, AND IMPLEMENTATION OF MULTI-PROCESSOR PLATFORMS ON FPGA CHIPS. Hristo Nikolov Todor Stefanov Ed Deprettere

Hardware Software Co-design and SoC. Neeraj Goel IIT Delhi

Multi-valued logic and standard IEEE 1164

Hardware/Software Codesign

HETEROGENEOUS MULTIPROCESSOR MAPPING FOR REAL-TIME STREAMING SYSTEMS

ESE Back End 2.0. D. Gajski, S. Abdi. (with contributions from H. Cho, D. Shin, A. Gerstlauer)

fakultät für informatik informatik 12 technische universität dortmund Data flow models Peter Marwedel TU Dortmund, Informatik /10/08

Discrete Event Models

Optimizations - Compilation for Embedded Processors -

Comparison of models. Peter Marwedel Informatik 12, TU Dortmund, Germany 2010/11/07. technische universität dortmund

Additional compiler optimizations

Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling

SDL. Jian-Jia Chen (slides are based on Peter Marwedel) TU Dortmund, Informatik 年 10 月 18 日. technische universität dortmund

Automatic Generation of System-Level Virtual Prototypes from Streaming Application Models

Discrete Event Models

Imperative model of computation

Imperative model of computation

Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays

Graphics: Alexandra Nolte, Gesine Marwedel, Universität Dortmund. RTL Synthesis

Extensions of Daedalus Todor Stefanov

FSMs & message passing: SDL

Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications

MOORE S law predicts the exponential growth over time

Codesign Framework. Parts of this lecture are borrowed from lectures of Johan Lilius of TUCS and ASV/LL of UC Berkeley available in their web.

Optimizations - Compilation for Embedded Processors -

EE382V: System-on-a-Chip (SoC) Design

HARDWARE SOFTWARE CO-DESIGN

Timing Analysis on Complex Real-Time Automotive Multicore Architectures

A Novel Deadlock Avoidance Algorithm and Its Hardware Implementation

SYSTEMCODESIGNER An Automatic ESL Synthesis Approach by Design Space Exploration and Behavioral Synthesis for Streaming Applications

EE382V: System-on-a-Chip (SoC) Design

Hardware/Software Codesign

A System-Level Synthesis Approach from Formal Application Models to Generic Bus-Based MPSoCs

On mapping to multi/manycores

Instruction Encoding Synthesis For Architecture Exploration

Embedded Systems. 7. System Components

Long Term Trends for Embedded System Design

Embedded Systems and Software

COE 561 Digital System Design & Synthesis Introduction

Generation of Multigrid-based Numerical Solvers for FPGA Accelerators

Formal Modeling and Analysis of Stream Processing Systems

A Methodology for Automated Design of Hard-Real-Time Embedded Streaming Systems

WCET-Aware C Compiler: WCC

fakultät für informatik informatik 12 technische universität dortmund Modeling levels Peter Marwedel TU Dortmund, Informatik /11/07

Contents Part I Basic Concepts The Nature of Hardware and Software Data Flow Modeling and Transformation

Efficient Modeling of Embedded Systems using Designer-controlled Recoding. Rainer Dömer. With contributions by Pramod Chandraiah

Embedded Systems: Hardware Components (part I) Todor Stefanov

Cover Page. The handle holds various files of this Leiden University dissertation

Towards Optimal Custom Instruction Processors

Metrics for Sensor Network Platforms

A Unified HW/SW Interface Model to Remove Discontinuities between HW and SW Design

Fault Tolerance Analysis of Distributed Reconfigurable Systems Using SAT-Based Techniques

Politecnico di Milano

Chapter #8. ARCHITECTURES AND DESIGN TECHNIQUES FOR ENERGY EFFICIENT EMBEDDED DSP AND MULTIMEDIA PROCESSING Subtitle 1.

ECE 448 Lecture 15. Overview of Embedded SoC Systems

System Design and Methodology/ Embedded Systems Design (Modeling and Design of Embedded Systems)

Design Space Exploration Using Parameterized Cores

Improving Nanoobject Detection in Optical Biosensor Data

RTL Coding General Concepts

Scenario-Based Design Space Exploration of MPSoCs

Optimizations - Compilation for Embedded Processors -

Multi-Objective Aware Extraction of Task-Level Parallelism Using Genetic Algorithms

Resource Efficiency of Scalable Processor Architectures for SDR-based Applications

Karthik Narayanan, Santosh Madiraju EEL Embedded Systems Seminar 1/41 1

Dynamic Memory Management for Real-Time Multiprocessor System-on-a-Chip

The CompSOC Design Flow for Virtual Execution Platforms

Easy Multicore Programming using MAPS

From Temporal Partitioning and Temporal Placement to Algorithmic Skeletons

FPGAs: High Assurance through Model Based Design

Specifications and Modeling

FPGA: What? Why? Marco D. Santambrogio

Cover Page. The following handle holds various files of this Leiden University dissertation:

MULTI-PROCESSOR SYSTEM-LEVEL SYNTHESIS FOR MULTIPLE APPLICATIONS ON PLATFORM FPGA

Transcription:

Mapping of Applications to Multi-Processor Systems Peter Marwedel TU Dortmund, Informatik 12 Germany Marwedel, 2003 Graphics: Alexandra Nolte, Gesine 2011 年 12 月 09 日 These slides use Microsoft clip arts. Microsoft copyright restrictions apply.

Structure of this course Application Knowledge 2: Specification Design repository Design 3: 8: ES-hardware 6: Application Test mapping 4: system software (RTOS, middleware, ) 7: Optimization 5: Evaluation & validation (energy, cost, performance, ) Numbers denote sequence of chapters - 2 -

The need to support heterogeneous architectures Energy efficiency a key constraint, e.g. for mobile systems Unconventional architectures close to IPE Hugo De Man/Philips, 2007 Renesas, MPSoC 07 How to map to these architectures? - 3 -

Practical problem in automotive design Which processor should run the software? - 4 -

A Simple Classification Architecture fixed/ Auto-parallelizing Fixed Architecture Architecture to be designed Starting from Map to CELL, COOL codesign tool; given task graph Auto-parallelizing Hopes, Qiang XU (HK) Simunic (UCSD) Mnemee (Dortmund) Franke (Edinburgh) MAPS EXPO/SPEA2 SystemCodesigner Daedalus - 5 -

Example: System Synthesis L. Thiele, ETHZ - 6 -

Basic Model Problem Graph L. Thiele, ETHZ - 7 -

Basic Model: Specification Graph L. Thiele, ETHZ - 8 -

Design Space Communication Templates Computation Templates Cipher FPGA DSP RISC SDRAM LookUp µe Scheduling/Arbitration EDF proportional WFQ share TDMA FCFS dynamic static fixed priority Which architecture is better suited for our application? Architecture # 1 Architecture # 2 LookUp Cipher RISC DSP EDF TDMA Priority WFQ µe µe µe µe µe µe static L. Thiele, ETHZ - 9 -

Evolutionary Algorithms for Design Space Exploration (DSE) L. Thiele, ETHZ - 10 -

Challenges L. Thiele, ETHZ - 11 -

EXPO Tool architecture (1) MOSES system architecture EXPO performance values SPEA 2 task graph, scenario graph, flows & resources Exploration Cycle selection of good architectures L. Thiele, ETHZ - 12 -

EXPO Tool architecture (2) Tool available online: http://www.tik. ee.ethz.ch/ex po/expo.html L. Thiele, ETHZ - 13 -

EXPO Tool (3) L. Thiele, ETHZ - 14 -

Application Model Example of a simple stream processing task structure: L. Thiele, ETHZ - 15 -

Exploration Case Study (1) L. Thiele, ETHZ - 16 -

Exploration Case Study (2) L. Thiele, ETHZ - 17 -

Exploration Case Study (3) L. Thiele, ETHZ - 18 -

More Results Performance for encryption/decryption Performance for RT voice processing L. Thiele, ETHZ - 19 -

Design Space Exploration with SystemCoDesigner (Teich et al., Erlangen) System Synthesis comprises: Resource allocation Actor binding Channel mapping Transaction modeling Idea: Formulate synthesis problem as 0-1 ILP Use Pseudo-Boolean (PB) solver to find feasible solution Use multi-objective Evolutionary algorithm (MOEA) to optimize Decision Strategy of the PB solver J. Teich, U. Erlangen-Nürnberg - 20 -

A 3rd approach based on evolutionary algorithms: SYMTA/S: [R. Ernst et al.: A framework for modular analysis and exploration of heteterogenous embedded systems, Real-time Systems, 2006, p. 124] - 21 -

A Simple Classification Architecture fixed/ Auto-parallelizing Fixed Architecture Architecture to be designed Starting from Map to CELL, COOL codesign tool; given task graph Auto-parallelizing Hopes Qiang XU (HK) Simunic (UCSD) Mnemee (Dortmund) Franke (Edinburgh) MAPS EXPO/SPEA2 SystemCodesigner Daedalus - 22 -

Martino Ruggiero, Luca Benini: Mapping task graphs to the CELL BE processor, 1st Workshop on Mapping of Applications to MPSoCs, Rheinfels Castle, 2008 A fixed architecture approach: Map CELL - 23 -

Partitioning into Allocation and Scheduling R Ruggiero, Benini, 2008-24 -

- 25 -

A Simple Classification Architecture fixed/ Auto-parallelizing Fixed Architecture Architecture to be designed Starting from Map to CELL, COOL codesign tool; given task graph Auto-parallelizing Hopes, Qiang XU (HK) Simunic (UCSD) Mnemee (Dortmund) Franke (Edinburgh) MAPS EXPO/SPEA2 SystemCodesigner Daedalus - 26 -

Daedalus Design-flow Explore, modify, select instances Sequential application High-level Models System-Level Specification System-level design Sesame space exploration Automatic KPNgen Parallelization Library of Common XML Platform Mapping Parallel application IP cores Interface specification Kahn Process Network specification specification RTL-level Models RTL-Level Specification Ed Deprettere et al.: Toward Composable Multimedia MP-SoC Design,1st Workshop on Mapping of Applications to MPSoCs, Rheinfels Castle, 2008 Synthesizable VHDL System-level ESPAMsynthesis MP-SoC Multi-processor System on Chip Xilinx Platform Studio (XPS) C/C++ code for processors (Synthesizable VHDL and C/C++ code for processors) E. Deprettere, U. Leiden - 27 -

JPEG/JPEG2000 case study Example architecture instances for a single-tile JPEG encoder: 16KB 32KB 32KB 4KB 2KB Vin,DCT Q,VLE,Vout Vin,Q,VLE,Vout DCT 2 MicroBlaze processors (50KB) 1 MicroBlaze, 1HW DCT (36KB) 8KB Vin DCT, Q DCT, Q 4x2KB 32KB VLE, Vout 2KB Vin 8KB DCT 2KB Q 8KB 32KB 2KB VLE, Vout 4x2KB DCT, Q DCT, Q 4x16KB 2KB DCT 2KB 8KB Q 2KB 6 MicroBlaze processors (120KB) 4 MicroBlaze, 2HW DCT (68KB) E. Deprettere, U. Leiden - 28 -

Sesame DSE results: Single JPEG encoder DSE E. Deprettere, U. Leiden - 29 -

A Simple Classification Architecture fixed/ Auto-parallelizing Fixed Architecture Architecture to be designed Starting from Map to CELL, COOL codesign tool; given task graph Auto-parallelizing Hopes, Qiang XU (HK) Simunic (UCSD) Mnemee (Dortmund) Franke (Edinburgh) MAPS EXPO/SPEA2 SystemCodesigner Daedalus - 30 -

Auto-Parallelizing Compilers Discipline High Performance Computing : Research on vectorizing compilers for more than 25 years. Traditionally: Fortran compilers. Such vectorizing compilers usually inappropriate for Multi- DSPs, since assumptions on memory model unrealistic: Communication between processors via shared memory Memory has only one single common address space De Facto no auto-parallelizing compiler for Multi-DSPs! Work of Franke, O Boyle (Edinburgh) Falk - 31 -

Introduction of Memory Architecture-Aware Optimization The MACC PMS (Processor/ Memory/Switch) Model Explicit memory architecture API provides access to memory information MACC_System C code CPU1 CPU2 CPU3 SPM SPM SPM L1$ L1$ BUS1 MM1 MM2 BRI L2$ MM3 BUS2-32 -

MaCC Modeling Example via GUI - 33 -

Toolflow Detailed View (Sequential C Source Code) START MACC Eco-System (1) Dynamic Data Type Optimizations (2) Map source code to task graphs (3) Parallelization Implem. MPSoC Parallelization Assistant (MPA) Memory Hierarchy (MH) (4) Dynamic Memory Management Optimizations MNEMEE Toolflow 1. Optimization of dynamic data structures 2. Extraction of potential parallelism 3. Implementation of parallelism; placement of static data 4. Placement of dynamic data - 34 - Page 34

Toolflow Detailed View (5) Scenario Based Mapping Platform DB (5) Memory Aware Mapping (6) RTLIB Mapping (7) Scratchpad Memory Optimizations per PE END (Optimized Source Code) MNEM MEE Toolflow 5. Perform mapping to processing elements Scenario based Memory aware 6. Transform the code to implement the mapping 7. Perform scratchpad memory optimizations for each processing element - 35 - Page 35

Leupers, Sheng, 2008 Rainer Leupers, Weihua Sheng: MAPS: An Integrated Framework for MPSoC Application Parallelization, 1st Workshop on Mapping of Applications to MPSoCs, Rheinfels Castle, 2008 MAPS-TCT Framework - 36 -

Summary Clear trend toward multi-processor systems for embedded systems, there exists a large design space Using architecture crucially depends on mapping tools Mapping applications onto heterogeneous MP systems needs allocation (if hardware is not fixed), binding of tasks to resources, scheduling Two criteria for classification Fixed / flexible architecture Auto parallelizing / non-parallelizing Introduction to proposed Mnemee tool chain Evolutionary algorithms currently the best choice - 37 -