Mapping of Applications to Multi-Processor Systems

Mapping of Applications to Multi-Processor Systems Peter Marwedel TU Dortmund, Informatik 12 Germany Marwedel, 2003 Graphics: Alexandra Nolte, Gesine 2011 年 12 月 09 日 These slides use Microsoft clip arts. Microsoft copyright restrictions apply.

Structure of this course Application Knowledge 2: Specification Design repository Design 3: 8: ES-hardware 6: Application Test mapping 4: system software (RTOS, middleware, ) 7: Optimization 5: Evaluation & validation (energy, cost, performance, ) Numbers denote sequence of chapters - 2 -

The need to support heterogeneous architectures Energy efficiency a key constraint, e.g. for mobile systems Unconventional architectures close to IPE Hugo De Man/Philips, 2007 Renesas, MPSoC 07 How to map to these architectures? - 3 -

Practical problem in automotive design Which processor should run the software? - 4 -

A Simple Classification Architecture fixed/ Auto-parallelizing Fixed Architecture Architecture to be designed Starting from Map to CELL, COOL codesign tool; given task graph Auto-parallelizing Hopes, Qiang XU (HK) Simunic (UCSD) Mnemee (Dortmund) Franke (Edinburgh) MAPS EXPO/SPEA2 SystemCodesigner Daedalus - 5 -

Example: System Synthesis L. Thiele, ETHZ - 6 -

Basic Model Problem Graph L. Thiele, ETHZ - 7 -

Basic Model: Specification Graph L. Thiele, ETHZ - 8 -

Design Space Communication Templates Computation Templates Cipher FPGA DSP RISC SDRAM LookUp µe Scheduling/Arbitration EDF proportional WFQ share TDMA FCFS dynamic static fixed priority Which architecture is better suited for our application? Architecture # 1 Architecture # 2 LookUp Cipher RISC DSP EDF TDMA Priority WFQ µe µe µe µe µe µe static L. Thiele, ETHZ - 9 -

Evolutionary Algorithms for Design Space Exploration (DSE) L. Thiele, ETHZ - 10 -

Challenges L. Thiele, ETHZ - 11 -

EXPO Tool architecture (1) MOSES system architecture EXPO performance values SPEA 2 task graph, scenario graph, flows & resources Exploration Cycle selection of good architectures L. Thiele, ETHZ - 12 -

EXPO Tool architecture (2) Tool available online: http://www.tik. ee.ethz.ch/ex po/expo.html L. Thiele, ETHZ - 13 -

EXPO Tool (3) L. Thiele, ETHZ - 14 -

Application Model Example of a simple stream processing task structure: L. Thiele, ETHZ - 15 -

Exploration Case Study (1) L. Thiele, ETHZ - 16 -

Exploration Case Study (2) L. Thiele, ETHZ - 17 -

Exploration Case Study (3) L. Thiele, ETHZ - 18 -

More Results Performance for encryption/decryption Performance for RT voice processing L. Thiele, ETHZ - 19 -

Design Space Exploration with SystemCoDesigner (Teich et al., Erlangen) System Synthesis comprises: Resource allocation Actor binding Channel mapping Transaction modeling Idea: Formulate synthesis problem as 0-1 ILP Use Pseudo-Boolean (PB) solver to find feasible solution Use multi-objective Evolutionary algorithm (MOEA) to optimize Decision Strategy of the PB solver J. Teich, U. Erlangen-Nürnberg - 20 -

A 3rd approach based on evolutionary algorithms: SYMTA/S: [R. Ernst et al.: A framework for modular analysis and exploration of heteterogenous embedded systems, Real-time Systems, 2006, p. 124] - 21 -

A Simple Classification Architecture fixed/ Auto-parallelizing Fixed Architecture Architecture to be designed Starting from Map to CELL, COOL codesign tool; given task graph Auto-parallelizing Hopes Qiang XU (HK) Simunic (UCSD) Mnemee (Dortmund) Franke (Edinburgh) MAPS EXPO/SPEA2 SystemCodesigner Daedalus - 22 -

Martino Ruggiero, Luca Benini: Mapping task graphs to the CELL BE processor, 1st Workshop on Mapping of Applications to MPSoCs, Rheinfels Castle, 2008 A fixed architecture approach: Map CELL - 23 -

Partitioning into Allocation and Scheduling R Ruggiero, Benini, 2008-24 -

- 25 -

Daedalus Design-flow Explore, modify, select instances Sequential application High-level Models System-Level Specification System-level design Sesame space exploration Automatic KPNgen Parallelization Library of Common XML Platform Mapping Parallel application IP cores Interface specification Kahn Process Network specification specification RTL-level Models RTL-Level Specification Ed Deprettere et al.: Toward Composable Multimedia MP-SoC Design,1st Workshop on Mapping of Applications to MPSoCs, Rheinfels Castle, 2008 Synthesizable VHDL System-level ESPAMsynthesis MP-SoC Multi-processor System on Chip Xilinx Platform Studio (XPS) C/C++ code for processors (Synthesizable VHDL and C/C++ code for processors) E. Deprettere, U. Leiden - 27 -

JPEG/JPEG2000 case study Example architecture instances for a single-tile JPEG encoder: 16KB 32KB 32KB 4KB 2KB Vin,DCT Q,VLE,Vout Vin,Q,VLE,Vout DCT 2 MicroBlaze processors (50KB) 1 MicroBlaze, 1HW DCT (36KB) 8KB Vin DCT, Q DCT, Q 4x2KB 32KB VLE, Vout 2KB Vin 8KB DCT 2KB Q 8KB 32KB 2KB VLE, Vout 4x2KB DCT, Q DCT, Q 4x16KB 2KB DCT 2KB 8KB Q 2KB 6 MicroBlaze processors (120KB) 4 MicroBlaze, 2HW DCT (68KB) E. Deprettere, U. Leiden - 28 -

Sesame DSE results: Single JPEG encoder DSE E. Deprettere, U. Leiden - 29 -

Auto-Parallelizing Compilers Discipline High Performance Computing : Research on vectorizing compilers for more than 25 years. Traditionally: Fortran compilers. Such vectorizing compilers usually inappropriate for Multi- DSPs, since assumptions on memory model unrealistic: Communication between processors via shared memory Memory has only one single common address space De Facto no auto-parallelizing compiler for Multi-DSPs! Work of Franke, O Boyle (Edinburgh) Falk - 31 -

Introduction of Memory Architecture-Aware Optimization The MACC PMS (Processor/ Memory/Switch) Model Explicit memory architecture API provides access to memory information MACC_System C code CPU1 CPU2 CPU3 SPM SPM SPM L1$ L1$ BUS1 MM1 MM2 BRI L2$ MM3 BUS2-32 -

MaCC Modeling Example via GUI - 33 -

Toolflow Detailed View (Sequential C Source Code) START MACC Eco-System (1) Dynamic Data Type Optimizations (2) Map source code to task graphs (3) Parallelization Implem. MPSoC Parallelization Assistant (MPA) Memory Hierarchy (MH) (4) Dynamic Memory Management Optimizations MNEMEE Toolflow 1. Optimization of dynamic data structures 2. Extraction of potential parallelism 3. Implementation of parallelism; placement of static data 4. Placement of dynamic data - 34 - Page 34

Toolflow Detailed View (5) Scenario Based Mapping Platform DB (5) Memory Aware Mapping (6) RTLIB Mapping (7) Scratchpad Memory Optimizations per PE END (Optimized Source Code) MNEM MEE Toolflow 5. Perform mapping to processing elements Scenario based Memory aware 6. Transform the code to implement the mapping 7. Perform scratchpad memory optimizations for each processing element - 35 - Page 35

Leupers, Sheng, 2008 Rainer Leupers, Weihua Sheng: MAPS: An Integrated Framework for MPSoC Application Parallelization, 1st Workshop on Mapping of Applications to MPSoCs, Rheinfels Castle, 2008 MAPS-TCT Framework - 36 -

Summary Clear trend toward multi-processor systems for embedded systems, there exists a large design space Using architecture crucially depends on mapping tools Mapping applications onto heterogeneous MP systems needs allocation (if hardware is not fixed), binding of tasks to resources, scheduling Two criteria for classification Fixed / flexible architecture Auto parallelizing / non-parallelizing Introduction to proposed Mnemee tool chain Evolutionary algorithms currently the best choice - 37 -