Plattformübergreifende Softwareentwicklung für heterogene Multicore-Systeme

Size: px

Start display at page:

Download "Plattformübergreifende Softwareentwicklung für heterogene Multicore-Systeme"

Ashlie Sparks
6 years ago
Views:

1 Plattformübergreifende Softwareentwicklung für heterogene Multicore-Systeme Dr.-Ing. Timo Stripf 1 Managing Director Technolgy

2 Outline Multicore Motivation Automatic Parallelization Interactive Parallelization Model-Based Development Workflow Hardware Accelerators 2

3 Motivation 3

4 Increasing consumer demands accelerate the use of multicore processor High Processing Power! Fast Response Time! Low Energy Consumption! Consumer Electronics Automation Automotive Telecommunication 4

5 Processor Evolution GHz Era Multicore Era Manycore? Heterogeneous? Pentium 4 (Single Core) Athlon X2 (Dual Core) Embedded GPU ARM Cortex-A35 (Quad Core) ZYNQ FPGA 6

6 Performance Parallel hardware needs parallel programming GHz Era Multicore Era Manycore? Heterogeneous? Based on Hans Pabst, 2011: Workshop on Programming of Heterogeneous Systems in Physics Time 7

Challenges with embedded multicore software development Difficult to predict performance High test and verification effort Required expertise on diverse target architectures Poor code reusability 25%

7 Challenges with embedded multicore software development Difficult to predict performance High test and verification effort Required expertise on diverse target architectures Poor code reusability 25% 3x 4,5x more time! more software developers! more expensive! VDC Research, Next Generation Embedded Hardware Architectures Driving Onset of Project Delays, Costs Overruns, and Software Development Changes 8

8 Software Parallelization 9

9 Automatic Parallelization as a Black Box Automatic Parallelization Sequential Parallel 10

10 Automatic Parallelization as a Black Box Automatic Parallelization We want a one button solution Like C compilers 11

11 Automatic Parallelization Levels Algorithmic Level Decision Impact Code Transformation Level Task Level Communication Level 12

12 Parallelization on Algorithmic Level Fast Fourier Transform (FFT) N Point FFT N/2 Point FFT N/2 Point FFT X 13

13 Loop Transformation Matrix Multiplication Example double c[10][10]; for (i4 = 0; i4 < 10; i4++) { for (i3 = 0; i3 < 10; i3++) { sum1 = 0.0; } } for (i5 = 0; i5 < 10; i5++) sum1 += a[i5][i3] * b[i4][i5]; c[i4][i3] = sum1; Variable Splitting Loop Splitting Loop Fission double c_0[5][10]; double c_1[5][10]; for (i9 = 0; i9 < 5; i9++) { for (i8 = 0; i8 < 10; i8++) { sum2 = 0.0; for (i10 = 0; i10 < 10; i10++) sum2 += a[i10][i8] * b[i9][i10]; c_0[i9][i8] = sum2; } } for (i4 = 5; i4 < 10; i4++) { for (i3 = 0; i3 < 10; i3++) { sum1 = 0.0; } } for (i5 = 0; i5 < 10; i5++) sum1 += a[i5][i3] * b[i4][i5]; c_1[i4-5][i3] = sum1; 14

14 Task Level Data flow / dependency analysis Identify independent code parts Perform mapping & scheduling 15

15 Task Level Pipelining Loop 16

16 Communication Placement & Data Management Decide when to communicate Influences memory allocation per core? 17

17 Performance Estimation Deep Learning Neuronal Network (20 layers) Without Performance Information With Performance Information 18

18 Interactive Parallelization Feedback Control 19

19 Software development with emmtrix: overview Multicore FPGA GPU Sequential Parallel 20

20 Interactive Parallelization starting from MATLAB Development with MATLAB /Scilab Code Generator Sequential C Code Paralleliza tion Parallel C Code Algorithmic Level Different algorithm versions in MATLAB Code Transformation Level Transformation selection in GUI Task Level Automatic user-constraint algorithm Communication Level Automatic algorithm 21

21 Example: Deep Learning Application Dominating Kernel: 2D Convolution 22

22 Loop Transformations Apply variable splitting Loop splitting Loop fission 23

23 Parallel Schedule (8 Cores ARM Cortex-A53) 24

24 Model-Based Development Workflow 25

real-time applications for embedded heterogeneous multi-core

software parallelization and code generation starting from

9 Million Euros Coordinator: Juergen Becker (KIT) ARGO has

25 ARGO Project Overview Three year project 01/ /2018 Motivation: Programming real-time applications for embedded heterogeneous multi-core systems is complex and expensive Project goal: Automate real-time software parallelization and code generation starting from high-level descriptions Project partners: Funded by EU: 3.9 Million Euros Coordinator: Juergen Becker (KIT) ARGO has received funding from the European Union s Horizon 2020 research and innovation programme under grant agreement No ARGO. 26

Enhanced Ground Proximity Warning System

controller) that creates visual and aural

into the Terrain Since 1974, the FAA has

airplanes to install GPWS equipment EGPWS from

26 Enhanced Ground Proximity Warning System (EGPWS) A flight system (a supervisory controller) that creates visual and aural warnings in order to avoid Controlled Flight into the Terrain Since 1974, the FAA has required all large turbine and turbojet airplanes to install GPWS equipment EGPWS from Honeywell LANDMARK from L3 T2CAS from ACSS TAWS from Universal Avionics 27

27 EGPWS Model 28

28 Xcos Model of Mode 1: Excessive Rate of Descent Models are described using graphs the limit altitudes (the reference being the radio altitude) are described as functions of other parameters like airspeed or rate of descent 29

model Two-phase collision processing Broad phase Uniform grids for

29 Model with Scilab Scripting: Terrain Awareness Shuttle Radar Topography Mission (SRTM) 3 arc second ( 90 m) as digital elevation model Two-phase collision processing Broad phase Uniform grids for spatial partitioning Narrow Phase Vertical ray casting for collision detection 30

30 Classical Model-based Workflow Parallelization for Real-time Applications! Plant Modeling Code Generation Controller Modeling Software- in-the- Loop Testing Model-in-the-Loop Testing Hardware-in-the- Loop Testing Unit Testing 31

31 Hardware-in-the-Loop Testing Multicore Recore Architecture 32

32 Parallelization of EPGWs application 33

33 Addressing Heterogeneous Architectures 34

34 Challenges for Addressing Heterogeneous Architectures Multicore FPGA GPU Programming Language C C++ VHDL SystemVerilog OpenCL CUDA Data Types Standard Integer Standard Float Fixed Point Standard Integer Standard Float Parallelization Coarse Grained Fine Grained Random Loop Data Locality Caches Local memories Streaming Register L2 Cache 36

35 Programming Language HLS C Code VHDL Code Development with MATLAB /Scilab Code Generator Sequential C Code Parallel Studio Parallel C Code CUDA 37

36 Supporting Hardware Accelerators Algorithmic Level Use FPGA / GPU library Fixed-point algorithms Code Transformation Level HLS-Pragmas HLS/GPU Transformations Task Level Hardware accelerator as special processor Communication Level Heterogeneous communication 40

37 FPGA Example 41

38 FPGA Example (2) 42

39 Benefits of Interactive Parallelization Code quality (reduce errors and test effort) Portability (single source) Transparency & control Productivity Develop sequential, get parallel 43

40 Summary Multicore Motivation Automatic Parallelization Interactive Parallelization Model-Based Development Workflow Hardware Accelerators 44

41 Your emmtrix Dr.-Ing. Timo Stripf emmtrix Technologies GmbH Engesserstraße Karlsruhe Germany Phone: Fax: timo.stripf@emmtrix.com Web: 45

Scilab White Paper Model-based Design of an Enhanced Ground Proximity Warning System

Scilab White Paper Model-based Design of an Enhanced Ground Proximity Warning System 2017/01/18 Umut Durak DLR Braunschweig Institute of Flight System Yann Debray Scilab Enterprises An Enhanced Ground