Compilation of Parametric Dataflow Applications for Software-Defined-Radio-Dedicated MPSoCs DREAM seminar Mickaël Dardaillon Research Intern with NOKIA Technologies January 27th, 2015
2 / 33 What we know about 5G demands Higher capacity, lowest latency and more consistent experience Evolution of telecommunication protocols Tactile Real-time control 1ms Visual 10ms NextGen media Monitoring & sensing Multimedia Mail? Tactile M2M MTC 3G 4G Fle for un tod Audio 100ms Text Voice 1G 2G Push & pull of technology 3 13/01/2015
3 / 33 4G LTE-Advanced: Downlink 1 frame (10 ms) 1 sub-frame (1 ms) 0 1 2 3 4 5 6 7 8 9
3 / 33 4G LTE-Advanced: Downlink 1 frame (10 ms) 1 sub-frame (1 ms) 0 1 2 3 4 5 6 7 8 9 2048 subcarriers (20 MHz)... 14 OFDM Symbols Control Data User 1 User 2 User 3
3 / 33 4G LTE-Advanced: Downlink 1 frame (10 ms) 1 sub-frame (1 ms) 0 1 2 3 4 5 6 7 8 9 MIMO: 4 2 antennas 2048 subcarriers (20 MHz)... Control Data User 1 LTE throughput: 1.4 Gbps LTE-Advanced: 7 Gbps Latency: 2 ms Power budget: 500 mw User 2 User 3 14 OFDM Symbols
Magali SDR LTE demonstrator [Clermidy et al., 09] Power consumption: 231mW 4 / 33
Magali SDR dsp4 dsp1 dsp3 dsp5 dsp2 LTE demonstrator [Clermidy et al., 09] Power consumption: 231mW 4 / 33
Magali SDR MOD mod dsp4 OFDM ofdm3 OFDM ofdm4 dsp1 LDPC ldpc OFDM ofdm1 dsp3 dsp5 TURBO turbo OFDM ofdm2 dsp2 DEMOD demod WIFLEX wiflex LTE demonstrator [Clermidy et al., 09] Power consumption: 231mW 4 / 33
Magali SDR MOD mod DMA dma4 dsp4 OFDM ofdm3 OFDM ofdm4 dsp1 DMA dma5 LDPC ldpc DMA dma1 OFDM ofdm1 DMA dma2 dsp3 dsp5 TURBO turbo OFDM ofdm2 dsp2 DMA dma3 DEMOD demod WIFLEX wiflex LTE demonstrator [Clermidy et al., 09] Power consumption: 231mW 4 / 33
Magali SDR MOD mod DMA dma4 dsp4 OFDM ofdm3 OFDM ofdm4 dsp1 ARM arm 8051 8051 DMA dma5 LDPC ldpc DMA dma1 OFDM ofdm1 DMA dma2 dsp3 dsp5 TURBO turbo OFDM ofdm2 dsp2 DMA dma3 DEMOD demod WIFLEX wiflex LTE demonstrator [Clermidy et al., 09] Power consumption: 231mW 4 / 33
5 / 33 Problem statement How should we program a Cell processor?
5 / 33 Problem statement How should we program a Cell processor? Any way you want! How to program and compile a telecommunication protocol to an heterogeneous MPSoC?
6 / 33 Outline Context Programming Model for SDR Dataflow Model of Computation Dataflow Refinement and Buffer Verification Mapping and Scheduling Micro-Scheduling Experimentations on Magali Code Generation Experimental Results Perspectives
7 / 33 State of the Art in SDR Programming Imperative Concurrent Platform ExoCHI [Wang et al., 07] BEAR [Derudder et al., 09] Language OpenMP + C Matlab + C Dataflow Platform Simulink LabView GNU Radio RVC-CAL [Lucarz et al., 08] DiplodocusDF [Gonzalez-Pina et al., 12] MAPS [Castrillon et al., 13] Language Python + C XML + C UML C like
8 / 33 Static Dataflow (SDF) [Lee et al., 87] Src 1 10 10 1 1 Decod 1 Ctrl
9 / 33 Phase Approach with Static Dataflow Src 1 10 10 1 1 Decod 1 Ctrl Src 2 100 10 Decod 2 1 10 Sink... Src 2 100 10 Decod 2 2 10 Sink Src 2 100 10 Decod 2 3 10 Sink
10 / 33 Dynamic Dataflow (DDF) [Buck, 93] SDF Analysable KPN DDF Expressive Kahn Process Network (KPN) [Kahn, 74]
10 / 33 Dynamic Dataflow (DDF) [Buck, 93] SDF MCDF SADF PiMM SPDF BPDF KPN DDF Analysable Expressive Scenario Aware DataFlow (SADF) [Theelen et al., 06] Mode Controlled DataFlow (MCDF) [Moreira et al., 12] Schedulable Parametric DataFlow (SPDF) [Fradet et al., 12] Parameterized and Interfaced dataflow Meta-Model (PiMM) [Desnos et al., 13] Boolean Parametric DataFlow (BPDF) [Bebelis et al., 13] Kahn Process Network (KPN) [Kahn, 74]
11 / 33 Schedulable Parametric DataFlow (SPDF) Src 10 10 1 1 Ctrl Decod 1 [Fradet et al., 12] Model of Computation Analysis Quasi-Static Scheduling
11 / 33 Schedulable Parametric DataFlow (SPDF) Src 10 100 10 1 1 Ctrl Decod 1 set p[1] p 10 p 10 Decod 2 Sink [Fradet et al., 12] Model of Computation Analysis Quasi-Static Scheduling...
Front End Implementation Front End PaDaF (C++) C++ Front End (CLang) LLVM IR Graph Construction SDR Programming Model Propose SPDF for SDR C++ input format Front End Based on LLVM framework Derived from SystemC analysis [Marquet et al., 10] Static graph structure Graph + LLVM IR 12 / 33
13 / 33 Outline Context Programming Model for SDR Dataflow Model of Computation Dataflow Refinement and Buffer Verification Mapping and Scheduling Micro-Scheduling Experimentations on Magali Code Generation Experimental Results Perspectives
14 / 33 SPDF Mapping Src dma1 10 100 10 1 Decod 1 10 Decod 2 p p 1 10 arm Ctrl set p[1] Sink demod dma2 DEMOD demod ARM arm DMA dma1 DMA dma2
15 / 33 SPDF Quasi-Static Scheduling [Fradet et al., 12] Src dma1 10 100 10 1 Decod 1 10 Decod 2 p p 1 10 arm Ctrl set p[1] Sink demod dma2 S(dma1) = (Src) S(arm) = (Ctrl; set(p)) S(demod) = ( Decod 1 ; get(p); (Decod 2 ) 10) S(dma2) = (get(p); (Sink) p )
16 / 33 SPDF Symbolic Execution dma1 Src demod arm D1 (D2) 10 Ctrl dma2 (Sink) p Time S(dma1) = (Src) S(arm) = (Ctrl; set(p)) S(demod) = ( Decod 1 ; get(p); (Decod 2 ) 10) S(dma2) = (get(p); (Sink) p )
17 / 33 SPDF Buffer Sizing arm Src dma1 [10] 10 100 [100] 10 1 [1] Decod 1 1 p 10 Decod 2 p[10*p max ] 10 Ctrl set p[1] Sink demod dma2 Problem: overestimates buffer size e.g. Magali FFT size: 2048 Buffer size: 16
18 / 33 SPDF Model Refinement arm Src dma1 10 100 [10] [10] 10 Decod 1 1 10 Decod 2 p [1] 1 p [p max ] 10 Ctrl set p[1] Sink demod dma2 Src::compute() { [...] out[1].push(ctrl, 10); for(int i=0; i<10; i++) out[2].push(data[i],10); } Idea: model each individual data communication Micro-Scheduling
19 / 33 Micro-Scheduling: an Example dma1 demod arm Src D1 (D2) 10 Ctrl dma2 µs(src) = µs(d 2 ) = µs(sink) = (Sink) p Time ) (push Src,D1 (10); push Src,D2 (10) 10 ) (pop Src,D2 (10); push D2,Sink ) (pop (p) D2,Sink (1)10
20 / 33 Buffer Sizing Verification How to verify buffer sizes using micro-schedules?
Buffer Sizing Verification How to verify buffer sizes using micro-schedules? Proposed Verification Method Based on Model Checking Derived from buffer minimization [Geilen et al., 05] Model Schedule Buffer sizes + Micro-Schedule + Parameter values Model Checker SPIN Check for deadlocks 20 / 33
Micro-Scheduling Implementation Front End PaDaF (C++) C++ Front End (CLang) LLVM IR Back End Mapping Scheduling Buffer Verification (SPIN) Micro-Scheduling SPDF model refinement Sequential communications Buffer Verification Model checking Graph Construction Graph + LLVM IR 21 / 33
22 / 33 Outline Context Programming Model for SDR Dataflow Model of Computation Dataflow Refinement and Buffer Verification Mapping and Scheduling Micro-Scheduling Experimentations on Magali Code Generation Experimental Results Perspectives
Code Generation Graph + LLVM IR OFDM DEMOD TURBO DMA ARM code generation communication code generation control code generation Control code (C) ARM code generation MOD DMA OFDM OFDM mod dsp1 dma4 ARM arm dsp4 8051 8051 ofdm3 DMA dma5 ofdm4 LDPC ldpc Magali code (ASM) DMA OFDM DMA TURBO dma1 ofdm1 dma2 dsp3 dsp5 turbo OFDM DMA DEMOD WIFLEX ofdm2 dsp2 dma3 demod wiflex 23 / 33
24 / 33 Benchmarks using LTE OFDM: compilation Src 7168 1024 1024 1024 FFT Defram 600 4200 Sink dma1 ofdm1 dma3 Demodulation: communications Src 1200 1200 900 900 Word 1200 Demap Deinter 900 dma2 dma3 57 Sink dma4 Src 1200 900 Bit Deinter 900 300 1353 1353 Depunct Turbo Decod 57 dma1 demod turbo
25 / 33 Benchmarks using LTE Parametric Demodulation: parameter Src 1440 60 Bit 60 30 93 93 Turbo 4 Deinter Depunct Decod dma2 1440 1440 Split Split 240 240 1200 240 240 1200 1200 1200 Demap p Demap 60 60 Word 60 Deinter 300p 300p Word 300p Deinter 8 57 arm Control set p[1] p Sink dma3 dma4 Src 1440 300p Bit Deinter 300p 300 1353 1353 Depunct Turbo Decod 57 dma1 demod turbo
26 / 33 Results: Estimated Development Time Compiler Development Front-End : 4 man-months Back-End : 8 man-months Native PaDaF Application C / ASM (#lines) (hours) C++ (#lines) (hours) OFDM 150 / 200 40 60 1 Demodulation 300 / 600 160 160 4 Param. Demod. 500 / 800 480 260 8 Takeaway Message: Reduces development time
Results: Buffer Verification Time Evaluation framework 2.4 GHz Intel Core i5, 8 GB RAM, OS X 10.9.2. SPIN Model Checker Application States Transitions Exec. Time (s) OFDM 1.28 10 4 2.56 10 4 0.1 Demodulation 2.12 10 6 1.07 10 7 9 Param. Demod. 6.07 10 7 2.22 10 8 199 Takeaway Message: Reduces development time, improves verification 27 / 33
Results: Execution Time Evaluation framework SystemC TLM based on 65 nm CMOS implementation ARM code run on QEMU Virtual Machine Application Native Generated (µs) (µs) OFDM 149 168 (+13%) Demodulation 180 283 (+57%) Param. Demod. 419 558 (+33%) Takeaway Message: Reduces development time, improves verification 28 / 33
Execution Model Src 7168 1024 1024 1024 FFT Defram 600 4200 Sink dma1 ofdm1 dma3 Phase Approach arm dma1 ofdm1 dma3 Time Distributed arm dma1 ofdm1 dma3 Time 29 / 33
29 / 33 Execution Model Phase Approach arm dma1 ofdm1 dma3 25 µs 37 µs 16 µs 21 µs Time Distributed arm dma1 ofdm1 dma3 25 µs 74 µs 23 µs 25 µs Time
Results: Execution Time Evaluation framework SystemC TLM based on 65 nm CMOS implementation ARM code run on QEMU Virtual Machine Application Native Generated Optimized (µs) (µs) (µs) OFDM 149 168 (+13%) 149 (+0%) Demodulation 180 283 (+57%) 180 (+0%) Param. Demod. 419 558 (+33%) 288 (-31%) Takeaway Message: Reduces development time, improves verification, maintains performances 30 / 33
31 / 33 Back End Implementation Front End PaDaF (C++) C++ Front End (CLang) LLVM IR Graph Construction Graph + LLVM IR Back End Mapping Scheduling Buffer Verification (SPIN) Code Generation MPSoC Code (ASM) Magali Support Computation Communication Control LTE Experimentation Performance close to native Buffer verification Central controller
32 / 33 Outline Context Programming Model for SDR Dataflow Model of Computation Dataflow Refinement and Buffer Verification Mapping and Scheduling Micro-Scheduling Experimentations on Magali Code Generation Experimental Results Perspectives
33 / 33 Perspectives On dataflow programming Compiler Runtime Front End PaDaF (C++) Back End Mapping C++ Front End (CLang) Scheduling LLVM IR Buffer Verification (SPIN) Graph Construction Code Generation Graph + LLVM IR MPSoC Code (ASM)
Perspectives On dataflow programming On heterogeneous MPSoC Future of dedicated platforms What we know about 5G demands Higher capacity, lowest latency and more consistent experience Development on such platforms Tactile Real-time control 1ms Visual 10ms NextGen media Monitoring & sensing Multimedia Mail? Tactile M2M MTC 3G 4G Flexibility for what is unknown today Audio 100ms Text Voice 1G 2G Push & pull of technology 3 13/01/2015 33 / 33
33 / 33 Perspectives On dataflow programming On heterogeneous MPSoC Publications Survey: [Dardaillon et al., IWCMC 12] Compilation flow: [Dardaillon et al., CASES 14] INSA-Lyon, CITI-Inria Tanguy Risset Kevin Marquet CEA Grenoble Jérôme Martin Henri-Pierre Charles