Dataflow programming for heterogeneous computing systems

Size: px

Start display at page:

Download "Dataflow programming for heterogeneous computing systems"

Cleopatra Cross
5 years ago
Views:

Dataflow programming for heterogeneous computing systems Jeronimo Castrillon Cfaed Chair for Compiler Construction TU Dresden jeronimo.

1 Dataflow programming for heterogeneous computing systems Jeronimo Castrillon Cfaed Chair for Compiler Construction TU Dresden Tutorial: Algorithmic specification, tools and algorithms for programming heterogeneous platforms. PACT Conference. San Francisco, October 19 th 2015

2 Outline Heterogeneous systems Dataflow models Analysis and synthesis Summary 2 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

3 Outline Heterogeneous systems Dataflow models Analysis and synthesis Summary 3 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

Peripherals Communication support HW queues Network Processor Packet DMA Tomorrow s heterogeneity: Emerging technologies

4 Heterogeneous computing systems Today s heterogeneity Desktop/HPC: GPGPU, GP + ACC Embedded: RISCs, DSPs, ASIPs, è Resources: different performance/energy characteristics MEM subsystem DMAs, semaphores PMU L1 A15 A15 L1 L2 L1 A15 A15 L1 VLIW DSP L1,L2 NoC Peripherals Communication support HW queues Network Processor Packet DMA Tomorrow s heterogeneity: Emerging technologies Heterogeneity beyond performance and energy Reliability/error tolerance Computation model 4 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

Heterogeneity: Cfaed Vision German Excellence Cluster: Goal to explore new technologies for electronic information processing which overcome the limits of CMOS technology Information Processing

5 Heterogeneity: Cfaed Vision German Excellence Cluster: Goal to explore new technologies for electronic information processing which overcome the limits of CMOS technology Information Processing Systems research to handle heterogeneity Devices & Circuits Materials & Functions CMOS (industry focus) H HAEC: Highly Adaptive Energy-Efficient Computing A Silicon Nanowires B Carbon F Orchestration C Organic G Resilience D Biomolecular Assembly E Chemical Biology to inspire solutions I Biological Systems Material research for post-cmos technologies 5 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

Heterogeneity: Example SiNW SiNW: Silicon Nanowires Multi-gate devices with less performance penalty Reconfigurable P/N functionality Possibilities

6 Heterogeneity: Example SiNW SiNW: Silicon Nanowires Multi-gate devices with less performance penalty Reconfigurable P/N functionality Possibilities New micro architectures New pipeline structures New field programmable devices [Trommer15] 6 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

Heterogeneity: Example chemical processing

other components Fundamentally different way of

7 Heterogeneity: Example chemical processing Lab-on-Chips: for sensing, analysis and test Also for computing? Different kinds of transistors Oscillators and other components Fundamentally different way of computation [Voigt14] 7 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

Heterogeneity: Example DNA origami DNA origami:

Use structures to build advance electronic devices

8 Heterogeneity: Example DNA origami DNA origami: Self-assembled 2D/3D structures made of DNA strands Use structures to build advance electronic devices Example: Plasmonic waveguides Courtesy: Thorsten Lars-Schmidt 8 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

Processor Packet DMA Already difficult to program them today, what about tomorrow?

9 Consequence of heterogeneity MEM subsystem DMAs, semaphores PMU L1 A15 A15 L1 L2 L1 A15 A15 L1 VLIW DSP L1,L2 NoC Peripherals Communication support HW queues Network Processor Packet DMA Already difficult to program them today, what about tomorrow? Need for models and abstraction 9 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

10 Outline Heterogeneous systems Dataflow models Analysis and synthesis Summary 10 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

Models: Introduction Von Neumann model makes things complicated Sharing state Data races Control unit Inst. reg Prog. counter Memory & I/O Processing unit Reg.

11 Models: Introduction Von Neumann model makes things complicated Sharing state Data races Control unit Inst. reg Prog. counter Memory & I/O Processing unit Reg. file ALU Central Processing Unit (CPU) Task graphs: A simple parallel programming model Intel TBB,.NET Task parallel library (TPL), OpenMP Tasks, Runtime and data management, e.g., StarPU [Augonnet10] 11 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

12 Directed acyclic task graphs Very well studied, see for example [Kwok99] Difficult problem for heterogeneous systems RISC Interconnect DSP Processor Shared Memory 1 2 4? 1 3 Processor 2 Time Time 12 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

13 Dataflow models Also a graph representation: Nodes & edges are called actors & channels Implicit repetition, common in streaming, signal processing applications Communication: only through channels Multiple flavors of models: Rules that determine when an actor fires A graph models multiple possible executions: Processor cf. LabVIEW models Time 13 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

14 Dataflow models (2) Also a graph representation: Nodes & edges are called actors & channels Implicit repetition, common in streaming, signal processing applications Communication: only through channels Multiple flavors of models: Rules that determine when an actor fires A graph models multiple possible executions: Processor What now? Time 14 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

15 Dataflow models (3) Synchronous Dataflow (SDF): every actor has a fixed behavior 2 e a1 a2 a3 e1 e3 6 2 e2 1 a3 always writes 1 token to e4 Cyclo-Static Dataflow (CSDF): every actor has a set of fixed behaviors {1,0,0} e2 1 1 e1 {0,1,0} 1 a1 a2 a3 {0,2} e3 a1: writes 1 token, then 0, then 0 to e2 15 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

called processes e4 p1 p2 p3 e1 e3 p1: writes any amount of tokens to e2

16 Dynamic models and Kahn Process Networks Dynamic dataflow: set of firing rules per actor i1 a1 a2 a3 i2 Kahn process networks (KPN): nodes are now called processes e4 p1 p2 p3 e1 e3 p1: writes any amount of tokens to e2 at any time e2 16 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

17 Outline Heterogeneous systems Dataflow models Analysis and synthesis Summary 17 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

Analysis and synthesis MEM subsystem DMAs, semaphores PMU L1 A15 A15 L1 L2 Application Architecture model L1 A15 A15 L1 VLIW DSP L1,L2 NoC Peripherals Non-functional specification Communication

18 Analysis and synthesis MEM subsystem DMAs, semaphores PMU L1 A15 A15 L1 L2 Application Architecture model L1 A15 A15 L1 VLIW DSP L1,L2 NoC Peripherals Non-functional specification Communication support HW queues Network Processor Packet DMA cf. Silexica s tool flows Analysis Synthesis Code generation Property models (timing, energy, error, ) PNargs_ifft_r.ID = 6U; PNargs_ifft_r.PNchannel_freq_coef = fi PNargs_ifft_r.PNnum_freq_coef = 0U; PNargs_ifft_r.PNchannel_time_coef = si PNargs_ifft_r.channel = 1; sink_left = IPCllmrf_open(3, 1, 1); sink_right = IPCllmrf_open(7, 1, 1); PNargs_sink.ID = 7U; PNargs_sink.PNchannel_in_left = sink_l PNargs_sink.PNnum_in_left = 0U; PNargs_sink.PNchannel_in_right = sink_ PNargs_sink.PNnum_in_right = 0U; taskparams.arg0 = (xdc_uarg)&pnargs_s taskparams.priority = 1; ti_sysbios_knl_task_create((ti_sysbios_kn &taskparams, &eb); glob_proc_cnt++; hasprocess = 1; taskparams.arg0 = (xdc_uarg)&pnargs_f taskparams.priority = 1; ti_sysbios_knl_task_create((ti_sysbios_kn ft_templ, &taskparams, &eb); glob_proc_cnt++; hasprocess = 1; taskparams.arg0 = (xdc_uarg)&pnargs_i taskparams.priority = 1; ti_sysbios_knl_task_create((ti_sysbios_kn fft_templ, &taskparams, &eb); glob_proc_cnt++; hasprocess = 1; taskparams.arg0 = (xdc_uarg)&pnargs_s taskparams.priority = 1; 18 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

19 Example for SDFs Compute topology matrix, and solve system of equations Solution: repetition vector serve to unroll the graph [1 3 2] Perform mapping and scheduling on the resulting directed acyclic graph (DAG) 19 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

20 Example for KPNs: Static & dynamic analysis Need to understand process interactions Unrolled CFG for read(&c1); f3(...); write(&c3); f4(...); write(&c3); [Castrillon14] f1(...); write(&c2); read(&c2); f5(...); write(&c4); read(&c0); f2(...); write(&c1); read(&c3); f6(...); read(&c3); f7(...); read(&c4); R4 R3 R2 R4 R3 R2 R1 Unrolled CFG for f2 f1 f5 R1 f1 f1 f1 f2 f3 f1 f6 f7 f7 f7 f7 f4 f1 f5 f3 f6 f7 f7 f7 f7 f4 f5 f5 f5 f5... t2 t t t J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

21 KPN & DDF: Tracing Dynamic analysis based on execution traces [Castrillon10/13, Brunet15, Singh15] DAG representation for further analysis and synthesis 21 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

22 Models from functional specification Inspect functional specification of actors/processes (cf. 2 nd talk) Instrumentation, emulation, cost models, For timing, energy, errors, Example [SAMOS14] Profiling: Execution counts, branch stats 22 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

Castrillon11] Algorithm library Plain code Plain code.

23 Models based on algorithmic descriptions When functional specification is not meant for synthesis Common/required in heterogeneous systems (special components) Need to match algorithms to hardware components [Castrillon10b, Castrillon11] Algorithm library Plain code Plain code Platform model + characterization of special components N: Algorithmic actors F: Existing implementation in target platform 23 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

24 Multiple-applications Use traces and mappings to reason about platform sharing 24 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

25 Multiple-applications (2) Quickly discard bad multi-application configurations by observing the platform utilization profiles 25 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

26 Outline Heterogeneous systems Dataflow models Analysis and synthesis Summary 26 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

27 Summary Need programming models (and HW/SW stacks) to handle heterogeneity Even more dramatic in the post-cmos era Dataflow models Natural way to describe some applications Amenable to analysis and synthesis for parallel execution Discuss different kinds of models and required analysis Need models of hardware for synthesis 27 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

28 Acknowledgements German Cluster of Excellence: Center for Advancing Electronics Dresden ( Collaborative research center (CRC): Highly Adaptive Energy-Efficient Computing (HAEC) CRC 912: Highly Adaptive Energy- Efficient Computing Silexica Software Solutions GmbH 28 J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

29 References [Trommer15] Trommer, et al. "Functionality-Enhanced Logic Gate Design Enabled by Symmetrical Reconfigurable Silicon Nanowire Transistors, IEEE Trans on in Nanotechnology, pp , July 2015 [Voigt14] Andreas Voigt, et al. Towards Computation with Microchemomechanical Systems, In International Journal of Foundations of Computer Science, World Scientific, volume 25, 2014 [Augonnet10] C. Augonnet, et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Concurrency and Computation: Practice and Experience 23.2 (2011), pp [Kwok99] Y.-K. Kwok, et al, Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors, ACM Comput. Surv., ACM Press, 1999, 31, [Castrillon14] J. Castrillon, et al, Programming Heterogeneous MPSoCs: Tool Flows to Close the Software Productivity Gap, Springer, 2014 [Castrillon10] J. Castrillon, et al, Trace-based KPN composability analysis for mapping simultaneous applications to MPSoC platforms, DATE 10, pp , 2010 [Castrillon13] J. Castrillon, et al, MAPS: Mapping concurrent dataflow applications to heterogeneous MPSoCs, IEEE Trans on Industrial Informatics, vol. 9, no. 1, pp , 2013 [Brunet15] Brunet, Simone C. Analysis and optimization of dynamic dataflow programs. Diss. ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE, 2015 [Singh15] Singh, Amit, et al. "Resource and Throughput Aware Execution Trace Analysis for Efficient Run-time Mapping on MPSoCs., TCAD 2015 [SAMOS14] J.F. Eusse, et al, "Pre-architectural performance estimation for ASIP design based on abstract processor models," Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), 2014 pp , 2014 [Castrillon10b] J. Castrillon, et al, Component-based waveform development: The nucleus tool flow for efficient and portable SDR, Wireless Innovation Conference and Product Exposition (SDR), 2010 [Castrillon11] J. Castrillon, et al, Component-based waveform development: The nucleus tool flow for efficient and portable software defined radio, Analog Integrated Circuits and Signal Processing, vol. 69, no. 2 3, pp , J. Castrillon. Dataflow programming. PACT Tutorial. Oct 2015

Programming Heterogeneous Embedded Systems for IoT

Programming Heterogeneous Embedded Systems for IoT Jeronimo Castrillon Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de Get-together toward a sustainable collaboration in IoT