Application-Platform Mapping in Multiprocessor Systems-on-Chip

Size: px

Start display at page:

Download "Application-Platform Mapping in Multiprocessor Systems-on-Chip"

Baldric Newton
6 years ago
Views:

1 Application-Platform Mapping in Multiprocessor Systems-on-Chip Leandro Soares Indrusiak CREDES Kick-off Meeting Tallinn - June 2009

2 Application-Platform Mapping in MPSoC L. S. Indrusiak Outline Motivation Application-Platform Mapping Multiprocessor System-on-Chip Platforms Application Modelling System Validation Conclusions and Future Work 2

3 Application-Platform Mapping in MPSoC L. S. Indrusiak Motivation Application-Platform mapping isn t really a problem when dealing with sequential software and uniprocessor hardware application platform 3

4 Application-Platform Mapping in MPSoC L. S. Indrusiak Motivation It has been studied for decades by researchers in parallel and distributed computing platform 1 platform 3 platform 2 platform 4 application application application application application

5 Application-Platform Mapping in MPSoC L. S. Indrusiak Motivation CMOS limits can be felt in many ways number of transistors clock frequency power dissipation Expectations of performance increase must be met somehow multiprocessing seems to be the best shot transistors (x 10 3 ) clock freq. (MHz) power (W)

6 Application-Platform Mapping in MPSoC L. S. Indrusiak Motivation application platform Application-Platform mapping is a critical issue when it comes to multiprocessor platforms? IDLE IDLE IDLE REALLY? 6

7 Application-Platform Mapping in MPSoC L. S. Indrusiak Motivation Sources: Gordon ASPLOS 06, Kudlur PLDI Unicore Homogeneous Multicore Heterogeneous Multicore PicoChip AMBRIC CISCO CSR1 NVIDIA G80 Larrabee # cores/chip C/C++/Java??? Opteron 4P AMD Fusion BCM 1480 Core2Quad Xbox 360 Xeon 4004 Power4 PA8800 Power Opteron Pentium P2 P3 P4 Core2Duo CoreDuo RAW Athlon Niagara RAZA XLR Core Itanium Itanium2 Cavium Cell

8 Application-Platform Mapping in MPSoC L. S. Indrusiak Motivation application platform Application-Platform mapping is a critical issue when it comes to multiprocessor platforms must be able to explore concurrency at the application level 8

9 Application-Platform Mapping in MPSoC L. S. Indrusiak Motivation application platform Application-Platform mapping is a critical issue when it comes to multiprocessor platforms can be done dynamically to improve performance or to increase dependability 9

10 Application-Platform Mapping in MPSoC L. S. Indrusiak Application-Platform Mapping The simplest formulation of the mapping problem resembles a graph isomorphism problem application is a graph G = G(A, C), where a i A is an application task and c i,j C represents the communication from a i to a j platform is a graph G = G(B, D), where b i B is a processor and d i,j D represents the channel from b i to b j objective is to map tasks onto processors such that task that communicate with each other lie on adjacent processors There is no known polynomial time solution for the graph isomorphism problem 10

11 Application-Platform Mapping Application-Platform Mapping in MPSoC L. S. Indrusiak More sophisticated problem formulations may include additional information to the application and platform graphs, so that different objectives can be met platform: processing power of b i power consumption and latency of d i,j application: computational cost and deadline of a i volume, max latency and required bandwidth of c i,j objectives: reduce bandwidth requirements on the platform, reduce power consumption, balance thermal dissipation, etc. 11

12 Application-Platform Mapping in MPSoC L. S. Indrusiak Application-Platform Mapping Platform models over-simplify the complex design space of on-chip multiprocessor platforms All approaches disregard the temporal dimension application and platform models must include further details on time and concurrency 12

Application-Platform Mapping in MPSoC L. S. Indrusiak Multiprocessor System-on-Chip Platforms Many architectural alternatives homogeneous vs.

13 Application-Platform Mapping in MPSoC L. S. Indrusiak Multiprocessor System-on-Chip Platforms Many architectural alternatives homogeneous vs. heterogeneous processing shared memory vs. distributed memory on-chip interconnect point-to-point, crossbar on-chip bus network-on-chip source: Intel 13

14 Application-Platform Mapping in MPSoC L. S. Indrusiak Multiprocessor System-on-Chip Platforms Cell Processor It has nine processors: one Power Processor Element (P) and eight Synergistic Processing Elements (S) P has separated I & D L1 cache (32KB each) Each S can only access its 256KB of local storage (LS) and uses its memory flow controller (MFE) to perform DMA operations to/from LS (non-blocking) Bandwidth (3.2 GHz) S <-> LS = 2x 25.6 GB/s MFC <-> EIB = 2x 25.6 GB/s MIC <-> EIB = 2x 25.6 GB/s L1 <-> L2 = 2x 51.2 GB/s 14

15 Application-Platform Mapping in MPSoC L. S. Indrusiak Multiprocessor System-on-Chip Platforms ARM Cortex-9 MPCore Can have 1-4 Cortex-A9 cores with separated I & D L1 cache (32KB each) Data caches of all cores are fully coherent Interface to external components be made cache-coherent On-chip interconnect based on AMBA standard 15

16 Application-Platform Mapping in MPSoC L. S. Indrusiak Multiprocessor System-on-Chip Platforms As the number of cores increase, on-chip communication becomes a major issue Source: ITRS Roadmap 16

17 Application-Platform Mapping in MPSoC L. S. Indrusiak Multiprocessor System-on-Chip Platforms Current solution: reusability standard processors reusable IP cores reusable on-chip interconnects OS application middleware OS OS OS reusable interconnect Networks-on-Chip (NoC) multi-hop, packet-based interconnect R R R scalable, high bandwidth low power (shorter wires) R R R regular, reusable R R R 17

18 Application-Platform Mapping in MPSoC L. S. Indrusiak Multiprocessor System-on-Chip Platforms Networks-on-Chip: design decisions topology switching packetisation arbitration flow control buffering routing R R R R R R R R R R R R R R R R R 18

19 Application-Platform Mapping in MPSoC L. S. Indrusiak Multiprocessor System-on-Chip Platforms Networks-on-Chip for 3D architectures all previous alternatives and more may integrate dies produced with different technologies asymmetry between horizontal and vertical channels increased locality thermal issues become critical load balancing 19

20 Application-Platform Mapping in MPSoC L. S. Indrusiak Multiprocessor System-on-Chip Platforms Design decisions must be evaluated using abstract models of the platform cycle-accurate cycle-approximate untimed/functional structural/graph Modeling trade-offs accuracy observability speed 20

21 Application-Platform Mapping in MPSoC L. S. Indrusiak Multiprocessor System-on-Chip Platforms Cycle-accurate and cycle-approximate models flit-level accuracy observability: buffer occupation, latency, throughput,... graphical interface to experiment with topology, routing, buffering,... 21

22 Application-Platform Mapping in MPSoC L. S. Indrusiak Multiprocessor System-on-Chip Platforms Coding techniques to reduce bit transition activity over NoC interconnects experimentation of alternative coding techniques aiming to reduce the power consumption in NoC interconnects reductions of up to 60% of bit transition were achieved 22

23 Application-Platform Mapping in MPSoC L. S. Indrusiak Multiprocessor System-on-Chip Platforms Simplified models packet-level accuracy, 1-position buffers complete packet is abstracted by header and trailler header s way through the network is fully simulated, trailler latencies are calculated less than 5% error for average latency (compared with cycle-accurate HDL model) fully analytical latencies of packet delivery are calculated upon packet creation and refined until packet delivery function of network occupation, route, buffer sizes, switching and arbitration techniques real-time analysis static analysis of worst-case interference between network flows 23

31 Application-Platform Mapping in MPSoC Application Modeling Compilers can only go so far on extracting parallelism from sequential code Developers must have the means to specify the inherent concurrency of each particular application avoiding pitfalls such as deadlocks or undesired non-determinism application L. S. Indrusiak platform IDLE IDLE IDLE IDLE IDLE IDLE IDLE IDLE IDLE IDLE IDLE IDLE IDLE IDLE IDLE IDLE IDLE IDLE IDLE 31

32 Application-Platform Mapping in MPSoC Application Modeling application L. S. Indrusiak application Many concurrent programming models available threads concurrent processes with message-passing actors streams/dataflow CSP timed automata <add your favorite here> 32

33 Application-Platform Mapping in MPSoC L. S. Indrusiak Application Modeling The choice of which concurrent programming model should be adopted depends on application domain familiarity by the designer/developer availability of stable flows and tools model tools threads libraries in Java and C++ message passing OpenMP actors Simulink, Ptolemy streams / dataflow StreamIt, CUDA timed automata UPAAL 33

34 Application-Platform Mapping in MPSoC L. S. Indrusiak Application Modeling period=4ms period=1ms Application models are abstract representations of a program, which can be used both at design time and at runtime A period=2ms B D can be strongly influenced by the programming model A1 A2 D1 B1 D2 C period=1ms A3 A4 D3 B2 C1 D4 34

35 Application-Platform Mapping in MPSoC L. S. Indrusiak Application Modeling Executable models fast execution rich set of modeling constructs, different abstraction levels clearly defined execution semantics, so that the dynamic behavior of the application can be properly validated Formal properties models of concurrent computation concurrency constraint through the definition of ordering relations polymorphic type systems 35

36 Application-Platform Mapping in MPSoC L. S. Indrusiak Concurrent Models of Computation Actor orientation, as proposed by E. A. Lee, UC Berkeley revisited fundamental concepts from Atkinson & Hewitt (MIT, 1977) and Gul Agha (MIT, 1986) Actor orientation basics execution semantics of a given system model was factored out from the individual components of that model execution semantics is associated to a well defined Model of Computation (MoC) heterogeneous models can be created through the hierarchical composition of MoCs 36

37 Application-Platform Mapping in MPSoC L. S. Indrusiak Actor-orientation execution semantic of the top level model execution semantic of the composite model 37

38 Application-Platform Mapping in MPSoC L. S. Indrusiak Application Modeling Extending actor-orientation with types and explicit ordering UML suitable visual representation for the definition of polymorphic type systems (class diagrams) and ordering relations (sequence diagram) but UML is not an executable specification language? 38

39 Application-Platform Mapping in MPSoC L. S. Indrusiak UML sequence diagrams within actors Recall the formal definition of MSC tuple P, E, C, l, m, < where P is a finite set of processes E is a finite set of events C is a finite set of names for messages l: E T = { p!q(a), p?q(a), p(a) p q P, a C } m: S R < E E is a acyclic relation between events consisting of: a total order on EP for every p P, and s < r, whenever m(s)=r 39

40 Application-Platform Mapping in MPSoC L. S. Indrusiak UML sequence diagrams within actors Recall the formal definition of MSC order of events (message occurrence) within a process (lifeline) is a total order the reflexive-transitive closure of < (denoted as <*) is a partial order on the complete set E enough for the definition of an untimed model of computation different possibilities were explored and integrated as a library of directors on an extended version of Ptolemy II 40

41 Application-Platform Mapping in MPSoC L. S. Indrusiak Case study Application modeling based on behavioral patterns and polymorphic type systems application functionality described using actors constraints to concurrent execution described using UML 41

42 Application-Platform Mapping in MPSoC L. S. Indrusiak System Validation Joint validation of application and platform models Application is backannotated with communication costs and power consumption data from the execution of platform models MAPR 42

43 Application-Platform Mapping in MPSoC L. S. Indrusiak System Validation Mapping heuristic becomes a design decision experiments show that in large multiprocessor platforms, the average communication latency of a given application can vary by 2 orders of magnitude according to the employed mapping heuristic MAPR 43

44 Application-Platform Mapping in MPSoC L. S. Indrusiak Case study Application and platform modeling on Ptolemy II multiple MoCs supports the evaluation of how different platforms execute a given application supports the evaluation of different mapping heuristics synthetic example: autonomous vehicle 44

45 Application-Platform Mapping in MPSoC L. S. Indrusiak Conclusions and Future Work Extensions to the state-of-the art on system-level specification and validation combination of formalism, simulation/execution and analytical solution Extensions on application and platform models allow for richer mapping mechanisms temporal constraint based on concurrent MoC and explicit ordering spacial constraint based on functional types 45

46 Application-Platform Mapping in MPSoC L. S. Indrusiak Conclusions and Future Work Further progress must be done in mapping to fully explore information in application and platform models time and concurrency functional constraints Must explore different implementation styles, aiming to achieve more reliable platforms static vs. dynamic mapping centralised vs. distributed mapping 46

CS758: Multicore Programming

CS758: Multicore Programming Introduction Fall 2009 1 CS758 Credits Material for these slides has been contributed by Prof. Saman Amarasinghe, MIT Prof. Mark Hill, Wisconsin Prof. David Patterson, Berkeley