Reliable Dynamic Embedded Data Processing Systems

Size: px

Start display at page:

Download "Reliable Dynamic Embedded Data Processing Systems"

Tyler Wiggins
5 years ago
Views:

1 2 Embedded Data Processing Systems Reliable Dynamic Embedded Data Processing Systems sony Twan Basten thales Joint work with Marc Geilen, AmirHossein Ghamarian, Hamid Shojaei, Sander Stuijk, Bart Theelen, Yang Yang and others Funding: NWO PROMES EC FP6 Betsy, FP7 MNEMEE SenterNovem Octopus 3 sony Embedded Data Processing Systems asml océ Knowing is not understanding. Charles Kettering apple 4 Embedded Data Processing Systems philips 2

2 5 Trend: Dynamic Behavior 6 Multiprocessor systems / networking / concurrency Application convergence / application evolution Intra-application dynamism Dataflow analysis What is the optimal configuration? - given applications starting/stopping over time - given data-dependent execution times - given varying communication bandwidth - given battery status - Intra-application dynamism Inter-application dynamism 7 Dataflow Analysis: Challenges 8 Synchronous Data Flow Graphs (SDFGs [Lee 986]) predictability and efficiency are contradictory rate token actor channel execution time A, B,2 C,2

3 9 Scenario-aware MPEG-4 SP Decoder Model 0 (FSM-based Scenario-Aware DataFlow (SADF)) Control channel Fixed rate Kernel Data channel VLD d d d IDCT Parameterized rate a c c b e FD MC RC Detector... Tokens 3 x={30,40,50,60,70,80,99} Rate I P x P 0 VLD IDCT MC Execution time P 0 0 others 40 P 0 0 others 7 I, P 0 0 P P P P P P P I 350 Throughput analysis for SDF I P 99 P 99 a 0 0 b 0 x 0 RC P 0 0 P 30, P 40, P c 99 x P P 0 MEMOCODE 2006 d 0 e 99 x 0 P 70, P 80, P FD All 0 ACSD 2006 Synchronous Data Flow Graphs 2 SDF Throughput Analysis (SDFGs [Lee 986]) rate token actor channel execution time A, B,2 C,2 Self-timed execution: A Transient Phase A, C B A, C A Periodic Phase B B A B Throughput: average number of actor firings over time Iteration: smallest non-empty set of actor firings that does not change the token distribution (example: A:3, B:2, C:) Throughput can be calculated from the periodic phase Efficient implementation Consider one designated firing per iteration only to detect a recurrent state

4 3 Throughput Analysis: Our Result 4 Traditional methods State Space Dasdan Gupta Howard Young Tarjan Orlin MP3 dec Modem Storage-throughput trade-offs Sample Rate >800 >800 >800 Satellite >800 >800 >800 H.263 decoder >800 >800 >800 Runtimes of various throughput analysis methods (in seconds) IEEE T Comp Lessons to be Learned Repeated application of throughput analysis Causal dependencies potentially reducing performance 6 Trade-off: Storage vs. Throughput 2 α 3 β 2 A, B,2 C,2 Tokens on channels must be stored in memory. Separate memory for each channel. Storage distribution, for example, α,β, 4,2

5 7 Trade-off: Storage vs. Throughput put through 2 α 3 β 2 A, B,2 C, ,2 5,3 6,3 6,2 7, distribution size Find all minimal storage distributions for any possible throughput 8 Dependency Graph firings in one period sp - space tk - token B 0 2 α 3 β 2 A, B,2 C,2 sp(β) Storage distribution α,β 4,2 sp(α) tk(α) tk(β) C 0 A 2 tk(α) B sp(α) A 3 A tk(a) 4 dependency: an actor firing start may depend on an actor firing end cycles denote storage dependencies 9 Dependency Graph 20 Abstract Dependency Graph resolving storage dependencies may increase throughput firings in one period sp(β) tk(β) C 0 dependency: an actor firing start may depend on an actor firing end dependencies between actors A B C tk(α) tk(β) tk(a) sp(α) sp(β) tk(β) sp(β) C 0 abstract dependency graph contains all storage dependencies easily constructed from state-space exploration B 0 sp(α) A 2 tk(α) B B 0 sp(α) A 2 tk(α) B sp - space tk - token tk(α) sp(α) A 3 A tk(a) 4 cycles denote storage dependencies dependency graph may be very large tk(α) sp(α) A 3 A tk(a) 4

6 2 Experimental Results 22 Trade-off Analysis Example Satellite MP3 H.263 #actors #channels min throughput > 0 /7 0.8 /37 /2876 Start small, recursively resolve bottlenecks Fast in general, approximation possible Generalizable to other resources, computational models Distr. size Max throughput Approximation / /32 /0000 Distr. size increase 0buffers with 544 n times the minimal step size #Pareto points #Distributions Exec. time ms 7ms 2ms 53min Scenario-aware Throughput Analysis SDF throughput analysis considers worst-case actor execution times Scenarios Scenario-aware throughput analysis considers worst-case execution times per scenario, but it must consider scenario transitions Analyzing all scenario transitions separately can be avoided Separate scenario transitions with an invariant reference schedule ACM TECS 200

MP3 Result 26 Pointers SDF model: 9.28 Mcycles per frame SADF model: 5.83 Mcycles per frame Yang Yang: shared resources, trade-off analysis Sander Stuijk: SDF3 toolset, http://www.es.ele.tue.

7 MP3 Result 26 Pointers SDF model: 9.28 Mcycles per frame SADF model: 5.83 Mcycles per frame Yang Yang: shared resources, trade-off analysis Sander Stuijk: SDF3 toolset, Future: shared resources + scenarios + trade-off analysis L-L L-S S-L S-S M Run-time Application Management - Applications starting/stopping over time Inter-application dynamism - 6 procs, slow (ck) and fast clocks (ck2) - Minimize energy under time constraints Energy (3,ck2) (2,ck2) (,ck2) (9,ck2) (8,ck2) (7,ck2) Run-time adaptation (3,ck2) (2,ck2) Const (,ck2) (5,ck) (4,ck) (3,ck) (6,ck2) (5,ck) (4,ck) (3,ck) (6,ck2) (5,ck2) (4,ck2) JCM Completion time (0,ck) (9,ck) (3,ck2) (2,ck2) (,ck2) (8,ck) Gantt chart (5,ck) DAC 2009 (4,ck) (3,ck) Join Const Min enforce the same clock

8 29 Pareto Algebra 30 Pareto-Algebraic Characterization of run-time adaptation Goal: Compositional computation of trade-offs product - derive constrain abstract - minimize component trade-o off spaces adapt select and configure system trade-off space (at the time of a change) 3 Pareto-Algebraic Characterization of run-time adaptation 32 Complexity n components with p Pareto points each components product - derive constrain abstract - minimize select and configure X derive constrain abstract min component trade-o off spaces take a product and on-the-fly derive constrain abstract - minimize adapt O(p 2n ) system trade-off space (at the time of a change)

9 33 Complexity n components with p Pareto points each take a product and on-the-fly derive constrain abstract - minimize 34 Complexity Control keep n and p small Compositional reasoning, among others Approximate Pareto sets O(p 2n ) y/x = c 2 components X constrain derive-abstract min y = y/x = c 4 x = 35 Multi-dimensional Multiple-choice Knapsack Problem 36 A Parameterized Compositional Heuristic MMKP NP hard - One optimization objective (value) - Multiple resource dimensions with capacity constraints - Multiple independent applications - Multiple independent configurations per application Pick one configuration per application optimizing value within constraints Project all resource dimensions into one dimension Approximate Pareto set in 2-dimensional space y/x = c 2 y/x = c 4 y = x = product-constrain-derive-abstract-minimize

10 37 A Parameterized Compositional Heuristic Project all resource dimensions into one dimension Approximate Pareto set in 2-dimensional space parameter p: maintain (at most) p Pareto points Allows to budget analysis time analysis time c n p 2 (n number of applications; c platform constant) product-constrain-derive-abstract-minimize 38 par p time Controlling Time Budgets budgets T=ms t=5ms T=20ms t=00ms values I I2 I3 I4 I5 I6 MMKP benchmark: Always within budget Good values Similar results to the IMEC, SOC 2006, non-compositional state-of-the-art heuristic SimIt-ARM simulator 206 Mhz StrongARM 39 Compositionality 40 Conclusions time(ms) time(ms) Applications start at various times 5 at a time (a), at a time (b) IMEC values 2353 Pareto 3224 what about applications stopping? # of active applications values always at least as good as IMEC heuristic # of active applications Goal: reliable, resource efficient data processing systems Dynamic behavior complicates analysis Intra-application dynamism: scenario-aware dataflow analysis and design-space exploration Inter-application dynamism: Pareto-algebra-based run-time adaptation

11 4 Thank you! Questions? More info: SDF3 toolkit: Pareto calculator: An understanding of the natural world and what's in it is a source of not only a great curiosity but great fulfillment. David Attenborough

Reliable Embedded Multimedia Systems?

2 Overview Reliable Embedded Multimedia Systems? Twan Basten Joint work with Marc Geilen, AmirHossein Ghamarian, Hamid Shojaei, Sander Stuijk, Bart Theelen, and others Embedded Multi-media Analysis of