Dirk Tetzlaff Technical University of Berlin

Size: px

Start display at page:

Download "Dirk Tetzlaff Technical University of Berlin"

Cory Davis
5 years ago
Views:

1 Software Engineering g for Embedded Systems Intelligent Task Mapping for MPSoCs using Machine Learning Dirk Tetzlaff Technical University of Berlin 3rd Workshop on Mapping of Applications to MPSoCs June 30th,

2 Task Mapping for MPSoCs Optimal solving NP complete Genetic/Evolutionary et o Algorithms: many iterations [Y09],[YH09] Common heuristics: do not fit to special MPSoCs ILP modeled: computational complex [YH08],[VM03] Requires information about runtime behavior Static analyses: over approximate Profiling: strongly input data dependent and expensive Use Machine Learning (ML) Dirk Tetzlaff June 30th,

3 Outline ML based Compilation Intelligent Task Mapping Learning Task Graph Mapping Experiments Results Conclusions Dirk Tetzlaff June 30th,

4 ML based Compilation Extraction Code Features Function: Features Behavior Programs ML Execute Training Phase Compile Phase Program Profiling Extraction Behavior Code Features Behavior Predictions Dirk Tetzlaff June 30th,

5 Outline ML based Compilation Intelligent Task Mapping Learning Task Graph Mapping Experiments Results Conclusions Dirk Tetzlaff June 30th,

6 Intelligent Task Mapping Use Machine Learning (ML) provides compiler with knowledge of runtime behavior fast and precise heuristics 1) Learn unknown loop bounds Ã Reduce communication overhead 2) Learn execution times of tasks Ã Reduce power consumption 3) Learn best performing Processing Element (PE) Ã Treat heterogeneous MPSoCs Dirk Tetzlaff June 30th,

7 Code Features Unknown loop bounds Structure of loop bounds, number of loop exit branches, size of referenced arrays, file IO Execution times of tasks Lt Latency of the most probable bbl path, fraction of control instructions, loop nesting, amount of interprocessor communication Best performing PE depends on architectural differences Caches Ã e.g. sizes of loop bodies Functional units Ã e.g. fraction of corresponding operations Dirk Tetzlaff June 30th,

8 Task Graph Mapping Execution time t i for task T i Interprocessor communication Amount a(t i, T j ) Cost c(t i, T j ) Runtime r(t i, T j ) i j t 0 T 0 t 1 T 1 t 2 T 2 t 3 T 3 a(t 3, T 4 ) t T 4 4 sequential: t i + t j + a(t i, T j ) * c(t i, T j ) parallel on different PEs: max(t i, t j ) + a(t i, T j ) * c(t i, T j ) T 3 T 4 parallel on same PE: t i + t j Latency weighted list scheduling Map tasks to PEs with minimum i penalty T 1 CP U 3 T 2 T 3 Dirk Tetzlaff June 30th,

9 Intelligent Task Mapping Program MPSoC time T 0 T 1 T 2 T 3 Behavior Predictions Mapping T 0 PE 0 PE 1 T 1 T T 3 2 T 4 time PE 2 PE 3 T 4 Benefits Communication aware i Power efficient Dirk Tetzlaff June 30th,

Implementation Compiler framework: CoSy Feature extraction Static branch prediction Path profiling Machine Learning: R Project Predictor construction

10 Implementation Compiler framework: CoSy Feature extraction Static branch prediction Path profiling Machine Learning: R Project Predictor construction supervised classification learning Program classification [AG09] hierarchical clustering to minimize inner cluster error Dirk Tetzlaff June 30th,

11 Experiments Learning of unknown loop bounds 66 programs from Ptrdist [A95], MiBench [GR01], SPEC CPU{95,2000,2006} }benchmark suites 7970 loops analysed 1 98 million iteration counts 115 loop features Loop iterations classified using truncated log Ã class Ã class million Ã class 8 Dirk Tetzlaff June 30th,

Experimental Results Self evaluation (self)

12 Experimental Results Self evaluation (self) Validation without program classification (val) Validation with program classification (pc val) 2 Mean absolute Error Correlation self val pc val self val pc val Dirk Tetzlaff June 30th,

13 Outline ML based Compilation Intelligent Task Mapping Learning Task Graph Mapping Experiments Results Conclusions Dirk Tetzlaff June 30th,

14 Conclusions Compilation with knowledge of runtime behavior via ML Unknown loop bounds Execution timesof tasks Best performing PE Intelligent t task mapping Communication aware Power efficient Experimental results Precise prediction of runtime behavior (error < 1 class) Dirk Tetzlaff June 30th,

15 References [YH08] H. Yang and S. Ha, ILP based data parallel multi task task mapping/scheduling technique for MPSoC, ISOCC 08 [VM03] G. Varatkar and R. Marculescu, Communication aware task scheduling and voltage selection for total systems energy minimization,, ICCAD 03 [Y09] M. Yoo, Real time task scheduling by multiobjective genetic algorithm, Journal. of System a. Software, 2009 [YH09] H. Yang and S. Ha, Pipelined data parallel task mapping/scheduling technique for MPSoC, DATE 09 [AG09] L. Alvincz, S. Glesner, Breaking the curse of static analysis: making compilers intelligent via Machine Learning, Proc. of SMART'09, 2009 [A95] T. Austin, et al., The pointer intensive intensive benchmark suite, 1995, dist.html. [GR01] M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown, Mibench: A free, commercially representative embedded benchmark suite, Workshop on Workload Characterization, [RRG07] C. Roig, A. Ripoll, and F. Guirado, A new task graph model for mapping message passing applications, IEEE Transactions on Parallel and Distributed Systems, vol. 18, no. 12, 2007 Dirk Tetzlaff June 30th,

16 Appendix Dirk Tetzlaff June 30th,

17 Estimating the Program Behavior Over Approxi mation Over Specialization Possible Program Behavior Considered Behavior Realistic Behavior Program Analysis considers all cases safe, but imprecise Machine learned dheuristics i considers realistic cases precise, but unsafe Profiling considers only one case (too) precise, but unsafe Dirk Tetzlaff June 30th,

18 Program Classification [AG09] One predictor for all kinds of programs? Better: group similar programs, one predictor per group Program classification (ML: unsupervised clustering) Input: set of programs s( (from the suite) distance measure/distance matrix Output: program classes Which programs are (dis )similar? similar il programs should ldbe able to explain each other s behavior bh Define similarity based on mutual predictability Dirk Tetzlaff June 30th,

19 Program Classification: Clustering [AG09] Mutual predictability: train one predictor for each program p i of the suite apply each predictor to every program p i, compare predicted and correct classes mean deviation error Result: distance matrix Dirk Tetzlaff June 30th,

20 Combination of Predictors [AG09] n programs n training sets n predictors How to obtain one predictor? merge n training sets D i to one, train predictor build a composite predictor: consult all predictors and vote take majority vote (if not unique, take min/max); take mean vote Dirk Tetzlaff June 30th,

Predicting Program Phases and Defending against Side-Channel Attacks using Hardware Performance Counters

Predicting Program Phases and Defending against Side-Channel Attacks using Hardware Performance Counters Junaid Nomani and Jakub Szefer Computer Architecture and Security Laboratory Yale University junaid.nomani@yale.edu