Declarative Machine Learning for Energy Efficient Compiler Optimisations

Size: px

Start display at page:

Download "Declarative Machine Learning for Energy Efficient Compiler Optimisations"

Marilynn Benson
5 years ago
Views:

1 Declarative Machine Learning for Energy Efficient Compiler Optimisations 1 Year PhD Review Craig Blackmore Supervised by Dr. Oliver Ray and Dr. Kerstin Eder 28 th October 2015

2 Introduction Motivation: reduce execution time and energy consumption of imperative programs

3 Introduction Motivation: reduce execution time and energy consumption of imperative programs Current focus: execution time of C programs for embedded architectures

4 Introduction Motivation: reduce execution time and energy consumption of imperative programs Current focus: execution time of C programs for embedded architectures Methodology: use machine learning to tune compiler parameters for a given program

5 Introduction Motivation: reduce execution time and energy consumption of imperative programs Current focus: execution time of C programs for embedded architectures Methodology: use machine learning to tune compiler parameters for a given program Research hypothesis: relational representation of source code allows learning of more accurate models by Inductive Logic Programming (ILP)

6 Compiling C Programs with GCC $ gcc program.c

7 Compiling C Programs with GCC $ gcc program.c $ time./a.out

8 Compiling C Programs with GCC $ gcc program.c $ time./a.out 1.70s

9 Compiling C Programs with GCC $ gcc program.c $ time./a.out 1.70s $ gcc program.c -O3 $ time./a.out 0.44s

10 Compiling C Programs with GCC $ gcc program.c $ time./a.out 1.70s $ gcc program.c -O3 $ time./a.out 0.44s $ gcc program.c -O3 -fno-guess-branch-probability $ time./a.out 0.27s

11 Compiling C Programs with GCC $ gcc program.c $ time./a.out 1.70s $ gcc program.c -O3 $ time./a.out 0.44s Configuration = set of compiler flags $ gcc program.c -O3 -fno-guess-branch-probability $ time./a.out 0.27s

12 Compiling C Programs with GCC $ gcc program.c $ time./a.out 1.70s $ gcc program.c -O3 $ time./a.out 0.44s Configuration = set of compiler flags $ gcc program.c -O3 -fno-guess-branch-probability $ time./a.out 0.27s How do we know which optimisations to use?

13 Random Iterative Compilation Better than O3 ARM Cortex-M3

14 Random Iterative Compilation 14% average possible improvement vs O3 (Up to 58% improvement) Better than O3 ARM Cortex-M3

15 Challenges Large search space Over 100 optimisation flags Over configurations Over years to exhaustive search Complex interactions between flags Good solutions program/platform dependent

16 Energy-time Correlation Single-threaded benchmarks Simple pipeline

17 Compiler Tuning Methods O1 O2 O3 Standard optimisation levels BEST EXECUTION TIME Milepost Predictive Methods Random Search Iterative Compilation Exhaustive Search (infeasible) SEARCH TIME

18 Compiler Tuning Methods O1 O2 O3 Standard optimisation levels BEST EXECUTION TIME Milepost ILP Hybrid Predictive Methods Random Search Iterative Compilation Exhaustive Search (infeasible) SEARCH TIME

19 Program Representation int fac(int x) { int y = 1; while (x > 1) { y = y*x; x = x-1; } return y; }

20 Program Representation int fac(int x) { int y = 1; while (x > 1) { y = y*x; x = x-1; } return y; } ed4 int y = 1; ed1 bb1 if (x > 1) bb2 ed2 y = y*x; x = x-1; bb3 return y; bb4 ed3

21 Program Representation Prolog-encoded IR: int y = 1; bb(bb1). bb(bb2). edge(ed1). edge_src(ed1,bb1). edge_dest(ed1,bb2).... ed4 ed1 bb1 if (x > 1) bb2 ed2 y = y*x; x = x-1; bb3 return y; bb4 ed3

22 Program Representation Prolog-encoded IR: int y = 1; bb(bb1). edge_src(ed1,bb1). edge_dest(ed1,bb2).... ed4 ed1 bb1 if (x > 1) bb2 ed2 ed3 y = y*x; x = x-1; bb3 return y; bb4 Feature vector: # basic blocks = 4 # edges = 4 # conditional branches = 1...

23 Program Representation Prolog-encoded IR: int y = 1; bb(bb1). edge_src(ed1,bb1). edge_dest(ed1,bb2).... ed4 ed1 bb1 if (x > 1) bb2 ed2 ed3 y = y*x; x = x-1; bb3 return y; bb4

24 22 cbench programs Milepost Intermediate Representation Random Iterative Compilation Good configurations Feature Vector (56 features) Machine Learning Flatten program into lossy feature vector {ft1=3, ft2=7,, ft56=6} Models (1NN, decision tree)

25 22 cbench programs Milepost Intermediate Representation Random Iterative Compilation Good configurations Feature Vector (56 features) Machine Learning Identify good configurations (within x% of the best config) Models (1NN, decision tree)

26 22 cbench programs Milepost Intermediate Representation Random Iterative Compilation Good configurations Feature Vector (56 features) Machine Learning Output models: Feature vector predicted configuration Models (1NN, decision tree)

27 60 BEEBS programs Our Approach Intermediate Representation Feature Vector (56 features) Random Iterative Compilation B Good/bad flags E ILP H Rule-based model

28 60 BEEBS programs Our Approach Intermediate Representation Random Iterative Compilation B Good/bad flags E ILP Bristol/Embecosm Embedded Benchmark Suite H (BEEBS): more diverse than cbench ~3x as many programs Rule-based model

29 60 BEEBS programs Our Approach Intermediate Representation Random Iterative Compilation B Good/bad flags E ILP Background knowledge B: edge_src(prog1,func1,bb1,ed1). edge_dest(prog1,func1,bb2,ed1). H H: Rule-based rule-based model

30 60 BEEBS programs Our Approach Intermediate Representation Random Iterative Compilation B Good/bad flags E ILP Background knowledge B: edge_src(prog1,func1,bb1,ed1). edge_dest(prog1,func1,bb2,ed1). bb_stmt_f(prog1,func1,bb2,st4). H H: Rule-based rule-based model

31 60 BEEBS programs Our Approach Intermediate Representation Random Iterative Compilation B Good/bad flags E ILP Background knowledge B: edge_src(prog1,func1,bb1,ed1). H: Rule-based rule-based edge_dest(prog1,func1,bb2,ed1). model bb_stmt_f(prog1,func1,bb2,st4). stmt_code(prog1,func1,st4,gimple_cond). H

32 60 BEEBS programs Our Approach Intermediate Representation Random Iterative Compilation B Good/bad flags E ILP H Examples E: badflag(prog1,'-flag-x'). Rule-based model

33 60 BEEBS programs Our Approach Intermediate Representation Random Iterative Compilation B Good/bad flags E ILP H Examples E: badflag(prog1,'-flag-x'). :- badflag(prog2,'-flag-x'). Rule-based model

34 60 BEEBS programs Our Approach Intermediate Representation Random Iterative Compilation B Good/bad flags E ILP H Rule-based Hypotheses H: model badflag(prog,'-flag-x') :- stmt_code(prog,func,stmt,gimple_cond).

35 Results

36 Results

37 Learned Hypotheses: Facts 19 flags always off (bad) 33 flags always on (good or indifferent)

38 Learned Hypotheses: Facts 19 flags always off (bad) 5 disabled at O3 33 flags always on (good or indifferent) 29 enabled at O3

39 Learned Hypotheses: Rules badflag(p,'-fguess-branch-probability') :- stmt_code(p,f,s,gimple_cond), expr_ge2(p,f,real_type).

40 Learned Hypotheses: Rules badflag(p,'-fguess-branch-probability') :- stmt_code(p,f,s,gimple_cond), expr_ge2(p,f,real_type). Disable -fguess-branch-probability if function F of program P has a conditional statement S and at least two statements with real number expressions

41 Learned Hypotheses: Rules badflag(p,'-fguess-branch-probability') :- stmt_code(p,f,s,gimple_cond), expr_ge2(p,f,real_type). Disable -fguess-branch-probability if function F of program P has a conditional statement S and at least two statements with real number expressions Compiler heuristic was wrong for functions with real number expressions

42 Learned Hypotheses: Rules badflag(p,'-fschedule-insns2') :- expr_ge2(p,f,array ref).

43 Learned Hypotheses: Rules badflag(p,'-fschedule-insns2') :- expr_ge2(p,f,array ref). Disable -fschedule-insns2 if function F of program P contains at least two statements that reference arrays

44 Learned Hypotheses: Rules badflag(p,'-fschedule-insns2') :- expr_ge2(p,f,array ref). Disable -fschedule-insns2 if function F of program P contains at least two statements that reference arrays New feature - no mention of arrays in feature vector

45 Hybrid Approach 8% average improvement vs O3 (Up to 50% improvement)

46 Hybrid Approach Predictions virtually instantaneous thanks to predictive model 8% average improvement vs O3 (Up to 50% improvement)

47 Ideas for Improving ILP Approach Improve flag examples Predict whole configurations (retain dependencies) Improve background knowledge

48 Compiler Tuning Methods O1 O2 O3 Standard optimisation levels BEST EXECUTION TIME Milepost ILP Hybrid Predictive Methods Random Search Iterative Compilation Exhaustive Search (infeasible) SEARCH TIME

49 Compiler Tuning Methods O1 O2 O3 Standard optimisation levels BEST EXECUTION TIME Milepost ILP Hybrid Predictive Methods Combined Elimination Random Search Iterative Compilation Exhaustive Search (infeasible) SEARCH TIME

50 Combined Elimination (CE) Introduced by Pan et al Hill-climbing & iterative compilation Starts with all flags enabled Prioritises removal of current worst flag

51 Combined Elimination (CE) Introduced by Pan et al Hill-climbing & iterative compilation Starts with all flags enabled Prioritises removal of current worst flag My evaluation 3 times as many flags and benchmarks Compares to random iterative compilation Tested 3 initial configurations: CE = All flags on CE-O3 = O3 CE-random = Best random configuration

52 Combined Elimination (CE) 17% average possible improvement vs O3

53 Combined Elimination (CE) CE outperforms random after 108 configs

54 Combined Elimination (CE) CE outperforms random after 108 configs

55 Downsampling Configuration Sets Iterative compilation (e.g configs) Reduced Set (e.g. 15 configs) Small set of configs that cover a range of programs

56 Downsampling Configuration Sets

57 Ideas for Improving Background Knowledge Continue developing extra rules to help generalise IR Milepost IR very large (60 lines of code 3000 facts) Other IRs to investigate LLVM IR Constrained Horn Clauses (e.g. SeaHorn)

58 Publications Blackmore, C., Ray, O., Eder, K.: A logic programming approach to predict effective compiler settings for embedded software. Theory and Practice of Logic Programming 15(4-5), (2015). Blackmore, C., Ray, O., Kull, M., et al.: Reframing of Classification and Regression Tasks for Predicting the Effects of Compiler Settings on Multiple Embedded Systems. 2nd International Workshop on Learning over Multiple Contexts (2015).

59 Long-Term Plan Compiler tuning for larger scale software Function-level optimisation Interesting energy cases Improve ILP approach Create tool for software engineers Iterative and predictive approaches

60 Conclusions ILP allows learning directly from IR Relational IR more expressive than feature vector Predictive approaches likely to be most beneficial at function-level Approach applicable to any metric that: can be measured AND is influenced by the compiler

61 Any questions?

An Evaluation of Autotuning Techniques for the Compiler Optimization Problems

An Evaluation of Autotuning Techniques for the Compiler Optimization Problems Amir Hossein Ashouri, Gianluca Palermo and Cristina Silvano Politecnico di Milano, Milan, Italy {amirhossein.ashouri,ginaluca.palermo,cristina.silvano}@polimi.it