Compiler Optimizations and Auto-tuning. Amir H. Ashouri Politecnico Di Milano -2014

Size: px

Start display at page:

Download "Compiler Optimizations and Auto-tuning. Amir H. Ashouri Politecnico Di Milano -2014"

Allen Rich
5 years ago
Views:

1 Compiler Optimizations and Auto-tuning Amir H. Ashouri Politecnico Di Milano -2014

2 Compilation Compilation = Translation One piece of code has : Around 10 ^ 80 different translations Different platforms Different optimizations Different compilers 2

3 Compiler Transformation -loop-tile -loop-unroll 3

4 Problems Too Many factors involve Big exploration space Multi-objective non-linear optimization Code Size vs Power Consumption vs performance Indefinite phase ordering of transformation Application-Hardware specific Ambiguity in effects of transformations 4

5 Challenges Huge exploration space By having only 24 compiler passes 2²⁴ ~ 16 million different combinations 24! ~ 10 ²³ different permutations with Phase Ordering Sequence length of 10 -> 24 unique sequences with Phase Ordering Multi-objective formulation Pattern-LESS behaviors! (enable/disable) Miscellaneous external effects on optimizations (X86-64 platforms) 5

6 Compiler Transformation Types Types Local/Global Machine dependent/ Independent Loop-based Optimizations Machine Code opt Peephole Optimizations General Techniques inlining Data/array padding 6

7 How to Tackle Iterative Compilation vs Static Approach Domain-Based Analyze Embedded Systems Architecture Specific Analytic Model for compilation Virtual execution (for basic blocks) Co-design Space Explorations Micro-architectural Design Space Linear Regressors, Neural networks, Spline Functions Re-targetable Compilers LLVM 7

8 Static Vs Iterative Compilation Iterative Compilation Many variant of the source program are generated and the best one by actually profiling these variant on the target system How to improve? Using machine learning approaches Heuristics 8

9 Methods Machine learning Techniques Unsupervised method find hidden structure in unlabeled data: Genetic algorithms Principal Component analysts K-mean clustering Feature extraction Supervised method inferring a function from labeled training data Decision Trees Random Forests 9

State of the Art (1) Before 2000 {Bodin, O'Boyle - 1998} Iterative Compilation in Non-linear Space Using an iterative algorithm to investigate a non-linear transformation Applying different

10 State of the Art (1) Before 2000 {Bodin, O'Boyle } Iterative Compilation in Non-linear Space Using an iterative algorithm to investigate a non-linear transformation Applying different transformation then evaluating the program features Using embedded system (VLIW- TM1000 as the target architecture Applying tiling, unrolling and padding Expressing iterative compilation as an program optimization technique 10

11 State of the Art (2) early 2000 {O'Boyle et al } iterative compilation Motivation -> introducing more complex and modern architecture capable of outof-order-execution, Deep memory Hierarchy, High Issue Width Therefore simplified machine models were not the answer anymore Iterative Compilation Source-to-source re-structurer Towards cache exploitation LoopTiling, Unroll-and-Jam Using search algorithms inside Driver: Genetic Algorithm, Simulated Annealing, Grid Search, etc 11

12 State of the Art (3) early 2000 {O'Boyle et al } Combined Selection of tile size and unroll of iterative compilation Extended version of their previous work in 2000 Using loop-tiling and unroll to exploit locality and ILP Comparisons and Definitions for Speedups, execution time improvements, etc Simultaneously select tile size and unroll factors Generates many transformed version of the program and search for the best compiling, executing time Search algorithms, square tile and rectangular one 12

13 State of the Art (4) early 2000 {G.G Fursin, O'Boyle et al } Evaluating iterative compilation Drawbacks of previous works: Focused solely on compute-intensive kernels having a small optimization space Nobility: Demonstrating the usage of iterative compilation in outperforming the static approaches Significant reductions in execution-time using arraypadding and unrolling 13

State of the Art (5) Recent Years {G.G Fursin, O'Boyle et al - 2009} Portable Compiler Optimization.

14 State of the Art (5) Recent Years {G.G Fursin, O'Boyle et al } Portable Compiler Optimization... Portable optimizing compiler : Off-line learning of the model Maps micro-architecture description plus hardware counters using a single run Best compiler optimization passes (top 5%) Train-dataSet, Fit to the Model and deployment 14

State of the Art (6) Recent Years {G.G Fursin, O'Boyle et al - 2010} Workload characterization.

15 State of the Art (6) Recent Years {G.G Fursin, O'Boyle et al } Workload characterization... Characterizing some of the domain specific compilers Using supervised learning decision trees Pinpointing the stress points and bottlenecks Produced human interpretable results 15

16 State of the Art (7) Recent Years {Suresh Purini et al } Finding Good Compiler Sequences covering... Constructed a small set of good sequences such that for every program class there exist a near optimal optimization sequence in the good sequence set. Using a down-sampling technique to reduce the large opt Program characterization method Iterative compilation and the machine learning& LLVM

17 State of the Art (8) Recent Years {Ashouri A. H. et al 2013} HW/SW co-explorations on VLIW 17

18 State of the Art (8)(2) Recent Years Selection Flow Random DoE exploration(30k) Pareto filtering (1k) K-Mean clustering (4-class) Optimization strategy 4 champion micro-architecture 18

19 State of the Art (7)(3) Recent Years Compiler Analysis Flow: Receives 4 customized VLIW architectures Uses Random DoE Effect Compilation: LLVM-Opt (C-2-C) VEX Roofline map with Metrics Statistical Analysis Kruskal-Wallis 19

20 State of the Art (8)(3) Recent Years Final Clustering 20

21 State of the Art (8)(4) Recent Years GSM Achieved Speed-up (w.r.t LLVM-O1) 21

22 Recent Challenges & Future Works Addressing : Parallel Programming Multi-core Era CPU/GPU computing Proebsting's law Even if we assume that the beginning of useful compiler optimization research began in the mid 1960's, the uniform performance improvement on integer intensive codes due to compiler optimization is still 3.6% per year This lies in stark contrast to the 60%per year performance we CAN expect from hardware due to Moore's law 22

23 Recent Challenges & Future Works (2) Courtesy of M. O'Boyle'

24 Thank You! Amir H. Ashouri 24

An Evaluation of Autotuning Techniques for the Compiler Optimization Problems

An Evaluation of Autotuning Techniques for the Compiler Optimization Problems Amir Hossein Ashouri, Gianluca Palermo and Cristina Silvano Politecnico di Milano, Milan, Italy {amirhossein.ashouri,ginaluca.palermo,cristina.silvano}@polimi.it