Predic've Modeling in a Polyhedral Op'miza'on Space

Size: px

Start display at page:

Download "Predic've Modeling in a Polyhedral Op'miza'on Space"

Garry Matthews
5 years ago
Views:

1 Predic've Modeling in a Polyhedral Op'miza'on Space Eunjung EJ Park 1, Louis- Noël Pouchet 2, John Cavazos 1, Albert Cohen 3, and P. Sadayappan 2 1 University of Delaware 2 The Ohio State University 3 ALCHEMY group, INRIA Saclay - Île- de- France April 5 th, 2011 IEEE/ACM Symposium on Code Genera'on and Op'miza'on Chamonix, France

2 4/7/11 Predic've Modeling in a Polyhedral Op'miza'on Space 1 Do you want good performance? Op'mizing loops is crucial! Difficulty on finding good loop op'miza'ons Complex interplay between hardware resources Conflicts between op'miza'on strategies Large compiler op'miza'on space Any Solu'ons? Using itera've compila'on Using polyhedral framework

3 4/7/11 Predic've Modeling in a Polyhedral Op'miza'on Space 2 Why polyhedral framework? Advantages over standard compiler framework Good expressiveness Complex op'miza'on sequences Loop 'ling of imperfectly nested loop Very large op'miza'on space Our Solu?on Let s take advantages of Polyhedral Framework and Itera?ve Compila?on!

4 4/7/11 Predic've Modeling in a Polyhedral Op'miza'on Space 3 Contribu?ons of this paper Power of three approaches Expressiveness from polyhedral framework Performance predic'on by machine learning Itera've compila'on Build predic'on model for polyhedral op'miza'on primi'ves Reduce number of evalua'ons, achieve good performance

5 4/7/11 Predic've Modeling in a Polyhedral Op'miza'on Space 4 Polyhedral Compiler Framework Input Program PoCC: source- to- source itera've and model- driven polyhedral compiler Extract Polyhedral Representa'on Build Legal Transforma'ons Source Code Genera'on Transformed Program by Polyhedral Op'miza'ons

on Space Encode in fixed length vector T Fusion/Distribu'on Op'miza'ons Handled in each step -

6 4/7/11 Predic've Modeling in a Polyhedral Op'miza'on Space 5 Polyhedral Op?miza?on Space Encode in fixed length vector T Fusion/Distribu'on Op'miza'ons Handled in each step - Fusion/Distribu'on/Code Mo'on Compute a schedule (Pluto) - Auto Paralleliza'on, Tileability Modify schedule - Vectoriza'on 864 possible points Final Search Space Individually Tile Loops Process all other op'miza'ons - Tiling - OpenMP Pragma, Unrolling

7 4/7/11 Predic've Modeling in a Polyhedral Op'miza'on Space 6 Finding Best Speedup is Hard Performance Performance Improvement Imp. over Baseline Actual Low density of best points in actual curve Op'miza'on sequences applied (sorted by actual speedup)

8 4/7/11 Predic've Modeling in a Polyhedral Op'miza'on Space 7 Predic?on Model Model Descrip'on Building Model Model Construc'on Model Usage

9 4/7/11 Predic've Modeling in a Polyhedral Op'miza'on Space 8 Speedup Predic?on Model Performance Counter" Characteristics" Optimizations" Predicted speedup of optimizations over a baseline"

program where i=1 to N do done Train a model on N- 1 programs

10 4/7/11 Predic've Modeling in a Polyhedral Op'miza'on Space 9 Building Model Leave- one- out cross valida'on for i- th program where i=1 to N do done Train a model on N- 1 programs excluding i- th program Test the model on the i- th program lek out

11 4/7/11 Predic've Modeling in a Polyhedral Op'miza'on Space 10 Model Construc?on Collect dynamic behavior for a program Backend Compiler 1 Program ICC Performance Counters for the program Underlying Architecture Performance Counters include total number of - accesses and misses in all levels of cache and TLB - stall cycles - vector instruc'ons - issued instruc'ons

12 4/7/11 Predic've Modeling in a Polyhedral Op'miza'on Space 11 Model Construc?on Now do this for N- 1 programs Backend Compiler N- 1 Programs ICC Performance Counters for N- 1 programs Underlying Architecture

13 4/7/11 Predic've Modeling in a Polyhedral Op'miza'on Space 12 Model Construc?on Run transformed programs, and get speedups One Program Polyhedral Compiler Baseline (- fast) Backend Compiler ICC Transformed Polyhedral Programs for the one Program Primi've Sequences(T) Speedup over baseline Primi've sequences and their speedup over baseline

14 4/7/11 Predic've Modeling in a Polyhedral Op'miza'on Space 13 Model Construc?on Now do this for N- 1 programs Baseline (- fast) Backend Compiler ICC N- 1 Programs Primi've sequences (T) Speedup over baseline Polyhedral Compiler Transformed Programs from each of N- 1 Programs Primi've sequences and their speedup over baseline

15 4/7/11 Predic've Modeling in a Polyhedral Op'miza'on Space 14 Model Construc?on Performance Counters for a program Primi've sequences and their speedup over baseline Linear Regression / Support Vector Machine (SVM) Generated model for a given machine

16 4/7/11 Predic've Modeling in a Polyhedral Op'miza'on Space 15 Model Construc?on Performance Counters for N- 1 programs Primi've sequences and their speedup over baseline for N- 1 programs Linear Regression / Support Vector Machine (SVM) Generated model for a given machine

4/7/11 Predic've Modeling in a Polyhedral Op'miza'on Space 16 Using Model Nth Program (one lek out) Backend Compiler ICC - fast Performance Counters Primi've

17 4/7/11 Predic've Modeling in a Polyhedral Op'miza'on Space 16 Using Model Nth Program (one lek out) Backend Compiler ICC - fast Performance Counters Primi've Sequences Generated model for a given machine We can use predicted speedups in two ways - Non- itera've Fashion - Itera've Fashion Predicted speedup for each sequences

18 4/7/11 Predic've Modeling in a Polyhedral Op'miza'on Space 17 Experimental Configura?on Hardware configura'on Nehalem Intel Xeon E GHz, 2 sockets 4 cores, 16 H/W threads, L3 12MB R900/Dunnington Intel Xeon E GHz, 4 sockets 6 cores, 24 H/W threads, L3 12MB Sokware Configura'on Backend compiler: ICC Baseline: ICC with fast, single threaded (no auto- par) Machine learning framework: Weka v3.6.2 Linear Regression/SVM (SMOReg) Benchmark: PolyBench 2.0 (28 different kernels and applica'ons)

19 4/7/11 Predic've Modeling in a Polyhedral Op'miza'on Space 18 Experimental Analysis Performance of our model Our model versus random

20 4/7/11 Predic've Modeling in a Polyhedral Op'miza'on Space 19 Experiment 1: Performance of our model Shot = Evalua'ng op'miza'on sequence Non- itera've fashion (1- shot model) Itera've fashion 2- shot model 5- shot model Performance Counters Primi've Sequences Predicted speedup for each sequences

21 4/7/11 Predic've Modeling in a Polyhedral Op'miza'on Space 20 Experiment 1: Performance of our model Shot = Evalua'ng op'miza'on sequence Non- itera've fashion (1- shot model) Itera've fashion 2- shot model 5- shot model Sorted by Predicted Speedup top1

22 4/7/11 Predic've Modeling in a Polyhedral Op'miza'on Space 21 Experiment 1: Performance of our model Shot = Evalua'ng op'miza'on sequence Non- itera've fashion (1- shot model) Itera've fashion 2- shot model 5- shot model Sorted by Predicted Speedup top2

23 4/7/11 Predic've Modeling in a Polyhedral Op'miza'on Space 22 Experiment 1: Performance of our model Shot = Evalua'ng op'miza'on sequence Non- itera've fashion (1- shot model) Itera've fashion 2- shot model 5- shot model Sorted by Predicted Speedup top5

4/7/11 Predic've Modeling in a Polyhedral Op'miza'on Space 23 Experiment 1: Performance of our model Summary With 5 itera'ons, we reached more than 80% of the space op'mal!

24 4/7/11 Predic've Modeling in a Polyhedral Op'miza'on Space 23 Experiment 1: Performance of our model Summary With 5 itera'ons, we reached more than 80% of the space op'mal! 1- shot 2- shot 5- shot LR Nehalem SVM 3.16x 3.27x 3.16x 3.43x 3.50x 4.68x R900/Dunnington LR SVM 4.91x 3.77x 4.92x 4.91x 5.82x 6.60x Space Best vs. Poly vs. ICC Nehalem R900/Dunnington SpaceBest 6.10x 8.04x Poly ICC 2.13x 1.98x 4.48x 3.38x

5 5 Performance Improvement on Nehalem LR SVM 21.54 (Baseline: ICC 11.

25 4/7/11 Predic've Modeling in a Polyhedral Op'miza'on Space 24 Experiment 1: Nehalem 1- Shot Speedup over Baseline Performance Improvement on Nehalem LR SVM (Baseline: ICC fast) Some benchmarks improve performance drama'cally 1/3 of benchmarks decrease performance! On Average, LR- 3.16x, SVM- 3.27x PolyBench

26 4/7/11 Predic've Modeling in a Polyhedral Op'miza'on Space 25 Experiment 1: Nehalem 2- Shot 15 Performance Improvement on Nehalem (Baseline: ICC fast) LR SVM Speedup over Baseline Not a big difference from 1- shot model, On Average, LR- 3.16x, SVM- 3.43x PolyBench

4/7/11 Predic've Modeling in a Polyhedral Op'miza'on Space 26 Experiment 1: Nehalem 5- shot Speedup over Baseline 15 12.5 10 7.5 5 Performance Improvement on Nehalem 21.

27 4/7/11 Predic've Modeling in a Polyhedral Op'miza'on Space 26 Experiment 1: Nehalem 5- shot Speedup over Baseline Performance Improvement on Nehalem (Baseline: ICC fast) LR SVM 3 benchmarks s'll decrease performance in 5 itera'ons. Most benchmarks take benefit of using our model! On Average, LR- 3.50x, SVM- 4.68x PolyBench

28 4/7/11 Predic've Modeling in a Polyhedral Op'miza'on Space 27 Experiment 2: Our Model vs. random Random search discovers performance improvement 1- shot 2- shot 5- shot Nehalem Random Our Model 1.61x 3.27x 2.24x 3.43x 3.32x 4.68x R900/Dunnington Random Our Model 2.14x 4.91x 3.19x 4.92x 3.99x 6.60x However, our model performs significantly berer!

29 4/7/11 Predic've Modeling in a Polyhedral Op'miza'on Space 28 Conclusion Power of three approaches Expressiveness from polyhedral framework Performance predic'on by machine learning Itera've compila'on Build predic'on model for polyhedral op'miza'on primi'ves Reduce number of evalua'ons, achieve good performance More than 80% of search space op'mal performance in 5 itera'ons Machine- independent algorithm but machine- dependent result (trained specifically on the target machine) Performance portability is achieved

30 Predic've Modeling in a Polyhedral Op'miza'on Space Thank You!

Using Graph- Based Characteriza4on for Predic4ve Modeling of Vectorizable Loop Nests

Using Graph- Based Characteriza4on for Predic4ve Modeling of Vectorizable Loop Nests William Killian PhD Prelimary Exam Presenta4on Department of Computer and Informa4on Science CommiIee John Cavazos and