Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time
|
|
- Lester Carpenter
- 5 years ago
- Views:
Transcription
1 Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time Louis-Noël Pouchet, Cédric Bastoul, Albert Cohen and Nicolas Vasilache ALCHEMY, INRIA Futurs / University of Paris-Sud XI March 12, 2007 Fifth International Symposium on Code Generation and Optimization San Jose, California
2 Outline: Outline Context of this study: Focus on Loop Nest Optimization for regular loops Automatic method for parallelism extraction / loop transformation Combine iterative methods with the power of the polyhedral model Solution independent of the compiler and the target machine Our contribution: Search space construction 1 point in the space 1 distinct legal program version suitable for various exploration methods Performance 99% of the best speedup attained within 20 runs of a dedicated heuristic wall clock optimal transformation discoverable on small kernels 2
3 Scheduling in the Polyhedral Model: A Motivating Example One-Dimensional Scheduling Original Schedule for (i=0; i<n; ++i) {. S1(i);. for (j=0; j<n; ++j).. S2(i,j); } { θ S1 = i θ S2 = i for (i=0; i<n; ++i) {. S1(i);. for (j=0; j<n; ++j).. S2(i,j); } Specify the outer-most loop only Initial outer-most loop is i 3
4 Scheduling in the Polyhedral Model: A Motivating Example One-Dimensional Scheduling Distribute loops for (i=0; i<n; ++i) {. S1(i);. for (j=0; j<n; ++j).. S2(i,j); } { θ S1 = i θ S2 = i+n for (i=0; i<n; ++i). S1(i); for (i=n; i<2*n; ++i). for (j=0; j<n; ++j).. S2(i-n,j); Specify the outer-most loop only All instances of S1 are executed before the first S2 instance 3
5 Scheduling in the Polyhedral Model: A Motivating Example One-Dimensional Scheduling Distribute loops + Interchange loops for S2 for (i=0; i<n; ++i) {. S1(i);. for (j=0; j<n; ++j).. S2(i,j); } { θ S1 = i θ S2 = j + n for (i=0; i<n; ++i). S1(i); for (j=n; j<2*n; ++j). for (i=0; i<n; ++i).. S2(i,j-n); Specify the outer-most loop only The outer-most loop for S2 becomes j 3
6 Scheduling in the Polyhedral Model: A Motivating Example One-Dimensional Scheduling Distribute loops + Interchange loops for S2 for (i=0; i<n; ++i) {. S1(i);. for (j=0; j<n; ++j).. S2(i,j); } { θ S1 = i θ S2 = j + n for (i=0; i<n; ++i). S1(i); for (j=n; j<2*n; ++j). for (i=0; i<n; ++i).. S2(i,j-n); Transformation reversal skewing interchange peeling shifting fusion distribution Description Changes the direction in which a loop traverses its iteration range Makes the bounds of a given loop depend on an outer loop counter Exchanges two loops in a perfectly nested loop, a.k.a. permutation Extracts one iteration of a given loop Allows to reorder loops Fuses two loops, a.k.a. jamming Splits a single loop nest into many, a.k.a. fission or splitting 3
7 Scheduling in the Polyhedral Model: A Motivating Example One-Dimensional Scheduling for (i=0; i<n; ++i) {. S1(i);. for (j=0; j<n; ++j).. S2(i,j); } A schedule is an affine function of the iteration vector and the parameters θ S1 ( x S1 ) = t 1S1.i S1 + t 2S1.n + t 3S1.1 θ S2 ( x S2 ) = t 1S2.i S2 + t 2S2.j S2 + t 3S2.n + t 4S2.1 4
8 Scheduling in the Polyhedral Model: A Motivating Example One-Dimensional Scheduling for (i=0; i<n; ++i) {. s[i] = 0;. for (j=0; j<n; ++j).. s[i] = s[i]+a[i][j]*x[j]; } A schedule is an affine function of the iteration vector and the parameters θ S1 ( x S1 ) = t 1S1.i S1 + t 2S1.n + t 3S1.1 θ S2 ( x S2 ) = t 1S2.i S2 + t 2S2.j S2 + t 3S2.n + t 4S2.1 For 1 t 1, there are 3 7 = 2187 possible schedules 4
9 Scheduling in the Polyhedral Model: A Motivating Example One-Dimensional Scheduling for (i=0; i<n; ++i) {. s[i] = 0;. for (j=0; j<n; ++j).. s[i] = s[i]+a[i][j]*x[j]; } A schedule is an affine function of the iteration vector and the parameters θ S1 ( x S1 ) = t 1S1.i S1 + t 2S1.n + t 3S1.1 θ S2 ( x S2 ) = t 1S2.i S2 + t 2S2.j S2 + t 3S2.n + t 4S2.1 For 1 t 1, there are 3 7 = 2187 possible schedules But only 129 legal distinct schedules 4
10 Scheduling in the Polyhedral Model: Overview Our Objective 1 Search space construction Efficiently construct a space of all legal, distinct affine schedules 5
11 Scheduling in the Polyhedral Model: Overview Our Objective 1 Search space construction Efficiently construct a space of all legal, distinct affine schedules matmult locality fir h264 crout i-bounds 1,1 1,1 0,1 1,1 3,3 c-bounds 1,1 1,1 0,3 0,4 3,3 #Sched
12 Scheduling in the Polyhedral Model: Overview Our Objective 1 Search space construction Efficiently construct a space of all legal, distinct affine schedules matmult locality fir h264 crout i-bounds 1,1 1,1 0,1 1,1 3,3 c-bounds 1,1 1,1 0,3 0,4 3,3 #Sched #Legal
13 Scheduling in the Polyhedral Model: Overview Our Objective 1 Search space construction Efficiently construct a space of all legal, distinct affine schedules matmult locality fir h264 crout i-bounds 1,1 1,1 0,1 1,1 3,3 c-bounds 1,1 1,1 0,3 0,4 3,3 #Sched #Legal Rely on the polyhedral model and Integer Linear Programming to guarantee completeness and correctness of the space properties 5
14 Scheduling in the Polyhedral Model: Overview Our Objective 1 Search space construction Efficiently construct a space of all legal, distinct affine schedules matmult locality fir h264 crout i-bounds 1,1 1,1 0,1 1,1 3,3 c-bounds 1,1 1,1 0,3 0,4 3,3 #Sched #Legal Rely on the polyhedral model and Integer Linear Programming to guarantee completeness and correctness of the space properties Search space will emcoumpass unique, distinct compositions of reversal, skewing, interchange, fusion, peeling, shifting, distribution 5
15 Scheduling in the Polyhedral Model: Overview Our Objective 1 Search space construction Efficiently construct a space of all legal, distinct affine schedules matmult locality fir h264 crout i-bounds 1,1 1,1 0,1 1,1 3,3 c-bounds 1,1 1,1 0,3 0,4 3,3 #Sched #Legal Rely on the polyhedral model and Integer Linear Programming to guarantee completeness and correctness of the space properties Search space will emcoumpass unique, distinct compositions of reversal, skewing, interchange, fusion, peeling, shifting, distribution 2 Search space exploration Perform exhaustive scan to discover wall clock optimal schedule, and evidences of intricacy of the best transformation Build an efficient heuristic to accelerate the space traversal 5
16 Search Space Construction: Preliminaries Polyhedral Representation of Programs Static Control Parts Loops have affine control only 6
17 Search Space Construction: Preliminaries Polyhedral Representation of Programs Static Control Parts Loops have affine control only Iteration domain: represented as integer polyhedra for (i=1; i<=n; ++i). for (j=1; j<=n; ++j).. if (i<=n-j+2)... s[i] =... D S1 = i j n 1 0 6
18 Search Space Construction: Preliminaries Polyhedral Representation of Programs Static Control Parts Loops have affine control only Iteration domain: represented as integer polyhedra Memory accesses: static references, represented as affine functions of x S and p for (i=0; i<n; ++i) {. s[i] = 0;. for (j=0; j<n; ++j).. s[i] = s[i]+a[i][j]*x[j]; } f s ( x S2 ) = [ ]. [ f a ( x S2 ) = ]. f x ( x S2 ) = [ ]. x S2 n 1 x S2 n 1 x S2 n 1 6
19 Search Space Construction: Preliminaries Polyhedral Representation of Programs Static Control Parts Loops have affine control only Iteration domain: represented as integer polyhedra Memory accesses: static references, represented as affine functions of x S and p Data dependence between S1 and S2: a subset of the Cartesian product of D S1 and D S2 (exact analysis) for (i=1; i<=3; ++i) {. s[i] = 0;. for (j=1; j<=3; ++j).. s[i] = s[i] + 1; } D S1δS2 : i S1. i S2 j S2 1 0 = 0 S1 iterations S2 iterations i 6
20 Search Space Construction: Preliminaries Polyhedral Representation of Programs Static Control Parts Loops have affine control only Iteration domain: represented as integer polyhedra Memory accesses: static references, represented as affine functions of x S and p Data dependence between S1 and S2: a subset of the Cartesian product of D S1 and D S2 (exact analysis) Reduced dependence graph labeled by dependence polyhedra 6
21 Search Space Construction: Way to Go Space Construction Affine Schedules Legal Distinct Schedules 7
22 Search Space Construction: Way to Go Space Construction Affine Schedules Legal Distinct Schedules - Causality condition Property (Causality condition for schedules) Given RδS, θ R and θ S are legal iff for each pair of instances in dependence: θ R ( x R ) < θ S ( x S ) Equivalently: R,S = θ S ( x S ) θ R ( x R ) 1 0 7
23 Search Space Construction: Way to Go Space Construction Affine Schedules Legal Distinct Schedules - Causality condition - Farkas Lemma Lemma (Affine form of Farkas lemma) Let D be a nonempty polyhedron defined by A x + b 0. Then any affine function f ( x) is non-negative everywhere in D iff it is a positive affine combination: f ( x) = λ 0 + λ T (A x + b), with λ 0 0 and λ 0. λ 0 and λ T are called the Farkas multipliers. 7
24 Search Space Construction: Way to Go Space Construction Affine Schedules Valid Farkas Multipliers Legal Distinct Schedules - Causality condition - Farkas Lemma 7
25 Search Space Construction: Way to Go Space Construction Affine Schedules Valid Farkas Multipliers Many to one Legal Distinct Schedules - Causality condition - Farkas Lemma 7
26 Search Space Construction: Way to Go Space Construction Affine Schedules Valid Farkas Multipliers Legal Distinct Schedules - Causality condition - Farkas Lemma - Identification ( ( ) ) θ S ( x S ) θ R ( x R ) 1 = λ 0 + λ T xr D R,S + d R,S 0 x S D RδS i R : λ D1,1 λ D1,2 + λ D1,7 i S : λ D1,3 λ D1,4 λ D1,7 j S : λ D1,5 λ D1,6 n : λ D1,2 + λ D1,4 + λ D1,6 1 : λ D1,0 7
27 Search Space Construction: Way to Go Space Construction Affine Schedules Valid Farkas Multipliers Legal Distinct Schedules - Causality condition - Farkas Lemma - Identification ( ( ) ) θ S ( x S ) θ R ( x R ) 1 = λ 0 + λ T xr D R,S + d R,S 0 x S D RδS i R : t 1R = λ D1,1 λ D1,2 + λ D1,7 i S : t 1S = λ D1,3 λ D1,4 λ D1,7 j S : t 2S = λ D1,5 λ D1,6 n : t 3S t 2R = λ D1,2 + λ D1,4 + λ D1,6 1 : t 4S t 3R 1 = λ D1,0 7
28 Search Space Construction: Way to Go Space Construction Affine Schedules Valid Farkas Multipliers Legal Distinct Schedules - Causality condition - Farkas Lemma - Identification - Projection Solve the constraint system Use (optimized) Fourier-Motzkin projection algorithm Reduce redundancy Detect implicit equalities 7
29 Search Space Construction: Way to Go Space Construction Affine Schedules Valid Farkas Multipliers Valid Transformation Coefficients Legal Distinct Schedules - Causality condition - Farkas Lemma - Identification - Projection 7
30 Search Space Construction: Way to Go Space Construction Affine Schedules Valid Farkas Multipliers Valid Transformation Coefficients Bijection Legal Distinct Schedules - Causality condition - Farkas Lemma - Identification - Projection One point in the space one set of legal schedules w.r.t. the dependence 7
31 Search Space Construction: Way to Go Overview Algorithm Add constraints obtained for each dependence Bound the space Search space: set of linear constraints on the schedule coefficients (i.e. Z-polytope) To each integral point in the space corresponds a distinct program version where the semantics is preserved Benchmark i-bounds #Sched #Legal Time matmult 1, locality 1, fir 0, h264 1, crout 3,
32 Search Space Exploration: Framework for Iterative Optimization Workflow SCoP representation Polyhedral computing libraries Code generation PIPLib PolyLib CLooG Iterative compilation and run of base source code with transformed SCoP C compilable code Source Code Static Analysis Space Construction Space Exploration Kernel Generation Unit Generation Compilation Run Target Code Polyhedral representation of SCoP Bounded search space Feedback from hardware counter(s) CLooG: PiPLib: PolyLib: 9
33 Search Space Exploration: Exhaustive Scan Performance Distribution [1/2] 2e+09 matmult 4e+09 locality 1.8e e e+09 3e+09 Cycles 1.4e e+09 Cycles 2.5e+09 2e+09 1e+09 original 1.5e+09 8e+08 6e Transformation identifier 1e+09 original 5e Transformation identifier Figure: Performance distribution for matmult and locality 10
34 Search Space Exploration: Exhaustive Scan Performance Distribution [2/2] 1.42e+09 crout 1.34e+09 crout 1.4e e e e e e+09 Cycles 1.34e e+09 Cycles 1.3e e e e+09 original 1.28e e+09 original 1.26e Transformation identifier 1.26e Transformation identifier (a) GCC -O3 (b) ICC -fast Figure: The effect of the compiler 11
35 Search Space Exploration: Exhaustive Scan Performance Comparison Figure: Best Version vs Original 12
36 Search Space Exploration: Heuristic Scan Heuristic Scan Propose a decoupling heuristic: The general form of the schedule is embedded in the iterator coefficients Decouple the schedule: θ S ( x S ) = ( ı p c) x S n 1 13
37 Search Space Exploration: Heuristic Scan Heuristic Scan Propose a decoupling heuristic: The general form of the schedule is embedded in the iterator coefficients Decouple the schedule: θ S ( x S ) = ( ı p c) x S n 1 Parameters and constant coefficients can be seen as a refinement 13
38 Search Space Exploration: Heuristic Scan Heuristic Scan Propose a decoupling heuristic: The general form of the schedule is embedded in the iterator coefficients Decouple the schedule: θ S ( x S ) = ( ı p c) x S n 1 Parameters and constant coefficients can be seen as a refinement Adressing scalability to larger SCoPs: 1 impose a static or dynamic limit to the number of runs (limit to the ı part) 2 replace an exhaustive enumeration of the ı combinations by a limited set of random draws in the ı space. 13
39 Search Space Exploration: Heuristic Scan Results locality matmult mvt Maximum speedup achieved (in %) Decoupling Maximum speedup achieved (in %) Decoupling Maximum speedup achieved (in %) Decoupling 40 Random Random Random Runs Runs Runs Figure: Comparison between random and decoupling heuristics locality matmult matvecttransp 4e+09 2e e e e e+09 Cycles 3e e+09 2e e+09 1e+09 original Cycles 1.6e e e+09 1e+09 8e+08 original Cycles (M) 1.1e+09 1e+09 9e+08 8e+08 7e+08 6e+08 5e+08 Original 5e e e Transformation identifier Transformation identifier Transfo. ID 14
40 Conclusion: Conclusion Optimizing and / or Enabling transformation framework on top of the compiler Encouraging speedups, fast heuristic convergence On small kernels, optimal transformation can be discovered 15
41 Conclusion: Conclusion Optimizing and / or Enabling transformation framework on top of the compiler Encouraging speedups, fast heuristic convergence On small kernels, optimal transformation can be discovered 15
42 Conclusion: Conclusion Optimizing and / or Enabling transformation framework on top of the compiler Encouraging speedups, fast heuristic convergence On small kernels, optimal transformation can be discovered 15
43 Conclusion: Conclusion Optimizing and / or Enabling transformation framework on top of the compiler Encouraging speedups, fast heuristic convergence On small kernels, optimal transformation can be discovered Ongoing and future work: Couple with state-of-the-art feedback-directed iterative methods Part II: multidimensional schedules Integrate into GCC GRAPHITE branch 15
44 Conclusion: Conclusion Optimizing and / or Enabling transformation framework on top of the compiler Encouraging speedups, fast heuristic convergence On small kernels, optimal transformation can be discovered Ongoing and future work: Couple with state-of-the-art feedback-directed iterative methods Part II: multidimensional schedules Integrate into GCC GRAPHITE branch 15
45 Conclusion: Conclusion Optimizing and / or Enabling transformation framework on top of the compiler Encouraging speedups, fast heuristic convergence On small kernels, optimal transformation can be discovered Ongoing and future work: Couple with state-of-the-art feedback-directed iterative methods Part II: multidimensional schedules Integrate into GCC GRAPHITE branch 15
46 Questions: 16
47 Questions: A Transformation Example Intricacy of the Transformed Code Optimal Transformation for locality, GCC 4 -O3, P4 Xeon S1: B[j] = A[j] S2: C[j] = A[j + N] for (i=0;i<=m;i++) { for (j=0;j<=m;j++) { S1(i,j); S2(i,j); } } for (c1=-n;c1<=min(-2,m-n);c1++) for (j=0;j<=m;j++) S1(c1+N,j); for (c1=-1;c1<=m-n;c1++) { for (j=0;j<=m;j++) S2(c1+1,j); for (j=0;j<=m;j++) S1(c1+N,j); } for (c1=max(m-n+1,-1);c1<=m-1;c1++) for (j=0;j<=m;j++) S2(c1+1,j); 19.4% speedup, without vectorization 17
Louis-Noël Pouchet
Internship Report, jointly EPITA CSI 2006 and UPS Master 2 (March - September 2006) Louis-Noël Pouchet WHEN ITERATIVE OPTIMIZATION MEETS THE POLYHEDRAL MODEL: ONE-DIMENSIONAL
More informationIterative Optimization in the Polyhedral Model
Iterative Optimization in the Polyhedral Model Louis-Noël Pouchet, INRIA Saclay / University of Paris-Sud 11, France January 18th, 2010 Ph.D Defense Introduction: A Brief History... A Quick look backward:
More informationPolyhedral Compilation Foundations
Polyhedral Compilation Foundations Louis-Noël Pouchet pouchet@cse.ohio-state.edu Dept. of Computer Science and Engineering, the Ohio State University Feb 22, 2010 888.11, Class #5 Introduction: Polyhedral
More informationCS671 Parallel Programming in the Many-Core Era
1 CS671 Parallel Programming in the Many-Core Era Polyhedral Framework for Compilation: Polyhedral Model Representation, Data Dependence Analysis, Scheduling and Data Locality Optimizations December 3,
More informationThe Polyhedral Model Is More Widely Applicable Than You Think
The Polyhedral Model Is More Widely Applicable Than You Think Mohamed-Walid Benabderrahmane 1 Louis-Noël Pouchet 1,2 Albert Cohen 1 Cédric Bastoul 1 1 ALCHEMY group, INRIA Saclay / University of Paris-Sud
More informationPolyhedral Compilation Foundations
Polyhedral Compilation Foundations Louis-Noël Pouchet pouchet@cse.ohio-state.edu Dept. of Computer Science and Engineering, the Ohio State University Feb 15, 2010 888.11, Class #4 Introduction: Polyhedral
More informationIterative Optimization in the Polyhedral Model: Part II, Multidimensional Time
Iterative Optimization in the Polyhedral Model: Part II, Multidimensional Time Louis-Noël Pouchet 1,3, Cédric Bastoul 1,3, Albert Cohen 1,3 and John Cavazos 2,3 1 ALCHEMY Group, INRIA Futurs and Paris-Sud
More informationProgram transformations and optimizations in the polyhedral framework
Program transformations and optimizations in the polyhedral framework Louis-Noël Pouchet Dept. of Computer Science UCLA May 14, 2013 1st Polyhedral Spring School St-Germain au Mont d or, France : Overview
More informationThe Polyhedral Compilation Framework
The Polyhedral Compilation Framework Louis-Noël Pouchet Dept. of Computer Science and Engineering Ohio State University pouchet@cse.ohio-state.edu October 20, 2011 Introduction: Overview of Today s Lecture
More informationLoop Nest Optimizer of GCC. Sebastian Pop. Avgust, 2006
Loop Nest Optimizer of GCC CRI / Ecole des mines de Paris Avgust, 26 Architecture of GCC and Loop Nest Optimizer C C++ Java F95 Ada GENERIC GIMPLE Analyses aliasing data dependences number of iterations
More informationUNIVERSITÉ DE PARIS-SUD 11 U.F.R. SCIENTIFIQUE D ORSAY
UNIVERSITÉ DE PARIS-SUD 11 U.F.R. SCIENTIFIQUE D ORSAY In Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY Specialty: Computer Science Louis-Noël POUCHET Subject: ITERATIVE
More informationThe Polyhedral Model (Transformations)
The Polyhedral Model (Transformations) Announcements HW4 is due Wednesday February 22 th Project proposal is due NEXT Friday, extension so that example with tool is possible (see resources website for
More informationPolyhedral Compilation Foundations
Polyhedral Complaton Foundatons Lous-Noël Pouchet pouchet@cse.oho-state.edu Dept. of Computer Scence and Engneerng, the Oho State Unversty Feb 8, 200 888., Class # Introducton: Polyhedral Complaton Foundatons
More informationGRAPHITE: Polyhedral Analyses and Optimizations
GRAPHITE: Polyhedral Analyses and Optimizations for GCC Sebastian Pop 1 Albert Cohen 2 Cédric Bastoul 2 Sylvain Girbal 2 Georges-André Silber 1 Nicolas Vasilache 2 1 CRI, École des mines de Paris, Fontainebleau,
More informationUNIVERSITÉ DE PARIS-SUD 11 U.F.R. SCIENTIFIQUE D ORSAY
UNIVERSITÉ DE PARIS-SUD 11 U.F.R. SCIENTIFIQUE D ORSAY In Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY Specialty: Computer Science Louis-Noël POUCHET Subject: ITERATIVE
More informationLoop Transformations: Convexity, Pruning and Optimization
Loop Transformations: Convexity, Pruning and Optimization Louis-Noël Pouchet, Uday Bondhugula, Cédric Bastoul, Albert Cohen, Jagannathan Ramanujam, Ponnuswamy Sadayappan, Nicolas Vasilache To cite this
More informationParametric Multi-Level Tiling of Imperfectly Nested Loops*
Parametric Multi-Level Tiling of Imperfectly Nested Loops* Albert Hartono 1, Cedric Bastoul 2,3 Sriram Krishnamoorthy 4 J. Ramanujam 6 Muthu Baskaran 1 Albert Cohen 2 Boyana Norris 5 P. Sadayappan 1 1
More informationLoop Transformations: Convexity, Pruning and Optimization
Loop Transformations: Convexity, Pruning and Optimization Louis-Noël Pouchet The Ohio State University pouchet@cse.ohio-state.edu Uday Bondhugula IBM T.J. Watson Research Center ubondhug@us.ibm.com Cédric
More informationTiling: A Data Locality Optimizing Algorithm
Tiling: A Data Locality Optimizing Algorithm Previously Performance analysis of existing codes Data dependence analysis for detecting parallelism Specifying transformations using frameworks Today Usefulness
More informationTiling: A Data Locality Optimizing Algorithm
Tiling: A Data Locality Optimizing Algorithm Previously Unroll and Jam Homework PA3 is due Monday November 2nd Today Unroll and Jam is tiling Code generation for fixed-sized tiles Paper writing and critique
More informationReview. Loop Fusion Example
Review Distance vectors Concisely represent dependences in loops (i.e., in iteration spaces) Dictate what transformations are legal e.g., Permutation and parallelization Legality A dependence vector is
More informationLegal and impossible dependences
Transformations and Dependences 1 operations, column Fourier-Motzkin elimination us use these tools to determine (i) legality of permutation and Let generation of transformed code. (ii) Recall: Polyhedral
More informationAn Overview to. Polyhedral Model. Fangzhou Jiao
An Overview to Polyhedral Model Fangzhou Jiao Polyhedral Model A framework for performing loop transformation Loop representation: using polytopes to achieve fine-grain representation of program Loop transformation:
More informationComputing and Informatics, Vol. 36, 2017, , doi: /cai
Computing and Informatics, Vol. 36, 2017, 566 596, doi: 10.4149/cai 2017 3 566 NESTED-LOOPS TILING FOR PARALLELIZATION AND LOCALITY OPTIMIZATION Saeed Parsa, Mohammad Hamzei Department of Computer Engineering
More informationPolyhedral-Based Data Reuse Optimization for Configurable Computing
Polyhedral-Based Data Reuse Optimization for Configurable Computing Louis-Noël Pouchet 1 Peng Zhang 1 P. Sadayappan 2 Jason Cong 1 1 University of California, Los Angeles 2 The Ohio State University February
More informationFourier-Motzkin and Farkas Questions (HW10)
Automating Scheduling Logistics Final report for project due this Friday, 5/4/12 Quiz 4 due this Monday, 5/7/12 Poster session Thursday May 10 from 2-4pm Distance students need to contact me to set up
More informationCS 293S Parallelism and Dependence Theory
CS 293S Parallelism and Dependence Theory Yufei Ding Reference Book: Optimizing Compilers for Modern Architecture by Allen & Kennedy Slides adapted from Louis-Noël Pouche, Mary Hall End of Moore's Law
More informationAutotuning. John Cavazos. University of Delaware UNIVERSITY OF DELAWARE COMPUTER & INFORMATION SCIENCES DEPARTMENT
Autotuning John Cavazos University of Delaware What is Autotuning? Searching for the best code parameters, code transformations, system configuration settings, etc. Search can be Quasi-intelligent: genetic
More informationCombined Iterative and Model-driven Optimization in an Automatic Parallelization Framework
Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework Louis-Noël Pouchet The Ohio State University pouchet@cse.ohio-state.edu Uday Bondhugula IBM T.J. Watson Research
More informationTiling: A Data Locality Optimizing Algorithm
Tiling: A Data Locality Optimizing Algorithm Announcements Monday November 28th, Dr. Sanjay Rajopadhye is talking at BMAC Friday December 2nd, Dr. Sanjay Rajopadhye will be leading CS553 Last Monday Kelly
More informationVectorization in the Polyhedral Model
Vectorzaton n the Polyhedral Model Lous-Noël Pouchet pouchet@cse.oho-state.edu Dept. of Computer Scence and Engneerng, the Oho State Unversty October 200 888. Introducton: Overvew Vectorzaton: Detecton
More informationRevisiting Loop Transformations with X10 Clocks. Tomofumi Yuki Inria / LIP / ENS Lyon X10 Workshop 2015
Revisiting Loop Transformations with X10 Clocks Tomofumi Yuki Inria / LIP / ENS Lyon X10 Workshop 2015 The Problem n The Parallelism Challenge n cannot escape going parallel n parallel programming is hard
More informationClint: A Direct Manipulation Tool for Parallelizing Compute-Intensive Program Parts
Clint: A Direct Manipulation Tool for Parallelizing Compute-Intensive Program Parts Oleksandr Zinenko, Stéphane Huot, Cédric Bastoul To cite this version: Oleksandr Zinenko, Stéphane Huot, Cédric Bastoul.
More informationEnabling more optimizations in GRAPHITE: ignoring memory-based dependences
Enabling more optimizations in GRAPHITE: ignoring memory-based dependences Konrad Trifunovic, Albert Cohen To cite this version: Konrad Trifunovic, Albert Cohen. Enabling more optimizations in GRAPHITE:
More informationLoop Transformations, Dependences, and Parallelization
Loop Transformations, Dependences, and Parallelization Announcements HW3 is due Wednesday February 15th Today HW3 intro Unimodular framework rehash with edits Skewing Smith-Waterman (the fix is in!), composing
More informationAffine Loop Optimization using Modulo Unrolling in CHAPEL
Affine Loop Optimization using Modulo Unrolling in CHAPEL Aroon Sharma, Joshua Koehler, Rajeev Barua LTS POC: Michael Ferguson 2 Overall Goal Improve the runtime of certain types of parallel computers
More informationA Crash Course in Compilers for Parallel Computing. Mary Hall Fall, L2: Transforms, Reuse, Locality
A Crash Course in Compilers for Parallel Computing Mary Hall Fall, 2008 1 Overview of Crash Course L1: Data Dependence Analysis and Parallelization (Oct. 30) L2 & L3: Loop Reordering Transformations, Reuse
More informationAuto-vectorization using polyhedral compilation for an embedded ARM platform
B.M. Nieuwenhuizen Auto-vectorization using polyhedral compilation for an embedded ARM platform Bachelorscriptie Scriptiebegeleiders: H.J. Hupkes T.P. Stefanov J.T. Zhai Datum Bachelorexamen: 20 augustus
More informationInteger Programming Theory
Integer Programming Theory Laura Galli October 24, 2016 In the following we assume all functions are linear, hence we often drop the term linear. In discrete optimization, we seek to find a solution x
More informationSemi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies
International Journal of Parallel Programming, Vol.??, No.?,??? 2006 ( c 2006) Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies Sylvain Girbal, 1 Nicolas Vasilache,
More informationPolyhedral Search Space Exploration in the ExaStencils Code Generator
Preprint version before issue assignment Polyhedral Search Space Exploration in the ExaStencils Code Generator STEFAN KRONAWITTER and CHRISTIAN LENGAUER, University of Passau, Germany Performance optimization
More informationPolyOpt/C. A Polyhedral Optimizer for the ROSE compiler Edition 0.2, for PolyOpt/C March 12th Louis-Noël Pouchet
PolyOpt/C A Polyhedral Optimizer for the ROSE compiler Edition 0.2, for PolyOpt/C 0.2.1 March 12th 2012 Louis-Noël Pouchet This manual is dedicated to PolyOpt/C version 0.2.1, a framework for Polyhedral
More informationPoCC. The Polyhedral Compiler Collection package Edition 0.3, for PoCC 1.2 February 18th Louis-Noël Pouchet
PoCC The Polyhedral Compiler Collection package Edition 0.3, for PoCC 1.2 February 18th 2013 Louis-Noël Pouchet This manual is dedicated to PoCC version 1.2, a flexible source-to-source compiler in the
More informationA Compiler Framework for Optimization of Affine Loop Nests for General Purpose Computations on GPUs
A Compiler Framework for Optimization of Affine Loop Nests for General Purpose Computations on GPUs Muthu Manikandan Baskaran 1 Uday Bondhugula 1 Sriram Krishnamoorthy 1 J. Ramanujam 2 Atanas Rountev 1
More informationAlan LaMielle, Michelle Strout Colorado State University March 16, Technical Report CS
Computer Science Technical Report Enabling Code Generation within the Sparse Polyhedral Framework Alan LaMielle, Michelle Strout Colorado State University {lamielle,mstrout@cs.colostate.edu March 16, 2010
More informationPLuTo: A Practical and Fully Automatic Polyhedral Program Optimization System
PLuTo: A Practical and Fully Automatic Polyhedral Program Optimization System Uday Bondhugula J. Ramanujam P. Sadayappan Dept. of Computer Science and Engineering Dept. of Electrical & Computer Engg. and
More informationLoop Transformations! Part II!
Lecture 9! Loop Transformations! Part II! John Cavazos! Dept of Computer & Information Sciences! University of Delaware! www.cis.udel.edu/~cavazos/cisc879! Loop Unswitching Hoist invariant control-flow
More informationPredictive Modeling in a Polyhedral Optimization Space
Noname manuscript No. (will be inserted by the editor) Predictive Modeling in a Polyhedral Optimization Space Eunjung Park 1 John Cavazos 1 Louis-Noël Pouchet 2,3 Cédric Bastoul 4 Albert Cohen 5 P. Sadayappan
More information6.189 IAP Lecture 11. Parallelizing Compilers. Prof. Saman Amarasinghe, MIT IAP 2007 MIT
6.189 IAP 2007 Lecture 11 Parallelizing Compilers 1 6.189 IAP 2007 MIT Outline Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization Opportunities Generation of Parallel
More informationThe Polytope Model: Past, Present, Future
The Polytope Model: Past, Present, Future Paul Feautrier ENS de Lyon Paul.Feautrier@ens-lyon.fr 8 octobre 2009 1 / 39 What is a Model? What is a Polytope? Basis of the Polytope Model Fundamental Algorithms
More informationPredic've Modeling in a Polyhedral Op'miza'on Space
Predic've Modeling in a Polyhedral Op'miza'on Space Eunjung EJ Park 1, Louis- Noël Pouchet 2, John Cavazos 1, Albert Cohen 3, and P. Sadayappan 2 1 University of Delaware 2 The Ohio State University 3
More informationA Compiler Framework for Optimization of Affine Loop Nests for GPGPUs
A Compiler Framework for Optimization of Affine Loop Nests for GPGPUs Muthu Manikandan Baskaran Department of Computer Science and Engg. The Ohio State University baskaran@cse.ohiostate.edu J. Ramanujam
More informationA polyhedral loop transformation framework for parallelization and tuning
A polyhedral loop transformation framework for parallelization and tuning Ohio State University Uday Bondhugula, Muthu Baskaran, Albert Hartono, Sriram Krishnamoorthy, P. Sadayappan Argonne National Laboratory
More informationOffload acceleration of scientific calculations within.net assemblies
Offload acceleration of scientific calculations within.net assemblies Lebedev A. 1, Khachumov V. 2 1 Rybinsk State Aviation Technical University, Rybinsk, Russia 2 Institute for Systems Analysis of Russian
More informationPolly Polyhedral Optimizations for LLVM
Polly Polyhedral Optimizations for LLVM Tobias Grosser - Hongbin Zheng - Raghesh Aloor Andreas Simbürger - Armin Grösslinger - Louis-Noël Pouchet April 03, 2011 Polly - Polyhedral Optimizations for LLVM
More informationPage 1. Parallelization techniques. Dependence graph. Dependence Distance and Distance Vector
Parallelization techniques The parallelization techniques for loops normally follow the following three steeps: 1. Perform a Data Dependence Test to detect potential parallelism. These tests may be performed
More informationFADA : Fuzzy Array Dataflow Analysis
FADA : Fuzzy Array Dataflow Analysis M. Belaoucha, D. Barthou, S. Touati 27/06/2008 Abstract This document explains the basis of fuzzy data dependence analysis (FADA) and its applications on code fragment
More informationPolar Duality and Farkas Lemma
Lecture 3 Polar Duality and Farkas Lemma October 8th, 2004 Lecturer: Kamal Jain Notes: Daniel Lowd 3.1 Polytope = bounded polyhedron Last lecture, we were attempting to prove the Minkowsky-Weyl Theorem:
More informationPolly - Polyhedral optimization in LLVM
Polly - Polyhedral optimization in LLVM Tobias Grosser Universität Passau Ohio State University grosser@fim.unipassau.de Andreas Simbürger Universität Passau andreas.simbuerger@unipassau.de Hongbin Zheng
More informationSystems of Inequalities
Systems of Inequalities 1 Goals: Given system of inequalities of the form Ax b determine if system has an integer solution enumerate all integer solutions 2 Running example: Upper bounds for x: (2)and
More informationData Dependences and Parallelization
Data Dependences and Parallelization 1 Agenda Introduction Single Loop Nested Loops Data Dependence Analysis 2 Motivation DOALL loops: loops whose iterations can execute in parallel for i = 11, 20 a[i]
More informationOutline. Why Parallelism Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization Opportunities
Parallelization Outline Why Parallelism Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization Opportunities Moore s Law From Hennessy and Patterson, Computer Architecture:
More informationLecture 9 Basic Parallelization
Lecture 9 Basic Parallelization I. Introduction II. Data Dependence Analysis III. Loop Nests + Locality IV. Interprocedural Parallelization Chapter 11.1-11.1.4 CS243: Parallelization 1 Machine Learning
More informationLecture 9 Basic Parallelization
Lecture 9 Basic Parallelization I. Introduction II. Data Dependence Analysis III. Loop Nests + Locality IV. Interprocedural Parallelization Chapter 11.1-11.1.4 CS243: Parallelization 1 Machine Learning
More informationStatic and Dynamic Frequency Scaling on Multicore CPUs
Static and Dynamic Frequency Scaling on Multicore CPUs Wenlei Bao 1 Changwan Hong 1 Sudheer Chunduri 2 Sriram Krishnamoorthy 3 Louis-Noël Pouchet 4 Fabrice Rastello 5 P. Sadayappan 1 1 The Ohio State University
More informationCompiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7
Compiler Optimizations Chapter 8, Section 8.5 Chapter 9, Section 9.1.7 2 Local vs. Global Optimizations Local: inside a single basic block Simple forms of common subexpression elimination, dead code elimination,
More informationMore Data Locality for Static Control Programs on NUMA Architectures
More Data Locality for Static Control Programs on NUMA Architectures Adilla Susungi 1, Albert Cohen 2, Claude Tadonki 1 1 MINES ParisTech, PSL Research University 2 Inria and DI, Ecole Normale Supérieure
More informationComputing the Integer Points of a Polyhedron
Computing the Integer Points of a Polyhedron Complexity Estimates Rui-Juan Jing 1,2 and Marc Moreno Maza 2,3 1 Key Laboratoty of Mathematics Mechnization, Academy of Mathematics and Systems Science, Chinese
More informationGeneration of parallel synchronization-free tiled code
Computing (2018) 100:277 302 https://doi.org/10.1007/s00607-017-0576-3 Generation of parallel synchronization-free tiled code Wlodzimierz Bielecki 1 Marek Palkowski 1 Piotr Skotnicki 1 Received: 22 August
More informationCompiling for Advanced Architectures
Compiling for Advanced Architectures In this lecture, we will concentrate on compilation issues for compiling scientific codes Typically, scientific codes Use arrays as their main data structures Have
More informationPolyhedral Optimizations of Explicitly Parallel Programs
Habanero Extreme Scale Software Research Group Department of Computer Science Rice University The 24th International Conference on Parallel Architectures and Compilation Techniques (PACT) October 19, 2015
More informationTransforming Imperfectly Nested Loops
Transforming Imperfectly Nested Loops 1 Classes of loop transformations: Iteration re-numbering: (eg) loop interchange Example DO 10 J = 1,100 DO 10 I = 1,100 DO 10 I = 1,100 vs DO 10 J = 1,100 Y(I) =
More informationTOBIAS GROSSER Parkas Group, Computer Science Department, Ècole Normale Supèrieure / INRIA 45 Rue d Ulm, Paris, 75005, France
Parallel Processing Letters c World Scientific Publishing Company POLLY PERFORMING POLYHEDRAL OPTIMIZATIONS ON A LOW-LEVEL INTERMEDIATE REPRESENTATION TOBIAS GROSSER Parkas Group, Computer Science Department,
More informationPutting Automatic Polyhedral Compilation for GPGPU to Work
Putting Automatic Polyhedral Compilation for GPGPU to Work Soufiane Baghdadi 1, Armin Größlinger 2,1, and Albert Cohen 1 1 INRIA Saclay and LRI, Paris-Sud 11 University, France {soufiane.baghdadi,albert.cohen@inria.fr
More informationInteger Programming ISE 418. Lecture 7. Dr. Ted Ralphs
Integer Programming ISE 418 Lecture 7 Dr. Ted Ralphs ISE 418 Lecture 7 1 Reading for This Lecture Nemhauser and Wolsey Sections II.3.1, II.3.6, II.4.1, II.4.2, II.5.4 Wolsey Chapter 7 CCZ Chapter 1 Constraint
More informationAffine and Unimodular Transformations for Non-Uniform Nested Loops
th WSEAS International Conference on COMPUTERS, Heraklion, Greece, July 3-, 008 Affine and Unimodular Transformations for Non-Uniform Nested Loops FAWZY A. TORKEY, AFAF A. SALAH, NAHED M. EL DESOUKY and
More informationLinear Loop Transformations for Locality Enhancement
Linear Loop Transformations for Locality Enhancement 1 Story so far Cache performance can be improved by tiling and permutation Permutation of perfectly nested loop can be modeled as a linear transformation
More informationThe Challenges of Non-linear Parameters and Variables in Automatic Loop Parallelisation
The Challenges of Non-linear Parameters and Variables in Automatic Loop Parallelisation Armin Größlinger December 2, 2009 Rigorosum Fakultät für Informatik und Mathematik Universität Passau Automatic Loop
More informationLecture 11 Loop Transformations for Parallelism and Locality
Lecture 11 Loop Transformations for Parallelism and Locality 1. Examples 2. Affine Partitioning: Do-all 3. Affine Partitioning: Pipelining Readings: Chapter 11 11.3, 11.6 11.7.4, 11.9-11.9.6 1 Shared Memory
More informationCache Aware Optimization of Stream Programs
Cache Aware Optimization of Stream Programs Janis Sermulins, William Thies, Rodric Rabbah and Saman Amarasinghe LCTES Chicago, June 2005 Streaming Computing Is Everywhere! Prevalent computing domain with
More informationInvestigating Mixed-Integer Hulls using a MIP-Solver
Investigating Mixed-Integer Hulls using a MIP-Solver Matthias Walter Otto-von-Guericke Universität Magdeburg Joint work with Volker Kaibel (OvGU) Aussois Combinatorial Optimization Workshop 2015 Outline
More informationAutomatic Counterflow Pipeline Synthesis
Automatic Counterflow Pipeline Synthesis Bruce R. Childers, Jack W. Davidson Computer Science Department University of Virginia Charlottesville, Virginia 22901 {brc2m, jwd}@cs.virginia.edu Abstract The
More informationEffective Automatic Parallelization and Locality Optimization Using The Polyhedral Model
Effective Automatic Parallelization and Locality Optimization Using The Polyhedral Model DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate
More informationGXBIT: COMBINING POLYHEDRAL MODEL WITH DYNAMIC BINARY TRANSLATION
GXBIT: COMBINING POLYHEDRAL MODEL WITH DYNAMIC BINARY TRANSLATION 1 ZHANG KANG, 2 ZHOU FANFU AND 3 LIANG ALEI 1 China Telecommunication, Shanghai, China 2 Department of Computer Science and Engineering,
More informationGenerating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory
Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory Roshan Dathathri Thejas Ramashekar Chandan Reddy Uday Bondhugula Department of Computer Science and Automation
More informationStatic Data Race Detection for SPMD Programs via an Extended Polyhedral Representation
via an Extended Polyhedral Representation Habanero Extreme Scale Software Research Group Department of Computer Science Rice University 6th International Workshop on Polyhedral Compilation Techniques (IMPACT
More informationMathematical and Algorithmic Foundations Linear Programming and Matchings
Adavnced Algorithms Lectures Mathematical and Algorithmic Foundations Linear Programming and Matchings Paul G. Spirakis Department of Computer Science University of Patras and Liverpool Paul G. Spirakis
More informationAbstract Acceleration of General Linear Loops
Abstract Acceleration of General Linear Loops Bertrand Jeannet, Peter Schrammel, Sriram Sankaranarayanan Principles of Programming Languages, POPL 14 San Diego, CA Motivation and Challenge Motivation Inferring
More informationA Polyhedral AST generation is more than scanning polyhedra
A Polyhedral AST generation is more than scanning polyhedra GROSSER TOBIAS, INRIA and ÉCOLE NORMALE SUPÉRIEURE 1 VERDOOLAEGE SVEN, INRIA, ÉCOLE NORMALE SUPÉRIEURE and KU Leuven2 COHEN ALBERT, INRIA and
More informationLanguage and Compiler Parallelization Support for Hashtables
Language Compiler Parallelization Support for Hashtables A Project Report Submitted in partial fulfilment of the requirements for the Degree of Master of Engineering in Computer Science Engineering by
More informationPolyhedral Computation Today s Topic: The Double Description Algorithm. Komei Fukuda Swiss Federal Institute of Technology Zurich October 29, 2010
Polyhedral Computation Today s Topic: The Double Description Algorithm Komei Fukuda Swiss Federal Institute of Technology Zurich October 29, 2010 1 Convexity Review: Farkas-Type Alternative Theorems Gale
More informationParallelization. Saman Amarasinghe. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
Spring 2 Parallelization Saman Amarasinghe Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Outline Why Parallelism Parallel Execution Parallelizing Compilers
More informationBasic Algorithms for Periodic-Linear Inequalities and Integer Polyhedra
Basic Algorithms for Periodic-Linear Inequalities and Integer Polyhedra Alain Ketterlin / Camus IMPACT 2018: January, 23, 2018 Motivation Periodic-Linear Inequalities The Omicron Test Decomposition Motivation
More informationLattice-Based Memory Allocation
Lattice-Based Memory Allocation Gilles Villard CNRS, Laboratoire LIP, ENS Lyon France Joint work with Alain Darte (CNRS, LIP) and Rob Schreiber (HP Labs) Int. Conf. Compilers, Architecture and Synthesis
More informationPolyhedral Operations. Algorithms needed for automation. Logistics
Polyhedral Operations Logistics Intermediate reports late deadline is Friday March 30 at midnight HW6 (posted) and HW7 (posted) due April 5 th Tuesday April 4 th, help session during class with Manaf,
More informationFacilitating the Search for Compositions of Program Transformations
Facilitating the Search for Compositions of Program Transformations Albert Cohen 1 Sylvain Girbal 12 David Parello 13 Marc Sigler 1 Olivier Temam 1 Nicolas Vasilache 1 1 ALCHEMY Group, INRIA Futurs and
More informationLanguage and compiler parallelization support for Hash tables
Language compiler parallelization support for Hash tables Arjun Suresh Advisor: Dr. UDAY KUMAR REDDY B. Department of Computer Science & Automation Indian Institute of Science, Bangalore Bengaluru, India.
More informationExploring Parallelism At Different Levels
Exploring Parallelism At Different Levels Balanced composition and customization of optimizations 7/9/2014 DragonStar 2014 - Qing Yi 1 Exploring Parallelism Focus on Parallelism at different granularities
More informationCombinatorial Geometry & Topology arising in Game Theory and Optimization
Combinatorial Geometry & Topology arising in Game Theory and Optimization Jesús A. De Loera University of California, Davis LAST EPISODE... We discuss the content of the course... Convex Sets A set is
More informationMANY signal processing systems, particularly in the multimedia
1304 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 9, SEPTEMBER 2009 Signal Assignment to Hierarchical Memory Organizations for Embedded Multidimensional Signal Processing
More information