Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time

Size: px
Start display at page:

Download "Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time"

Transcription

1 Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time Louis-Noël Pouchet, Cédric Bastoul, Albert Cohen and Nicolas Vasilache ALCHEMY, INRIA Futurs / University of Paris-Sud XI March 12, 2007 Fifth International Symposium on Code Generation and Optimization San Jose, California

2 Outline: Outline Context of this study: Focus on Loop Nest Optimization for regular loops Automatic method for parallelism extraction / loop transformation Combine iterative methods with the power of the polyhedral model Solution independent of the compiler and the target machine Our contribution: Search space construction 1 point in the space 1 distinct legal program version suitable for various exploration methods Performance 99% of the best speedup attained within 20 runs of a dedicated heuristic wall clock optimal transformation discoverable on small kernels 2

3 Scheduling in the Polyhedral Model: A Motivating Example One-Dimensional Scheduling Original Schedule for (i=0; i<n; ++i) {. S1(i);. for (j=0; j<n; ++j).. S2(i,j); } { θ S1 = i θ S2 = i for (i=0; i<n; ++i) {. S1(i);. for (j=0; j<n; ++j).. S2(i,j); } Specify the outer-most loop only Initial outer-most loop is i 3

4 Scheduling in the Polyhedral Model: A Motivating Example One-Dimensional Scheduling Distribute loops for (i=0; i<n; ++i) {. S1(i);. for (j=0; j<n; ++j).. S2(i,j); } { θ S1 = i θ S2 = i+n for (i=0; i<n; ++i). S1(i); for (i=n; i<2*n; ++i). for (j=0; j<n; ++j).. S2(i-n,j); Specify the outer-most loop only All instances of S1 are executed before the first S2 instance 3

5 Scheduling in the Polyhedral Model: A Motivating Example One-Dimensional Scheduling Distribute loops + Interchange loops for S2 for (i=0; i<n; ++i) {. S1(i);. for (j=0; j<n; ++j).. S2(i,j); } { θ S1 = i θ S2 = j + n for (i=0; i<n; ++i). S1(i); for (j=n; j<2*n; ++j). for (i=0; i<n; ++i).. S2(i,j-n); Specify the outer-most loop only The outer-most loop for S2 becomes j 3

6 Scheduling in the Polyhedral Model: A Motivating Example One-Dimensional Scheduling Distribute loops + Interchange loops for S2 for (i=0; i<n; ++i) {. S1(i);. for (j=0; j<n; ++j).. S2(i,j); } { θ S1 = i θ S2 = j + n for (i=0; i<n; ++i). S1(i); for (j=n; j<2*n; ++j). for (i=0; i<n; ++i).. S2(i,j-n); Transformation reversal skewing interchange peeling shifting fusion distribution Description Changes the direction in which a loop traverses its iteration range Makes the bounds of a given loop depend on an outer loop counter Exchanges two loops in a perfectly nested loop, a.k.a. permutation Extracts one iteration of a given loop Allows to reorder loops Fuses two loops, a.k.a. jamming Splits a single loop nest into many, a.k.a. fission or splitting 3

7 Scheduling in the Polyhedral Model: A Motivating Example One-Dimensional Scheduling for (i=0; i<n; ++i) {. S1(i);. for (j=0; j<n; ++j).. S2(i,j); } A schedule is an affine function of the iteration vector and the parameters θ S1 ( x S1 ) = t 1S1.i S1 + t 2S1.n + t 3S1.1 θ S2 ( x S2 ) = t 1S2.i S2 + t 2S2.j S2 + t 3S2.n + t 4S2.1 4

8 Scheduling in the Polyhedral Model: A Motivating Example One-Dimensional Scheduling for (i=0; i<n; ++i) {. s[i] = 0;. for (j=0; j<n; ++j).. s[i] = s[i]+a[i][j]*x[j]; } A schedule is an affine function of the iteration vector and the parameters θ S1 ( x S1 ) = t 1S1.i S1 + t 2S1.n + t 3S1.1 θ S2 ( x S2 ) = t 1S2.i S2 + t 2S2.j S2 + t 3S2.n + t 4S2.1 For 1 t 1, there are 3 7 = 2187 possible schedules 4

9 Scheduling in the Polyhedral Model: A Motivating Example One-Dimensional Scheduling for (i=0; i<n; ++i) {. s[i] = 0;. for (j=0; j<n; ++j).. s[i] = s[i]+a[i][j]*x[j]; } A schedule is an affine function of the iteration vector and the parameters θ S1 ( x S1 ) = t 1S1.i S1 + t 2S1.n + t 3S1.1 θ S2 ( x S2 ) = t 1S2.i S2 + t 2S2.j S2 + t 3S2.n + t 4S2.1 For 1 t 1, there are 3 7 = 2187 possible schedules But only 129 legal distinct schedules 4

10 Scheduling in the Polyhedral Model: Overview Our Objective 1 Search space construction Efficiently construct a space of all legal, distinct affine schedules 5

11 Scheduling in the Polyhedral Model: Overview Our Objective 1 Search space construction Efficiently construct a space of all legal, distinct affine schedules matmult locality fir h264 crout i-bounds 1,1 1,1 0,1 1,1 3,3 c-bounds 1,1 1,1 0,3 0,4 3,3 #Sched

12 Scheduling in the Polyhedral Model: Overview Our Objective 1 Search space construction Efficiently construct a space of all legal, distinct affine schedules matmult locality fir h264 crout i-bounds 1,1 1,1 0,1 1,1 3,3 c-bounds 1,1 1,1 0,3 0,4 3,3 #Sched #Legal

13 Scheduling in the Polyhedral Model: Overview Our Objective 1 Search space construction Efficiently construct a space of all legal, distinct affine schedules matmult locality fir h264 crout i-bounds 1,1 1,1 0,1 1,1 3,3 c-bounds 1,1 1,1 0,3 0,4 3,3 #Sched #Legal Rely on the polyhedral model and Integer Linear Programming to guarantee completeness and correctness of the space properties 5

14 Scheduling in the Polyhedral Model: Overview Our Objective 1 Search space construction Efficiently construct a space of all legal, distinct affine schedules matmult locality fir h264 crout i-bounds 1,1 1,1 0,1 1,1 3,3 c-bounds 1,1 1,1 0,3 0,4 3,3 #Sched #Legal Rely on the polyhedral model and Integer Linear Programming to guarantee completeness and correctness of the space properties Search space will emcoumpass unique, distinct compositions of reversal, skewing, interchange, fusion, peeling, shifting, distribution 5

15 Scheduling in the Polyhedral Model: Overview Our Objective 1 Search space construction Efficiently construct a space of all legal, distinct affine schedules matmult locality fir h264 crout i-bounds 1,1 1,1 0,1 1,1 3,3 c-bounds 1,1 1,1 0,3 0,4 3,3 #Sched #Legal Rely on the polyhedral model and Integer Linear Programming to guarantee completeness and correctness of the space properties Search space will emcoumpass unique, distinct compositions of reversal, skewing, interchange, fusion, peeling, shifting, distribution 2 Search space exploration Perform exhaustive scan to discover wall clock optimal schedule, and evidences of intricacy of the best transformation Build an efficient heuristic to accelerate the space traversal 5

16 Search Space Construction: Preliminaries Polyhedral Representation of Programs Static Control Parts Loops have affine control only 6

17 Search Space Construction: Preliminaries Polyhedral Representation of Programs Static Control Parts Loops have affine control only Iteration domain: represented as integer polyhedra for (i=1; i<=n; ++i). for (j=1; j<=n; ++j).. if (i<=n-j+2)... s[i] =... D S1 = i j n 1 0 6

18 Search Space Construction: Preliminaries Polyhedral Representation of Programs Static Control Parts Loops have affine control only Iteration domain: represented as integer polyhedra Memory accesses: static references, represented as affine functions of x S and p for (i=0; i<n; ++i) {. s[i] = 0;. for (j=0; j<n; ++j).. s[i] = s[i]+a[i][j]*x[j]; } f s ( x S2 ) = [ ]. [ f a ( x S2 ) = ]. f x ( x S2 ) = [ ]. x S2 n 1 x S2 n 1 x S2 n 1 6

19 Search Space Construction: Preliminaries Polyhedral Representation of Programs Static Control Parts Loops have affine control only Iteration domain: represented as integer polyhedra Memory accesses: static references, represented as affine functions of x S and p Data dependence between S1 and S2: a subset of the Cartesian product of D S1 and D S2 (exact analysis) for (i=1; i<=3; ++i) {. s[i] = 0;. for (j=1; j<=3; ++j).. s[i] = s[i] + 1; } D S1δS2 : i S1. i S2 j S2 1 0 = 0 S1 iterations S2 iterations i 6

20 Search Space Construction: Preliminaries Polyhedral Representation of Programs Static Control Parts Loops have affine control only Iteration domain: represented as integer polyhedra Memory accesses: static references, represented as affine functions of x S and p Data dependence between S1 and S2: a subset of the Cartesian product of D S1 and D S2 (exact analysis) Reduced dependence graph labeled by dependence polyhedra 6

21 Search Space Construction: Way to Go Space Construction Affine Schedules Legal Distinct Schedules 7

22 Search Space Construction: Way to Go Space Construction Affine Schedules Legal Distinct Schedules - Causality condition Property (Causality condition for schedules) Given RδS, θ R and θ S are legal iff for each pair of instances in dependence: θ R ( x R ) < θ S ( x S ) Equivalently: R,S = θ S ( x S ) θ R ( x R ) 1 0 7

23 Search Space Construction: Way to Go Space Construction Affine Schedules Legal Distinct Schedules - Causality condition - Farkas Lemma Lemma (Affine form of Farkas lemma) Let D be a nonempty polyhedron defined by A x + b 0. Then any affine function f ( x) is non-negative everywhere in D iff it is a positive affine combination: f ( x) = λ 0 + λ T (A x + b), with λ 0 0 and λ 0. λ 0 and λ T are called the Farkas multipliers. 7

24 Search Space Construction: Way to Go Space Construction Affine Schedules Valid Farkas Multipliers Legal Distinct Schedules - Causality condition - Farkas Lemma 7

25 Search Space Construction: Way to Go Space Construction Affine Schedules Valid Farkas Multipliers Many to one Legal Distinct Schedules - Causality condition - Farkas Lemma 7

26 Search Space Construction: Way to Go Space Construction Affine Schedules Valid Farkas Multipliers Legal Distinct Schedules - Causality condition - Farkas Lemma - Identification ( ( ) ) θ S ( x S ) θ R ( x R ) 1 = λ 0 + λ T xr D R,S + d R,S 0 x S D RδS i R : λ D1,1 λ D1,2 + λ D1,7 i S : λ D1,3 λ D1,4 λ D1,7 j S : λ D1,5 λ D1,6 n : λ D1,2 + λ D1,4 + λ D1,6 1 : λ D1,0 7

27 Search Space Construction: Way to Go Space Construction Affine Schedules Valid Farkas Multipliers Legal Distinct Schedules - Causality condition - Farkas Lemma - Identification ( ( ) ) θ S ( x S ) θ R ( x R ) 1 = λ 0 + λ T xr D R,S + d R,S 0 x S D RδS i R : t 1R = λ D1,1 λ D1,2 + λ D1,7 i S : t 1S = λ D1,3 λ D1,4 λ D1,7 j S : t 2S = λ D1,5 λ D1,6 n : t 3S t 2R = λ D1,2 + λ D1,4 + λ D1,6 1 : t 4S t 3R 1 = λ D1,0 7

28 Search Space Construction: Way to Go Space Construction Affine Schedules Valid Farkas Multipliers Legal Distinct Schedules - Causality condition - Farkas Lemma - Identification - Projection Solve the constraint system Use (optimized) Fourier-Motzkin projection algorithm Reduce redundancy Detect implicit equalities 7

29 Search Space Construction: Way to Go Space Construction Affine Schedules Valid Farkas Multipliers Valid Transformation Coefficients Legal Distinct Schedules - Causality condition - Farkas Lemma - Identification - Projection 7

30 Search Space Construction: Way to Go Space Construction Affine Schedules Valid Farkas Multipliers Valid Transformation Coefficients Bijection Legal Distinct Schedules - Causality condition - Farkas Lemma - Identification - Projection One point in the space one set of legal schedules w.r.t. the dependence 7

31 Search Space Construction: Way to Go Overview Algorithm Add constraints obtained for each dependence Bound the space Search space: set of linear constraints on the schedule coefficients (i.e. Z-polytope) To each integral point in the space corresponds a distinct program version where the semantics is preserved Benchmark i-bounds #Sched #Legal Time matmult 1, locality 1, fir 0, h264 1, crout 3,

32 Search Space Exploration: Framework for Iterative Optimization Workflow SCoP representation Polyhedral computing libraries Code generation PIPLib PolyLib CLooG Iterative compilation and run of base source code with transformed SCoP C compilable code Source Code Static Analysis Space Construction Space Exploration Kernel Generation Unit Generation Compilation Run Target Code Polyhedral representation of SCoP Bounded search space Feedback from hardware counter(s) CLooG: PiPLib: PolyLib: 9

33 Search Space Exploration: Exhaustive Scan Performance Distribution [1/2] 2e+09 matmult 4e+09 locality 1.8e e e+09 3e+09 Cycles 1.4e e+09 Cycles 2.5e+09 2e+09 1e+09 original 1.5e+09 8e+08 6e Transformation identifier 1e+09 original 5e Transformation identifier Figure: Performance distribution for matmult and locality 10

34 Search Space Exploration: Exhaustive Scan Performance Distribution [2/2] 1.42e+09 crout 1.34e+09 crout 1.4e e e e e e+09 Cycles 1.34e e+09 Cycles 1.3e e e e+09 original 1.28e e+09 original 1.26e Transformation identifier 1.26e Transformation identifier (a) GCC -O3 (b) ICC -fast Figure: The effect of the compiler 11

35 Search Space Exploration: Exhaustive Scan Performance Comparison Figure: Best Version vs Original 12

36 Search Space Exploration: Heuristic Scan Heuristic Scan Propose a decoupling heuristic: The general form of the schedule is embedded in the iterator coefficients Decouple the schedule: θ S ( x S ) = ( ı p c) x S n 1 13

37 Search Space Exploration: Heuristic Scan Heuristic Scan Propose a decoupling heuristic: The general form of the schedule is embedded in the iterator coefficients Decouple the schedule: θ S ( x S ) = ( ı p c) x S n 1 Parameters and constant coefficients can be seen as a refinement 13

38 Search Space Exploration: Heuristic Scan Heuristic Scan Propose a decoupling heuristic: The general form of the schedule is embedded in the iterator coefficients Decouple the schedule: θ S ( x S ) = ( ı p c) x S n 1 Parameters and constant coefficients can be seen as a refinement Adressing scalability to larger SCoPs: 1 impose a static or dynamic limit to the number of runs (limit to the ı part) 2 replace an exhaustive enumeration of the ı combinations by a limited set of random draws in the ı space. 13

39 Search Space Exploration: Heuristic Scan Results locality matmult mvt Maximum speedup achieved (in %) Decoupling Maximum speedup achieved (in %) Decoupling Maximum speedup achieved (in %) Decoupling 40 Random Random Random Runs Runs Runs Figure: Comparison between random and decoupling heuristics locality matmult matvecttransp 4e+09 2e e e e e+09 Cycles 3e e+09 2e e+09 1e+09 original Cycles 1.6e e e+09 1e+09 8e+08 original Cycles (M) 1.1e+09 1e+09 9e+08 8e+08 7e+08 6e+08 5e+08 Original 5e e e Transformation identifier Transformation identifier Transfo. ID 14

40 Conclusion: Conclusion Optimizing and / or Enabling transformation framework on top of the compiler Encouraging speedups, fast heuristic convergence On small kernels, optimal transformation can be discovered 15

41 Conclusion: Conclusion Optimizing and / or Enabling transformation framework on top of the compiler Encouraging speedups, fast heuristic convergence On small kernels, optimal transformation can be discovered 15

42 Conclusion: Conclusion Optimizing and / or Enabling transformation framework on top of the compiler Encouraging speedups, fast heuristic convergence On small kernels, optimal transformation can be discovered 15

43 Conclusion: Conclusion Optimizing and / or Enabling transformation framework on top of the compiler Encouraging speedups, fast heuristic convergence On small kernels, optimal transformation can be discovered Ongoing and future work: Couple with state-of-the-art feedback-directed iterative methods Part II: multidimensional schedules Integrate into GCC GRAPHITE branch 15

44 Conclusion: Conclusion Optimizing and / or Enabling transformation framework on top of the compiler Encouraging speedups, fast heuristic convergence On small kernels, optimal transformation can be discovered Ongoing and future work: Couple with state-of-the-art feedback-directed iterative methods Part II: multidimensional schedules Integrate into GCC GRAPHITE branch 15

45 Conclusion: Conclusion Optimizing and / or Enabling transformation framework on top of the compiler Encouraging speedups, fast heuristic convergence On small kernels, optimal transformation can be discovered Ongoing and future work: Couple with state-of-the-art feedback-directed iterative methods Part II: multidimensional schedules Integrate into GCC GRAPHITE branch 15

46 Questions: 16

47 Questions: A Transformation Example Intricacy of the Transformed Code Optimal Transformation for locality, GCC 4 -O3, P4 Xeon S1: B[j] = A[j] S2: C[j] = A[j + N] for (i=0;i<=m;i++) { for (j=0;j<=m;j++) { S1(i,j); S2(i,j); } } for (c1=-n;c1<=min(-2,m-n);c1++) for (j=0;j<=m;j++) S1(c1+N,j); for (c1=-1;c1<=m-n;c1++) { for (j=0;j<=m;j++) S2(c1+1,j); for (j=0;j<=m;j++) S1(c1+N,j); } for (c1=max(m-n+1,-1);c1<=m-1;c1++) for (j=0;j<=m;j++) S2(c1+1,j); 19.4% speedup, without vectorization 17

Louis-Noël Pouchet

Louis-Noël Pouchet Internship Report, jointly EPITA CSI 2006 and UPS Master 2 (March - September 2006) Louis-Noël Pouchet WHEN ITERATIVE OPTIMIZATION MEETS THE POLYHEDRAL MODEL: ONE-DIMENSIONAL

More information

Iterative Optimization in the Polyhedral Model

Iterative Optimization in the Polyhedral Model Iterative Optimization in the Polyhedral Model Louis-Noël Pouchet, INRIA Saclay / University of Paris-Sud 11, France January 18th, 2010 Ph.D Defense Introduction: A Brief History... A Quick look backward:

More information

Polyhedral Compilation Foundations

Polyhedral Compilation Foundations Polyhedral Compilation Foundations Louis-Noël Pouchet pouchet@cse.ohio-state.edu Dept. of Computer Science and Engineering, the Ohio State University Feb 22, 2010 888.11, Class #5 Introduction: Polyhedral

More information

CS671 Parallel Programming in the Many-Core Era

CS671 Parallel Programming in the Many-Core Era 1 CS671 Parallel Programming in the Many-Core Era Polyhedral Framework for Compilation: Polyhedral Model Representation, Data Dependence Analysis, Scheduling and Data Locality Optimizations December 3,

More information

The Polyhedral Model Is More Widely Applicable Than You Think

The Polyhedral Model Is More Widely Applicable Than You Think The Polyhedral Model Is More Widely Applicable Than You Think Mohamed-Walid Benabderrahmane 1 Louis-Noël Pouchet 1,2 Albert Cohen 1 Cédric Bastoul 1 1 ALCHEMY group, INRIA Saclay / University of Paris-Sud

More information

Polyhedral Compilation Foundations

Polyhedral Compilation Foundations Polyhedral Compilation Foundations Louis-Noël Pouchet pouchet@cse.ohio-state.edu Dept. of Computer Science and Engineering, the Ohio State University Feb 15, 2010 888.11, Class #4 Introduction: Polyhedral

More information

Iterative Optimization in the Polyhedral Model: Part II, Multidimensional Time

Iterative Optimization in the Polyhedral Model: Part II, Multidimensional Time Iterative Optimization in the Polyhedral Model: Part II, Multidimensional Time Louis-Noël Pouchet 1,3, Cédric Bastoul 1,3, Albert Cohen 1,3 and John Cavazos 2,3 1 ALCHEMY Group, INRIA Futurs and Paris-Sud

More information

Program transformations and optimizations in the polyhedral framework

Program transformations and optimizations in the polyhedral framework Program transformations and optimizations in the polyhedral framework Louis-Noël Pouchet Dept. of Computer Science UCLA May 14, 2013 1st Polyhedral Spring School St-Germain au Mont d or, France : Overview

More information

The Polyhedral Compilation Framework

The Polyhedral Compilation Framework The Polyhedral Compilation Framework Louis-Noël Pouchet Dept. of Computer Science and Engineering Ohio State University pouchet@cse.ohio-state.edu October 20, 2011 Introduction: Overview of Today s Lecture

More information

Loop Nest Optimizer of GCC. Sebastian Pop. Avgust, 2006

Loop Nest Optimizer of GCC. Sebastian Pop. Avgust, 2006 Loop Nest Optimizer of GCC CRI / Ecole des mines de Paris Avgust, 26 Architecture of GCC and Loop Nest Optimizer C C++ Java F95 Ada GENERIC GIMPLE Analyses aliasing data dependences number of iterations

More information

UNIVERSITÉ DE PARIS-SUD 11 U.F.R. SCIENTIFIQUE D ORSAY

UNIVERSITÉ DE PARIS-SUD 11 U.F.R. SCIENTIFIQUE D ORSAY UNIVERSITÉ DE PARIS-SUD 11 U.F.R. SCIENTIFIQUE D ORSAY In Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY Specialty: Computer Science Louis-Noël POUCHET Subject: ITERATIVE

More information

The Polyhedral Model (Transformations)

The Polyhedral Model (Transformations) The Polyhedral Model (Transformations) Announcements HW4 is due Wednesday February 22 th Project proposal is due NEXT Friday, extension so that example with tool is possible (see resources website for

More information

Polyhedral Compilation Foundations

Polyhedral Compilation Foundations Polyhedral Complaton Foundatons Lous-Noël Pouchet pouchet@cse.oho-state.edu Dept. of Computer Scence and Engneerng, the Oho State Unversty Feb 8, 200 888., Class # Introducton: Polyhedral Complaton Foundatons

More information

GRAPHITE: Polyhedral Analyses and Optimizations

GRAPHITE: Polyhedral Analyses and Optimizations GRAPHITE: Polyhedral Analyses and Optimizations for GCC Sebastian Pop 1 Albert Cohen 2 Cédric Bastoul 2 Sylvain Girbal 2 Georges-André Silber 1 Nicolas Vasilache 2 1 CRI, École des mines de Paris, Fontainebleau,

More information

UNIVERSITÉ DE PARIS-SUD 11 U.F.R. SCIENTIFIQUE D ORSAY

UNIVERSITÉ DE PARIS-SUD 11 U.F.R. SCIENTIFIQUE D ORSAY UNIVERSITÉ DE PARIS-SUD 11 U.F.R. SCIENTIFIQUE D ORSAY In Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY Specialty: Computer Science Louis-Noël POUCHET Subject: ITERATIVE

More information

Loop Transformations: Convexity, Pruning and Optimization

Loop Transformations: Convexity, Pruning and Optimization Loop Transformations: Convexity, Pruning and Optimization Louis-Noël Pouchet, Uday Bondhugula, Cédric Bastoul, Albert Cohen, Jagannathan Ramanujam, Ponnuswamy Sadayappan, Nicolas Vasilache To cite this

More information

Parametric Multi-Level Tiling of Imperfectly Nested Loops*

Parametric Multi-Level Tiling of Imperfectly Nested Loops* Parametric Multi-Level Tiling of Imperfectly Nested Loops* Albert Hartono 1, Cedric Bastoul 2,3 Sriram Krishnamoorthy 4 J. Ramanujam 6 Muthu Baskaran 1 Albert Cohen 2 Boyana Norris 5 P. Sadayappan 1 1

More information

Loop Transformations: Convexity, Pruning and Optimization

Loop Transformations: Convexity, Pruning and Optimization Loop Transformations: Convexity, Pruning and Optimization Louis-Noël Pouchet The Ohio State University pouchet@cse.ohio-state.edu Uday Bondhugula IBM T.J. Watson Research Center ubondhug@us.ibm.com Cédric

More information

Tiling: A Data Locality Optimizing Algorithm

Tiling: A Data Locality Optimizing Algorithm Tiling: A Data Locality Optimizing Algorithm Previously Performance analysis of existing codes Data dependence analysis for detecting parallelism Specifying transformations using frameworks Today Usefulness

More information

Tiling: A Data Locality Optimizing Algorithm

Tiling: A Data Locality Optimizing Algorithm Tiling: A Data Locality Optimizing Algorithm Previously Unroll and Jam Homework PA3 is due Monday November 2nd Today Unroll and Jam is tiling Code generation for fixed-sized tiles Paper writing and critique

More information

Review. Loop Fusion Example

Review. Loop Fusion Example Review Distance vectors Concisely represent dependences in loops (i.e., in iteration spaces) Dictate what transformations are legal e.g., Permutation and parallelization Legality A dependence vector is

More information

Legal and impossible dependences

Legal and impossible dependences Transformations and Dependences 1 operations, column Fourier-Motzkin elimination us use these tools to determine (i) legality of permutation and Let generation of transformed code. (ii) Recall: Polyhedral

More information

An Overview to. Polyhedral Model. Fangzhou Jiao

An Overview to. Polyhedral Model. Fangzhou Jiao An Overview to Polyhedral Model Fangzhou Jiao Polyhedral Model A framework for performing loop transformation Loop representation: using polytopes to achieve fine-grain representation of program Loop transformation:

More information

Computing and Informatics, Vol. 36, 2017, , doi: /cai

Computing and Informatics, Vol. 36, 2017, , doi: /cai Computing and Informatics, Vol. 36, 2017, 566 596, doi: 10.4149/cai 2017 3 566 NESTED-LOOPS TILING FOR PARALLELIZATION AND LOCALITY OPTIMIZATION Saeed Parsa, Mohammad Hamzei Department of Computer Engineering

More information

Polyhedral-Based Data Reuse Optimization for Configurable Computing

Polyhedral-Based Data Reuse Optimization for Configurable Computing Polyhedral-Based Data Reuse Optimization for Configurable Computing Louis-Noël Pouchet 1 Peng Zhang 1 P. Sadayappan 2 Jason Cong 1 1 University of California, Los Angeles 2 The Ohio State University February

More information

Fourier-Motzkin and Farkas Questions (HW10)

Fourier-Motzkin and Farkas Questions (HW10) Automating Scheduling Logistics Final report for project due this Friday, 5/4/12 Quiz 4 due this Monday, 5/7/12 Poster session Thursday May 10 from 2-4pm Distance students need to contact me to set up

More information

CS 293S Parallelism and Dependence Theory

CS 293S Parallelism and Dependence Theory CS 293S Parallelism and Dependence Theory Yufei Ding Reference Book: Optimizing Compilers for Modern Architecture by Allen & Kennedy Slides adapted from Louis-Noël Pouche, Mary Hall End of Moore's Law

More information

Autotuning. John Cavazos. University of Delaware UNIVERSITY OF DELAWARE COMPUTER & INFORMATION SCIENCES DEPARTMENT

Autotuning. John Cavazos. University of Delaware UNIVERSITY OF DELAWARE COMPUTER & INFORMATION SCIENCES DEPARTMENT Autotuning John Cavazos University of Delaware What is Autotuning? Searching for the best code parameters, code transformations, system configuration settings, etc. Search can be Quasi-intelligent: genetic

More information

Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework

Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework Louis-Noël Pouchet The Ohio State University pouchet@cse.ohio-state.edu Uday Bondhugula IBM T.J. Watson Research

More information

Tiling: A Data Locality Optimizing Algorithm

Tiling: A Data Locality Optimizing Algorithm Tiling: A Data Locality Optimizing Algorithm Announcements Monday November 28th, Dr. Sanjay Rajopadhye is talking at BMAC Friday December 2nd, Dr. Sanjay Rajopadhye will be leading CS553 Last Monday Kelly

More information

Vectorization in the Polyhedral Model

Vectorization in the Polyhedral Model Vectorzaton n the Polyhedral Model Lous-Noël Pouchet pouchet@cse.oho-state.edu Dept. of Computer Scence and Engneerng, the Oho State Unversty October 200 888. Introducton: Overvew Vectorzaton: Detecton

More information

Revisiting Loop Transformations with X10 Clocks. Tomofumi Yuki Inria / LIP / ENS Lyon X10 Workshop 2015

Revisiting Loop Transformations with X10 Clocks. Tomofumi Yuki Inria / LIP / ENS Lyon X10 Workshop 2015 Revisiting Loop Transformations with X10 Clocks Tomofumi Yuki Inria / LIP / ENS Lyon X10 Workshop 2015 The Problem n The Parallelism Challenge n cannot escape going parallel n parallel programming is hard

More information

Clint: A Direct Manipulation Tool for Parallelizing Compute-Intensive Program Parts

Clint: A Direct Manipulation Tool for Parallelizing Compute-Intensive Program Parts Clint: A Direct Manipulation Tool for Parallelizing Compute-Intensive Program Parts Oleksandr Zinenko, Stéphane Huot, Cédric Bastoul To cite this version: Oleksandr Zinenko, Stéphane Huot, Cédric Bastoul.

More information

Enabling more optimizations in GRAPHITE: ignoring memory-based dependences

Enabling more optimizations in GRAPHITE: ignoring memory-based dependences Enabling more optimizations in GRAPHITE: ignoring memory-based dependences Konrad Trifunovic, Albert Cohen To cite this version: Konrad Trifunovic, Albert Cohen. Enabling more optimizations in GRAPHITE:

More information

Loop Transformations, Dependences, and Parallelization

Loop Transformations, Dependences, and Parallelization Loop Transformations, Dependences, and Parallelization Announcements HW3 is due Wednesday February 15th Today HW3 intro Unimodular framework rehash with edits Skewing Smith-Waterman (the fix is in!), composing

More information

Affine Loop Optimization using Modulo Unrolling in CHAPEL

Affine Loop Optimization using Modulo Unrolling in CHAPEL Affine Loop Optimization using Modulo Unrolling in CHAPEL Aroon Sharma, Joshua Koehler, Rajeev Barua LTS POC: Michael Ferguson 2 Overall Goal Improve the runtime of certain types of parallel computers

More information

A Crash Course in Compilers for Parallel Computing. Mary Hall Fall, L2: Transforms, Reuse, Locality

A Crash Course in Compilers for Parallel Computing. Mary Hall Fall, L2: Transforms, Reuse, Locality A Crash Course in Compilers for Parallel Computing Mary Hall Fall, 2008 1 Overview of Crash Course L1: Data Dependence Analysis and Parallelization (Oct. 30) L2 & L3: Loop Reordering Transformations, Reuse

More information

Auto-vectorization using polyhedral compilation for an embedded ARM platform

Auto-vectorization using polyhedral compilation for an embedded ARM platform B.M. Nieuwenhuizen Auto-vectorization using polyhedral compilation for an embedded ARM platform Bachelorscriptie Scriptiebegeleiders: H.J. Hupkes T.P. Stefanov J.T. Zhai Datum Bachelorexamen: 20 augustus

More information

Integer Programming Theory

Integer Programming Theory Integer Programming Theory Laura Galli October 24, 2016 In the following we assume all functions are linear, hence we often drop the term linear. In discrete optimization, we seek to find a solution x

More information

Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies

Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies International Journal of Parallel Programming, Vol.??, No.?,??? 2006 ( c 2006) Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies Sylvain Girbal, 1 Nicolas Vasilache,

More information

Polyhedral Search Space Exploration in the ExaStencils Code Generator

Polyhedral Search Space Exploration in the ExaStencils Code Generator Preprint version before issue assignment Polyhedral Search Space Exploration in the ExaStencils Code Generator STEFAN KRONAWITTER and CHRISTIAN LENGAUER, University of Passau, Germany Performance optimization

More information

PolyOpt/C. A Polyhedral Optimizer for the ROSE compiler Edition 0.2, for PolyOpt/C March 12th Louis-Noël Pouchet

PolyOpt/C. A Polyhedral Optimizer for the ROSE compiler Edition 0.2, for PolyOpt/C March 12th Louis-Noël Pouchet PolyOpt/C A Polyhedral Optimizer for the ROSE compiler Edition 0.2, for PolyOpt/C 0.2.1 March 12th 2012 Louis-Noël Pouchet This manual is dedicated to PolyOpt/C version 0.2.1, a framework for Polyhedral

More information

PoCC. The Polyhedral Compiler Collection package Edition 0.3, for PoCC 1.2 February 18th Louis-Noël Pouchet

PoCC. The Polyhedral Compiler Collection package Edition 0.3, for PoCC 1.2 February 18th Louis-Noël Pouchet PoCC The Polyhedral Compiler Collection package Edition 0.3, for PoCC 1.2 February 18th 2013 Louis-Noël Pouchet This manual is dedicated to PoCC version 1.2, a flexible source-to-source compiler in the

More information

A Compiler Framework for Optimization of Affine Loop Nests for General Purpose Computations on GPUs

A Compiler Framework for Optimization of Affine Loop Nests for General Purpose Computations on GPUs A Compiler Framework for Optimization of Affine Loop Nests for General Purpose Computations on GPUs Muthu Manikandan Baskaran 1 Uday Bondhugula 1 Sriram Krishnamoorthy 1 J. Ramanujam 2 Atanas Rountev 1

More information

Alan LaMielle, Michelle Strout Colorado State University March 16, Technical Report CS

Alan LaMielle, Michelle Strout Colorado State University March 16, Technical Report CS Computer Science Technical Report Enabling Code Generation within the Sparse Polyhedral Framework Alan LaMielle, Michelle Strout Colorado State University {lamielle,mstrout@cs.colostate.edu March 16, 2010

More information

PLuTo: A Practical and Fully Automatic Polyhedral Program Optimization System

PLuTo: A Practical and Fully Automatic Polyhedral Program Optimization System PLuTo: A Practical and Fully Automatic Polyhedral Program Optimization System Uday Bondhugula J. Ramanujam P. Sadayappan Dept. of Computer Science and Engineering Dept. of Electrical & Computer Engg. and

More information

Loop Transformations! Part II!

Loop Transformations! Part II! Lecture 9! Loop Transformations! Part II! John Cavazos! Dept of Computer & Information Sciences! University of Delaware! www.cis.udel.edu/~cavazos/cisc879! Loop Unswitching Hoist invariant control-flow

More information

Predictive Modeling in a Polyhedral Optimization Space

Predictive Modeling in a Polyhedral Optimization Space Noname manuscript No. (will be inserted by the editor) Predictive Modeling in a Polyhedral Optimization Space Eunjung Park 1 John Cavazos 1 Louis-Noël Pouchet 2,3 Cédric Bastoul 4 Albert Cohen 5 P. Sadayappan

More information

6.189 IAP Lecture 11. Parallelizing Compilers. Prof. Saman Amarasinghe, MIT IAP 2007 MIT

6.189 IAP Lecture 11. Parallelizing Compilers. Prof. Saman Amarasinghe, MIT IAP 2007 MIT 6.189 IAP 2007 Lecture 11 Parallelizing Compilers 1 6.189 IAP 2007 MIT Outline Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization Opportunities Generation of Parallel

More information

The Polytope Model: Past, Present, Future

The Polytope Model: Past, Present, Future The Polytope Model: Past, Present, Future Paul Feautrier ENS de Lyon Paul.Feautrier@ens-lyon.fr 8 octobre 2009 1 / 39 What is a Model? What is a Polytope? Basis of the Polytope Model Fundamental Algorithms

More information

Predic've Modeling in a Polyhedral Op'miza'on Space

Predic've Modeling in a Polyhedral Op'miza'on Space Predic've Modeling in a Polyhedral Op'miza'on Space Eunjung EJ Park 1, Louis- Noël Pouchet 2, John Cavazos 1, Albert Cohen 3, and P. Sadayappan 2 1 University of Delaware 2 The Ohio State University 3

More information

A Compiler Framework for Optimization of Affine Loop Nests for GPGPUs

A Compiler Framework for Optimization of Affine Loop Nests for GPGPUs A Compiler Framework for Optimization of Affine Loop Nests for GPGPUs Muthu Manikandan Baskaran Department of Computer Science and Engg. The Ohio State University baskaran@cse.ohiostate.edu J. Ramanujam

More information

A polyhedral loop transformation framework for parallelization and tuning

A polyhedral loop transformation framework for parallelization and tuning A polyhedral loop transformation framework for parallelization and tuning Ohio State University Uday Bondhugula, Muthu Baskaran, Albert Hartono, Sriram Krishnamoorthy, P. Sadayappan Argonne National Laboratory

More information

Offload acceleration of scientific calculations within.net assemblies

Offload acceleration of scientific calculations within.net assemblies Offload acceleration of scientific calculations within.net assemblies Lebedev A. 1, Khachumov V. 2 1 Rybinsk State Aviation Technical University, Rybinsk, Russia 2 Institute for Systems Analysis of Russian

More information

Polly Polyhedral Optimizations for LLVM

Polly Polyhedral Optimizations for LLVM Polly Polyhedral Optimizations for LLVM Tobias Grosser - Hongbin Zheng - Raghesh Aloor Andreas Simbürger - Armin Grösslinger - Louis-Noël Pouchet April 03, 2011 Polly - Polyhedral Optimizations for LLVM

More information

Page 1. Parallelization techniques. Dependence graph. Dependence Distance and Distance Vector

Page 1. Parallelization techniques. Dependence graph. Dependence Distance and Distance Vector Parallelization techniques The parallelization techniques for loops normally follow the following three steeps: 1. Perform a Data Dependence Test to detect potential parallelism. These tests may be performed

More information

FADA : Fuzzy Array Dataflow Analysis

FADA : Fuzzy Array Dataflow Analysis FADA : Fuzzy Array Dataflow Analysis M. Belaoucha, D. Barthou, S. Touati 27/06/2008 Abstract This document explains the basis of fuzzy data dependence analysis (FADA) and its applications on code fragment

More information

Polar Duality and Farkas Lemma

Polar Duality and Farkas Lemma Lecture 3 Polar Duality and Farkas Lemma October 8th, 2004 Lecturer: Kamal Jain Notes: Daniel Lowd 3.1 Polytope = bounded polyhedron Last lecture, we were attempting to prove the Minkowsky-Weyl Theorem:

More information

Polly - Polyhedral optimization in LLVM

Polly - Polyhedral optimization in LLVM Polly - Polyhedral optimization in LLVM Tobias Grosser Universität Passau Ohio State University grosser@fim.unipassau.de Andreas Simbürger Universität Passau andreas.simbuerger@unipassau.de Hongbin Zheng

More information

Systems of Inequalities

Systems of Inequalities Systems of Inequalities 1 Goals: Given system of inequalities of the form Ax b determine if system has an integer solution enumerate all integer solutions 2 Running example: Upper bounds for x: (2)and

More information

Data Dependences and Parallelization

Data Dependences and Parallelization Data Dependences and Parallelization 1 Agenda Introduction Single Loop Nested Loops Data Dependence Analysis 2 Motivation DOALL loops: loops whose iterations can execute in parallel for i = 11, 20 a[i]

More information

Outline. Why Parallelism Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization Opportunities

Outline. Why Parallelism Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization Opportunities Parallelization Outline Why Parallelism Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization Opportunities Moore s Law From Hennessy and Patterson, Computer Architecture:

More information

Lecture 9 Basic Parallelization

Lecture 9 Basic Parallelization Lecture 9 Basic Parallelization I. Introduction II. Data Dependence Analysis III. Loop Nests + Locality IV. Interprocedural Parallelization Chapter 11.1-11.1.4 CS243: Parallelization 1 Machine Learning

More information

Lecture 9 Basic Parallelization

Lecture 9 Basic Parallelization Lecture 9 Basic Parallelization I. Introduction II. Data Dependence Analysis III. Loop Nests + Locality IV. Interprocedural Parallelization Chapter 11.1-11.1.4 CS243: Parallelization 1 Machine Learning

More information

Static and Dynamic Frequency Scaling on Multicore CPUs

Static and Dynamic Frequency Scaling on Multicore CPUs Static and Dynamic Frequency Scaling on Multicore CPUs Wenlei Bao 1 Changwan Hong 1 Sudheer Chunduri 2 Sriram Krishnamoorthy 3 Louis-Noël Pouchet 4 Fabrice Rastello 5 P. Sadayappan 1 1 The Ohio State University

More information

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7 Compiler Optimizations Chapter 8, Section 8.5 Chapter 9, Section 9.1.7 2 Local vs. Global Optimizations Local: inside a single basic block Simple forms of common subexpression elimination, dead code elimination,

More information

More Data Locality for Static Control Programs on NUMA Architectures

More Data Locality for Static Control Programs on NUMA Architectures More Data Locality for Static Control Programs on NUMA Architectures Adilla Susungi 1, Albert Cohen 2, Claude Tadonki 1 1 MINES ParisTech, PSL Research University 2 Inria and DI, Ecole Normale Supérieure

More information

Computing the Integer Points of a Polyhedron

Computing the Integer Points of a Polyhedron Computing the Integer Points of a Polyhedron Complexity Estimates Rui-Juan Jing 1,2 and Marc Moreno Maza 2,3 1 Key Laboratoty of Mathematics Mechnization, Academy of Mathematics and Systems Science, Chinese

More information

Generation of parallel synchronization-free tiled code

Generation of parallel synchronization-free tiled code Computing (2018) 100:277 302 https://doi.org/10.1007/s00607-017-0576-3 Generation of parallel synchronization-free tiled code Wlodzimierz Bielecki 1 Marek Palkowski 1 Piotr Skotnicki 1 Received: 22 August

More information

Compiling for Advanced Architectures

Compiling for Advanced Architectures Compiling for Advanced Architectures In this lecture, we will concentrate on compilation issues for compiling scientific codes Typically, scientific codes Use arrays as their main data structures Have

More information

Polyhedral Optimizations of Explicitly Parallel Programs

Polyhedral Optimizations of Explicitly Parallel Programs Habanero Extreme Scale Software Research Group Department of Computer Science Rice University The 24th International Conference on Parallel Architectures and Compilation Techniques (PACT) October 19, 2015

More information

Transforming Imperfectly Nested Loops

Transforming Imperfectly Nested Loops Transforming Imperfectly Nested Loops 1 Classes of loop transformations: Iteration re-numbering: (eg) loop interchange Example DO 10 J = 1,100 DO 10 I = 1,100 DO 10 I = 1,100 vs DO 10 J = 1,100 Y(I) =

More information

TOBIAS GROSSER Parkas Group, Computer Science Department, Ècole Normale Supèrieure / INRIA 45 Rue d Ulm, Paris, 75005, France

TOBIAS GROSSER Parkas Group, Computer Science Department, Ècole Normale Supèrieure / INRIA 45 Rue d Ulm, Paris, 75005, France Parallel Processing Letters c World Scientific Publishing Company POLLY PERFORMING POLYHEDRAL OPTIMIZATIONS ON A LOW-LEVEL INTERMEDIATE REPRESENTATION TOBIAS GROSSER Parkas Group, Computer Science Department,

More information

Putting Automatic Polyhedral Compilation for GPGPU to Work

Putting Automatic Polyhedral Compilation for GPGPU to Work Putting Automatic Polyhedral Compilation for GPGPU to Work Soufiane Baghdadi 1, Armin Größlinger 2,1, and Albert Cohen 1 1 INRIA Saclay and LRI, Paris-Sud 11 University, France {soufiane.baghdadi,albert.cohen@inria.fr

More information

Integer Programming ISE 418. Lecture 7. Dr. Ted Ralphs

Integer Programming ISE 418. Lecture 7. Dr. Ted Ralphs Integer Programming ISE 418 Lecture 7 Dr. Ted Ralphs ISE 418 Lecture 7 1 Reading for This Lecture Nemhauser and Wolsey Sections II.3.1, II.3.6, II.4.1, II.4.2, II.5.4 Wolsey Chapter 7 CCZ Chapter 1 Constraint

More information

Affine and Unimodular Transformations for Non-Uniform Nested Loops

Affine and Unimodular Transformations for Non-Uniform Nested Loops th WSEAS International Conference on COMPUTERS, Heraklion, Greece, July 3-, 008 Affine and Unimodular Transformations for Non-Uniform Nested Loops FAWZY A. TORKEY, AFAF A. SALAH, NAHED M. EL DESOUKY and

More information

Linear Loop Transformations for Locality Enhancement

Linear Loop Transformations for Locality Enhancement Linear Loop Transformations for Locality Enhancement 1 Story so far Cache performance can be improved by tiling and permutation Permutation of perfectly nested loop can be modeled as a linear transformation

More information

The Challenges of Non-linear Parameters and Variables in Automatic Loop Parallelisation

The Challenges of Non-linear Parameters and Variables in Automatic Loop Parallelisation The Challenges of Non-linear Parameters and Variables in Automatic Loop Parallelisation Armin Größlinger December 2, 2009 Rigorosum Fakultät für Informatik und Mathematik Universität Passau Automatic Loop

More information

Lecture 11 Loop Transformations for Parallelism and Locality

Lecture 11 Loop Transformations for Parallelism and Locality Lecture 11 Loop Transformations for Parallelism and Locality 1. Examples 2. Affine Partitioning: Do-all 3. Affine Partitioning: Pipelining Readings: Chapter 11 11.3, 11.6 11.7.4, 11.9-11.9.6 1 Shared Memory

More information

Cache Aware Optimization of Stream Programs

Cache Aware Optimization of Stream Programs Cache Aware Optimization of Stream Programs Janis Sermulins, William Thies, Rodric Rabbah and Saman Amarasinghe LCTES Chicago, June 2005 Streaming Computing Is Everywhere! Prevalent computing domain with

More information

Investigating Mixed-Integer Hulls using a MIP-Solver

Investigating Mixed-Integer Hulls using a MIP-Solver Investigating Mixed-Integer Hulls using a MIP-Solver Matthias Walter Otto-von-Guericke Universität Magdeburg Joint work with Volker Kaibel (OvGU) Aussois Combinatorial Optimization Workshop 2015 Outline

More information

Automatic Counterflow Pipeline Synthesis

Automatic Counterflow Pipeline Synthesis Automatic Counterflow Pipeline Synthesis Bruce R. Childers, Jack W. Davidson Computer Science Department University of Virginia Charlottesville, Virginia 22901 {brc2m, jwd}@cs.virginia.edu Abstract The

More information

Effective Automatic Parallelization and Locality Optimization Using The Polyhedral Model

Effective Automatic Parallelization and Locality Optimization Using The Polyhedral Model Effective Automatic Parallelization and Locality Optimization Using The Polyhedral Model DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate

More information

GXBIT: COMBINING POLYHEDRAL MODEL WITH DYNAMIC BINARY TRANSLATION

GXBIT: COMBINING POLYHEDRAL MODEL WITH DYNAMIC BINARY TRANSLATION GXBIT: COMBINING POLYHEDRAL MODEL WITH DYNAMIC BINARY TRANSLATION 1 ZHANG KANG, 2 ZHOU FANFU AND 3 LIANG ALEI 1 China Telecommunication, Shanghai, China 2 Department of Computer Science and Engineering,

More information

Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory

Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory Roshan Dathathri Thejas Ramashekar Chandan Reddy Uday Bondhugula Department of Computer Science and Automation

More information

Static Data Race Detection for SPMD Programs via an Extended Polyhedral Representation

Static Data Race Detection for SPMD Programs via an Extended Polyhedral Representation via an Extended Polyhedral Representation Habanero Extreme Scale Software Research Group Department of Computer Science Rice University 6th International Workshop on Polyhedral Compilation Techniques (IMPACT

More information

Mathematical and Algorithmic Foundations Linear Programming and Matchings

Mathematical and Algorithmic Foundations Linear Programming and Matchings Adavnced Algorithms Lectures Mathematical and Algorithmic Foundations Linear Programming and Matchings Paul G. Spirakis Department of Computer Science University of Patras and Liverpool Paul G. Spirakis

More information

Abstract Acceleration of General Linear Loops

Abstract Acceleration of General Linear Loops Abstract Acceleration of General Linear Loops Bertrand Jeannet, Peter Schrammel, Sriram Sankaranarayanan Principles of Programming Languages, POPL 14 San Diego, CA Motivation and Challenge Motivation Inferring

More information

A Polyhedral AST generation is more than scanning polyhedra

A Polyhedral AST generation is more than scanning polyhedra A Polyhedral AST generation is more than scanning polyhedra GROSSER TOBIAS, INRIA and ÉCOLE NORMALE SUPÉRIEURE 1 VERDOOLAEGE SVEN, INRIA, ÉCOLE NORMALE SUPÉRIEURE and KU Leuven2 COHEN ALBERT, INRIA and

More information

Language and Compiler Parallelization Support for Hashtables

Language and Compiler Parallelization Support for Hashtables Language Compiler Parallelization Support for Hashtables A Project Report Submitted in partial fulfilment of the requirements for the Degree of Master of Engineering in Computer Science Engineering by

More information

Polyhedral Computation Today s Topic: The Double Description Algorithm. Komei Fukuda Swiss Federal Institute of Technology Zurich October 29, 2010

Polyhedral Computation Today s Topic: The Double Description Algorithm. Komei Fukuda Swiss Federal Institute of Technology Zurich October 29, 2010 Polyhedral Computation Today s Topic: The Double Description Algorithm Komei Fukuda Swiss Federal Institute of Technology Zurich October 29, 2010 1 Convexity Review: Farkas-Type Alternative Theorems Gale

More information

Parallelization. Saman Amarasinghe. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Parallelization. Saman Amarasinghe. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Spring 2 Parallelization Saman Amarasinghe Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Outline Why Parallelism Parallel Execution Parallelizing Compilers

More information

Basic Algorithms for Periodic-Linear Inequalities and Integer Polyhedra

Basic Algorithms for Periodic-Linear Inequalities and Integer Polyhedra Basic Algorithms for Periodic-Linear Inequalities and Integer Polyhedra Alain Ketterlin / Camus IMPACT 2018: January, 23, 2018 Motivation Periodic-Linear Inequalities The Omicron Test Decomposition Motivation

More information

Lattice-Based Memory Allocation

Lattice-Based Memory Allocation Lattice-Based Memory Allocation Gilles Villard CNRS, Laboratoire LIP, ENS Lyon France Joint work with Alain Darte (CNRS, LIP) and Rob Schreiber (HP Labs) Int. Conf. Compilers, Architecture and Synthesis

More information

Polyhedral Operations. Algorithms needed for automation. Logistics

Polyhedral Operations. Algorithms needed for automation. Logistics Polyhedral Operations Logistics Intermediate reports late deadline is Friday March 30 at midnight HW6 (posted) and HW7 (posted) due April 5 th Tuesday April 4 th, help session during class with Manaf,

More information

Facilitating the Search for Compositions of Program Transformations

Facilitating the Search for Compositions of Program Transformations Facilitating the Search for Compositions of Program Transformations Albert Cohen 1 Sylvain Girbal 12 David Parello 13 Marc Sigler 1 Olivier Temam 1 Nicolas Vasilache 1 1 ALCHEMY Group, INRIA Futurs and

More information

Language and compiler parallelization support for Hash tables

Language and compiler parallelization support for Hash tables Language compiler parallelization support for Hash tables Arjun Suresh Advisor: Dr. UDAY KUMAR REDDY B. Department of Computer Science & Automation Indian Institute of Science, Bangalore Bengaluru, India.

More information

Exploring Parallelism At Different Levels

Exploring Parallelism At Different Levels Exploring Parallelism At Different Levels Balanced composition and customization of optimizations 7/9/2014 DragonStar 2014 - Qing Yi 1 Exploring Parallelism Focus on Parallelism at different granularities

More information

Combinatorial Geometry & Topology arising in Game Theory and Optimization

Combinatorial Geometry & Topology arising in Game Theory and Optimization Combinatorial Geometry & Topology arising in Game Theory and Optimization Jesús A. De Loera University of California, Davis LAST EPISODE... We discuss the content of the course... Convex Sets A set is

More information

MANY signal processing systems, particularly in the multimedia

MANY signal processing systems, particularly in the multimedia 1304 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 9, SEPTEMBER 2009 Signal Assignment to Hierarchical Memory Organizations for Embedded Multidimensional Signal Processing

More information