Program transformations and optimizations in the polyhedral framework
|
|
- Florence Brittany Skinner
- 5 years ago
- Views:
Transcription
1 Program transformations and optimizations in the polyhedral framework Louis-Noël Pouchet Dept. of Computer Science UCLA May 14, st Polyhedral Spring School St-Germain au Mont d or, France
2 : Overview UCLA 2
3 Overview & Motivation: Agenda for the Lecture Disclaimer: this lecture is (mainly) for new Ph.D students 1 Program Representation in the Polyhedral Model 2 Applying loop transformations in the model 3 Exact data dependence representation 4 Scheduling using Farkas-based methods 5 Scheduling using approximate data dependences If time permits... UCLA 3
4 Overview & Motivation: Running Example: DGEMM Example (dgemm) /* C := alpha*a*b + beta*c */ for (i = 0; i < ni; i++) for (j = 0; j < nj; j++) S1: C[i][j] *= beta; for (i = 0; i < ni; i++) for (j = 0; j < nj; j++) for (k = 0; k < nk; ++k) S2: C[i][j] += alpha * A[i][k] * B[k][j]; Loop transformation: permute(i,k,s2) Execution time (in s) on this laptop, GCC 4.2, ni=nj=nk=512 version -O0 -O1 -O2 -O3 -vec original permute UCLA 4
5 Overview & Motivation: Running Example: fdtd-2d Example (fdtd-2d) } for(t = 0; t < tmax; t++) { for (j = 0; j < ny; j++) ey[0][j] = _edge_[t]; for (i = 1; i < nx; i++) for (j = 0; j < ny; j++) ey[i][j] = ey[i][j] - 0.5*(hz[i][j]-hz[i-1][j]); for (i = 0; i < nx; i++) for (j = 1; j < ny; j++) ex[i][j] = ex[i][j] - 0.5*(hz[i][j]-hz[i][j-1]); for (i = 0; i < nx - 1; i++) for (j = 0; j < ny - 1; j++) hz[i][j] = hz[i][j] - 0.7* (ex[i][j+1] - ex[i][j] + ey[i+1][j]-ey[i][j]); Loop transformation: polyhedralopt(fdtd-2d) Execution time (in s) on this laptop, GCC 4.2, 64x1024x1024 version -O0 -O1 -O2 -O3 -vec original polyhedralopt UCLA 5
6 Overview & Motivation: Manual transformations? Example (fdtd-2d) } for(t = 0; t < tmax; t++) { for (j = 0; j < ny; j++) ey[0][j] = _edge_[t]; for (i = 1; i < nx; i++) for (j = 0; j < ny; j++) ey[i][j] = ey[i][j] - 0.5*(hz[i][j]-hz[i-1][j]); for (i = 0; i < nx; i++) for (j = 1; j < ny; j++) ex[i][j] = ex[i][j] - 0.5*(hz[i][j]-hz[i][j-1]); for (i = 0; i < nx - 1; i++) for (j = 0; j < ny - 1; j++) hz[i][j] = hz[i][j] - 0.7* (ex[i][j+1] - ex[i][j] + ey[i+1][j]-ey[i][j]); UCLA 6
7 Overview & Motivation: NO NO NO!!! Example (fdtd-2d tiled) for (c0 = 0; c0 <= (((ny + 2 * tmax + -3) * 32 < 0?((32 < 0?-((-(ny + 2 * tmax + -3) ) / 32) : -((-(ny + 2 * tmax + -3) ) / 32))) : (ny + 2 * tmax + -3) / 32)); ++c0) { #pragma omp parallel for private(c3, c4, c2, c5) for (c1 = (((c0 * 2 < 0?-(-c0 / 2) : ((2 < 0?(-c ) / -2 : (c ) / 2)))) > (((32 * c0 + -tmax + 1) * 32 < 0?-(-(32 * c0 + -tmax + 1) / 32) : ((32 < 0?(-(32 * c0 + -tmax + 1) ) / -32 : (32 * c0 + -tmax ) / 32))))?((c0 * 2 < 0?-(-c0 / 2) : ((2 < 0?(-c ) / -2 : (c ) / 2)))) : (((32 * c0 + -tmax + 1) * 32 < 0?-(-(32 * c0 + -tmax + 1) / 32) : ((32 < 0?(-(32 * c0 + -tmax + 1) ) / -32 : (32 * c0 + -tmax ) / 32))))); c1 <= (((((((ny + tmax + -2) * 32 < 0?((32 < 0?-((-(ny + tmax + -2) ) / 32) : -((-(ny + tmax + -2) ) / 32))) : (ny + tmax + -2) / 32)) < (((32 * c0 + ny + 30) * 64 < 0?((64 < 0?-((-(32 * c0 + ny + 30) ) / 64) : -((-(32 * c0 + ny + 30) ) / 64))) : (32 * c0 + ny + 30) / 64))?(((ny + tmax + -2) * 32 < 0?((32 < 0?-((-(ny + tmax + -2) ) / 32) : -((-(ny + tmax + -2) ) / 32))) : (ny + tmax + -2) / 32)) : (((32 * c0 + ny + 30) * 64 < 0?((64 < 0?-((-(32 * c0 + ny + 30) ) / 64) : -((-(32 * c0 + ny + 30) ) / 64))) : (32 * c0 + ny + 30) / 64)))) < c0?(((((ny + tmax + -2) * 32 < 0?((32 < 0?-((-(ny + tmax + -2) ) / 32) : -((-(ny + tmax + -2) ) / 32))) : (ny + tmax + -2) / 32)) < (((32 * c0 + ny + 30) * 64 < 0?((64 < 0?-((-(32 * c0 + ny + 30) ) / 64) : -((-(32 * c0 + ny + 30) ) / 64))) : (32 * c0 + ny + 30) / 64))?(((ny + tmax + -2) * 32 < 0?((32 < 0?-((-(ny + tmax + -2) ) / 32) : -((-(ny + tmax + -2) ) / 32))) : (ny + tmax + -2) / 32)) : (((32 * c0 + ny + 30) * 64 < 0?((64 < 0?-((-(32 * c0 + ny + 30) ) / 64) : -((-(32 * c0 + ny + 30) ) / 64))) : (32 * c0 + ny + 30) / 64)))) : c0)); ++c1) { for (c2 = c0 + -c1; c2 <= (((((tmax + nx + -2) * 32 < 0?((32 < 0?-((-(tmax + nx + -2) ) / 32) : -((-(tmax + nx + -2) ) / 32))) : (tmax + nx + -2) / 32)) < (((32 * c * c1 + nx + 30) * 32 < 0?((32 < 0?-((-(32 * c * c1 + nx + 30) ) / 32) : -((-(32 * c * c1 + nx + 30) ) / 32))) : (32 * c * c1 + nx + 30) / 32))?(((tmax + nx + -2) * 32 < 0?((32 < 0?-((-(tmax + nx + -2) ) / 32) : -((-(tmax + nx + -2) ) / 32))) : (tmax + nx + -2) / 32)) : (((32 * c * c1 + nx + 30) * 32 < 0?((32 < 0?-((-(32 * c * c1 + nx + 30) ) / 32) : -((-(32 * c * c1 + nx + 30) ) / 32))) : (32 * c * c1 + nx + 30) / 32)))); ++c2) { if (c0 == 2 * c1 && c0 == 2 * c2) { for (c3 = 16 * c0; c3 <= ((tmax + -1 < 16 * c0 + 30?tmax + -1 : 16 * c0 + 30)); ++c3) if (c0 % 2 == 0) (ey[0])[0] = (_edge_[c3]);... (200 more lines!) Performance gain: 2-6 on modern multicore platforms UCLA 6
8 Overview & Motivation: Loop Transformations in Production Compilers Limitations of standard syntactic frameworks: Composition of transformations may be tedious composability rules / applicability Parametric loop bounds, imperfectly nested loops are challenging Look at the examples! Approximate dependence analysis Miss parallelization opportunities (among many others) (Very) conservative performance models UCLA 7
9 Overview & Motivation: Achievements of Polyhedral Compilation The polyhedral model: Model/apply seamlessly arbitrary compositions of transformations Automatic handling of imperfectly nested, parametric loop structures Many loop transformation can be modeled Exact dependence analysis on a class of programs Unleash the power of automatic parallelization Aggressive multi-objective program restructuring (parallelism, SIMD, cache, etc.) Requires computationally expensive algorithms Usually NP-complete / exponential complexity Requires careful problem statement/representation UCLA 8
10 Overview & Motivation: Research around Polyhedral Compilation "first generation" Provided the theoretical foundations Focused on high-level/abstract objectives Was questioned about how practical is this model "second generation" Provided practical / "more scalable" algorithms Focused on concrete performance improvement Still often questioned about how practical is this model Thanks for the legacy! ;-) UCLA 9
11 Overview & Motivation: Perspective on 25 Years of Computing "first generation" Parallel machines were... rare Era of increased program performance for free Intel 486 DX2: 54 MIPS, 66MHz "second generation" Parallel machines are ubiquitous (multi-core, SIMD, GPUs) Very difficult to get good performance Intel Core i7: MIPS, 3.33 GHz UCLA 10
12 Overview & Motivation: Perspective on 25 Years of Computing "first generation" Parallel machines were... rare Era of increased program performance for free Intel 486 DX2: 54 MIPS, 66MHz "second generation" Parallel machines are ubiquitous (multi-core, SIMD, GPUs) Very difficult to get good performance Intel Core i7: MIPS, 3.33 GHz More complex problem can be solved, but more complex objectives are needed anyway UCLA 10
13 Overview & Motivation: Disclaimer This lecture is: Partial polyhedral model affine scheduling - Google Scholar 5/14/13 12:57 AM Web Images More louisnoel.pouchet@gmail.com polyhedral model affine scheduling Scholar About 3,520 results (0.06 sec) My Citations 3 Articles Iterative optimization in the polyhedral model: Part II, multidimensional time LN Pouchet, C Bastoul, A Cohen, J Cavazos - ACM SIGPLAN Notices, dl.acm.org Legal documents loops) where iterative affine schedul- ing has never been attempted before. 4. We demonstrate significant performance improvements on 3 different architectures. The paper is structured as follows. Section 2 introduces multi- dimensional scheduling in the polyhedral model.... Any time Cited by 80 Related articles All 20 versions Cite Since 2013 Since 2012 Iterative optimization in the polyhedral model: Part I, one-dimensional time Since 2009 LN Pouchet, C Bastoul, A Cohen - Code Generation and, ieeexplore.ieee.org Custom range that can be ex- pressed as one-dimensional affine schedules (using a polyhedral abstraction); this... ing approaches on loop transformation sequences, or state-of-the-art affine scheduling; such smaller... of the optimal code, a confirmation that building a predictive model for loop... Sort by relevance Cited by 85 Related articles All 29 versions Cite Sort by date Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model include patents U Bondhugula, M Baskaran, S Krishnamoorthy - Compiler, Springer include citations... The polyhedral model provides powerful abstractions to optimize loop nests with regular accesses. Affine transformations in this model capture a com- plex sequence of execution... Although a significant body of research has addressed affine scheduling and partitioning, the... Create alert Cited by 60 Related articles All 19 versions Cite Biased (strongly!) Provocative at times (hopefully ;-) ) [PDF] from inria.fr [PDF] from inria.fr [PDF] from ohio-state.edu Memory reuse analysis in the polyhedral model [PDF] from psu.edu D Wilde, S Rajopadhye - Euro-Par'96 Parallel Processing, Springer... Currently, the user chooses the transformations based on the analyses enabled by the polyhedral UCLA model, specifies it... Scheduling: There has been much research on the SARE scheduling problem 11 [9, 17, 8, i, j) - i + j and tx (i) = 2i are valid (one dimensional) affine schedules for... Cited by 38 Related articles All 10 versions Cite
14 Overview & Motivation: Polyhedral Program Representation UCLA 12
15 Polyhedral Program Representation: Basics Example: DGEMM Example (dgemm) /* C := alpha*a*b + beta*c */ for (i = 0; i < ni; i++) for (j = 0; j < nj; j++) { S1: C[i][j] *= beta; for (k = 0; k < nk; ++k) S2: C[i][j] += alpha * A[i][k] * B[k][j]; } This code has: imperfectly nested loops multiple statements parametric loop bounds UCLA 13
16 Polyhedral Program Representation: Basics Granularity of Program Representation DGEMM has: 3 loops For loops in the code, while loops Control-flow graph analysis 2 (syntactic) statements Input source code? Basic block? ASM instructions? S1 is executed ni nj times dynamic instances of the statement Does not (necessarily) correspond to reality! UCLA 14
17 Polyhedral Program Representation: Basics Some Observations Reasoning at the loop/statement level: Some loop transformation may be very difficult to implement How to fuse loops with different loop bounds? How to permute triangular loops? How to unroll-and-jam triangular loops? How to apply time-tiling?... Statements may operate on the same array while being independent UCLA 15
18 Polyhedral Program Representation: Basics Some Motivations for Polyhedral Transformations Known problem: scheduling of task graph Obvious limitations: task graph is not finite / size depends on problem / effective code generation almost impossible Alternative approach: use loop transformations solve all above limitation BUT the problem is to find a sequence that implements the order we want AND also how to apply/compose them Desired features: ability to reason at the instance level (as for task graph scheduling) ability to easily apply/compose loop transformations UCLA 16
19 The Polyhedral Model: The Polyhedral Model UCLA 17
20 The Polyhedral Model: Overview Polyhedral Program Optimization: a Three-Stage Process 1 Analysis: from code to model Existing prototype tools URUK, Omega, ChiLL... ISL, Loopo PolyOpt/PoCC (Clan-Candl-LetSee-Pluto-Ponos-Cloog-Polylib-PIPLib-ISL-FM-...) GCC GRAPHITE, LLVM Polly (now in mainstream) Reservoir Labs R-Stream, IBM XL/Poly UCLA 18
21 The Polyhedral Model: Overview Polyhedral Program Optimization: a Three-Stage Process 1 Analysis: from code to model Existing prototype tools URUK, Omega, ChiLL... ISL, Loopo PolyOpt/PoCC (Clan-Candl-LetSee-Pluto-Ponos-Cloog-Polylib-PIPLib-ISL-FM-...) GCC GRAPHITE, LLVM Polly (now in mainstream) Reservoir Labs R-Stream, IBM XL/Poly 2 Transformation in the model Build and select a program transformation UCLA 18
22 The Polyhedral Model: Overview Polyhedral Program Optimization: a Three-Stage Process 1 Analysis: from code to model Existing prototype tools URUK, Omega, ChiLL... ISL, Loopo PolyOpt/PoCC (Clan-Candl-LetSee-Pluto-Ponos-Cloog-Polylib-PIPLib-ISL-FM-...) GCC GRAPHITE, LLVM Polly (now in mainstream) Reservoir Labs R-Stream, IBM XL/Poly 2 Transformation in the model Build and select a program transformation 3 Code generation: from model to code "Apply" the transformation in the model Regenerate syntactic (AST-based) code UCLA 18
23 The Polyhedral Model: Overview Motivating Example [1/2] Example for (i = 0; i <= 1; ++i) for (j = 0; j <= 2; ++j) A[i][j] = i * j; Program execution: 1: A[0][0] = 0 * 0; 2: A[0][1] = 0 * 1; 3: A[0][2] = 0 * 2; 4: A[1][0] = 1 * 0; 5: A[1][1] = 1 * 1; 6: A[1][2] = 1 * 2; UCLA 19
24 The Polyhedral Model: Overview Motivating Example [2/2] A few observations: Statement is executed 6 times There is a different values for i, j associated to these 6 instances There is an order on them (the execution order) A rough analogy: polyhedral compilation is about (statically) scheduling tasks, where tasks are statement instances, or operations UCLA 20
25 The Polyhedral Model: Overview Polyhedral Representation of Programs Static Control Parts Loops have affine control only (over-approximation otherwise) UCLA 21
26 The Polyhedral Model: Overview Polyhedral Representation of Programs Static Control Parts Loops have affine control only (over-approximation otherwise) Iteration domain: represented as integer polyhedra for (i=1; i<=n; ++i). for (j=1; j<=n; ++j).. if (i<=n-j+2)... s[i] =... D S1 = i j n 1 0 UCLA 21
27 The Polyhedral Model: Overview Polyhedral Representation of Programs Static Control Parts Loops have affine control only (over-approximation otherwise) Iteration domain: represented as integer polyhedra Memory accesses: static references, represented as affine functions of x S and p for (i=0; i<n; ++i) {. s[i] = 0;. for (j=0; j<n; ++j).. s[i] = s[i]+a[i][j]*x[j]; } f s ( x S2 ) = [ ]. [ f a ( x S2 ) = ]. f x ( x S2 ) = [ ]. x S2 n 1 x S2 n 1 x S2 n 1 UCLA 21
28 The Polyhedral Model: Overview Polyhedral Representation of Programs Static Control Parts Loops have affine control only (over-approximation otherwise) Iteration domain: represented as integer polyhedra Memory accesses: static references, represented as affine functions of x S and p Data dependence between S1 and S2: a subset of the Cartesian product of D S1 and D S2 (exact analysis) for (i=1; i<=3; ++i) {. s[i] = 0;. for (j=1; j<=3; ++j).. s[i] = s[i] + 1; } D S1δS2 : i S1. i S2 j S2 1 = 0 0 S1 iterations S2 iterations i UCLA 21
29 The Polyhedral Model: Overview Math Corner UCLA 22
30 The Polyhedral Model: Math Corner The Affine Qualifier Definition (Affine function) A function f : K m K n is affine if there exists a vector b K n and a matrix A K m n such that: x K m, f ( x) = A x + b Definition (Affine half-space) An affine half-space of K m (affine constraint) is defined as the set of points: { x K m a. x b} UCLA 23
31 The Polyhedral Model: Math Corner Polyhedron (Implicit Representation) Definition (Polyhedron) A set S K m is a polyhedron if there exists a system of a finite number of inequalities A x b such that: P = { x K m A x b} Equivalently, it is the intersection of finitely many half-spaces. Definition (Polytope) A polytope is a bounded polyhedron. UCLA 24
32 The Polyhedral Model: Math Corner Integer Polyhedron Definition (Z-polyhedron) It is a polyhedron where all its extreme points are integer valued Definition (Integer hull) The integer hull of a rational polyhedron P is the largest set of integer points such that each of these points is in P. For the moment, we will "say" an integer polyhedron is a polyhedron of integer points (language abuse) UCLA 25
33 The Polyhedral Model: Math Corner Rational and Integer Polytopes UCLA 26
34 The Polyhedral Model: Math Corner Another View of Polyhedra The dual representation models a polyhedron as a combination of lines L and rays R (forming the polyhedral cone) and vertices V (forming the polytope) Definition (Dual representation) P : { x Q n x = L λ + R µ + V ν, µ 0, ν 0, ν i = 1} i Definition (Face) A face F of P is the intersection of P with a supporting hyperplane of P. We have: dim(f ) dim(p ) Definition (Facet) A facet F of P is a face of P such that: dim(f ) = dim(p ) 1 UCLA 27
35 The Polyhedral Model: Math Corner Some Equivalence Properties Theorem (Fundamental Theorem on Polyhedral Decomposition) If P is a polyhedron, then it can be decomposed as a polytope V plus a polyhedral cone L. Theorem (Equivalence of Representations) Every polyhedron has both an implicit and dual representation Chernikova s algorithm can compute the dual representation from the implicit one The Dual representation is heavily used in polyhedral compilation Some works operate on the constraint-based representation (Pluto) UCLA 28
36 The Polyhedral Model: Math Corner Parametric Polyhedra Definition (Paramteric polyhedron) Given n the vector of symbolic parameters, P is a parametric polyhedron if it is defined by: P = { x K m A x B n + b} Requires to adapt theory and tools to parameters Can become nasty: case distinctions (QUAST) Reflects nicely the program context UCLA 29
37 The Polyhedral Model: Math Corner Some Useful Algorithms All extended to parametric polyhedra: Compute the facets of a polytope: PolyLib [Wilde et al] Compute the volume of a polytope (number of points): Barvinok [Claus/Verdoolaege] Scan a polytope (code generation): CLooG [Quillere/Bastoul] Find the lexicographic minimum: PIP [Feautrier] UCLA 30
38 The Polyhedral Model: Math Corner Operations on Polyhedra Definition (Intersection) The intersection of two convex sets P 1 and P 2 is a convex set P : P = { x K m x P 1 x P 2 } Definition (Union) The union of two convex sets P 1 and P 2 is a set P : P = { x K m x P 1 x P 2 } The union of two convex sets may not be a convex set UCLA 31
39 The Polyhedral Model: Math Corner Lattices Definition (Lattice) A subset L in Q n is a lattice if is generated by integral combination of finitely many vectors: a 1,a 2,...,a n (a i Q n ). If the a i vectors have integral coordinates, L is an integer lattice. Definition (Z-polyhedron) A Z-polyhedron is the intersection of a polyhedron and an affine integral full dimensional lattice. UCLA 32
40 The Polyhedral Model: Math Corner Pictured Example Example of a Z-polyhedron: Q 1 = {i,j 0 i 5, 0 3j 20} L 1 = {2i + 1,3j + 5 i,j Z} Z 1 = Q 1 L 1 UCLA 33
41 The Polyhedral Model: Math Corner Quick Facts on Z-polyhedra Iteration domains are in fact Z-polyhedra with unit lattice Intersection of Z-polyhedra is not convex in general Union is complex to compute Can count points, can optimize, can scan Implementation available for most operations in PolyLib UCLA 34
42 The Polyhedral Model: Math Corner Image and Preimage Definition (Image) The image of a polyhedron P Z n by an affine function f : Z n Z m is a Z-polyhedron P : P = {f ( x) Z m x P } Definition (Preimage) The preimage of a polyhedron P Z n by an affine function f : Z n Z m is a Z-polyhedron P : P = { x Z n f ( x) P } We have Image(f 1,P ) = Preimage(f,P ) if f is invertible. UCLA 35
43 The Polyhedral Model: Math Corner Relation Between Image, Preimage and Z-polyhedra The image of a Z-polyhedron by an unimodular function is a Z-polyhedron The preimage of a Z-polyhedron by an affine function is a Z-polyhedron The image of a polyhedron by an affine invertible function is a Z-polyhedron The preimage of a Z-polyhedron by an affine function is a Z-polyhedron The image by a non-invertible function is not a Z-polyhedron UCLA 36
44 The Polyhedral Model: Math Corner Program Transformations UCLA 37
45 The Polyhedral Model: Modeling Programs What Can Be Modeled? Exact vs. approximate representation: Exact representation of iteration domains Static control flow Affine loop bounds (includes min/max/integer division) Affine conditionals (conjunction/disjunction) Approximate representation of iteration domains Use affine over-approximations, predicate statement executions Full-function support UCLA 38
46 Program Transformation: Key Intuition Programs are represented with geometric shapes Transforming a program is about modifying those shapes rotation, skewing, stretching,... But we need here to assume some order to scan points UCLA 39
47 Program Transformation: Affine Transformations Interchange Transformation The transformation matrix is the identity with a permutation of two rows. j ( ) i j i ( ) i j = = [ ] ( ) i j j i 0 1 ( ) i 1 0 j (a) original polyhedron (b) transformation function (c) target polyhedron A x + a 0 y = T x (AT 1 ) y + a 0 do i = 1, 2 do j = 1, 3 S(i,j) do i = 1, 3 do j = 1, 2 S(i=j,j=i ) UCLA 40
48 Program Transformation: Affine Transformations Reversal Transformation The transformation matrix is the identity with one diagonal element replaced by 1. j j i = i 1 0 ( ) i j ( ) i j = [ ] ( ) i j 1 0 ( ) i 0 1 j (a) original polyhedron (b) transformation function (c) target polyhedron A x + a 0 y = T x (AT 1 ) y + a 0 do i = 1, 2 do j = 1, 3 S(i,j) do i = -1, -2, -1 do j = 1, 3 S(i=3-i,j=j ) UCLA 40
49 Program Transformation: Affine Transformations Coumpound Transformation The transformation matrix is the composition of an interchange and reversal j j i = i 1 0 ( ) i j ( ) i j = [ ] ( ) i j 0 1 ( ) i 1 0 j (a) original polyhedron (b) transformation function (c) target polyhedron A x + a 0 y = T x (AT 1 ) y + a 0 do i = 1, 2 do j = 1, 3 S(i,j) do j = -1, -3, -1 do i = 1, 2 S(i=4-j,j=i ) UCLA 40
50 Program Transformation: Affine Transformations Coumpound Transformation The transformation matrix is the composition of an interchange and reversal j j i = i 1 0 ( ) i j ( ) i j = [ ] ( ) i j 0 1 ( ) i 1 0 j (a) original polyhedron (b) transformation function (c) target polyhedron A x + a 0 y = T x (AT 1 ) y + a 0 do i = 1, 2 do j = 1, 3 S(i,j) do j = -1, -3, -1 do i = 1, 2 S(i=4-j,j=i ) UCLA 40
51 Program Transformation: So, What is This Matrix? We know how to generate code for arbitrary matrices with integer coefficients Arbitrary number of rows (but fixed number of columns) Arbitrary value for the coefficients Through code generation, the number of dynamic instances is preserved But this is not true for the transformed polyhedra! Some classification: The matrix is unimodular The matrix is full rank and invertible The matrix is full rank The matrix has only integral coefficients The matrix has rational coefficients UCLA 41
52 Program Transformation: A Reverse History Perspective 1 CLooG: arbitrary matrix 2 Affine Mappings 3 Unimodular framework 4 SARE 5 SURE UCLA 42
53 Program Transformation: Program Transformations Original Schedule for (i = 0; i < n; ++i) for (j = 0; j < n; ++j){ S1: C[i][j] = 0; for (k = 0; k < n; ++k) S2: C[i][j] += A[i][k]* B[k][j]; } i ( ) Θ S1. x S1 = j n 1 ( ) i j Θ S2. x S2 = k n 1 for (i = 0; i < n; ++i) for (j = 0; j < n; ++j){ C[i][j] = 0; for (k = 0; k < n; ++k) C[i][j] += A[i][k]* B[k][j]; } Represent Static Control Parts (control flow and dependences must be statically computable) Use code generator (e.g. CLooG) to generate C code from polyhedral representation (provided iteration domains + schedules) UCLA 43
54 Program Transformation: Program Transformations Original Schedule for (i = 0; i < n; ++i) for (j = 0; j < n; ++j){ S1: C[i][j] = 0; for (k = 0; k < n; ++k) S2: C[i][j] += A[i][k]* B[k][j]; } i ( ) Θ S1. x S1 = j n 1 ( ) i j Θ S2. x S2 = k n 1 for (i = 0; i < n; ++i) for (j = 0; j < n; ++j){ C[i][j] = 0; for (k = 0; k < n; ++k) C[i][j] += A[i][k]* B[k][j]; } Represent Static Control Parts (control flow and dependences must be statically computable) Use code generator (e.g. CLooG) to generate C code from polyhedral representation (provided iteration domains + schedules) UCLA 43
55 Program Transformation: Program Transformations Original Schedule for (i = 0; i < n; ++i) for (j = 0; j < n; ++j){ S1: C[i][j] = 0; for (k = 0; k < n; ++k) S2: C[i][j] += A[i][k]* B[k][j]; } i ( ) Θ S1. x S1 = j n 1 ( ) i j Θ S2. x S2 = k n 1 for (i = 0; i < n; ++i) for (j = 0; j < n; ++j){ C[i][j] = 0; for (k = 0; k < n; ++k) C[i][j] += A[i][k]* B[k][j]; } Represent Static Control Parts (control flow and dependences must be statically computable) Use code generator (e.g. CLooG) to generate C code from polyhedral representation (provided iteration domains + schedules) UCLA 43
56 Program Transformation: Program Transformations Distribute loops for (i = 0; i < n; ++i) for (j = 0; j < n; ++j){ S1: C[i][j] = 0; for (k = 0; k < n; ++k) S2: C[i][j] += A[i][k]* B[k][j]; } i ( ) Θ S1. x S1 = j n 1 ( ) i j Θ S2. x S2 = k n 1 for (i = 0; i < n; ++i) for (j = 0; j < n; ++j) C[i][j] = 0; for (i = n; i < 2*n; ++i) for (j = 0; j < n; ++j) for (k = 0; k < n; ++k) C[i-n][j] += A[i-n][k]* B[k][j]; All instances of S1 are executed before the first S2 instance UCLA 43
57 Program Transformation: Program Transformations Distribute loops + Interchange loops for S2 for (i = 0; i < n; ++i) for (j = 0; j < n; ++j){ S1: C[i][j] = 0; for (k = 0; k < n; ++k) S2: C[i][j] += A[i][k]* B[k][j]; } i ( ) Θ S1. x S1 = j n 1 ( ) i j Θ S2. x S2 = k n 1 for (i = 0; i < n; ++i) for (j = 0; j < n; ++j) C[i][j] = 0; for (k = n; k < 2*n; ++k) for (j = 0; j < n; ++j) for (i = 0; i < n; ++i) C[i][j] += A[i][k-n]* B[k-n][j]; The outer-most loop for S2 becomes k UCLA 43
58 Program Transformation: Program Transformations Illegal schedule for (i = 0; i < n; ++i) for (j = 0; j < n; ++j){ S1: C[i][j] = 0; for (k = 0; k < n; ++k) S2: C[i][j] += A[i][k]* B[k][j]; } i ( ) Θ S1. x S1 = j n 1 ( ) i j Θ S2. x S2 = k n 1 for (k = 0; k < n; ++k) for (j = 0; j < n; ++j) for (i = 0; i < n; ++i) C[i][j] += A[i][k]* B[k][j]; for (i = n; i < 2*n; ++i) for (j = 0; j < n; ++j) C[i-n][j] = 0; All instances of S1 are executed after the last S2 instance UCLA 43
59 Program Transformation: Program Transformations A legal schedule for (i = 0; i < n; ++i) for (j = 0; j < n; ++j){ S1: C[i][j] = 0; for (k = 0; k < n; ++k) S2: C[i][j] += A[i][k]* B[k][j]; } i ( ) Θ S1. x S1 = j n 1 ( ) i j Θ S2. x S2 = k n 1 for (i = n; i < 2*n; ++i) for (j = 0; j < n; ++j) C[i][j] = 0; for (k= n+1; k<= 2*n; ++k) for (j = 0; j < n; ++j) for (i = 0; i < n; ++i) C[i][j] += A[i][k-n-1]* B[k-n-1][j]; Delay the S2 instances Constraints must be expressed between Θ S1 and Θ S2 UCLA 43
60 Program Transformation: Program Transformations Implicit fine-grain parallelism for (i = 0; i < n; ++i) for (j = 0; j < n; ++j){ S1: C[i][j] = 0; for (k = 0; k < n; ++k) S2: C[i][j] += A[i][k]* B[k][j]; } i Θ S1. x S1 = ( ). j n 1 i j Θ S2. x S2 = ( ). k n 1 for (i = 0; i < n; ++i) pfor (j = 0; j < n; ++j) C[i][j] = 0; for (k = n; k < 2*n; ++k) pfor (j = 0; j < n; ++j) pfor (i = 0; i < n; ++i) C[i][j] += A[i][k-n]* B[k-n][j]; Less (linear) rows than loop depth remaining dimensions are parallel UCLA 43
61 Program Transformation: Program Transformations Representing a schedule for (i = 0; i < n; ++i) for (j = 0; j < n; ++j){ S1: C[i][j] = 0; for (k = 0; k < n; ++k) S2: C[i][j] += A[i][k]* B[k][j]; } i ( ) Θ S1. x S1 = j n 1 ( ) i j Θ S2. x S2 = k n 1 for (i = n; i < 2*n; ++i) for (j = 0; j < n; ++j) C[i][j] = 0; for (k= n+1; k<= 2*n; ++k) for (j = 0; j < n; ++j) for (i = 0; i < n; ++i) C[i][j] += A[i][k-n-1]* B[k-n-1][j]; Θ. x = ( ) ( i j i j k n n 1 1 ) T UCLA 43
62 Program Transformation: Program Transformations Representing a schedule for (i = 0; i < n; ++i) for (j = 0; j < n; ++j){ S1: C[i][j] = 0; for (k = 0; k < n; ++k) S2: C[i][j] += A[i][k]* B[k][j]; } i ( ) Θ S1. x S1 = j n 1 ( ) i j Θ S2. x S2 = k n 1 for (i = n; i < 2*n; ++i) for (j = 0; j < n; ++j) C[i][j] = 0; for (k= n+1; k<= 2*n; ++k) for (j = 0; j < n; ++j) for (i = 0; i < n; ++i) C[i][j] += A[i][k-n-1]* B[k-n-1][j]; Θ. x = ( ) ( i j i j k n n 1 1 ) T ı p c UCLA 43
63 Program Transformation: Program Transformations Representing a schedule for (i = 0; i < n; ++i) for (j = 0; j < n; ++j){ S1: C[i][j] = 0; for (k = 0; k < n; ++k) S2: C[i][j] += A[i][k]* B[k][j]; } i ( ) Θ S1. x S1 = j n 1 ( ) i j Θ S2. x S2 = k n 1 for (i = n; i < 2*n; ++i) for (j = 0; j < n; ++j) C[i][j] = 0; for (k= n+1; k<= 2*n; ++k) for (j = 0; j < n; ++j) for (i = 0; i < n; ++i) C[i][j] += A[i][k-n-1]* B[k-n-1][j]; ı p c Transformation reversal skewing interchange fusion distribution peeling shifting Description Changes the direction in which a loop traverses its iteration range Makes the bounds of a given loop depend on an outer loop counter Exchanges two loops in a perfectly nested loop, a.k.a. permutation Fuses two loops, a.k.a. jamming Splits a single loop nest into many, a.k.a. fission or splitting Extracts one iteration of a given loop Allows to reorder loops UCLA 43
64 Program Transformation: Fusion in the Polyhedral Model 0 N for (i = 0; i <= N; ++i) { Blue(i); Red(i); } Perfectly aligned fusion UCLA 44
65 Program Transformation: Fusion in the Polyhedral Model 0 1 N N+1 Blue(0); for (i = 1; i <= N; ++i) { Blue(i); Red(i-1); } Red(N); Fusion with shift of 1 Not all instances are fused UCLA 44
66 Program Transformation: Fusion in the Polyhedral Model 0 P N N+P for (i = 0; i < P; ++i) Blue(i); for (i = P; i <= N; ++i) { Blue(i); Red(i-P); } for (i = N+1; i <= N+P; ++i) Red(i-P); Fusion with parametric shift of P Automatic generation of prolog/epilog code UCLA 44
67 Program Transformation: Fusion in the Polyhedral Model 0 P N N+P for (i = 0; i < P; ++i) Blue(i); for (i = P; i <= N; ++i) { Blue(i); Red(i-P); } for (i = N+1; i <= N+P; ++i) Red(i-P); Many other transformations may be required to enable fusion: interchange, skewing, etc. UCLA 44
68 Program Transformation: Scheduling Matrix Definition (Affine multidimensional schedule) Given a statement S, an affine schedule Θ S of dimension m is an affine form on the d outer loop iterators x S and the p global parameters n. Θ S Z m (d+p+1) can be written as: θ 1,1... θ 1,d+p+1 Θ S ( x S ) =... x S n θ m,1... θ m,d+p+1 1 Θ S k denotes the kth row of Θ S. Definition (Bounded affine multidimensional schedule) Θ S is a bounded schedule if θ S i,j [x,y] with x,y Z UCLA 45
69 Program Transformation: Another Representation One can separate coefficients of Θ into: 1 The iterator coefficients 2 The parameter coefficients 3 The constant coefficients One can also enforce the schedule dimension to be 2d+1. A d-dimensional square matrix for the linear part represents composition of interchange/skewing/slowing A d n matrix for the parametric part represents (parametric) shift A d + 1 vector for the scalar offset represents statement interleaving See URUK for instance UCLA 46
70 Program Transformation: Computing the 2d+1 Identity Schedule θs1 ("xs1 ) = (0, i, 0) θs2 ("xs2 ) = (0, i, 1, j, 0) θs3 0 i 0 S1 S2 S3 S4 S5 S6 UCLA S1 j S3 0 j 0 S2 S4 1 2 k S6 0 S5 47
71 examples updating the data layout and array access functions. This table is not complete (e.g., it lacks index-set splitting and datalayout Transformation transformations), but Catalogue it demonstrates [1/2] the expressiveness of the unified representation. Program Transformation: Syntax Effect UNIMODULAR(P,U) S S cop P β S,A S U.A S ; Γ S U.Γ S SHIFT(P,M) S S cop P β S,Γ S Γ S + M CUTDOM(P,c) S S cop P β S,Λ S ( AddRow Λ S,0,c/gcd(c 1,...,c d S +d lv S +dgp+1)) d S d S + 1; Λ S AddCol(Λ S,c,0); β S AddRow(β S,l,0); EXTEND(P,l,c) S S cop P β S, A S AddRow(AddCol(A S,c,0),l,1 l ); Γ S AddRow(Γ S,l,0); (A,F) Lhs S R hs S,F AddRow(F,l,0) ADDLOCALVAR(P) S S cop P β S,dlv S ds lv + 1; ΛS AddCol(Λ S,d S + 1,0); (A,F) Lhs S R hs S,F AddCol(F,dS + 1,0) PRIVATIZE(A,l) S S cop, (A,F) Lhs S R hs S,F AddRow(F,l,1 l) CONTRACT(A,l) S S cop, (A,F) Lhs S R hs S,F RemRow(F,l) FUSION(P,o) b = max{β S dim(p)+1 (P,o) βs } + 1 Move((P,o + 1),(P,o + 1),b); Move(P,(P,o + 1), 1) FISSION(P,o,b) Move(P,(P,o,b),1); Move((P,o + 1),(P,o + 1), b) MOTION(P,T ) if dim(p) + 1 = dim(t) then b = max{β S dim(p) P βs } + 1 else b = 1 Move(pfx(T,dim(T) 1),T,b) S S cop P β S,β S β S + T pfx(p,dim(t)) Move(P,P, 1) Fig. 18. Some classical transformation primitives UCLA 48
72 Program Transformation: Transformation Catalogue [2/2] 30 Girbal, Vasilache et al. Syntax Effect Comments INTERCHANGE(P,o) S S cop P β S swap rows o and o + 1, { U = Id S 1 o,o 1 o+1,o+1 +1 o,o+1 +1 o+1,o ; UNIMODULAR(β S,U) SKEW(P,l,c,s) S S cop P β S, { U = Id S + s 1 l,c ; UNIMODULAR(β S,U) REVERSE(P,o) S S cop P β S, { U = Id S 2 1 o,o ; UNIMODULAR(β S,U) STRIPMINE(P,k) S S cop P β S, c = dim(p); EXTEND(β S,c,c); u = d S + dlv S + d gp + 1; CUTDOM(β S, k 1 c + (A S c+1,γs c+1 )); CUTDOM(β S,k 1 c (A S c+1,γs c+1 ) + (k 1)1 u) TILE(P,o,k 1,k 2 ) S S cop (P,o) β S, STRIPMINE((P,o),k 2 ); STRIPMINE(P,k 1 ); INTERCHANGE((P,0),dim(P)) add the skew factor put a -1 in (o,o) insert intermediate loop constant column k i c i c+1 i c+1 k i c + k 1 strip outer loop strip inner loop interchange Fig. 19. Composition of transformation primitives UCLA 49
73 Program Transformation: Some Final Observations Some classical pitfalls The number of rows of Θ does not correspond to actual parallel levels Scalar rows vs. linear rows Linear independence Parametric shift for domains without parametric bound... UCLA 50
74 Program Transformation: Data Dependences UCLA 51
75 Data Dependences: Purpose of Dependence Analysis Not all program transformations preserve the semantics Semantics is preserved if the dependence are preserved In standard frameworks, it usually means reordering statements/loops Statements containing dependent references should not be executed in a different order Granularity: usually a reference to an array In the polyhedral framework, it means reordering statement instances Statement instances containing dependent references should not be executed in a different order Granularity: a reference to an array cell UCLA 52
76 Data Dependences: Example Exercise: Compute the set of cells of A accessed Example for (i = 0; i < N; ++i) for (j = i; j < N; ++j) A[2i + 3][4j] = i * j; D S : {i,j 0 i < N, i j < N} Function: f A : {2i + 3,4j i,j Z} Image(f A,D S ) is the set of cells of A accessed (a Z-polyhedron): Polyhedron: Q : {i,j 3 i < 2N + 2, 0 j < 4N} Lattice: L : {2i + 3,4j i,j Z} UCLA 53
77 Data Dependences: Data Dependence Definition (Bernstein conditions) Given two references, there exists a dependence between them if the three following conditions hold: they reference the same array (cell) one of this access is a write the two associated statements are executed Three categories of dependences: RAW (Read-After-Write, aka flow): first reference writes, second reads WAR (Write-After-Read, aka anti): first reference reads, second writes WAW (Write-After-Write, aka output): both references writes Another kind: RAR (Read-After-Read dependences), used for locality/reuse computations UCLA 54
78 Data Dependences: Some Terminology on Dependence Relations We categorize the dependence relation in three kinds: Uniform dependences: the distance between two dependent iterations is a constant ex: i i + 1 ex: i,j i + 1,j + 1 Non-uniform dependences: the distance between two dependent iterations varies during the execution ex: i i + j ex: i 2i Parametric dependences: at least a parameter is involved in the dependence relation ex: i i + N ex: i + N j + M UCLA 55
79 Data Dependences: Dependence Polyhedra Dependence Polyhedron [1/3] Principle: model all pairs of instances in dependence Definition (Dependence of statement instances) A statement S depends on a statement R (written R S) if there exists an operation S( x S ) and R( x R ) and a memory location m such that: 1 S( x S ) and R( x R ) refer to the same memory location m, and at least one of them writes to that location, 2 x S and x R belongs to the iteration domain of R and S, 3 in the original sequential order, S( x S ) is executed before R( x R ). UCLA 56
80 Data Dependences: Dependence Polyhedra Dependence Polyhedron [2/3] 1 Same memory location: equality of the subscript functions of a pair of references to the same array: F S x S + a S = F R x R + a R. 2 Iteration domains: both S and R iteration domains can be described using affine inequalities, respectively A S x S + c S 0 and A R x R + c R 0. 3 Precedence order: each case corresponds to a common loop depth, and is called a dependence level. For each dependence level l, the precedence constraints are the equality of the loop index variables at depth lesser to l: x R,i = x S,i for i < l and x R,l > x S,l if l is less than the common nesting loop level. Otherwise, there is no additional constraint and the dependence only exists if S is textually before R. Such constraints can be written using linear inequalities: P l,s x S P l,r x R + b 0. UCLA 57
81 Data Dependences: Dependence Polyhedra Dependence Polyhedron [3/3] The dependence polyhedron for R S at a given level l and for a given pair of references f R,f S is described as [Feautrier/Bastoul]: ( ) xs D R,S,fR,f S,l : D + d = x R F S F R A S 0 0 A R PS P R ( ) a S a R xs c + S x R c R b = 0 0 A few properties: We can always build the dep polyhedron for a given pair of affine array accesses (it is convex) It is exact, if the iteration domain and the access functions are also exact it is over-approximated if the iteration domain or the access function is an approximation UCLA 58
82 Data Dependences: Dependence Polyhedra Polyhedral Dependence Graph Definition (Polyhedral Dependence Graph) The Polyhedral Dependence Graph is a multi-graph with one vertex per syntactic program statement S i, with edges S i S j for each dependence polyhedron D Si,S j. UCLA 59
83 Data Dependences: Dependence Polyhedra Scheduling UCLA 60
84 Affine Scheduling: Affine Scheduling Definition (Affine schedule) Given a statement S, a p-dimensional affine schedule Θ R is an affine form on the outer loop iterators x S and the global parameters n. It is written: Θ S ( x S ) = T S x S n, T S K p dim( x S)+dim( n)+1 1 A schedule assigns a timestamp to each executed instance of a statement If T is a vector, then Θ is a one-dimensional schedule If T is a matrix, then Θ is a multidimensional schedule Question: does it translate to sequential loops? UCLA 61
85 Affine Scheduling: Legal Program Transformation Definition (Precedence condition) Given Θ R a schedule for the instances of R, Θ S a schedule for the instances of S. Θ R and Θ S are legal schedules if x R, x S D R,S : denotes the lexicographic ordering. Θ R ( x R ) Θ S ( x S ) (a 1,...,a n ) (b 1,...,b m ) iff i, 1 i min(n,m) s.t. (a 1,...,a i 1 ) = (b 1,...,b i 1 ) and a i < b i UCLA 62
86 Affine Scheduling: Scheduling in the Polyhedral Model Constraints: The schedule must respect the precedence condition, for all dependent instances Dependence constraints can be turned into constraints on the solution set Scheduling: Among all possibilities, one has to be picked Optimal solution requires to consider all legal possible schedules Question: is this always true? UCLA 63
87 Affine Scheduling: One-Dimensional Affine Schedules For now, we focus on 1-d schedules Example for (i = 1; i < N; ++i) A[i] = A[i - 1] + A[i] + A[i + 1]; Simple program: 1 loop, 1 polyhedral statement 2 dependences: RAW: A[i] A[i - 1] WAR: A[i + 1] A[i] UCLA 64
88 Affine Scheduling: Checking the Legality of a Schedule Exercise: given the dependence polyhedra, check if a schedule is legal eq eq D 1 : i S i S D 2 : n i S i S n Θ = i 2 Θ = i UCLA 65
89 Affine Scheduling: Checking the Legality of a Schedule Exercise: given the dependence polyhedra, check if a schedule is legal eq eq D 1 : i S i S D 2 : n i S i S n Θ = i 2 Θ = i Solution: check for the emptiness of the polyhedron [ ] i S D P : i S i. i S S n 1 where: i S i S gets the consumer instances scheduled after the producer ones For Θ = i, it is i S i S, which is non-empty UCLA 65
90 Affine Scheduling: A (Naive) Scheduling Approach Pick a schedule for the program statements Check if it respects all dependences This is called filtering Limitations: How to use this in combination of an objective function? The density of legal 1-d affine schedules is low: matmult locality fir h264 crout i-bounds 1,1 1,1 0,1 1,1 3,3 c-bounds 1,1 1,1 0,3 0,4 3,3 #Sched #Legal UCLA 66
91 Affine Scheduling: Objectives for a Good Scheduling Algorithm Build a legal schedule! Embed some properties in this legal schedule latency: minimize the time of the last iteration delay: minimize the time between the first and last iteration parallelism / placement permutability (for tiling)... A possible "simple" two-step approach: Find the solution set of all legal affine schedules Find an ILP/PIP formulation for the objective function(s) UCLA 67
92 Affine Scheduling: The Precedence Constraint (Again!) Precedence constraint adapted to 1-d schedules: Definition (Causality condition for schedules) Given D R,S, Θ R and Θ S are legal iff for each pair of instances in dependence: Θ R ( x R ) < Θ S ( x S ) Equivalently: R,S = Θ S ( x S ) Θ R ( x R ) 1 0 All functions R,S which are non-negative over the dependence polyhedron represent legal schedules For the instances which are not in dependence, we don t care First step: how to get all non-negative functions over a polyhedron? UCLA 68
93 Affine Scheduling: Affine Form of the Farkas Lemma Lemma (Affine form of Farkas lemma) Let D be a nonempty polyhedron defined by A x + b 0. Then any affine function f ( x) is non-negative everywhere in D iff it is a positive affine combination: f ( x) = λ 0 + λ T (A x + b), with λ 0 0 and λ 0 λ 0 and λ T are called the Farkas multipliers. UCLA 69
94 Affine Scheduling: The Farkas Lemma: Example Function: f (x) = ax + b Domain of x: {1 x 3} x 1 0, x Farkas lemma: f (x) 0 f (x) = λ 0 + λ 1 (x 1) + λ 2 ( x + 3) The system to solve: λ 1 λ 2 = a λ 0 λ 1 + 3λ 2 = b λ 0 0 λ 1 0 λ 2 0 UCLA 70
95 Affine Scheduling: One-dimensional Schedules Example: Semantics Preservation (1-D) Affine Schedules Legal Distinct Schedules UCLA 71
96 Affine Scheduling: One-dimensional Schedules Example: Semantics Preservation (1-D) Affine Schedules Legal Distinct Schedules - Causality condition Property (Causality condition for schedules) Given RδS, Θ R and Θ S are legal iff for each pair of instances in dependence: Θ R ( x R ) < Θ S ( x S ) Equivalently: R,S = Θ S ( x S ) Θ R ( x R ) 1 0 UCLA 71
97 Affine Scheduling: One-dimensional Schedules Example: Semantics Preservation (1-D) Affine Schedules Legal Distinct Schedules - Causality condition - Farkas Lemma Lemma (Affine form of Farkas lemma) Let D be a nonempty polyhedron defined by A x + b 0. Then any affine function f ( x) is non-negative everywhere in D iff it is a positive affine combination: f ( x) = λ 0 + λ T (A x + b), with λ 0 0 and λ 0. λ 0 and λ T are called the Farkas multipliers. UCLA 71
98 Affine Scheduling: One-dimensional Schedules Example: Semantics Preservation (1-D) Affine Schedules Valid Farkas Multipliers Legal Distinct Schedules - Causality condition - Farkas Lemma UCLA 71
99 Affine Scheduling: One-dimensional Schedules Example: Semantics Preservation (1-D) Affine Schedules Valid Farkas Multipliers Many to one Legal Distinct Schedules - Causality condition - Farkas Lemma UCLA 71
100 Affine Scheduling: One-dimensional Schedules Example: Semantics Preservation (1-D) Affine Schedules Valid Farkas Multipliers Legal Distinct Schedules - Causality condition - Farkas Lemma - Identification ( ( ) ) Θ S ( x S ) Θ R ( x R ) 1 = λ 0 + λ T xr D R,S + d R,S 0 x S D RδS i R : λ D1,1 λ D1,2 + λ D1,3 λ D1,4 i S : λ D1,1 + λ D1,2 + λ D1,5 λ D1,6 j S : λ D1,7 λ D1,8 n : λ D1,4 + λ D1,6 + λ D1,8 1 : λ D1,0 UCLA 71
101 Affine Scheduling: One-dimensional Schedules Example: Semantics Preservation (1-D) Affine Schedules Valid Farkas Multipliers Legal Distinct Schedules - Causality condition - Farkas Lemma - Identification ( ( ) ) Θ S ( x S ) Θ R ( x R ) 1 = λ 0 + λ T xr D R,S + d R,S 0 x S D RδS i R : t 1R = λ D1,1 λ D1,2 + λ D1,3 λ D1,4 i S : t 1S = λ D1,1 + λ D1,2 + λ D1,5 λ D1,6 j S : t 2S = λ D1,7 λ D1,8 n : t 3S t 2R = λ D1,4 + λ D1,6 + λ D1,8 1 : t 4S t 3R 1 = λ D1,0 UCLA 71
102 Affine Scheduling: One-dimensional Schedules Example: Semantics Preservation (1-D) Affine Schedules Valid Farkas Multipliers Legal Distinct Schedules - Causality condition - Farkas Lemma - Identification - Projection Solve the constraint system Use (purpose-optimized) Fourier-Motzkin projection algorithm Reduce redundancy Detect implicit equalities UCLA 71
103 Affine Scheduling: One-dimensional Schedules Example: Semantics Preservation (1-D) Affine Schedules Valid Farkas Multipliers Valid Transformation Coefficients Legal Distinct Schedules - Causality condition - Farkas Lemma - Identification - Projection UCLA 71
104 Affine Scheduling: One-dimensional Schedules Example: Semantics Preservation (1-D) Affine Schedules Valid Farkas Multipliers Valid Transformation Coefficients Bijection Legal Distinct Schedules - Causality condition - Farkas Lemma - Identification - Projection One point in the space one set of legal schedules w.r.t. the dependences UCLA 71
105 Affine Scheduling: One-dimensional Schedules Scheduling Algorithm for Multiple Dependences Algorithm Compute the schedule constraints for each dependence Intersect all sets of constraints Output is a convex solution set of all legal one-dimensional schedules Computation is fast, but requires eliminating variables in a system of inequalities: projection Can be computed as soon as the dependence polyhedra are known UCLA 72
106 Affine Scheduling: One-dimensional Schedules Objective Function Idea: bound the latency of the schedule and minimize this bound Theorem (Schedule latency bound) If all domains are bounded, and if there exists at least one 1-d schedule Θ, then there exists at least one affine form in the structure parameters: L = u. n + w such that: x R, L Θ R ( x R ) Objective function: min{ u,w u. n + w Θ 0} Subject to Θ is a legal schedule, and θ i 0 In many cases, it is equivalent to take the lexicosmallest point in the polytope of non-negative legal schedules UCLA 73
107 Affine Scheduling: One-dimensional Schedules Example min{ u,w u. n + w Θ 0} : Θ R = 0, Θ S = k + 1 Example parfor (i = 0; i < N; ++i) parfor (j = 0; j < N; ++j) C[i][j] = 0; for (k = 1; k < N + 1; ++k) parfor (i = 0; i < N; ++i) parfor (j = 0; j < N; ++j) C[i][j] += A[i][k-1] + B[k-1][j]; UCLA 74
108 Affine Scheduling: One-dimensional Schedules Limitations of One-dimensional Schedules Not all programs have a legal one-dimensional schedule Question: does this program have a 1-d schedule? Example for (i = 1; i < N - 1; ++i) for (j = 1; j < N - 1; ++j) A[i][j] = A[i-1][j-1] + A[i+1][j] + A[i][j+1]; Not all compositions of transformation are possible Interchange in inner-loops Fusion / distribution of inner-loops UCLA 75
109 Affine Scheduling: One-dimensional Schedules Multidimensional Scheduling UCLA 76
110 Affine Scheduling: Multidimensional Scheduling Multidimensional Scheduling Some program does not have a legal 1-d schedule It means, it s not possible to enforce the precedence condition for all Example dependences for (i = 0; i < N; ++i) for (j = 0; j < N; ++j) s += s; Intuition: multidimensional time means nested time loops The precedence constraint needs to be adapted to multidimensional time UCLA 77
111 Affine Scheduling: Multidimensional Scheduling Dependence Satisfaction Definition (Strong dependence satisfaction) Given D R,S, the dependence is strongly satisfied at schedule level k if x R, x S D R,S, Θ S k ( x S) Θ R k ( x R) 1 Definition (Weak dependence satisfaction) Given D R,S, the dependence is weakly satisfied at dimension k if x R, x S D R,S, Θ S k ( x S) Θ R k ( x R ) 0 x R, x S D R,S, Θ S k ( x S) = Θ R k ( x R ) UCLA 78
112 Affine Scheduling: Multidimensional Scheduling Program Legality and Existence Results All dependence must be strongly satisfied for the program to be correct Once a dependence is strongly satisfied at level k, it does not contribute to the constraints of level k + i Unlike with 1-d schedules, it is always possible to build a legal multidimensional schedule for a SCoP [Feautrier] Theorem (Existence of an affine schedule) Every static control program has a multdidimensional affine schedule UCLA 79
113 Affine Scheduling: Multidimensional Scheduling Reformulation of the Precedence Condition We introduce variable δ D R,S 1 to model the dependence satisfaction Considering the first row of the scheduling matrices, to preserve the precedence relation we have: D R,S, x R, x S D R,S, Θ S 1 ( x S) Θ R 1 ( x R) δ D R,S 1 δ D R,S 1 {0,1} Lemma (Semantics-preserving affine schedules) Given a set of affine schedules Θ R,Θ S... of dimension m, the program semantics is preserved if: D R,S, p {1,...,m}, δ D R,S p = 1 j < p, δ D R,S j = 0 j p, x R, x S D R,S, Θ S p( x S ) Θ R p ( x R ) δ D R,S j UCLA 80
114 Affine Scheduling: Multidimensional Scheduling Space of All Affine Schedules Objective: Design an ILP which operates on all scheduling coefficients Easier optimality reasoning: the space contains all schedules (hence necesarily the optimal one) Examples: maximal fusion, maximal coarse-grain parallelism, best idea: locality, etc. Combine all coefficients of all rows of the scheduling function into a single solution set Find a convex encoding for the lexicopositivity of dependence satisfaction A dependence must be weakly satisfied until it is strongly satisfied Once it is strongly satisfied, it must not constrain subsequent levels UCLA 81
115 Affine Scheduling: Multidimensional Scheduling Schedule Lower Bound Idea: Bound the schedule latency with a lower bound which does not prevent to find all solutions Intuitively: Θ S ( x S ) Θ R ( x R ) δ if the dependence has not been strongly satisfied Θ S ( x S ) Θ R ( x R ) if it has Lemma (Schedule lower bound) Given Θ R k, ΘS k such that each coefficient value is bounded in [x,y]. Then there exists K Z such that: max ( Θ S k ( x S) Θ R k ( x R ) ) > K. n K UCLA 82
116 Affine Scheduling: Space of Semantics-Preserving Affine Schedules Space of Semantics-Preserving Affine Schedules All unique bounded affine multidimensional schedules All unique semantics-preserving affine multidimensional schedules 1 point 1 unique semantically equivalent program (up to affine iteration reordering) UCLA 83
117 Affine Scheduling: Space of Semantics-Preserving Affine Schedules Semantics Preservation Definition (Causality condition) Given Θ R a schedule for the instances of R, Θ S a schedule for the instances of S. Θ R and Θ S preserve the dependence D R,S if x R, x S D R,S : Θ R ( x R ) Θ S ( x S ) denotes the lexicographic ordering. (a 1,...,a n ) (b 1,...,b m ) iff i, 1 i min(n,m) s.t. (a 1,...,a i 1 ) = (b 1,...,b i 1 ) and a i < b i UCLA 84
118 Affine Scheduling: Space of Semantics-Preserving Affine Schedules Lexico-positivity of Dependence Satisfaction Θ R ( x R ) Θ S ( x S ) is equivalently written Θ S ( x S ) Θ R ( x R ) 0 UCLA 85
119 Affine Scheduling: Space of Semantics-Preserving Affine Schedules Lexico-positivity of Dependence Satisfaction Θ R ( x R ) Θ S ( x S ) is equivalently written Θ S ( x S ) Θ R ( x R ) 0 Considering the row p of the scheduling matrices: Θ S p( x S ) Θ R p ( x R ) δ p UCLA 85
120 Affine Scheduling: Space of Semantics-Preserving Affine Schedules Lexico-positivity of Dependence Satisfaction Θ R ( x R ) Θ S ( x S ) is equivalently written Θ S ( x S ) Θ R ( x R ) 0 Considering the row p of the scheduling matrices: Θ S p( x S ) Θ R p ( x R ) δ p δp 1 implies no constraints on δ k, k > p δp 0 is required if k < p, δ k 1 UCLA 85
121 Affine Scheduling: Space of Semantics-Preserving Affine Schedules Lexico-positivity of Dependence Satisfaction Θ R ( x R ) Θ S ( x S ) is equivalently written Θ S ( x S ) Θ R ( x R ) 0 Considering the row p of the scheduling matrices: Θ S p( x S ) Θ R p ( x R ) δ p δp 1 implies no constraints on δ k, k > p δp 0 is required if k < p, δ k 1 Schedule lower bound: Lemma (Schedule lower bound) Given Θ R k, ΘS k such that each coefficient value is bounded in [x,y]. Then there exists K Z such that: x R, x S, Θ S k ( x S) Θ R k ( x R ) > K. n K UCLA 85
122 Affine Scheduling: Space of Semantics-Preserving Affine Schedules Convex Form of All Bounded Affine Schedules Lemma (Convex form of semantics-preserving affine schedules) Given a set of affine schedules Θ R,Θ S... of dimension m, the program semantics is preserved if the three following conditions hold: (i) D R,S, δ D R,S p {0,1} (ii) m D R,S, δ D R,S p = 1 p=1 (iii) D R,S, p {1,...,m}, x R, x S D R,S, UCLA 86
123 Affine Scheduling: Space of Semantics-Preserving Affine Schedules Convex Form of All Bounded Affine Schedules Lemma (Convex form of semantics-preserving affine schedules) Given a set of affine schedules Θ R,Θ S... of dimension m, the program semantics is preserved if the three following conditions hold: (i) D R,S, δ D R,S p {0,1} (ii) m D R,S, δ D R,S p = 1 p=1 (iii) D R,S, p {1,...,m}, x R, x S D R,S, UCLA 86
124 Affine Scheduling: Space of Semantics-Preserving Affine Schedules Convex Form of All Bounded Affine Schedules Lemma (Convex form of semantics-preserving affine schedules) Given a set of affine schedules Θ R,Θ S... of dimension m, the program semantics is preserved if the three following conditions hold: (i) D R,S, δ D R,S p {0,1} (ii) m D R,S, δ D R,S p = 1 p=1 (iii) D R,S, p {1,...,m}, x R, x S D R,S, Θ S p( x S ) Θ R p ( x R ) δ D R,S p UCLA 86
125 Affine Scheduling: Space of Semantics-Preserving Affine Schedules Convex Form of All Bounded Affine Schedules Lemma (Convex form of semantics-preserving affine schedules) Given a set of affine schedules Θ R,Θ S... of dimension m, the program semantics is preserved if the three following conditions hold: (i) D R,S, δ D R,S p {0,1} (ii) m D R,S, δ D R,S p = 1 p=1 (iii) D R,S, p {1,...,m}, x R, x S D R,S, Θ S p( x S ) Θ R p ( x R ) δ D R,S p p 1 k=1 δ D R,S k.(k. n + K) UCLA 86
126 Affine Scheduling: Space of Semantics-Preserving Affine Schedules Convex Form of All Bounded Affine Schedules Lemma (Convex form of semantics-preserving affine schedules) Given a set of affine schedules Θ R,Θ S... of dimension m, the program semantics is preserved if the three following conditions hold: (i) D R,S, δ D R,S p {0,1} (ii) m D R,S, δ D R,S p = 1 p=1 (iii) D R,S, p {1,...,m}, x R, x S D R,S, Θ S p( x S ) Θ R p ( x R ) δ D R,S p 1 p + k=1 δ D R,S k.(k. n + K) 0 Use Farkas lemma to build all non-negative functions over a polyhedron (here, the dependence polyhedra) [Feautrier,92] UCLA 86
127 Affine Scheduling: Space of Semantics-Preserving Affine Schedules Convex Form of All Bounded Affine Schedules Lemma (Convex form of semantics-preserving affine schedules) Given a set of affine schedules Θ R,Θ S... of dimension m, the program semantics is preserved if the three following conditions hold: (i) D R,S, δ D R,S p {0,1} (ii) m D R,S, δ D R,S p = 1 p=1 (iii) D R,S, p {1,...,m}, x R, x S D R,S, Θ S p( x S ) Θ R p ( x R ) δ D R,S p 1 p + k=1 δ D R,S k.(k. n + K) 0 Use Farkas lemma to build all non-negative functions over a polyhedron (here, the dependence polyhedra) [Feautrier,92] Bounded coefficients required [Vasilache,07] UCLA 86
128 Affine Scheduling: Space of Semantics-Preserving Affine Schedules Maximal Fine-Grain Parallelism Objectives: Have as few dimensions as possible carrying a dependence For dimension k p..1: min δ D R,S k D R,S We use lexicographic optimization UCLA 87
129 Affine Scheduling: Space of Semantics-Preserving Affine Schedules Key Observations Is all of this really necessary? We have encoded one objective per row of Θ Question: do we need to solve this large ILP/LP? UCLA 88
130 Affine Scheduling: Space of Semantics-Preserving Affine Schedules Feautrier s Greedy Algorithm Main idea: 1 Start at row 1 of Θ 2 Build the set of legal one-dimensional schedules 3 Maximize the number of dependences strongly solved (maxδ i ) 4 Remove strongly solved dependences from P 5 Goto 1 This is a row-by-row decomposition of the scheduling problem UCLA 89
131 Affine Scheduling: Space of Semantics-Preserving Affine Schedules Key Properties of Feautrier s Algorithm It terminates It finds "optimal" fine-grain parallelism Granularity of dependence satisfaction: all-or-nothing Example for (i = 0; i < 2 * N; ++i) A[i] = A[2 * N - i]; UCLA 90
132 Affine Scheduling: Space of Semantics-Preserving Affine Schedules Key Observations Is all of this really necessary? Question: do we need to consider all statements at once? Insight: the PDG gives structural information about dependences Decomposition of the PDG into strongly-connected components UCLA 91
133 Affine Scheduling: Space of Semantics-Preserving Affine Schedules Feautrier s Scheduler UCLA 92
134 Affine Scheduling: Space of Semantics-Preserving Affine Schedules More Observations Some problems may be decomposed without loss of optimality The PDG gives extra information about further problem decomposition Still, is all of this really necessary? Question: can we use additional knowledge about dependences? Uniform, non-uniform and parametric dependences UCLA 93
135 Affine Scheduling: Space of Semantics-Preserving Affine Schedules Cost Functions UCLA 94
136 Scheduling for Performance: Objectives for Good Scheduling Fine-grain parallelism is nice, but... It has little connection with modern SIMD parallelism No information about the quality of the generated code Ignores all the important performance objectives: Data locality / TLB / Cache consideration Multi-core parallelism (sync-free, with barrier) SIMD vectorization Question: how to find a FAST schedule for a modern processor? UCLA 95
137 Scheduling for Performance: Performance Distribution for 1-D Schedules [1/2] 2e+09 matmult 4e+09 locality 1.8e e e+09 3e+09 Cycles 1.4e e+09 Cycles 2.5e+09 2e+09 1e+09 original 1.5e+09 8e+08 6e Transformation identifier 1e+09 original 5e Transformation identifier Figure : Performance distribution for matmult and locality UCLA 96
138 Scheduling for Performance: Performance Distribution for 1-D Schedules [2/2] 1.42e+09 crout 1.34e+09 crout 1.4e e e e e e+09 Cycles 1.34e e+09 Cycles 1.3e e e e+09 original 1.28e e+09 original 1.26e Transformation identifier 1.26e Transformation identifier (a) GCC -O3 (b) ICC -fast Figure : The effect of the compiler UCLA 97
139 Scheduling for Performance: Quantitative Analysis: The Hypothesis Extremely large generated spaces: > points we must leverage static and dynamic characteristics to build traversal mechanisms Hypothesis: It is possible to statically order the impact on performance of transformation coefficients, that is, decompose the search space in subspaces where the performance variation is maximal or reduced First rows of Θ are more performance impacting than the last ones UCLA 98
140 Scheduling for Performance: Observations on the Performance Distribution Performance distribution - 8x8 DCT Performance improvement Best Average Worst for (i = 0; i < M; i++) for (j = 0; j < M; j++) { tmp[i][j] = 0.0; for (k = 0; k < M; k++) tmp[i][j] += block[i][k] * cos1[j][k]; } for (i = 0; i < M; i++) for (j = 0; j < M; j++) { sum2 = 0.0; for (k = 0; k < M; k++) sum2 += cos1[i][k] * tmp[k][j]; block[i][j] = ROUND(sum2); } Point index for the first schedule row Extensive study of 8x8 Discrete Cosine Transform (UTDSP) Search space analyzed: = different legal program versions UCLA 99
141 Scheduling for Performance: Observations on the Performance Distribution Performance improvement Performance distribution - 8x8 DCT Best Average Worst Θ : Point index for the first schedule row Extensive study of 8x8 Discrete Cosine Transform (UTDSP) Search space analyzed: = different legal program versions UCLA 99
142 Scheduling for Performance: Observations on the Performance Distribution Performance distribution - 8x8 DCT Best Average Worst best average worst Performance improvement Point index for the first schedule row Take one specific value for the first row Try the possible values for the second row UCLA 99
143 Scheduling for Performance: Observations on the Performance Distribution Performance distribution - 8x8 DCT Performance distribution (sorted) - 8x8 DCT Best Average Worst Performance improvement Point index for the first schedule row Point index of the second schedule dimension, first one fixed Take one specific value for the first row Try the possible values for the second row Very low proportion of best points: < 0.02% UCLA 99
144 Scheduling for Performance: Observations on the Performance Distribution Performance distribution - 8x8 DCT Best Average Worst Large performance variation Performance improvement Point index for the first schedule row Performance variation is large for good values of the first row UCLA 99
145 Scheduling for Performance: Observations on the Performance Distribution Performance distribution - 8x8 DCT Best Average Worst Small performance variation Performance improvement Point index for the first schedule row Performance variation is large for good values of the first row It is usually reduced for bad values of the first row UCLA 99
146 Scheduling for Performance: Scanning The Space of Program Versions The search space: Performance variation indicates to partition the space: ı > p > c Non-uniform distribution of performance No clear analytical property of the optimization function Build dedicated heuristic and genetic operators aware of these static and dynamic characteristics UCLA 100
147 Scheduling for Performance: The Quest for Good Objective Functions For data locality, loop tiling is key But what is the cost of tiling? Is tiling the only criterion? For coarse-grain parallelism, doall parallelization is key But what is the cost of parallelization? UCLA 101
148 Scheduling for Performance: Dependence Distance Minimization Idea: minimize the delay between instances accessing the same data Formulation in the polyhedral model: Expression of the delay through parametric form Use all dependences (including RAR) Definition (Dependence distance minimization) u k. n + w k Θ S ( x S ) Θ R ( x R ) x R, x S D R,S (1) u k N p,w k N UCLA 102
149 Scheduling for Performance: Key Observations Minimizing d = u k. n + w k minimize the dependence distance When d = 0 then Θ R k ( x R) = Θ S k ( x S) 0 Θ R k ( x R ) Θ S k ( x S) 0 d gives an indication of the communication volume between hyperplanes UCLA 103
150 Scheduling for Performance: Tiling An Overview of Tiling Tiling: partition the computation into atomic blocs Early work in the late 80 s Motivation: data locality improvement + parallelization UCLA 104
151 Scheduling for Performance: Tiling An Overview of Tiling Tiling the iteration space It must be valid (dependence analysis required) It may require pre-transformation Unimodular transformation framework limitations Supported in current compilers, but limited applicability Challenges: imperfectly nested loops, parametric loops, pre-transformations, tile shape,... Tile size selection Critical for locality concerns: determines the footprint Empirical search of the best size (problem + machine specific) Parametric tiling makes the generated code valid for any tile size UCLA 105
152 Scheduling for Performance: The Tiling Hyperplane Method Tiling in the Polyhedral Model Tiling partition the computation into blocks Note we consider only rectangular tiling here For tiling to be legal, such a partitioning must be legal UCLA 106
153 Scheduling for Performance: The Tiling Hyperplane Method Key Ideas of the Tiling Hyperplane Algorithm Affine transformations for communication minimal parallelization and locality optimization of arbitrarily nested loop sequences [Bondhugula et al, CC 08 & PLDI 08] Compute a set of transformations to make loops tilable Try to minimize synchronizations Try to maximize locality (maximal fusion) Result is a set of permutable loops, if possible Strip-mining / tiling can be applied Tiles may be sync-free parallel or pipeline parallel Algorithm always terminates (possibly by splitting loops/statements) UCLA 107
154 Scheduling for Performance: The Tiling Hyperplane Method Legality of Tiling Theorem (Legality of Tiling) Given Θ R k,θs k two one-dimensional schedules. They are valid tiling hyperplanes if D R,S, x R, x S D R,S, Θ S k ( x S) Θ R k ( x R ) 0 For a schedule to be a legal tiling hyperplane, all communications must go forward: Forward Communication Only [Griebl] All dependences must be considered at each level, including the previously strongly satisfied Equivalence between loop permutability and loop tilability UCLA 108
155 Scheduling for Performance: The Tiling Hyperplane Method Greedy Algorithm for Tiling Hyperplane Computation 1 Start from the outer-most level, find the set of FCO schedules 2 Select one which minimize the distance between dependent iterations 3 Mark dependences strongly satisfied by this schedule, but do not remove them 4 Formulate the problem for the next level (FCO), adding orthogonality constraints (linear independence) 5 solve again, etc. Special treatment when no permutable band can be found: splitting A few properties: Result is a set of permutable/tilable outer loops, when possible It exhibits coarse-grain parallelism Maximal fusion achieved to improve locality UCLA 109
156 Scheduling for Performance: The Tiling Hyperplane Method Example: 1D-Jacobi 1-D Jacobi (imperfectly nested) for (t=1; t<m; t++) { for (i=2; i<n!1; i++) { S: b[i] = 0.333*(a[i!1]+a[i]+a[i+1]); } for (j=2; j<n!1; j++) { T: a[j] = b[j]; } } Louisiana State University 1 The Ohio State University UCLA 110
157 Scheduling for Performance: The Tiling Hyperplane Method Example: 1D-Jacobi 1-D Jacobi (imperfectly nested) The resulting transformation is equivalent to a constant shift of one for T relative to S, fusion (j and i are named the same as a result), and skewing the fused i loop with respect to the t loop by a factor of two. The (1,0) hyperplane has the least communication: no dependence crosses more than one hyperplane instance along it. Louisiana State University 2 The Ohio State University UCLA 111
158 Scheduling for Performance: The Tiling Hyperplane Method Example: 1D-Jacobi Transforming S i i! Louisiana State University 3 t t! The Ohio State University UCLA 112
159 Scheduling for Performance: The Tiling Hyperplane Method Example: 1D-Jacobi Transforming T j j! Louisiana State University 4 t t! The Ohio State University UCLA 113
160 Scheduling for Performance: The Tiling Hyperplane Method Example: 1D-Jacobi Interleaving S and T i! j! t! t! Louisiana State University 5 The Ohio State University UCLA 114
161 Scheduling for Performance: The Tiling Hyperplane Method Example: 1D-Jacobi Interleaving S and T t Louisiana State University 6 The Ohio State University UCLA 115
162 Scheduling for Performance: The Tiling Hyperplane Method Example: 1D-Jacobi 1-D Jacobi (imperfectly nested) transformed code for (t0=0;t0<=m-1;t0++) { S : b[2]=0.333*(a[2-1]+a[2]+a[2+1]); for (t1=2*t0+3;t1<=2*t0+n-2;t1++) { S: b[-2*t0+t1]=0.333*(a[-2*t0+t1-1]+a[-2*t0+t1] +a[-2*t0+t1+1]); T: a[-2*t0+t1-1]=b[-2*t0+t1-1]; } T : a[n-2]=b[n-2]; } Louisiana State University 7 The Ohio State University UCLA 116
163 Scheduling for Performance: The Tiling Hyperplane Method Example: 1D-Jacobi 1-D Jacobi (imperfectly nested) transformed code for (t0=0;t0<=m-1;t0++) { S : b[2]=0.333*(a[2-1]+a[2]+a[2+1]); for (t1=2*t0+3;t1<=2*t0+n-2;t1++) { S: b[-2*t0+t1]=0.333*(a[-2*t0+t1-1]+a[-2*t0+t1] +a[-2*t0+t1+1]); T: a[-2*t0+t1-1]=b[-2*t0+t1-1]; } T : a[n-2]=b[n-2]; }!!!!!!! Louisiana State University 8 The Ohio State University UCLA 117
CS671 Parallel Programming in the Many-Core Era
1 CS671 Parallel Programming in the Many-Core Era Polyhedral Framework for Compilation: Polyhedral Model Representation, Data Dependence Analysis, Scheduling and Data Locality Optimizations December 3,
More informationIterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time
Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time Louis-Noël Pouchet, Cédric Bastoul, Albert Cohen and Nicolas Vasilache ALCHEMY, INRIA Futurs / University of Paris-Sud XI March
More informationPolyhedral Compilation Foundations
Polyhedral Compilation Foundations Louis-Noël Pouchet pouchet@cse.ohio-state.edu Dept. of Computer Science and Engineering, the Ohio State University Feb 22, 2010 888.11, Class #5 Introduction: Polyhedral
More informationPolyhedral Compilation Foundations
Polyhedral Compilation Foundations Louis-Noël Pouchet pouchet@cse.ohio-state.edu Dept. of Computer Science and Engineering, the Ohio State University Feb 15, 2010 888.11, Class #4 Introduction: Polyhedral
More informationThe Polyhedral Compilation Framework
The Polyhedral Compilation Framework Louis-Noël Pouchet Dept. of Computer Science and Engineering Ohio State University pouchet@cse.ohio-state.edu October 20, 2011 Introduction: Overview of Today s Lecture
More informationTiling: A Data Locality Optimizing Algorithm
Tiling: A Data Locality Optimizing Algorithm Previously Performance analysis of existing codes Data dependence analysis for detecting parallelism Specifying transformations using frameworks Today Usefulness
More informationIterative Optimization in the Polyhedral Model
Iterative Optimization in the Polyhedral Model Louis-Noël Pouchet, INRIA Saclay / University of Paris-Sud 11, France January 18th, 2010 Ph.D Defense Introduction: A Brief History... A Quick look backward:
More informationThe Polyhedral Model (Transformations)
The Polyhedral Model (Transformations) Announcements HW4 is due Wednesday February 22 th Project proposal is due NEXT Friday, extension so that example with tool is possible (see resources website for
More informationComputing and Informatics, Vol. 36, 2017, , doi: /cai
Computing and Informatics, Vol. 36, 2017, 566 596, doi: 10.4149/cai 2017 3 566 NESTED-LOOPS TILING FOR PARALLELIZATION AND LOCALITY OPTIMIZATION Saeed Parsa, Mohammad Hamzei Department of Computer Engineering
More informationAutomatic Polyhedral Optimization of Stencil Codes
Automatic Polyhedral Optimization of Stencil Codes ExaStencils 2014 Stefan Kronawitter Armin Größlinger Christian Lengauer 31.03.2014 The Need for Different Optimizations 3D 1st-grade Jacobi smoother Speedup
More informationLouis-Noël Pouchet
Internship Report, jointly EPITA CSI 2006 and UPS Master 2 (March - September 2006) Louis-Noël Pouchet WHEN ITERATIVE OPTIMIZATION MEETS THE POLYHEDRAL MODEL: ONE-DIMENSIONAL
More informationLegal and impossible dependences
Transformations and Dependences 1 operations, column Fourier-Motzkin elimination us use these tools to determine (i) legality of permutation and Let generation of transformed code. (ii) Recall: Polyhedral
More informationPolyhedral-Based Data Reuse Optimization for Configurable Computing
Polyhedral-Based Data Reuse Optimization for Configurable Computing Louis-Noël Pouchet 1 Peng Zhang 1 P. Sadayappan 2 Jason Cong 1 1 University of California, Los Angeles 2 The Ohio State University February
More informationLoop Nest Optimizer of GCC. Sebastian Pop. Avgust, 2006
Loop Nest Optimizer of GCC CRI / Ecole des mines de Paris Avgust, 26 Architecture of GCC and Loop Nest Optimizer C C++ Java F95 Ada GENERIC GIMPLE Analyses aliasing data dependences number of iterations
More informationPolyOpt/C. A Polyhedral Optimizer for the ROSE compiler Edition 0.2, for PolyOpt/C March 12th Louis-Noël Pouchet
PolyOpt/C A Polyhedral Optimizer for the ROSE compiler Edition 0.2, for PolyOpt/C 0.2.1 March 12th 2012 Louis-Noël Pouchet This manual is dedicated to PolyOpt/C version 0.2.1, a framework for Polyhedral
More informationCompiling for Advanced Architectures
Compiling for Advanced Architectures In this lecture, we will concentrate on compilation issues for compiling scientific codes Typically, scientific codes Use arrays as their main data structures Have
More informationPolyhedral Operations. Algorithms needed for automation. Logistics
Polyhedral Operations Logistics Intermediate reports late deadline is Friday March 30 at midnight HW6 (posted) and HW7 (posted) due April 5 th Tuesday April 4 th, help session during class with Manaf,
More informationParametric Multi-Level Tiling of Imperfectly Nested Loops*
Parametric Multi-Level Tiling of Imperfectly Nested Loops* Albert Hartono 1, Cedric Bastoul 2,3 Sriram Krishnamoorthy 4 J. Ramanujam 6 Muthu Baskaran 1 Albert Cohen 2 Boyana Norris 5 P. Sadayappan 1 1
More informationLinear Loop Transformations for Locality Enhancement
Linear Loop Transformations for Locality Enhancement 1 Story so far Cache performance can be improved by tiling and permutation Permutation of perfectly nested loop can be modeled as a linear transformation
More informationInteger Programming Theory
Integer Programming Theory Laura Galli October 24, 2016 In the following we assume all functions are linear, hence we often drop the term linear. In discrete optimization, we seek to find a solution x
More informationAn Overview to. Polyhedral Model. Fangzhou Jiao
An Overview to Polyhedral Model Fangzhou Jiao Polyhedral Model A framework for performing loop transformation Loop representation: using polytopes to achieve fine-grain representation of program Loop transformation:
More informationIterative Optimization in the Polyhedral Model: Part II, Multidimensional Time
Iterative Optimization in the Polyhedral Model: Part II, Multidimensional Time Louis-Noël Pouchet 1,3, Cédric Bastoul 1,3, Albert Cohen 1,3 and John Cavazos 2,3 1 ALCHEMY Group, INRIA Futurs and Paris-Sud
More informationUNIVERSITÉ DE PARIS-SUD 11 U.F.R. SCIENTIFIQUE D ORSAY
UNIVERSITÉ DE PARIS-SUD 11 U.F.R. SCIENTIFIQUE D ORSAY In Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY Specialty: Computer Science Louis-Noël POUCHET Subject: ITERATIVE
More informationLoop Transformations: Convexity, Pruning and Optimization
Loop Transformations: Convexity, Pruning and Optimization Louis-Noël Pouchet The Ohio State University pouchet@cse.ohio-state.edu Uday Bondhugula IBM T.J. Watson Research Center ubondhug@us.ibm.com Cédric
More informationConvex Geometry arising in Optimization
Convex Geometry arising in Optimization Jesús A. De Loera University of California, Davis Berlin Mathematical School Summer 2015 WHAT IS THIS COURSE ABOUT? Combinatorial Convexity and Optimization PLAN
More informationPLuTo: A Practical and Fully Automatic Polyhedral Program Optimization System
PLuTo: A Practical and Fully Automatic Polyhedral Program Optimization System Uday Bondhugula J. Ramanujam P. Sadayappan Dept. of Computer Science and Engineering Dept. of Electrical & Computer Engg. and
More informationA Crash Course in Compilers for Parallel Computing. Mary Hall Fall, L2: Transforms, Reuse, Locality
A Crash Course in Compilers for Parallel Computing Mary Hall Fall, 2008 1 Overview of Crash Course L1: Data Dependence Analysis and Parallelization (Oct. 30) L2 & L3: Loop Reordering Transformations, Reuse
More informationA polyhedral loop transformation framework for parallelization and tuning
A polyhedral loop transformation framework for parallelization and tuning Ohio State University Uday Bondhugula, Muthu Baskaran, Albert Hartono, Sriram Krishnamoorthy, P. Sadayappan Argonne National Laboratory
More informationReview. Loop Fusion Example
Review Distance vectors Concisely represent dependences in loops (i.e., in iteration spaces) Dictate what transformations are legal e.g., Permutation and parallelization Legality A dependence vector is
More informationCombinatorial Geometry & Topology arising in Game Theory and Optimization
Combinatorial Geometry & Topology arising in Game Theory and Optimization Jesús A. De Loera University of California, Davis LAST EPISODE... We discuss the content of the course... Convex Sets A set is
More informationLoop Transformations: Convexity, Pruning and Optimization
Loop Transformations: Convexity, Pruning and Optimization Louis-Noël Pouchet, Uday Bondhugula, Cédric Bastoul, Albert Cohen, Jagannathan Ramanujam, Ponnuswamy Sadayappan, Nicolas Vasilache To cite this
More informationThe Challenges of Non-linear Parameters and Variables in Automatic Loop Parallelisation
The Challenges of Non-linear Parameters and Variables in Automatic Loop Parallelisation Armin Größlinger December 2, 2009 Rigorosum Fakultät für Informatik und Mathematik Universität Passau Automatic Loop
More informationMath 5593 Linear Programming Lecture Notes
Math 5593 Linear Programming Lecture Notes Unit II: Theory & Foundations (Convex Analysis) University of Colorado Denver, Fall 2013 Topics 1 Convex Sets 1 1.1 Basic Properties (Luenberger-Ye Appendix B.1).........................
More informationAdvanced Operations Research Techniques IE316. Quiz 2 Review. Dr. Ted Ralphs
Advanced Operations Research Techniques IE316 Quiz 2 Review Dr. Ted Ralphs IE316 Quiz 2 Review 1 Reading for The Quiz Material covered in detail in lecture Bertsimas 4.1-4.5, 4.8, 5.1-5.5, 6.1-6.3 Material
More informationPolyhedral Compilation Foundations
Polyhedral Complaton Foundatons Lous-Noël Pouchet pouchet@cse.oho-state.edu Dept. of Computer Scence and Engneerng, the Oho State Unversty Feb 8, 200 888., Class # Introducton: Polyhedral Complaton Foundatons
More informationAutotuning. John Cavazos. University of Delaware UNIVERSITY OF DELAWARE COMPUTER & INFORMATION SCIENCES DEPARTMENT
Autotuning John Cavazos University of Delaware What is Autotuning? Searching for the best code parameters, code transformations, system configuration settings, etc. Search can be Quasi-intelligent: genetic
More informationThe Polyhedral Model Is More Widely Applicable Than You Think
The Polyhedral Model Is More Widely Applicable Than You Think Mohamed-Walid Benabderrahmane 1 Louis-Noël Pouchet 1,2 Albert Cohen 1 Cédric Bastoul 1 1 ALCHEMY group, INRIA Saclay / University of Paris-Sud
More informationPolyhedral Search Space Exploration in the ExaStencils Code Generator
Preprint version before issue assignment Polyhedral Search Space Exploration in the ExaStencils Code Generator STEFAN KRONAWITTER and CHRISTIAN LENGAUER, University of Passau, Germany Performance optimization
More informationAdvanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs
Advanced Operations Research Techniques IE316 Quiz 1 Review Dr. Ted Ralphs IE316 Quiz 1 Review 1 Reading for The Quiz Material covered in detail in lecture. 1.1, 1.4, 2.1-2.6, 3.1-3.3, 3.5 Background material
More informationPolly Polyhedral Optimizations for LLVM
Polly Polyhedral Optimizations for LLVM Tobias Grosser - Hongbin Zheng - Raghesh Aloor Andreas Simbürger - Armin Grösslinger - Louis-Noël Pouchet April 03, 2011 Polly - Polyhedral Optimizations for LLVM
More informationLoop Transformations, Dependences, and Parallelization
Loop Transformations, Dependences, and Parallelization Announcements HW3 is due Wednesday February 15th Today HW3 intro Unimodular framework rehash with edits Skewing Smith-Waterman (the fix is in!), composing
More informationUNIVERSITÉ DE PARIS-SUD 11 U.F.R. SCIENTIFIQUE D ORSAY
UNIVERSITÉ DE PARIS-SUD 11 U.F.R. SCIENTIFIQUE D ORSAY In Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY Specialty: Computer Science Louis-Noël POUCHET Subject: ITERATIVE
More informationMathematical and Algorithmic Foundations Linear Programming and Matchings
Adavnced Algorithms Lectures Mathematical and Algorithmic Foundations Linear Programming and Matchings Paul G. Spirakis Department of Computer Science University of Patras and Liverpool Paul G. Spirakis
More informationTiling: A Data Locality Optimizing Algorithm
Tiling: A Data Locality Optimizing Algorithm Announcements Monday November 28th, Dr. Sanjay Rajopadhye is talking at BMAC Friday December 2nd, Dr. Sanjay Rajopadhye will be leading CS553 Last Monday Kelly
More informationIn this chapter we introduce some of the basic concepts that will be useful for the study of integer programming problems.
2 Basics In this chapter we introduce some of the basic concepts that will be useful for the study of integer programming problems. 2.1 Notation Let A R m n be a matrix with row index set M = {1,...,m}
More informationmaximize c, x subject to Ax b,
Lecture 8 Linear programming is about problems of the form maximize c, x subject to Ax b, where A R m n, x R n, c R n, and b R m, and the inequality sign means inequality in each row. The feasible set
More informationNull space basis: mxz. zxz I
Loop Transformations Linear Locality Enhancement for ache performance can be improved by tiling and permutation Permutation of perfectly nested loop can be modeled as a matrix of the loop nest. dependence
More informationLinear programming and duality theory
Linear programming and duality theory Complements of Operations Research Giovanni Righini Linear Programming (LP) A linear program is defined by linear constraints, a linear objective function. Its variables
More informationControl flow graphs and loop optimizations. Thursday, October 24, 13
Control flow graphs and loop optimizations Agenda Building control flow graphs Low level loop optimizations Code motion Strength reduction Unrolling High level loop optimizations Loop fusion Loop interchange
More informationA Compiler Framework for Optimization of Affine Loop Nests for General Purpose Computations on GPUs
A Compiler Framework for Optimization of Affine Loop Nests for General Purpose Computations on GPUs Muthu Manikandan Baskaran 1 Uday Bondhugula 1 Sriram Krishnamoorthy 1 J. Ramanujam 2 Atanas Rountev 1
More informationSolving the Live-out Iterator Problem, Part I
Louis-Noël Pouchet pouchet@cse.ohio-state.edu Dept. of Computer Science and Engineering, the Ohio State University September 2010 888.11 Outline: Reminder: step-by-step methodology 1 Problem definition:
More informationLoop Transformations! Part II!
Lecture 9! Loop Transformations! Part II! John Cavazos! Dept of Computer & Information Sciences! University of Delaware! www.cis.udel.edu/~cavazos/cisc879! Loop Unswitching Hoist invariant control-flow
More informationLecture 4: Rational IPs, Polyhedron, Decomposition Theorem
IE 5: Integer Programming, Spring 29 24 Jan, 29 Lecture 4: Rational IPs, Polyhedron, Decomposition Theorem Lecturer: Karthik Chandrasekaran Scribe: Setareh Taki Disclaimer: These notes have not been subjected
More information6.189 IAP Lecture 11. Parallelizing Compilers. Prof. Saman Amarasinghe, MIT IAP 2007 MIT
6.189 IAP 2007 Lecture 11 Parallelizing Compilers 1 6.189 IAP 2007 MIT Outline Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization Opportunities Generation of Parallel
More informationThe Polytope Model: Past, Present, Future
The Polytope Model: Past, Present, Future Paul Feautrier ENS de Lyon Paul.Feautrier@ens-lyon.fr 8 octobre 2009 1 / 39 What is a Model? What is a Polytope? Basis of the Polytope Model Fundamental Algorithms
More informationPolytopes Course Notes
Polytopes Course Notes Carl W. Lee Department of Mathematics University of Kentucky Lexington, KY 40506 lee@ms.uky.edu Fall 2013 i Contents 1 Polytopes 1 1.1 Convex Combinations and V-Polytopes.....................
More informationTiling: A Data Locality Optimizing Algorithm
Tiling: A Data Locality Optimizing Algorithm Previously Unroll and Jam Homework PA3 is due Monday November 2nd Today Unroll and Jam is tiling Code generation for fixed-sized tiles Paper writing and critique
More informationGRAPHITE: Polyhedral Analyses and Optimizations
GRAPHITE: Polyhedral Analyses and Optimizations for GCC Sebastian Pop 1 Albert Cohen 2 Cédric Bastoul 2 Sylvain Girbal 2 Georges-André Silber 1 Nicolas Vasilache 2 1 CRI, École des mines de Paris, Fontainebleau,
More informationInvestigating Mixed-Integer Hulls using a MIP-Solver
Investigating Mixed-Integer Hulls using a MIP-Solver Matthias Walter Otto-von-Guericke Universität Magdeburg Joint work with Volker Kaibel (OvGU) Aussois Combinatorial Optimization Workshop 2015 Outline
More informationLecture 9 Basic Parallelization
Lecture 9 Basic Parallelization I. Introduction II. Data Dependence Analysis III. Loop Nests + Locality IV. Interprocedural Parallelization Chapter 11.1-11.1.4 CS243: Parallelization 1 Machine Learning
More informationLecture 9 Basic Parallelization
Lecture 9 Basic Parallelization I. Introduction II. Data Dependence Analysis III. Loop Nests + Locality IV. Interprocedural Parallelization Chapter 11.1-11.1.4 CS243: Parallelization 1 Machine Learning
More informationLinear Programming in Small Dimensions
Linear Programming in Small Dimensions Lekcija 7 sergio.cabello@fmf.uni-lj.si FMF Univerza v Ljubljani Edited from slides by Antoine Vigneron Outline linear programming, motivation and definition one dimensional
More informationConvex Hull Representation Conversion (cddlib, lrslib)
Convex Hull Representation Conversion (cddlib, lrslib) Student Seminar in Combinatorics: Mathematical Software Niklas Pfister October 31, 2014 1 Introduction In this report we try to give a short overview
More informationGenerating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory
Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory Roshan Dathathri Thejas Ramashekar Chandan Reddy Uday Bondhugula Department of Computer Science and Automation
More informationConic Duality. yyye
Conic Linear Optimization and Appl. MS&E314 Lecture Note #02 1 Conic Duality Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/
More informationPolar Duality and Farkas Lemma
Lecture 3 Polar Duality and Farkas Lemma October 8th, 2004 Lecturer: Kamal Jain Notes: Daniel Lowd 3.1 Polytope = bounded polyhedron Last lecture, we were attempting to prove the Minkowsky-Weyl Theorem:
More informationPolyhedral Computation Today s Topic: The Double Description Algorithm. Komei Fukuda Swiss Federal Institute of Technology Zurich October 29, 2010
Polyhedral Computation Today s Topic: The Double Description Algorithm Komei Fukuda Swiss Federal Institute of Technology Zurich October 29, 2010 1 Convexity Review: Farkas-Type Alternative Theorems Gale
More information1. Lecture notes on bipartite matching
Massachusetts Institute of Technology 18.453: Combinatorial Optimization Michel X. Goemans February 5, 2017 1. Lecture notes on bipartite matching Matching problems are among the fundamental problems in
More informationAdvanced Compiler Construction Theory And Practice
Advanced Compiler Construction Theory And Practice Introduction to loop dependence and Optimizations 7/7/2014 DragonStar 2014 - Qing Yi 1 A little about myself Qing Yi Ph.D. Rice University, USA. Associate
More informationDual-fitting analysis of Greedy for Set Cover
Dual-fitting analysis of Greedy for Set Cover We showed earlier that the greedy algorithm for set cover gives a H n approximation We will show that greedy produces a solution of cost at most H n OPT LP
More informationCombined Iterative and Model-driven Optimization in an Automatic Parallelization Framework
Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework Louis-Noël Pouchet The Ohio State University pouchet@cse.ohio-state.edu Uday Bondhugula IBM T.J. Watson Research
More informationDM545 Linear and Integer Programming. Lecture 2. The Simplex Method. Marco Chiarandini
DM545 Linear and Integer Programming Lecture 2 The Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. 2. 3. 4. Standard Form Basic Feasible Solutions
More informationMath 414 Lecture 2 Everyone have a laptop?
Math 44 Lecture 2 Everyone have a laptop? THEOREM. Let v,...,v k be k vectors in an n-dimensional space and A = [v ;...; v k ] v,..., v k independent v,..., v k span the space v,..., v k a basis v,...,
More informationData Dependences and Parallelization
Data Dependences and Parallelization 1 Agenda Introduction Single Loop Nested Loops Data Dependence Analysis 2 Motivation DOALL loops: loops whose iterations can execute in parallel for i = 11, 20 a[i]
More informationLecture 6: Faces, Facets
IE 511: Integer Programming, Spring 2019 31 Jan, 2019 Lecturer: Karthik Chandrasekaran Lecture 6: Faces, Facets Scribe: Setareh Taki Disclaimer: These notes have not been subjected to the usual scrutiny
More informationLooPo: Automatic Loop Parallelization
LooPo: Automatic Loop Parallelization Michael Claßen Fakultät für Informatik und Mathematik Düsseldorf, November 27 th 2008 Model-Based Loop Transformations model-based approach: map source code to an
More informationCS 293S Parallelism and Dependence Theory
CS 293S Parallelism and Dependence Theory Yufei Ding Reference Book: Optimizing Compilers for Modern Architecture by Allen & Kennedy Slides adapted from Louis-Noël Pouche, Mary Hall End of Moore's Law
More informationMATH 890 HOMEWORK 2 DAVID MEREDITH
MATH 890 HOMEWORK 2 DAVID MEREDITH (1) Suppose P and Q are polyhedra. Then P Q is a polyhedron. Moreover if P and Q are polytopes then P Q is a polytope. The facets of P Q are either F Q where F is a facet
More informationMinimizing Memory Strides using Integer Transformations of Parametric Polytopes
Minimizing Memory Strides using Integer Transformations of Parametric Polytopes Rachid Seghir, Vincent Loechner and Benoît Meister ICPS/LSIIT UMR CNRS-ULP 7005 Pôle API, Bd Sébastien Brant F-67400 Illkirch
More informationACTUALLY DOING IT : an Introduction to Polyhedral Computation
ACTUALLY DOING IT : an Introduction to Polyhedral Computation Jesús A. De Loera Department of Mathematics Univ. of California, Davis http://www.math.ucdavis.edu/ deloera/ 1 What is a Convex Polytope? 2
More informationLecture 3. Corner Polyhedron, Intersection Cuts, Maximal Lattice-Free Convex Sets. Tepper School of Business Carnegie Mellon University, Pittsburgh
Lecture 3 Corner Polyhedron, Intersection Cuts, Maximal Lattice-Free Convex Sets Gérard Cornuéjols Tepper School of Business Carnegie Mellon University, Pittsburgh January 2016 Mixed Integer Linear Programming
More informationBasic Algorithms for Periodic-Linear Inequalities and Integer Polyhedra
Basic Algorithms for Periodic-Linear Inequalities and Integer Polyhedra Alain Ketterlin / Camus IMPACT 2018: January, 23, 2018 Motivation Periodic-Linear Inequalities The Omicron Test Decomposition Motivation
More informationTHEORY OF LINEAR AND INTEGER PROGRAMMING
THEORY OF LINEAR AND INTEGER PROGRAMMING ALEXANDER SCHRIJVER Centrum voor Wiskunde en Informatica, Amsterdam A Wiley-Inter science Publication JOHN WILEY & SONS^ Chichester New York Weinheim Brisbane Singapore
More informationApplied Integer Programming
Applied Integer Programming D.S. Chen; R.G. Batson; Y. Dang Fahimeh 8.2 8.7 April 21, 2015 Context 8.2. Convex sets 8.3. Describing a bounded polyhedron 8.4. Describing unbounded polyhedron 8.5. Faces,
More informationDiscrete Optimization 2010 Lecture 5 Min-Cost Flows & Total Unimodularity
Discrete Optimization 2010 Lecture 5 Min-Cost Flows & Total Unimodularity Marc Uetz University of Twente m.uetz@utwente.nl Lecture 5: sheet 1 / 26 Marc Uetz Discrete Optimization Outline 1 Min-Cost Flows
More informationMore Data Locality for Static Control Programs on NUMA Architectures
More Data Locality for Static Control Programs on NUMA Architectures Adilla Susungi 1, Albert Cohen 2, Claude Tadonki 1 1 MINES ParisTech, PSL Research University 2 Inria and DI, Ecole Normale Supérieure
More informationC&O 355 Lecture 16. N. Harvey
C&O 355 Lecture 16 N. Harvey Topics Review of Fourier-Motzkin Elimination Linear Transformations of Polyhedra Convex Combinations Convex Hulls Polytopes & Convex Hulls Fourier-Motzkin Elimination Joseph
More informationbe a polytope. has such a representation iff it contains the origin in its interior. For a generic, sort the inequalities so that
( Shelling (Bruggesser-Mani 1971) and Ranking Let be a polytope. has such a representation iff it contains the origin in its interior. For a generic, sort the inequalities so that. a ranking of vertices
More informationRanking Functions. Linear-Constraint Loops
for Linear-Constraint Loops Amir Ben-Amram 1 for Loops Example 1 (GCD program): while (x > 1, y > 1) if x
More informationLecture 5: Duality Theory
Lecture 5: Duality Theory Rajat Mittal IIT Kanpur The objective of this lecture note will be to learn duality theory of linear programming. We are planning to answer following questions. What are hyperplane
More informationOutline. Why Parallelism Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization Opportunities
Parallelization Outline Why Parallelism Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization Opportunities Moore s Law From Hennessy and Patterson, Computer Architecture:
More information3. The Simplex algorithmn The Simplex algorithmn 3.1 Forms of linear programs
11 3.1 Forms of linear programs... 12 3.2 Basic feasible solutions... 13 3.3 The geometry of linear programs... 14 3.4 Local search among basic feasible solutions... 15 3.5 Organization in tableaus...
More informationParallelization. Saman Amarasinghe. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
Spring 2 Parallelization Saman Amarasinghe Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Outline Why Parallelism Parallel Execution Parallelizing Compilers
More informationAffine and Unimodular Transformations for Non-Uniform Nested Loops
th WSEAS International Conference on COMPUTERS, Heraklion, Greece, July 3-, 008 Affine and Unimodular Transformations for Non-Uniform Nested Loops FAWZY A. TORKEY, AFAF A. SALAH, NAHED M. EL DESOUKY and
More informationEssential constraints: Data Dependences. S1: a = b + c S2: d = a * 2 S3: a = c + 2 S4: e = d + c + 2
Essential constraints: Data Dependences S1: a = b + c S2: d = a * 2 S3: a = c + 2 S4: e = d + c + 2 Essential constraints: Data Dependences S1: a = b + c S2: d = a * 2 S3: a = c + 2 S4: e = d + c + 2 S2
More informationChapter 4 Concepts from Geometry
Chapter 4 Concepts from Geometry An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Line Segments The line segment between two points and in R n is the set of points on the straight line joining
More informationLecture 2 September 3
EE 381V: Large Scale Optimization Fall 2012 Lecture 2 September 3 Lecturer: Caramanis & Sanghavi Scribe: Hongbo Si, Qiaoyang Ye 2.1 Overview of the last Lecture The focus of the last lecture was to give
More informationLecture 2 - Introduction to Polytopes
Lecture 2 - Introduction to Polytopes Optimization and Approximation - ENS M1 Nicolas Bousquet 1 Reminder of Linear Algebra definitions Let x 1,..., x m be points in R n and λ 1,..., λ m be real numbers.
More informationMatrix Multiplication
Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2018 1 / 32 Outline 1 Matrix operations Importance Dense and sparse
More information60 2 Convex sets. {x a T x b} {x ã T x b}
60 2 Convex sets Exercises Definition of convexity 21 Let C R n be a convex set, with x 1,, x k C, and let θ 1,, θ k R satisfy θ i 0, θ 1 + + θ k = 1 Show that θ 1x 1 + + θ k x k C (The definition of convexity
More information