Legal and impossible dependences

Similar documents
Linear Loop Transformations for Locality Enhancement

Null space basis: mxz. zxz I

Loop Transformations, Dependences, and Parallelization

Tiling: A Data Locality Optimizing Algorithm

Systems of Inequalities

The Polyhedral Model (Transformations)

Review. Loop Fusion Example

Tiling: A Data Locality Optimizing Algorithm

LARGE SCALE LINEAR AND INTEGER OPTIMIZATION: A UNIFIED APPROACH

Chapter 4 Concepts from Geometry

CS671 Parallel Programming in the Many-Core Era

Data Dependences and Parallelization

CS 293S Parallelism and Dependence Theory

Lesson 17. Geometry and Algebra of Corner Points

Principle of Polyhedral model for loop optimization. cschen 陳鍾樞

Computing and Informatics, Vol. 36, 2017, , doi: /cai

The Challenges of Non-linear Parameters and Variables in Automatic Loop Parallelisation

Discrete Optimization 2010 Lecture 5 Min-Cost Flows & Total Unimodularity

Computing the Integer Points of a Polyhedron

A Crash Course in Compilers for Parallel Computing. Mary Hall Fall, L2: Transforms, Reuse, Locality

6.189 IAP Lecture 11. Parallelizing Compilers. Prof. Saman Amarasinghe, MIT IAP 2007 MIT

Investigating Mixed-Integer Hulls using a MIP-Solver

Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time

Linear Equations in Linear Algebra

Catalan Numbers. Table 1: Balanced Parentheses

Linear Programming in Small Dimensions

Integer Programming Theory

Transforming Imperfectly Nested Loops

Control flow graphs and loop optimizations. Thursday, October 24, 13

Linear programming and duality theory

Autotuning. John Cavazos. University of Delaware UNIVERSITY OF DELAWARE COMPUTER & INFORMATION SCIENCES DEPARTMENT

Performing Matrix Operations on the TI-83/84

CS 372: Computational Geometry Lecture 10 Linear Programming in Fixed Dimension

Dual-fitting analysis of Greedy for Set Cover

Lecture Notes on Loop Transformations for Cache Optimization

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs

Data-centric Transformations for Locality Enhancement

Polyhedral Compilation Foundations

4 Integer Linear Programming (ILP)

Loop Transformations! Part II!

THEORY OF LINEAR AND INTEGER PROGRAMMING

The Polyhedral Model (Dependences and Transformations)

A Loop Transformation Theory and an Algorithm to Maximize Parallelism

Computational Methods CMSC/AMSC/MAPL 460. Vectors, Matrices, Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science

A Layout-Conscious Iteration Space Transformation Technique

DM545 Linear and Integer Programming. Lecture 2. The Simplex Method. Marco Chiarandini

Communication-Minimal Tiling of Uniform Dependence Loops

POLYHEDRAL GEOMETRY. Convex functions and sets. Mathematical Programming Niels Lauritzen Recall that a subset C R n is convex if

Lecture 2 Convex Sets

Tiling: A Data Locality Optimizing Algorithm

Outline. Why Parallelism Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization Opportunities

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

Polyhedral Operations. Algorithms needed for automation. Logistics

Numerical Linear Algebra

15.082J and 6.855J. Lagrangian Relaxation 2 Algorithms Application to LPs

Mathematical and Algorithmic Foundations Linear Programming and Matchings

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #16 Loops: Matrix Using Nested for Loop

1 Inference for Boolean theories

Lecture 9 Basic Parallelization

Lecture 9 Basic Parallelization

Parallelization. Saman Amarasinghe. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Unconstrained Optimization

60 2 Convex sets. {x a T x b} {x ã T x b}

LARP / 2018 ACK : 1. Linear Algebra and Its Applications - Gilbert Strang 2. Autar Kaw, Transforming Numerical Methods Education for STEM Graduates

Compiling for Advanced Architectures

1.5 Part - 2 Inverse Relations and Inverse Functions

Polytopes Course Notes

Week 7 Convex Hulls in 3D

Matching and Planarity

Gelfand-Tsetlin Polytopes

Tilings of the Euclidean plane

Probabilistic Graphical Models

Fourier-Motzkin and Farkas Questions (HW10)

Mathematical Programming and Research Methods (Part II)

The Simplex Algorithm

Math 1B03/1ZC3 - Tutorial 3. Jan. 24th/28th, 2014

Module 18: Loop Optimizations Lecture 35: Amdahl s Law. The Lecture Contains: Amdahl s Law. Induction Variable Substitution.

x = 12 x = 12 1x = 16

Linear Optimization and Extensions: Theory and Algorithms

IN TELECOM and real-time multimedia applications including

Some Advanced Topics in Linear Programming

Mapping Multi-Dimensional Signals into Hierarchical Memory Organizations

Chapter 15 Introduction to Linear Programming

Code Generation Methods for Tiling Transformations

Module 18: Loop Optimizations Lecture 36: Cycle Shrinking. The Lecture Contains: Cycle Shrinking. Cycle Shrinking in Distance Varying Loops

Rubber bands. Chapter Rubber band representation

EULER S FORMULA AND THE FIVE COLOR THEOREM

Ranking Functions. Linear-Constraint Loops

Contents of Lecture 11

Package lintools. July 30, 2018

Cluster algebras and infinite associahedra

Optimization of Design. Lecturer:Dung-An Wang Lecture 8

Linear Programming. Course review MS-E2140. v. 1.1

A constructive solution to the juggling problem in systolic array synthesis

R n a T i x = b i} is a Hyperplane.

MATH 890 HOMEWORK 2 DAVID MEREDITH

On the Dimension of the Bivariate Spline Space S 1 3( )

LECTURES 3 and 4: Flows and Matchings

Three applications of Euler s formula. Chapter 10

LECTURE 13: SOLUTION METHODS FOR CONSTRAINED OPTIMIZATION. 1. Primal approach 2. Penalty and barrier methods 3. Dual approach 4. Primal-dual approach

Transcription:

Transformations and Dependences 1

operations, column Fourier-Motzkin elimination us use these tools to determine (i) legality of permutation and Let generation of transformed code. (ii) Recall: Polyhedral algebra tools for determining emptiness of convex polyhedra enumerating integers in such a polyhedron. Central ideas: reduction of matrices to echelon form by unimodular 2

Dependence relation

Legal and impossible dependences

Distance vectors

Distance vectors

Computing distance vectors

Distance vectors: 2D loop nest

Direction vectors

Direction vectors for nested loops

Example

Example (contd.)

Example (contd.)

Dependence matrix

Loop permutation can be modeled as a linear transformation on iteration space: J V 0 1 I = U 1 0 J V DO I = 1, N DO U = 1, N DO J = I,N DO V = 1,U X(I,J) = 5 I X(V,U) = 5 U Permutation of loops in n-loop nest: nxn permutation matrix P P I = U Questions: (1) How do we generate new loop bounds? (2) How do we modify the loop body? (3) How do we know when loop interchange is legal? 4

J V I U DO I = 1, N 0 1 I = U DO U = 1, N DO J = I,N 1 0 J V DO V = 1,U...... I1 J1 I2 J2 T I1 J1 T I2 J2 Dependence distance = I2 - I1 J2 - J1 Distance between iterations = I2 T - T I1 = T J2 J1 I2 - I1 J2 - J1 = J2 - J1 I2 - I1 Check for legality: interchange positions in distance/direction vector check for lex +ve If transformation P is legal and original dependence matrix is D, new dependence matrix is T*D. 47

Transformation matrix: T Dependence matrix of transformed program: T:D Correctness of general permutation matrix: D Dependence in which each column is a distance/direction vector Matrix Legality: T:D 0 48

Examples: DO I = 1,N DO J = 1,N X(I,J) = X(I-1,J-1)... Distance vector = (1,1) => permutation is legal Dependence vector of transformed program = (1,1) DO I = 1,N DO J = 1,N X(I,J) = X(I-1,J+1)... Distance vector = (1,-1) => permutation is not legal 49

Transformation: U = T I for loop permutation, T is a permutation matrix Note: inverse is integer matrix => So bounds on U can be written as A T ;1 U b Fourier-Motzkin elimination on this system of Perform to obtain bounds on U. inequalities Code Generation for Transformed Loop Nest Two problems: (1) Loop bounds (2) Change of variables in body (1) New bounds: Original bounds: A I b where A is in echelon form (2) Change of variables: I = T ;1 U Replace old variables by new using this formula 5

Example: J V I U 0 1 I = U 1 0 J V DO I = 1, N DO U = 1, N DO J = I,N DO V = 1,U X(I,J) = 5 X(V,U) = 5 Fourier-Motzkin elimination -1 0 I < -1 1 0 J N 1-1 0 0 1 N -1 0 1 0 1-1 0 1 0 1 1 0 U V < -1 N 0 N 6

-1 0 1 0 1-1 0 1 0 1 1 0 U V < -1 N 0 N 0-1 0 1-1 1 1 0 U V < -1 N 0 N Projecting out V from system gives 1 < U < N Bounds for V are 1 < V < min(u,n) These are loop bounds given by FM elimination. With a little extra work, we can simplify the upper bound of V to U. 7

maxs in lower bounds and mins in upper bounds. with No need for pattern matching etc for triangular bounds and the Key points: Loop bounds determination in transformed code is mechanical. Polyhedral algebra technology can handle very general bounds like. 8

for permutations applies to other loop transformations that Theory be modeled as linear transformations: skewing, reversal,scaling. can Dependence matrix of transformed program: T:D Transformation matrix: T (a non-singular matrix) matrix: D Dependence in which each column is a distance/direction vector Matrix Legality: T:D 0 Small complication with code generation if scaling is included. 3

between two iterations Dependence iterations touch the same location => J 5 4 3 2 1 00 11 01 00 11 01 01 00 11 00 11 01 1 2 3 4 5 00 11 01 I Motivating example: Wavefront code DO I = 1,N DO J = 1,N X(I,J) = X(I-1,J+1)... Dependence matrix = 1-1 => potential for exploiting data reuse! iteration points between executions of dependent iterations. N we schedule dependent iterations closer together? Can 8

now, focus only on reuse opportunities between dependent For iterations. iterations that read from the same location: input dependences spatial locality This exploits only one kind of temporal locality. There are other kinds of locality that are important: Both are easy to add to basic model as we will see later. 9

J 5 4 3 2 1 Exploiting temporal locality in wavefront 01 00 11 01 00 11 01 00 11 01 00 11 01 00 11 01 00 11 1 2 3 4 5 01 00 11 01 00 11 I J 5 4 3 2 1 00 11 00 11 01 01 00 11 00 11 01 01 1 2 3 4 5 01 01 01 01 I Permutation is illegal! Tiling is illegal! We have studied two transformations: permutation and tiling. Permutation and tiling are both illegal. 10

Height Reduction 11

Transformation is legal. Dependent iterations are scheduled close together, so good for J H One solution: schedule iterations along 45 degree planes! I G Note: locality. Can we view this in terms of loop transformation? Loop skewing followed by loop reversal. 12

Loop Skewing: a linear loop transformation J V 1 0 1 1 I J = U V Skewing of inner loop by outer loop: I 1 0 k 1 U (k is some fixed integer) Skewing of inner loop by an outer loop: always legal New dependence vectors: compute T*D 1 In this example, D = T*D = -1 1 0 This skewing has changed dependence vector but it has not brought dependent iterations closer together... 13

Skewing outer loop by inner loop J V I U 1 1 0 1 I J = U V Outer loop skewing: 1 k 0 1 Skewing of outer loop by inner loop: not necessarily legal In this example, D = 1 T*D = 0 incorrect -1-1 Dependent iterations are closer together (good) but program is illegal (bad). How do we fix this?? 14

5-5 Loop Reversal:a linear loop transformation 0 I U 0 U = [-1][I] DO I = 1, N DO U = -N,-1 X(I) = I+2 X(-U) = -U +2 Transformation matrix = [-1] Another example: 2-D loop, reverse inner loop U V = 1 0 0-1 I J Legality of loop reversal: Apply transformation matrix to all dependences verify lex +ve Code generation: easy 15

Need for composite transformations J V 1 1 0 1 I J = U V I U Transformation: skewing followed by reversal In final program, dependent iterations are close together! Composition of linear transformations = another linear transformation! Composite transformation matrix is H 1 0 0-1 U V = G H G 1 0 0-1 1 1 * = 0 1 1 1 0-1 How do we synthesize this composite transformation?? 16

Transformation matrices for permutation/reversal/skewing are unimodular. Any composition of these transformations can be represented a unimodular matrix. by Any unimodular matrix can be decomposed into product of matrices. permutation/reversal/skewing Legality of composite transformation T : check that T:D 0. T 3 (T 2 (T 1 D)) = (T 3 T 2 T 1 ) D.) (Proof: Code generation algorithm: New bounds: compute from A T ;1 U b Some facts about permutation/reversal/skewing Original bounds: A I b Transformation: U = T I 17

composite transformations using matrix-based Synthesizing approaches Rather than reason about sequences of transformations, we can about the single matrix that represents the composite reason transformation. Enabling abstraction: dependence matrix 18

Height reduction: move reuse into inner loops! 1. Dependence vector is ;1 Prefer not to have dependent iterations in dierent outer loop iterations.! 0 So dependence vector in transformed program should look like.??!! 1 0 So T = ;1??! 1 This says rst row of T is orthogonal to. ;1 So rst row of T can be chosen to be (1 1). What about second row? 19

s should be linearly independent of rst row of T (non-singular matrix). T D should be a lexicographically positive vector. Determinant of matrix should be +1 or -1 (unimodularity). Second row of T (call it s) should satisfy the following properties:! 1 1 One choice: (0 ; 1), giving T =. 0 ;1 20

How do we choose the rst few rows of the transformation 1. to push reuses down? matrix How do we ll in the rest of the matrix to get unimodular 2. matrix? General questions 21

Linear loop transformations to enable tiling 50

is legal if loops are fully permutable (all permutations of Tiling are legal). loops Can we always convert a perfectly nested loop into a fully loop nest? permutable When we can, how do we do it? J In general, tiling is not legal. 5 4 3 2 1 01 00 11 00 11 01 00 11 01 01 00 11 00 11 01 01 00 11 00 11 01 00 11 01 01 00 11 00 11 01 D = 1-1 1 2 3 4 5 I Tiling is illegal! Tiling is legal if all entries in dependence matrix are non-negative. 51

If all dependence vectors are distance vectors, we can Theorem: entire loop nest into a fully permutable loop nest. convert @ 1 ;1 A. matrix of transformed program must have all positive Dependence entries. rst row of transformation can be (1 0). So row of transformation (m 1) (for any m > 0). Second idea: skew inner loops by outer loops suciently to make General negative entries non-negative. all Example: wavefront 0 1 Dependence matrix is 52

to make rst row with negative entries into row Transformation non-negative entries with p2... p1......... -m -n............ p3... -k...... row a row b first row with negative entries (a) for each negative entry in the first row with negative entries, find the first positive number in the corresponding column assume the rows for these positive entries are a,b etc as shown above (b) skew the row with negative entries by appropriate multiples of rows a,b... For our example, multiple of row a = ceiling(n/p2) multiple of row b = ceiling(max(m/p1,k/p3)) Transformation: I 0 0..0 ceiling(n/p2) 0 0 ceiling(max(m/p1,k/p3))0...0 I 53

all entries in dependence matrix are non-negative, done. If Otherwise, Apply algorithm on previous slide to rst row with 1. entries. non-negative Generate new dependence matrix. 2. If no negative entries, done. 3. General algorithm for making loop nest fully permutable: 4. Otherwise, go step (1). 54

as nice as height reduction solution, but it will work ne for Not enhancement except at tile boundaries (but boundary locality J 5 4 3 2 1 00 11 00 11 01 00 11 1 2 3 4 5 00 11 00 11 00 11 00 11 00 11 00 11 I 1 0 1 1 J 5 4 3 2 1 00 11 00 11 00 11 01 00 11 01 00 11 00 11 00 11 01 00 11 00 11 00 11 00 11 00 11 01 00 11 00 11 00 11 00 11 1 2 3 4 5 I Result of tiling transformed wavefront Original loop Tiled fully permutable loop Tiling generates a 4-deep loop nest. points small compared to number of interior points). 55