Loop Transformations, Dependences, and Parallelization

Similar documents
The Polyhedral Model (Transformations)

The Polyhedral Model (Dependences and Transformations)

Review. Loop Fusion Example

Legal and impossible dependences

CS671 Parallel Programming in the Many-Core Era

Tiling: A Data Locality Optimizing Algorithm

Polyhedral Operations. Algorithms needed for automation. Logistics


Loop Transformations, Dependences, and Parallelization

Lecture 9 Basic Parallelization

Lecture 9 Basic Parallelization

CS 293S Parallelism and Dependence Theory

LLVM passes and Intro to Loop Transformation Frameworks

6.189 IAP Lecture 11. Parallelizing Compilers. Prof. Saman Amarasinghe, MIT IAP 2007 MIT

Lecture 11 Loop Transformations for Parallelism and Locality

Tiling: A Data Locality Optimizing Algorithm

Linear Loop Transformations for Locality Enhancement

Plan for Today. Concepts. Next Time. Some slides are from Calvin Lin s grad compiler slides. CS553 Lecture 2 Optimizations and LLVM 1

A Crash Course in Compilers for Parallel Computing. Mary Hall Fall, L2: Transforms, Reuse, Locality

Null space basis: mxz. zxz I

Tiling: A Data Locality Optimizing Algorithm

Affine and Unimodular Transformations for Non-Uniform Nested Loops

Data Dependence Analysis

} Evaluate the following expressions: 1. int x = 5 / 2 + 2; 2. int x = / 2; 3. int x = 5 / ; 4. double x = 5 / 2.

Contents of Lecture 11

Compiling for Advanced Architectures

Control flow graphs and loop optimizations. Thursday, October 24, 13

Computing and Informatics, Vol. 36, 2017, , doi: /cai

Fourier-Motzkin and Farkas Questions (HW10)

Outline. L5: Writing Correct Programs, cont. Is this CUDA code correct? Administrative 2/11/09. Next assignment (a homework) given out on Monday

Dependence Analysis. Hwansoo Han

Polyhedral Compilation Foundations

UMIACS-TR December, CS-TR-3192 Revised April, William Pugh. Dept. of Computer Science. Univ. of Maryland, College Park, MD 20742

Transformations Techniques for extracting Parallelism in Non-Uniform Nested Loops

Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time

Data Dependences and Parallelization

CS4620/5620. Professor: Kavita Bala. Cornell CS4620/5620 Fall 2012 Lecture Kavita Bala 1 (with previous instructors James/Marschner)

Outline. Why Parallelism Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization Opportunities

LARGE SCALE LINEAR AND INTEGER OPTIMIZATION: A UNIFIED APPROACH

Revisiting Loop Transformations with X10 Clocks. Tomofumi Yuki Inria / LIP / ENS Lyon X10 Workshop 2015

Parallelization. Saman Amarasinghe. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Loop Tiling for Parallelism

Array Dependence Analysis as Integer Constraints. Array Dependence Analysis Example. Array Dependence Analysis as Integer Constraints, cont

We will focus on data dependencies: when an operand is written at some point and read at a later point. Example:!

Programming for Electrical and Computer Engineers. Pointers and Arrays

Computing the Integer Points of a Polyhedron

Lecture 3.4: Recursive Algorithms

Using Static Single Assignment Form

USC 227 Office hours: 3-4 Monday and Wednesday CS553 Lecture 1 Introduction 4

Loop Transformations! Part II!

HY425 Lecture 09: Software to exploit ILP

Roofline Model (Will be using this in HW2)

Module 16: Data Flow Analysis in Presence of Procedure Calls Lecture 32: Iteration. The Lecture Contains: Iteration Space.

HY425 Lecture 09: Software to exploit ILP

Louis-Noël Pouchet

Loops. Lather, Rinse, Repeat. CS4410: Spring 2013

Data Structures and Algorithms CSE 465

Polyèdres et compilation

Autotuning. John Cavazos. University of Delaware UNIVERSITY OF DELAWARE COMPUTER & INFORMATION SCIENCES DEPARTMENT

Increasing Parallelism of Loops with the Loop Distribution Technique

Theory of Integers. CS389L: Automated Logical Reasoning. Lecture 13: The Omega Test. Overview of Techniques. Geometric Description

CS 372: Computational Geometry Lecture 10 Linear Programming in Fixed Dimension

Module 18: Loop Optimizations Lecture 35: Amdahl s Law. The Lecture Contains: Amdahl s Law. Induction Variable Substitution.

CS4961 Parallel Programming. Lecture 2: Introduction to Parallel Algorithms 8/31/10. Mary Hall August 26, Homework 1, cont.

An Overview to. Polyhedral Model. Fangzhou Jiao

Lecture 5: Multidimensional Arrays. Wednesday, 11 February 2009

Lecture 21. Software Pipelining & Prefetching. I. Software Pipelining II. Software Prefetching (of Arrays) III. Prefetching via Software Pipelining

Announcements. CS243: Discrete Structures. Strong Induction and Recursively Defined Structures. Review. Example (review) Example (review), cont.

Loop Nest Optimizer of GCC. Sebastian Pop. Avgust, 2006

CS4230 Parallel Programming. Lecture 12: More Task Parallelism 10/5/12

CS4230 Parallel Programming. Lecture 7: Loop Scheduling cont., and Data Dependences 9/6/12. Administrative. Mary Hall.

Lecture 14. No in-class files today. Homework 7 (due on Wednesday) and Project 3 (due in 10 days) posted. Questions?

Lecture Notes on Loop Transformations for Cache Optimization

CS560 Lecture Parallel Architecture 1

Linear Programming in Small Dimensions

Advanced Compiler Construction Theory And Practice

Mapping Multi-Dimensional Signals into Hierarchical Memory Organizations

Theory and Algorithms for the Generation and Validation of Speculative Loop Optimizations

MATH 890 HOMEWORK 2 DAVID MEREDITH

4/8/17. Admin. Assignment 5 BINARY. David Kauchak CS 52 Spring 2017

Exploring Parallelism At Different Levels

Module 18: Loop Optimizations Lecture 36: Cycle Shrinking. The Lecture Contains: Cycle Shrinking. Cycle Shrinking in Distance Varying Loops

CS599: Convex and Combinatorial Optimization Fall 2013 Lecture 4: Convex Sets. Instructor: Shaddin Dughmi

Loop Permutation. Loop Transformations for Parallelism & Locality. Legality of Loop Interchange. Loop Interchange (cont)

CS302 Topic: Algorithm Analysis. Thursday, Sept. 22, 2005

EE/CSCI 451: Parallel and Distributed Computation

CS101 Lecture 04: Binary Arithmetic

We propose the following strategy for applying unimodular transformations to non-perfectly nested loops. This strategy amounts to reducing the problem

Sets and set operations

The Simplex Algorithm

Coarse-Grained Parallelism

Efficient Polynomial-Time Nested Loop Fusion with Full Parallelism

Communication-Minimal Tiling of Uniform Dependence Loops

CS 177. Lists and Matrices. Week 8

Abstractions and small languages in synthesis CS294: Program Synthesis for Everyone

G Programming Languages Spring 2010 Lecture 4. Robert Grimm, New York University

Solving the Live-out Iterator Problem, Part I

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)

Polyhedral Compilation Foundations

SEARCHING, SORTING, AND ASYMPTOTIC COMPLEXITY. Lecture 11 CS2110 Spring 2016

Transcription:

Loop Transformations, Dependences, and Parallelization Announcements HW3 is due Wednesday February 15th Today HW3 intro Unimodular framework rehash with edits Skewing Smith-Waterman (the fix is in!), composing transformations Unimodular transformation framework Affine transformations? (NOPE) Why unimodular matrix? Usage in other research and extensions Loop reversal Testing transformation legality Converting C++ code to iteration space representation Unimodular Transformation Framework 1 Iteration Space Representation Original code do i = 1,6 do j = 1,5 A(i,j) = A(i-1,j+1)+1 j i Represent the iteration space As an intersection of inequalities The iteration space is the integer tuples within the intersection Bounds: Unimodular Transformation Framework 2

Lexicographical Order as Schedule Iteration point Integer tuple with dimensionality d Lexicographical Order First order the iteration points by i_0, then i_1, and finally i_d. (i 0,i 1,..., i d 1 ) (i 0,i 1,..., i d 1 ) (i 0 <j 0 ) (i 0 = j 0 i 1 <j 1 )...(i 0 = j 0 i 1 = j 1...i d 1 = j d 1 ) Unimodular Transformation Framework 3 Frameworks for Loop Transformations Loop Transformations as functions Unimodular Loop Transformations [Banerjee 90],[Wolf & Lam 91] can represent loop permutation, loop reversal, and loop skewing unimodular linear mapping (determinant of matrix is + or - 1) T is a matrix, i and i are iteration vectors example limitations only perfectly nested loops all statements are transformed the same Unimodular Transformation Framework 4

Loop Skewing Original code do i = 1,6 do j = 1,5 A(i,j) = A(i-1,j+1)+1 j i Distance vector: (1, -1) Skewing: j Unimodular Transformation Framework i 5 Transforming the Dependences and Array Accesses Original code do i = 1,6 do j = 1,5 A(i,j) = A(i-1,j+1)+1 Dependence vector: j i New Array Accesses: j i Unimodular Transformation Framework 6

Transforming the Loop Bounds Original code do i = 1,6 do j = 1,5 A(i,j) = A(i-1,j+1)+1 Bounds: j i Transformed code do i = 1,6 do j = 1+i,5+i A(i,j -i ) = A(i -1,j -i +1)+1 j Unimodular Transformation Framework i 7 Revisiting (smithwaterman.c) and really parallelizing it! for (i=1;i<=a[0];i++) {! for (j=1;j<=b[0];j++) {! diag = h[i-1][j-1] + sim[a[i]][b[j]];! down = h[i-1][j] + DELTA;! right = h[i][j-1] + DELTA;!! Let x=i+j and y=i. for (x=2; x<=a[0]+b[0]; x++) {! #omp parallel for private(y) shared(a,b)! for (y=max(1, x-b[0]); y<=min(a[0],x-1); y++) {! diag = h[y-1][x-y-1] + sim[a[x]][b[x-y]];! down = h[y-1][x-y] + DELTA;! right = h[y][x-y-1] + DELTA;!! Unimodular Transformation Framework 8

Unimodular Framework Does it have an affine component? Actually no. That moves us to the polyhedral model. Why a unimodular matrix? Has an inverse and the inverse is unimodular. Preserves the volume of a polytope. Transformations we can automate Loop permutation, skewing, and reversal Any combination of the above Unimodular Transformation Framework 9 Loop Reversal Idea Change the direction of loop iteration (i.e., From low-to-high indices to high-to-low indices or vice versa) Benefits Could improve cache performance Enables other transformations Example do i = 6,1,-1 A(i) = B(i) + C(i) do i = 1,6 A(i) = B(i) + C(i) CS553 Lecture Loop Transformations 10

Loop Reversal and Distance Vectors Impact Reversal of loop i negates the i th entry of all distance vectors associated with the loop What about direction vectors? When is reversal legal? When the loop being reversed does not carry a dependence (i.e., When the transformed distance vectors remain legal) Example do i = 1,5 do j = 1,6 A(i,j) = A(i-1,j-1)+1 Dependence: Distance Vector: Transformed Distance Vector: Flow (1,1) (1,-1) legal CS553 Lecture Loop Transformations 11 Transformation Legality Recall A dependence vector is legal if it is lexicographically non-negative. Applying the transformation function to each dependence vector produces a dependence vector for the new iteration space. When is a transformation legal assuming a lexicographical schedule? What about parallelism? Unimodular Transformation Framework 12

Converting C loops to iteration space representation Analyses needed Loop analysis Loop bounds from AST or control-flow graph Induction variable detection Pointer analysis Do pointers point at same or overlapping memory? Note that in C can cast a pointer to an integer and back and can do pointer arithmetic. In general requires whole program analysis. Dependence analysis Is this even possible? Current tools make the optimistic pointer assumption We need programming models that simplify or remove the need for such analyses Unimodular Transformation Framework 13 Concepts smithwaterman.c example Projecting out a variable from bounds (prelim to Fourier Motzkin elimination) Unimodular transformation framework represents loop permutation, loop reversal, and loop skewing provides mathematical framework for... testing transformation legality, transforming array accesses and loop bounds, and combining transformations Compiler analyses needed in C to obtain an iteration space representation References [Banerjee90] Uptal Banerjee, Unimodular transformations of double loops, In Advances in Languages and Compilers for Parallel Computing, 1990. [Wolf & Lam 91] Wolf and Lam, A Data Locality Optimizing Algorithm, In Programming Languages Design and Implementation, 1991. Unimodular Transformation Framework 14

Next Time Reading No reading assignments next week Homework HW3 is due Wednesday 2/15/12 Lecture Polyhedral framework Unimodular Transformation Framework 15