Data Dependences and Parallelization

Similar documents
Lecture 9 Basic Parallelization

Lecture 9 Basic Parallelization

6.189 IAP Lecture 11. Parallelizing Compilers. Prof. Saman Amarasinghe, MIT IAP 2007 MIT

Compiling for Advanced Architectures

Outline. Why Parallelism Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization Opportunities


We will focus on data dependencies: when an operand is written at some point and read at a later point. Example:!

The Polyhedral Model (Dependences and Transformations)

Dependence Analysis. Hwansoo Han

Parallelization. Saman Amarasinghe. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

CS/COE 1501 cs.pitt.edu/~bill/1501/ More Math

CS671 Parallel Programming in the Many-Core Era

Legal and impossible dependences

COP 4516: Math for Programming Contest Notes

Module 17: Loops Lecture 34: Symbolic Analysis. The Lecture Contains: Symbolic Analysis. Example: Triangular Lower Limits. Multiple Loop Limits

Outline. L5: Writing Correct Programs, cont. Is this CUDA code correct? Administrative 2/11/09. Next assignment (a homework) given out on Monday

Recursion. CSCI 112: Programming in C

Euclid's Algorithm. MA/CSSE 473 Day 06. Student Questions Odd Pie Fight Euclid's algorithm (if there is time) extended Euclid's algorithm

Cuts from Proofs: A Complete and Practical Technique for Solving Linear Inequalities over Integers

1 Elementary number theory

Systems of Inequalities

Univ. of Illinois Due Wednesday, Sept 13, 2017 Prof. Allen

Loops / Repetition Statements

Topic 10 Part 2 [474 marks]

ELEMENTARY NUMBER THEORY AND METHODS OF PROOF

BEng (Hons) Telecommunications. Examinations for / Semester 2

Math Introduction to Advanced Mathematics

Solutions to the Second Midterm Exam

16.10 Exercises. 372 Chapter 16 Code Improvement. be translated as

Theory of Integers. CS389L: Automated Logical Reasoning. Lecture 13: The Omega Test. Overview of Techniques. Geometric Description

CS 293S Parallelism and Dependence Theory

MAT 685: C++ for Mathematicians

Data Dependence Analysis

Conditionals !

Array Dependence Analysis as Integer Constraints. Array Dependence Analysis Example. Array Dependence Analysis as Integer Constraints, cont

UNIT-II NUMBER THEORY

Pick s Theorem and Lattice Point Geometry

Introduction to Programming in C Department of Computer Science and Engineering\ Lecture No. #02 Introduction: GCD

Introduction to Analysis of Algorithms

Assertions & Verification & Example Loop Invariants Example Exam Questions

1 Elementary number theory

Discrete Mathematics Lecture 4. Harper Langston New York University

Mathematics. Jaehyun Park. CS 97SI Stanford University. June 29, 2015

THEORY OF LINEAR AND INTEGER PROGRAMMING

Lecture 7. Instruction Scheduling. I. Basic Block Scheduling II. Global Scheduling (for Non-Numeric Code)

Assertions & Verification Example Exam Questions

CS 97SI: INTRODUCTION TO PROGRAMMING CONTESTS. Jaehyun Park

Loop Transformations, Dependences, and Parallelization

Admin ENCRYPTION. Admin. Encryption 10/29/15. Assignment 6. 4 more assignments: Midterm next Thursday. What is it and why do we need it?

Recognizing regular tree languages with static information

On Demand Parametric Array Dataflow Analysis

Integers and Mathematical Induction

The Challenges of Non-linear Parameters and Variables in Automatic Loop Parallelisation

UCT Algorithm Circle: Number Theory

CS4230 Parallel Programming. Lecture 7: Loop Scheduling cont., and Data Dependences 9/6/12. Administrative. Mary Hall.

Affine and Unimodular Transformations for Non-Uniform Nested Loops

Exploring Parallelism At Different Levels

HY425 Lecture 09: Software to exploit ILP

Lecture Notes on Cache Iteration & Data Dependencies

HY425 Lecture 09: Software to exploit ILP

Primal Dual Schema Approach to the Labeling Problem with Applications to TSP

Software Pipelining by Modulo Scheduling. Philip Sweany University of North Texas

Advanced Compiler Construction Theory And Practice

EE 4683/5683: COMPUTER ARCHITECTURE

The complement of PATH is in NL

Writing Parallel Programs; Cost Model.

Loops / Repetition Statements

Data Structures and Algorithms Key to Homework 1

Automatic Translation of Fortran Programs to Vector Form. Randy Allen and Ken Kennedy

Chapter 2: Complexity Analysis

A Crash Course in Compilers for Parallel Computing. Mary Hall Fall, L2: Transforms, Reuse, Locality

UMIACS-TR December, CS-TR-3192 Revised April, William Pugh. Dept. of Computer Science. Univ. of Maryland, College Park, MD 20742

Profiling Dependence Vectors for Loop Parallelization

Finite Math - J-term Homework. Section Inverse of a Square Matrix

AC64/AT64 DESIGN & ANALYSIS OF ALGORITHMS DEC 2014

Computing and Informatics, Vol. 36, 2017, , doi: /cai

Planning and Optimization

Basic Algorithms for Periodic-Linear Inequalities and Integer Polyhedra

Outline. Register Allocation. Issues. Storing values between defs and uses. Issues. Issues P3 / 2006

Technical Section. Lab 4 while loops and for loops. A. while Loops or for loops

Reading 8 : Recursion

Page 1. Parallelization techniques. Dependence graph. Dependence Distance and Distance Vector

CS4961 Parallel Programming. Lecture 2: Introduction to Parallel Algorithms 8/31/10. Mary Hall August 26, Homework 1, cont.

IA-64 Compiler Technology

Field Analysis. Last time Exploit encapsulation to improve memory system performance

Chapter 4. Number Theory. 4.1 Factors and multiples

Algorithmic number theory Cryptographic hardness assumptions. Table of contents

The Citizen s Guide to Dynamic Programming

16.216: ECE Application Programming Spring 2015 Exam 2 Solution

Contents of Lecture 11

Review. Loop Fusion Example

APCS Semester #1 Final Exam Practice Problems

(Refer Slide Time: 00:26)

Analysis of Algorithms. Unit 4 - Analysis of well known Algorithms

CSCE 110: Programming I

Integer Programming for Array Subscript Analysis

Compiler Structure. Data Flow Analysis. Control-Flow Graph. Available Expressions. Data Flow Facts

The Polyhedral Model (Transformations)

Optimizing Aggregate Array Computations in Loops

A Note on Scheduling Parallel Unit Jobs on Hypercubes

Transcription:

Data Dependences and Parallelization 1

Agenda Introduction Single Loop Nested Loops Data Dependence Analysis 2

Motivation DOALL loops: loops whose iterations can execute in parallel for i = 11, 20 a[i] = a[i] + 3 New abstraction needed Abstraction used in data flow analysis is inadequate Information of all instances of a statement is combined 3

Examples for i = 11, 20 a[i] = a[i] + 3 for i = 11, 20 a[i] = a[i-1] + 3 Parallel Parallel? 4

Examples for i = 11, 20 a[i] = a[i] + 3 for i = 11, 20 a[i] = a[i-1] + 3 for i = 11, 20 a[i] = a[i-10] + 3 Parallel Not parallel Parallel? 5

Agenda Introduction Single Loop Nested Loops Data Dependence Analysis 6

Data Dependence of Scalar True dependence a = = a Anti-dependence = a a = Variables Output dependence a = a = Input dependence = a = a 7

Array Accesses in a Loop for i= 2, 5 a[i] = a[i] + 3 read a[2] a[3] a[4] a[5] write a[2] a[3] a[4] a[5] 8

Array Anti-dependence for i= 2, 5 a[i-2] = a[i] + 3 read a[2] a[3] a[4] a[5] write a[0] a[1] a[2] a[3] 9

Array True-dependence for i= 2, 5 a[i] = a[i-2] + 3 read a[0] a[1] a[2] a[3] write a[2] a[3] a[4] a[5] 10

Dynamic Data Dependence Let o and o be two (dynamic) operations Data dependence exists from o to o, iff either o or o is a write operation o and o may refer to the same location o executes before o 11

Static Data Dependence Let a and a be two static array accesses (not necessarily distinct) Data dependence exists from a to a, iff either a or a is a write operation There exists a dynamic instance of a (o) and a dynamic instance of a (o ) such that o and o may refer to the same location o executes before o 12

Recognizing DOALL Loops Find data dependences in loop Definition: a dependence is loop-carried if it crosses an iteration boundary If there are no loop-carried dependences then loop is parallelizable 13

Compute Dependence for i= 2, 5 a[i-2] = a[i] + 3 There is a dependence between a[i] and a[i-2] if There exist two iterations i r and i w within the loop bounds such that iterations i r and i w read and write the same array element, respectively There exist i r, i w, 2 i r, i w 5, i r = i w -2 14

Compute Dependence for i= 2, 5 a[i-2] = a[i] + 3 There is a dependence between a[i-2] and a[i-2] if There exist two iterations i v and i w within the loop bounds such that iterations i v and i w write the same array element, respectively There exist i v, i w, 2 i v, i w 5, i v -2= i w -2 15

for i= 2, 5 a[i-2] = a[i] + 3 Parallelization Is there a loop-carried dependence between a[i] and a[i-2]? Is there a loop-carried dependence between a[i-2] and a[i-2]? 16

Agenda Introduction Single Loops Nested Loops Data Dependence Analysis 17

Nested Loops Which loop(s) are parallel? for i1 = 0, 5 for i2 = 0, 3 a[i1,i2] = a[i1-2,i2-1] + 3 18

Iteration Space An abstraction for loops for i1 = 0, 5 for i2 = 0, 3 a[i1,i2] = 3 Iteration is represented as coordinates in iteration space. i2 i1 19

Execution Order Sequential execution order of iterations: Lexicographic order [0,0], [0,1], [0,3], [1,0], [1,1], [1,3], [2,0] Let I = (i 1,i 2, i n ). I is lexicographically less than I, I<I, iff there exists k such that (i 1, i k-1 ) = (i 1, i k-1 ) and i k < i k i 1 i 2 20

Parallelism for Nested Loops Is there a data dependence between a[i1,i2] and a[i1-2,i2-1]? There exist i1 r, i2 r, i1 w, i2 w, such that 0 i1 r, i1 w 5, 0 i2 r, i2 w 3, i1 r - 2 = i1 w i2 r - 1 = i2 w 21

Loop-carried Dependence If there are no loop-carried dependences, then loop is parallelizable. Dependence carried by outer loop: i1 r i1 w Dependence carried by inner loop: i1 r = i1 w i2 r i2 w This can naturally be extended to dependence carried by loop level k. 22

Nested Loops Which loop carries the dependence? for i1 = 0, 5 i2 for i2 = 0, 3 a[i1,i2] = a[i1-2,i2-1] + 3 i1 23

Agenda Introduction Single Loop Nested Loops Data Dependence Analysis 24

Solving Data Dependence Problems Memory disambiguation is un-decidable at compile-time. read(n) for i = 0, 3 a[i] = a[n] + 3 25

Domain of Data Dependence Analysis Only use loop bounds and array indices which are integer linear functions of variables. for i1 = 1, n for i2 = 2*i1, 100 a[i1+2*i2+3][4*i1+2*i2][i1*i1] = = a[1][2*i1+1][i2] + 3 26

Equations There is a data dependence, if There exist i1 r, i2 r, i1 w, i2 w, such that 0 i1 r, i1 w n, 2*i1 r i2 r 100, 2*i1 w i2 w 100, i1 w + 2*i2 w +3 = 1, 4*i1 w + 2*i2 w = 2*i1 r + 1 Note: ignoring non-affine relations for i1 = 1, n for i2 = 2*i1, 100 a[i1+2*i2+3][4*i1+2*i2][i1*i1] = = a[1][2*i1+1][i2] + 3 27

Solutions There is a data dependence, if There exist i1 r, i2 r, i1 w, i2 w, such that 0 i1 r, i1 w n, 2*i1 w i2 w 100, 2*i1 w i2 w 100, i1 w + 2*i2 w +3 = 1, 4*i1 w + 2*i2 w - 1 = i1 r + 1 No solution No data dependence Solution there may be a dependence 28

Form of Data Dependence Analysis Data dependence problems originally contains equalities and equalities Eliminate inequalities in the problem statement: Replace a b with two sub-problems: a>b or a<b We get inti, Ai b, A i 1 1 2 b 2 29

Form of Data Dependence Analysis Eliminate equalities in the problem statement: Replace a =b with two sub-problems: a b and b a We get int i, Ai b Integer programming is NP-complete, i.e. Expensive 30

Techniques: Inexact Tests Examples: GCD test, Banerjee s test 2 outcomes No no dependence Don t know assume there is a solution dependence Extra data dependence constraints Sacrifice parallelism for compiler efficiency 31

GCD Test Is there any dependence? for i = 1, 100 a[2*i] = = a[2*i+1] + 3 Solve a linear Diophantine equation 2*i w = 2*i r + 1 32

GCD The greatest common divisor (GCD) of integers a 1, a 2,, a n, denoted gcd(a 1, a 2,, a n ), is the largest integer that evenly divides all these integers. Theorem: The linear Diophantine equation a1x1 a2x2... anxn c has an integer solution x 1, x 2,, x n iff gcd(a 1, a 2,, a n ) divides c 33

Examples Example 1: gcd(2,-2) = 2. No solutions 2x 2x2 1 1 Example 2: gcd(24,36,54) = 6. Many solutions 24x 36y 54z 30 34

Multiple Equalities x 2 y z 0 3x 2 y z 5 Equation 1: gcd(1,-2,1) = 1. Many solutions Equation 2: gcd(3,2,1) = 1. Many solutions Is there any solution satisfying both equations? 35

The Euclidean Algorithm Assume a and b are positive integers, and a > b. Let c be the remainder of a/b. If c=0, then gcd(a,b) = b. Otherwise, gcd(a,b) = gcd(b,c). gcd(a 1, a 2,, a n ) = gcd(gcd(a 1, a 2 ), a 3, a n ) 36

Exact Analysis Most memory disambiguations are simple integer programs. Approach: Solve exactly yes, or no solution Solve exactly with Fourier-Motzkin + branch and bound Omega package from University of Maryland 37

Incremental Analysis Use a series of simple tests to solve simple programs (based on properties of inequalities rather than array access patterns) Solve exactly with Fourier-Motzkin + branch and bound Memoization Many identical integer programs solved for each program Save the results so it need not be recomputed 38

State of the Art Multiprocessors need large outer parallel loops Many inter-procedural optimizations are needed Interprocedural scalar optimizations Dependence Privatization Reduction recognition Interprocedural array analysis Array section analysis 39

Summary DOALL loops Iteration Space Data dependence analysis 40