Module 16: Data Flow Analysis in Presence of Procedure Calls Lecture 32: Iteration. The Lecture Contains: Iteration Space.

Similar documents
Module 17: Loops Lecture 34: Symbolic Analysis. The Lecture Contains: Symbolic Analysis. Example: Triangular Lower Limits. Multiple Loop Limits

Module 18: Loop Optimizations Lecture 36: Cycle Shrinking. The Lecture Contains: Cycle Shrinking. Cycle Shrinking in Distance Varying Loops

Module 18: Loop Optimizations Lecture 35: Amdahl s Law. The Lecture Contains: Amdahl s Law. Induction Variable Substitution.

Loop Transformations! Part II!

Autotuning. John Cavazos. University of Delaware UNIVERSITY OF DELAWARE COMPUTER & INFORMATION SCIENCES DEPARTMENT

Compiling for Advanced Architectures

ECE 563 Spring 2012 First Exam

Parallelizing The Matrix Multiplication. 6/10/2013 LONI Parallel Programming Workshop

Enhancing Parallelism

Transforming Imperfectly Nested Loops

Pipeline Parallelism and the OpenMP Doacross Construct. COMP515 - guest lecture October 27th, 2015 Jun Shirako

Supercomputing in Plain English Part IV: Henry Neeman, Director

Lecture 9 Basic Parallelization

Lecture 9 Basic Parallelization

Module 13: INTRODUCTION TO COMPILERS FOR HIGH PERFORMANCE COMPUTERS Lecture 25: Supercomputing Applications. The Lecture Contains: Loop Unswitching

Dependence Analysis. Hwansoo Han

CS 293S Parallelism and Dependence Theory

MPI Programming. Henrik R. Nagel Scientific Computing IT Division


Computer Science & Engineering 150A Problem Solving Using Computers

MIT Spring 2011 Quiz 3

Parallelisation. Michael O Boyle. March 2014

Review. Topics. Lecture 3. Advanced Programming Topics. Review: How executable files are generated. Defining macros through compilation flags

Performance Issues in Parallelization Saman Amarasinghe Fall 2009

Performance Issues in Parallelization. Saman Amarasinghe Fall 2010

Dynamic Programming II

Numerical Algorithms

Leftover. UMA vs. NUMA Parallel Computer Architecture. Shared-memory Distributed-memory

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

The DAG Model; Analysis of For-Loops; Reduction

CSL 730: Parallel Programming. Algorithms

Lecture 2. Memory locality optimizations Address space organization

Data Dependence Analysis

More Algorithm Analysis

Module 14: Approaches to Control Flow Analysis Lecture 27: Algorithm and Interval. The Lecture Contains: Algorithm to Find Dominators.

Lecture 11 Loop Transformations for Parallelism and Locality

Lecture 4. Instruction Level Parallelism Vectorization, SSE Optimizing for the memory hierarchy

Control flow graphs and loop optimizations. Thursday, October 24, 13

Compiler techniques for leveraging ILP

Loop Optimizations. Outline. Loop Invariant Code Motion. Induction Variables. Loop Invariant Code Motion. Loop Invariant Code Motion

Review. Loop Fusion Example

Decision Making in C

Lecture 2. Two-Dimensional Arrays

EGR 102 Introduction to Engineering Modeling. Lab 10A Nested Programming II Iterative Nesting

SORTING. Insertion sort Selection sort Quicksort Mergesort And their asymptotic time complexity

Advanced optimizations of cache performance ( 2.2)

CS4961 Parallel Programming. Lecture 9: Task Parallelism in OpenMP 9/22/09. Administrative. Mary Hall September 22, 2009.

Program Transformations for the Memory Hierarchy

1.4 Arrays. A Foundation for Programming. Many Variables of the Same Type. Arrays. !!! any program you might want to write

Efficient Microarchitecture Modeling and Path Analysis for Real-Time Software. Yau-Tsun Steven Li Sharad Malik Andrew Wolfe

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

6.189 IAP Lecture 11. Parallelizing Compilers. Prof. Saman Amarasinghe, MIT IAP 2007 MIT

Dynamic-Programming algorithms for shortest path problems: Bellman-Ford (for singlesource) and Floyd-Warshall (for all-pairs).

Chapter 01 Arrays Prepared By: Dr. Murad Magableh 2013

Maltepe University Computer Engineering Department. BİL 133 Algorithms and Programming. Chapter 8: Arrays

i=1 i=2 i=3 i=4 i=5 x(4) x(6) x(8)

Debugging [continued]

Exploring Parallelism At Different Levels

Algorithm Complexity Analysis: Big-O Notation (Chapter 10.4) Dr. Yingwu Zhu

Coarse-Grained Parallelism

Essential constraints: Data Dependences. S1: a = b + c S2: d = a * 2 S3: a = c + 2 S4: e = d + c + 2

Exceptions, Case Study-Exception handling in C++.

Debugging [continued]

CS 177. Lists and Matrices. Week 8

Automatic Tuning of Scientific Applications. Apan Qasem Ken Kennedy Rice University Houston, TX

Mapping Algorithms to Hardware By Prawat Nagvajara

Computational Physics - Fortran February 1997

[4] 1 cycle takes 1/(3x10 9 ) seconds. One access to memory takes 50/(3x10 9 ) seconds. =16ns. Performance = 4 FLOPS / (2x50/(3x10 9 )) = 120 MFLOPS.

CS553 Lecture Profile-Guided Optimizations 3

The Polyhedral Model (Transformations)

Declaration and Initialization

6.854J / J Advanced Algorithms Fall 2008

16.10 Exercises. 372 Chapter 16 Code Improvement. be translated as

Parallel Programming. OpenMP Parallel programming for multiprocessors for loops

Last Lecture: Adder Examples

Redundant Synchronization Elimination for DOACROSS Loops

Assignment 1 (concept): Solutions

A Crash Course in Compilers for Parallel Computing. Mary Hall Fall, L2: Transforms, Reuse, Locality

Loop Transformations, Dependences, and Parallelization

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #16 Loops: Matrix Using Nested for Loop

1 The sorting problem

More iteration space tiling

Loops and Locality. with an introduc-on to the memory hierarchy. COMP 506 Rice University Spring target code. source code OpJmizer

Linear Loop Transformations for Locality Enhancement

Analysis of Algorithms Part I: Analyzing a pseudo-code

COMPUTER SCIENCE. 4. Arrays. Section 1.4.

CSC D70: Compiler Optimization Prefetching

J. E. Smith. Automatic Parallelization Vector Architectures Cray-1 case study. Data Parallel Programming CM-2 case study

Register Allocation (wrapup) & Code Scheduling. Constructing and Representing the Interference Graph. Adjacency List CS2210

Lecture Notes on Loop Transformations for Cache Optimization

Supercomputing and Science An Introduction to High Performance Computing

Outline. Issues with the Memory System Loop Transformations Data Transformations Prefetching Alias Analysis

"Organizing is what you do before you do something, so that when you do it, it is not all mixed up." ~ A. A. Milne SORTING

Subexponential Lower Bounds for the Simplex Algorithm

Systolic arrays Parallel SIMD machines. 10k++ processors. Vector/Pipeline units. Front End. Normal von Neuman Runs the application program

Fusion of Loops for Parallelism and Locality

Outline. L5: Writing Correct Programs, cont. Is this CUDA code correct? Administrative 2/11/09. Next assignment (a homework) given out on Monday

C++ Programming. Pointers and Memory Management. M1 Math Michail Lampis

C/C++ Programming Lecture 18 Name:

Maltepe University Computer Engineering Department. BİL 133 Algoritma ve Programlama. Chapter 8: Arrays and pointers

Transcription:

The Lecture Contains: Iteration Space Iteration Vector Normalized Iteration Vector Dependence Distance Direction Vector Loop Carried Dependence Relations Dependence Level Iteration Vector - Triangular Space Non Tightly Nested Loops Code Sinking Data Dependence With Conditionals Conditionals in Loops file:///d /...ary,%20dr.%20sanjeev%20k%20aggrwal%20&%20dr.%20rajat%20moona/multi-core_architecture/lecture%2032/32_1.htm[6/14/2012 12:11:16 PM]

Iteration Space Concept associated with the loops Contains one point for each iteration of the loop If a statement in one iteration dependes on a statement in another iteration then dependence is represented by an edge from source to target (called iteration space dependence graph) for i = 2 to 9 do x[i] =... =... x[i-1]... Space requirement too large Compiler can not always determine number of iterations Iteration Vector Each iteration is assigned a vector iv = (I 1, I 2,..., In) Where I k is the value of loop index variable of kth nested loop at that iteration. for i = 3 to 7 do for j = 6 to 2 step -2 do a[i,j] = a[i,,j+2] + 1 file:///d /...ary,%20dr.%20sanjeev%20k%20aggrwal%20&%20dr.%20rajat%20moona/multi-core_architecture/lecture%2032/32_2.htm[6/14/2012 12:11:16 PM]

Normalized Iteration Vector Iterations are labeled 0, 1, 2,... Advantage Later iterations always have larger vector than earlier iterations if then is always executed before next iteration is always one more than the current iteration Dependence Distance Vector difference between iteration vector of the source and the target iterations. Direction Vector Frequently used but less precise It is ordering vector relating the source and the target iteration vectors For some optimizations direction information is sufficient Some times distance is not fixed but the direction is fixed for i = 1 to 10 do A[2*I] = B[I] + 1 C[I] = A[I] Distance varies from 1 to 5, therefore Direction is constant, Loop Carried Dependence Relations for i = 1 to n do for j = 1 to n do A[i,j] = = A[i,j] B[i,j+1] = = B[i,j] C[i+1,j] = = C[i,j+1] file:///d /...ary,%20dr.%20sanjeev%20k%20aggrwal%20&%20dr.%20rajat%20moona/multi-core_architecture/lecture%2032/32_3.htm[6/14/2012 12:11:16 PM]

For statements involving A[i,,j] distance vector (0,0) direction vector (=,=) dependence is loop independent For statements involving B[i,,j] distance vector (0,1) direction vector (=,<) dependence is carried by the inner loop For statements involving C[i,,j] distance vector (1,-1) direction vector (<,>) dependence is carried by the outer loop Dependence Level Loop nest level that carries the data dependence relation Therefore, for C[i,,j] level is 1 for B[i,,j] level is 2 for A[i,,j] level is Iteration Vector - Triangular Space Advantages of normalized iteration vector: Later iterations have larger iteration vector Adjacent iterations differ by only one However, neither of these require first iteration to have vector (0,0,, 0). Consider following program: for i = 1 to 7 do for j = i to 7 do A[i+1,,j] = A[i,,j]+1 file:///d /...ary,%20dr.%20sanjeev%20k%20aggrwal%20&%20dr.%20rajat%20moona/multi-core_architecture/lecture%2032/32_4.htm[6/14/2012 12:11:16 PM]

Dependence distance vector is (1, -1) A more natural way to depict such a loop is to assign iteration vectors that give triangular space as: Semi normalized iteration vector: when distance between successive iterations is 1 and the lower limit is not zero file:///d /...ary,%20dr.%20sanjeev%20k%20aggrwal%20&%20dr.%20rajat%20moona/multi-core_architecture/lecture%2032/32_5.htm[6/14/2012 12:11:16 PM]

Non Tightly Nested Loops A nested loop where outer loop contains additional statements outside inner loop. For example for i = 1 to n do B[i] = B[i] / A[i,i] for j = i+1 to n do B[j] = B[j]-A[i,,j]*B[i] Code Sinking Sink outer loop body into the inner loop by adding conditionals: for i = 1 to n do for j = i to n do if (j=i) B[i] = B[i] / A[i,i] if (j>i) B[j] = B[j]-A[i,,j]*B[i] file:///d /...ary,%20dr.%20sanjeev%20k%20aggrwal%20&%20dr.%20rajat%20moona/multi-core_architecture/lecture%2032/32_6.htm[6/14/2012 12:11:16 PM]

Code Sinking Data Dependence With Conditionals There must be a path from definition to use In case of conditionals, path cannot be determined at compile time Any path may be taken at runtime Data dependence cannot occur between statements in alternate paths file:///d /...ary,%20dr.%20sanjeev%20k%20aggrwal%20&%20dr.%20rajat%20moona/multi-core_architecture/lecture%2032/32_7.htm[6/14/2012 12:11:17 PM]

1. X = 1 2. Y = 2 3. if Y < T then 4. X = 2 5. else 6. Y = X 7. endif 8. Z = X + Y Find out data dependence for X. Definition of X in 1 and 4 Use of X at 6 and 8 Therefore Conditionals in Loops Statements that can t be executed on the same iteration cannot be involved in a loopindependent dependence Conditionals do not affect loop carried dependence 1. for i = 2 to 9 do 2. if a(i) > 0 then 3. a(i) = b(i-1) + 1 4. else 5. b(i) = a(i) * 2 6. endif 7. Note: file:///d /...ary,%20dr.%20sanjeev%20k%20aggrwal%20&%20dr.%20rajat%20moona/multi-core_architecture/lecture%2032/32_8.htm[6/14/2012 12:11:17 PM]