Loop Transformations, Dependences, and Parallelization

Similar documents
LLVM passes and Intro to Loop Transformation Frameworks

Loop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation

Loop Permutation. Loop Transformations for Parallelism & Locality. Legality of Loop Interchange. Loop Interchange (cont)

Today Using Fourier-Motzkin elimination for code generation Using Fourier-Motzkin elimination for determining schedule constraints

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Loop Transformations, Dependences, and Parallelization

Vectorization in the Polyhedral Model

Polyhedral Compilation Foundations

Lecture 15: Memory Hierarchy Optimizations. I. Caches: A Quick Review II. Iteration Space & Loop Transformations III.

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Solving two-person zero-sum game by Matlab

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

ADJUSTING A PROGRAM TRANSFORMATION FOR LEGALITY

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

CSE 326: Data Structures Quicksort Comparison Sorting Bound

An Optimal Algorithm for Prufer Codes *

Optimization Methods: Integer Programming Integer Linear Programming 1. Module 7 Lecture Notes 1. Integer Linear Programming

News. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example

GSLM Operations Research II Fall 13/14

Problem Set 3 Solutions

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Algorithmic Transformation Techniques for Efficient Exploration of Alternative Application Instances

Efficient Code Generation for Automatic Parallelization and Optimization

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Support Vector Machines

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

A SYSTOLIC APPROACH TO LOOP PARTITIONING AND MAPPING INTO FIXED SIZE DISTRIBUTED MEMORY ARCHITECTURES

Cluster Analysis of Electrical Behavior

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Some Tutorial about the Project. Computer Graphics

CSE 326: Data Structures Quicksort Comparison Sorting Bound

Insertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array

BFF1303: ELECTRICAL / ELECTRONICS ENGINEERING. Direct Current Circuits : Methods of Analysis

AMath 483/583 Lecture 21 May 13, Notes: Notes: Jacobi iteration. Notes: Jacobi with OpenMP coarse grain

CHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vidyanagar

Midterms Save the Dates!

Support Vector Machines. CS534 - Machine Learning

The Codesign Challenge

such that is accepted of states in , where Finite Automata Lecture 2-1: Regular Languages be an FA. A string is the transition function,

3D Metric Reconstruction with Auto Calibration Method CS 283 Final Project Tarik Adnan Moon

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Assembler. Building a Modern Computer From First Principles.

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE

Outline. Midterm Review. Declaring Variables. Main Variable Data Types. Symbolic Constants. Arithmetic Operators. Midterm Review March 24, 2014

On Some Entertaining Applications of the Concept of Set in Computer Science Course

Solitary and Traveling Wave Solutions to a Model. of Long Range Diffusion Involving Flux with. Stability Analysis

Programming Assignment Six. Semester Calendar. 1D Excel Worksheet Arrays. Review VBA Arrays from Excel. Programming Assignment Six May 2, 2017

Feature Reduction and Selection

Reading. 14. Subdivision curves. Recommended:

Dijkstra s Single Source Algorithm. All-Pairs Shortest Paths. Dynamic Programming Solution. Performance. Decision Sequence.

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016)

3D vector computer graphics

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL)

Esc101 Lecture 1 st April, 2008 Generating Permutation

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

Private Information Retrieval (PIR)

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Sorting: The Big Picture. The steps of QuickSort. QuickSort Example. QuickSort Example. QuickSort Example. Recursive Quicksort

Classification / Regression Support Vector Machines

Computer Vision. Exercise Session 1. Institute of Visual Computing

Learning to Project in Multi-Objective Binary Linear Programming

Lecture - Data Encryption Standard 4

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Improving High Level Synthesis Optimization Opportunity Through Polyhedral Transformations

y and the total sum of

Agenda & Reading. Simple If. Decision-Making Statements. COMPSCI 280 S1C Applications Programming. Programming Fundamentals

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Sorting. Sorting. Why Sort? Consistent Ordering

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 5 Luca Trevisan September 7, 2017

Recognizing Faces. Outline

Lecture 5: Multilayer Perceptrons

CS 534: Computer Vision Model Fitting

Parallel Solutions of Indexed Recurrence Equations

Programming in Fortran 90 : 2017/2018

Introduction to Programming. Lecture 13: Container data structures. Container data structures. Topics for this lecture. A basic issue with containers

CS240: Programming in C. Lecture 12: Polymorphic Sorting

CMPS 10 Introduction to Computer Science Lecture Notes

Analysis of Continuous Beams in General

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

Algorithm To Convert A Decimal To A Fraction

Hermite Splines in Lie Groups as Products of Geodesics

SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR

All-Pairs Shortest Paths. Approximate All-Pairs shortest paths Approximate distance oracles Spanners and Emulators. Uri Zwick Tel Aviv University

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

Sorting. Sorted Original. index. index

Machine Learning. Topic 6: Clustering

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

CE 221 Data Structures and Algorithms

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

LOOP ANALYSIS. The second systematic technique to determine all currents and voltages in a circuit

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Transcription:

Loop Transformatons, Dependences, and Parallelzaton Announcements Mdterm s Frday from 3-4:15 n ths room Today Semester long project Data dependence recap Parallelsm and storage tradeoff Scalar expanson example Skewng Smth-Waterman Automatng transformatons lke skewng Iteraton space representaton Transformaton representaton Applyng the transformaton to the teraton space Generatng code for the new teraton space CS 553 Intro to Automatng Loop Transformatons 1

Semester Long Project Posted Onlne Man Idea (fnd a program analyss and/or transformaton tool) Demonstrate usage of the tool to the rest of the class (10 mnutes, 2-page tutoral) Fnd 10+ related papers and descrbe research problem space Descrbe the space of solutons presented n the papers Evaluate the tool on a benchmark. How well does t solve the problem? What are some lmtatons? Present your fndngs to the rest of the class. Requrements Project proposal due next Frday October 17 th In-class demos and 2-page tutorals due Monday November 17th Fnal report due Frday December 12th In-class presentatons Wednesday December 17 th, 4:10-6:10pm CS 553 Intro to Automatng Loop Transformatons 2

Parallelsm and Storage Usage Tradeoff False dependences lmt parallelsm Removng false dependences requres more memory/storage Obtanng performance requres fndng an effectve tradeoff CS 553 Intro to Automatng Loop Transformatons 3

Loop-Carred, Storage-Related Dependences Problem Loop-carred dependences nhbt parallelsm Scalar references result n loop-carred dependences Example!do = 1,6!!!! t = A() + B()!! C() = t + 1/t!!! Can ths loop be parallelzed? What knd of dependences are these? No. Ant dependences. Conventon for these sldes: Arrays start wth upper case letters, scalars do not CS 553 Intro to Automatng Loop Transformatons 4

Removng False Dependences wth Scalar Expanson Idea Elmnate false dependences by ntroducng extra storage Example do = 1,6 T() = A() + B() C() = T() + 1/T() t = T[6] Can ths loop be parallelzed? Dsadvantages? CS 553 Intro to Automatng Loop Transformatons 5

Scalar Expanson Detals Restrctons The loop must be a countable loop.e. The loop trp count must be ndependent of the body of the loop The expanded scalar must have no upward exposed uses n the loop do = 1,6 prnt(t) t = A() + B() C() = t + 1/t - Nested loops may requre much more storage - When the scalar s lve after the loop, we must move the correct array value nto the scalar - Prvatzaton s another approach that s smlar, one scalar per thread CS 553 Intro to Automatng Loop Transformatons 6

Automatng Loop Transformatons wth Frameworks Currently Frameworks used n compler to abstract loops, memory accesses, and data dependences n loop specfy the effect of a sequence of loop transformatons on the loop, ts memory accesses, and ts data dependences generate code from the transformed loop Loop transformatons affect the schedule of the loop Future How can framework technology be exposed n the programmng model? Frameworks we wll dscuss ths semester Unmodular Polyhedral Presburger Sparse Polyhedral CS 553 Intro to Automatng Loop Transformatons 7

Proten Strng Matchng Example (smthwaterman.c) for (=1;<=a[0];++) {! for (j=1;j<=b[0];j++) {! dag = h[-1][j-1] + sm[a[]][b[j]];! down = h[-1][j] + DELTA;! rght = h[][j-1] + DELTA;! max=max3(dag,down,rght);! f (max <= 0) {! h[][j]=0; xtraceback[][j]=-1; ytraceback[][j]=-1;! } else f (max == dag) {! h[][j]=dag; xtraceback[][j]=-1; ytraceback[][j]=j-1;! } else f (max == down) {! h[][j]=down; xtraceback[][j]=-1; ytraceback[][j]=j;! } else {! h[][j]=rght; xtraceback[][j]=; ytraceback[][j]=j-1;! }! f (max > Max){! Max=max; xmax=; ymax=j;! }! }} // end for loops CS 553 Intro to Automatng Loop Transformatons 8

Skewng (smthwaterman.c) // Let j =+j and =.! for ( =1; <=a[0]; ++) {! for (j = +1;j <= +b[0];j ++) {! dag = h[ -1][j - -1] + sm[a[ ]][b[j - ]];! down = h[ -1][j - ] + DELTA;! rght = h[ ][j - -1] + DELTA;! max=max3(dag,down,rght);! f (max <= 0) {! h[ ][j - ]=0; xtraceback[ ][j - ]=-1; ytraceback[ ][j - ]=-1;! } else f (max == dag) {! h[ ][j - ]=dag; xtraceback[ ][j - ]= -1;! ytraceback[ ][j - ]=j - -1;! } else f (max == down) {! h[ ][j - ]=down; xtraceback[ ][j - ]= -1;! ytraceback[ ][j - ]=j - ;! } else {! h[ ][j - ]=rght; xtraceback[ ][j - ]= ;! ytraceback[ ][j - ]=j - -1;! }! f (max > Max){ Max=max; xmax= ; ymax=j - ;! }! }} // end for loops CS 553 Intro to Automatng Loop Transformatons 9

Iteraton Space Representaton Orgnal code do = 1,6 do j = 1,5 A(,j) = A(-1,j+1)+1 j Represent the teraton space As an ntersecton of nequaltes The teraton space s the nteger tuples wthn the ntersecton Bounds: 1 <= <= 6 1 <= j j <= 5 CS 553 Intro to Automatng Loop Transformatons 10

Lexcographcal Order as Schedule Iteraton pont Integer tuple wth dmensonalty d ( 0, 1,..., d ) Lexcographcal Order Frst order the teraton ponts by _0, then _1, and fnally _d. ( 0, 1,..., d 1 ) ( 0, 1,..., d 1 ) ( 0 <j 0 ) ( 0 = j 0 1 <j 1 )...( 0 = j 0 1 = j 1... d 1 = j d 1 ) CS 553 Intro to Automatng Loop Transformatons 11

Frameworks for Loop Transformatons Loop Transformatons as functons = f() Unmodular Loop Transformatons [Banerjee 90],[Wolf & Lam 91] can represent loop permutaton, loop reversal, and loop skewng unmodular lnear mappng (determnant of matrx s + or - 1) T s a matrx, and are teraton vectors example lmtatons apple 0 j 0 = only perfectly nested loops = T apple 0 1 1 1 apple j all statements are transformed the same CS 553 Intro to Automatng Loop Transformatons 12

Loop Skewng Orgnal code do = 1,6 do j = 1,5 A(,j) = A(-1,j+1)+1 j Dstance vector: (1, -1) Skewng: j CS 553 Intro to Automatng Loop Transformatons 13

Transformng the Dependences and Array Accesses Orgnal code do = 1,6 do j = 1,5 A(,j) = A(-1,j+1)+1 Dependence vector: j A A New Array Accesses: 1 0 0 1 apple 1 0 0 1 apple 1 0 A 0 1 apple 1 0 A 0 1 j 0 + 0 apple 1 0 1 1 apple j apple + apple 1 0 1 1 = A(, j) 1 1 apple 0 j 0 + apple 0 0 = A( 0,j 0 0 ) = A( 1,j+ 1) apple apple 0 1 j 0 + = A( 0 1,j 0 0 + 1) 1 j CS 553 Intro to Automatng Loop Transformatons 14

Transformng the Loop Bounds Orgnal code do = 1,6 do j = 1,5 A(,j) = A(-1,j+1)+1 Bounds: j Transformed code do = 1,6 do j = 1+,5+ A(,j - ) = A( -1,j - +1)+1 CS 553 Intro to Automatng Loop Transformatons 15 j

Revstng (smthwaterman.c) for (=1;<=a[0];++) {! for (j=1;j<=b[0];j++) {! dag = h[-1][j-1] + sm[a[]][b[j]];! down = h[-1][j] + DELTA;! rght = h[][j-1] + DELTA;!! Let j =+j and =. for ( =1; <=a[0]; ++) {! for (j =+1;j <=+b[0];j ++) {! dag = h[ -1][j - -1] + sm[a[]][b[j - ]];! down = h[ -1][j - ] + DELTA;! rght = h[ ][j - -1] + DELTA;!! CS 553 Intro to Automatng Loop Transformatons 16

Transformaton Legalty Recall A dependence vector s legal f t s lexcographcally non-negatve. Applyng the transformaton functon to each dependence vector produces a dependence vector for the new teraton space. When s a transformaton legal assumng a lexcographcal schedule? What about parallelsm? CS 553 Intro to Automatng Loop Transformatons 17

Convertng C loops to teraton space representaton Analyses needed Loop analyss Loop bounds from AST or control-flow graph Inducton varable detecton Ponter analyss Do ponters pont at same or overlappng memory? Note that n C can cast a ponter to an nteger and back and can do ponter arthmetc. In general requres whole program analyss. Dependence analyss Is ths even possble? Current tools make the optmstc ponter assumpton We need programmng models that smplfy or remove the need for such analyses CS 553 Intro to Automatng Loop Transformatons 18

Concepts Parallelsm and Memory Usage tradeoff Transformaton Frameworks Representng the teraton space Representng transformatons Applyng transformatons to the teraton space, dependences, and array accesses Testng the legalty of a transformaton Compler analyses needed n C to obtan an teraton space representaton References [Banerjee90] Uptal Banerjee, Unmodular transformatons of double loops, In Advances n Languages and Complers for Parallel Computng, 1990. [Wolf & Lam 91] Wolf and Lam, A Data Localty Optmzng Algorthm, In Programmng Languages Desgn and Implementaton, 1991. CS 553 Intro to Automatng Loop Transformatons 19

Next Tme Homework Study for the mdterm by dong example problems. Lecture Mdterm revew After mdterm: Usng the unmodular framework to represent other loop transformatons CS 553 Intro to Automatng Loop Transformatons 20