Loop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation

Similar documents
Loop Permutation. Loop Transformations for Parallelism & Locality. Legality of Loop Interchange. Loop Interchange (cont)

Loop Transformations, Dependences, and Parallelization

LLVM passes and Intro to Loop Transformation Frameworks

Today Using Fourier-Motzkin elimination for code generation Using Fourier-Motzkin elimination for determining schedule constraints

Vectorization in the Polyhedral Model

Polyhedral Compilation Foundations

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Lecture 15: Memory Hierarchy Optimizations. I. Caches: A Quick Review II. Iteration Space & Loop Transformations III.

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016)

Support Vector Machines

AMath 483/583 Lecture 21 May 13, Notes: Notes: Jacobi iteration. Notes: Jacobi with OpenMP coarse grain

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

Cluster Analysis of Electrical Behavior

3D vector computer graphics

Private Information Retrieval (PIR)

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

LECTURE NOTES Duality Theory, Sensitivity Analysis, and Parametric Programming

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 5 Luca Trevisan September 7, 2017

Computer Animation and Visualisation. Lecture 4. Rigging / Skinning

PYTHON IMPLEMENTATION OF VISUAL SECRET SHARING SCHEMES

Review. Loop Fusion Example

ADJUSTING A PROGRAM TRANSFORMATION FOR LEGALITY

Improving High Level Synthesis Optimization Opportunity Through Polyhedral Transformations

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

The relation between diamond tiling and hexagonal tiling

Array transposition in CUDA shared memory

Analysis of Continuous Beams in General

Loop Transformations! Part II!

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Lecture 5: Multilayer Perceptrons

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

Feature Reduction and Selection

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL)

APPLICATION OF A COMPUTATIONALLY EFFICIENT GEOSTATISTICAL APPROACH TO CHARACTERIZING VARIABLY SPACED WATER-TABLE DATA

Agenda & Reading. Simple If. Decision-Making Statements. COMPSCI 280 S1C Applications Programming. Programming Fundamentals

Solitary and Traveling Wave Solutions to a Model. of Long Range Diffusion Involving Flux with. Stability Analysis

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

Monte Carlo Rendering

S1 Note. Basis functions.

Hybrid Non-Blind Color Image Watermarking

Reading. 14. Subdivision curves. Recommended:

Midterms Save the Dates!

The Codesign Challenge

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Problem Set 3 Solutions

Hierarchical clustering for gene expression data analysis

Face Recognition Method Based on Within-class Clustering SVM

Petri Net Based Software Dependability Engineering

Programming Assignment Six. Semester Calendar. 1D Excel Worksheet Arrays. Review VBA Arrays from Excel. Programming Assignment Six May 2, 2017

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Sorting. Sorted Original. index. index

Design and Analysis of Algorithms

PROJECTIVE RECONSTRUCTION OF BUILDING SHAPE FROM SILHOUETTE IMAGES ACQUIRED FROM UNCALIBRATED CAMERAS

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access

Support Vector Machines. CS534 - Machine Learning

An efficient iterative source routing algorithm

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Optimization Methods: Integer Programming Integer Linear Programming 1. Module 7 Lecture Notes 1. Integer Linear Programming

On Some Entertaining Applications of the Concept of Set in Computer Science Course

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

The Polyhedral Model (Transformations)

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

5 The Primal-Dual Method

A SYSTOLIC APPROACH TO LOOP PARTITIONING AND MAPPING INTO FIXED SIZE DISTRIBUTED MEMORY ARCHITECTURES

Simplification of 3D Meshes

Preconditioning Parallel Sparse Iterative Solvers for Circuit Simulation

PHOTOGRAMMETRIC ANALYSIS OF ASYNCHRONOUSLY ACQUIRED IMAGE SEQUENCES

Improving Low Density Parity Check Codes Over the Erasure Channel. The Nelder Mead Downhill Simplex Method. Scott Stransky

REDUCING hardware design time is more than ever a

Harmonic Coordinates for Character Articulation PIXAR

Efficient Broadcast Disks Program Construction in Asymmetric Communication Environments

Multi-Resolution Geometric Fusion

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE

Loop Transformations, Dependences, and Parallelization

Tiling: A Data Locality Optimizing Algorithm

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR

Image Alignment CSC 767

Using Fuzzy Logic to Enhance the Large Size Remote Sensing Images

Optimization and Parallelization of Sequential Programs

Machine Learning. Topic 6: Clustering

Hermite Splines in Lie Groups as Products of Geodesics

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

Concurrent Apriori Data Mining Algorithms

Assembler. Building a Modern Computer From First Principles.

Simulation Based Analysis of FAST TCP using OMNET++

Efficient Code Generation for Automatic Parallelization and Optimization

RESEARCH ON EQUIVALNCE OF SPATIAL RELATIONS IN AUTOMATIC PROGRESSIVE CARTOGRAPHIC GENERALIZATION

Object-Based Techniques for Image Retrieval

GLORE: Generalized Loop Redundancy Elimination upon LER-Notation

BFF1303: ELECTRICAL / ELECTRONICS ENGINEERING. Direct Current Circuits : Methods of Analysis

such that is accepted of states in , where Finite Automata Lecture 2-1: Regular Languages be an FA. A string is the transition function,

Transcription:

Loop Transformatons for Parallelsm & Localty Last week Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Scalar expanson for removng false dependences Loop nterchange Loop transformatons and transformaton frameworks Loop permutaton Loop reversal Loop skewng Loop fuson Revew Dstance vectors Concsely represent dependences n loops (.e., n teraton spaces) Dctate what transformatons are legal e.g., Permutaton and parallelzaton Legalty A dependence vector s legal when t s lexcographcally nonnegatve Loop-carred dependence A dependence D=(d 1,...d n ) s carred at loop level f d s the frst nonzero element of D CS553 Lecture Loop Transformaton CS553 Lecture Loop Transformaton Scalar Expanson: Motvaton Problem Loop-carred dependences nhbt parallelsm Scalar references result n loop-carred dependences t = A() + B() C() = t + 1/t Can ths loop be parallelzed? What knd of dependences are these? No. Ant dependences. Scalar Expanson Elmnate false dependences by ntroducng extra storage T() = A() + B() C() = T() + 1/T() Can ths loop be parallelzed? Dsadvantages? Conventon for these sldes: Arrays start wth upper case letters, scalars do not CS553 Lecture Loop Transformatons 4 CS553 Lecture Loop Transformatons 5 1

Scalar Expanson Detals Restrctons The loop must be a countable loop.e. The loop trp count must be ndependent of the body of the loop The expanded scalar must have no upward exposed uses n the loop prnt(t) t = A() + B() C() = t + 1/t Nested loops may requre much more storage When the scalar s lve after the loop, we must move the correct array value nto the scalar Loop Permutaton Swap the order of two loops to ncrease parallelsm, to mprove spatal localty, or to enable other transformatons Also known as loop nterchange do = 1,n x = A(2,) a row of A Ths access strdes through do = 1,n x = A(2,) Ths code s nvarant wth respect to the nner loop, yeldng better localty CS553 Lecture Loop Transformatons 6 CS553 Lecture Loop Transformatons 7 Loop Interchange (cont) do = 1,n x = A(,) Ths array has strde n access do = 1,n x = A(,) (Assumng column-maor order for Fortran) Ths array now has strde 1 access Legalty of Loop Interchange Case analyss of the drecton vectors (=,=) The dependence s loop ndependent, so t s unaffected by nterchange (=,<) The dependence s carred by the loop. After nterchange the dependence wll be (<,=), so the dependence wll stll be carred by the loop, so the dependence relatons do not change. (<,=) The dependence s carred by the loop. After nterchange the dependence wll be (=,<), so the dependence wll stll be carred by the loop, so the dependence relatons do not change. CS553 Lecture Loop Transformatons 8 CS553 Lecture Loop Transformatons 9 2

Legalty of Loop Interchange (cont) Case analyss of the drecton vectors (cont.) (<,<) The dependence dstance s postve n both dmensons. After nterchange t wll stll be postve n both dmensons, so the dependence relatons do not change. (<,>) The dependence s carred by the outer loop. After nterchange the dependence wll be (>,<), whch changes the dependences and results n an llegal drecton vector, so nterchange s llegal. (>,*) (=,>) Such drecton vectors are not possble for the orgnal loop. Loop Interchange Consder the (<,>) case do = 1,n C(,) = C(+1,-1) Before (1,1) C(1,1) = C(2,0) (1,2) C(1,2) = C(2,1)... (2,1) C(2,1) = C(3,0) δ a do = 1,n C(,) = C(+1,-1) After (1,1) C(1,1) = C(2,0) (2,1) C(2,1) = C(3,0)... (1,2) C(1,2) = C(2,1) δ f CS553 Lecture Loop Transformaton0 CS553 Lecture Loop Transformaton1 Frameworks for Loop Transformatons Unmodular Loop Transformatons [Baneree 90],[Wolf & Lam 91] can represent loop permutaton, loop reversal, and loop skewng unmodular lnear mappng (determnant of matrx s + or - 1) T =, T s a matrx, and are teraton vectors transformaton s legal f the transformed dependence vector reman lexcographcally postve lmtatons only perfectly nested loops all statements are transformed the same CS553 Lecture Loop Transformaton2 Legalty of Loop Interchange, Reprse Reduced case analyss of the drecton vectors (=,=) The dependence s loop ndependent, so t s unaffected by nterchange (=,<) The dependence s carred by the loop. After nterchange the dependence wll be (<,=), so the dependence wll stll be carred by the loop, so the dependence relatons do not change. (<,>) The dependence s carred by the outer loop. After nterchange the dependence wll be (>,<), whch changes the dependences and results n an llegal drecton vector, so nterchange s llegal. CS553 Lecture Loop Transformaton3 3

Loop Reversal Change the drecton of loop teraton (.e., From low-to-hgh ndces to hgh-to-low ndces or vce versa) Benefts Improved cache performance Enables other transformatons (comng soon) do = 6,1,-1 A() = B() + C() A() = B() + C() CS553 Lecture Loop Transformaton4 Loop Reversal and Dstance Vectors Impact Reversal of loop negates the th entry of all dstance vectors assocated wth the loop What about drecton vectors? When s reversal legal? When the loop beng reversed does not carry a dependence (.e., When the transformed dstance vectors reman legal) do = 1,5 do = 1,6 A(,) = A(-1,-1)+1 Dependence: Dstance Vector: Transformed Dstance Vector: Flow (1,1) (1,-1) legal CS553 Lecture Loop Transformaton5 Loop Reversal Loop Skewng Legalty Loop reversal wll change the drecton of the dependence relaton Is the followng legal? A() = A(-1) Dependence: Dstance Vector: Flow (1) Orgnal code do = 1,5 A(,) = A(-1,+1)+1 Dstance vector: (1, -1) Can we permute the orgnal loop? do = 6,1,-1 A() = A(-1) Dependence: Dstance Vector: Ant (1) Flow ( 1) Skewng: CS553 Lecture Loop Transformaton6 CS553 Lecture Loop Transformaton7 4

Transformng the Dependences and Array Accesses Transformng the Loop Bounds Orgnal code Orgnal code do = 1,5 A(,) = A(-1,+1)+1 Dependence vector: do = 1,5 A(,) = A(-1,+1)+1 Bounds: New Array Accesses: CS553 Lecture Loop Transformaton8 Transformed code do = 1,6 do = 1+,5+ A(, - ) = A( -1, - +1)+1 CS553 Lecture Loop Transformaton9 Loop Fuson Combne multple loop nests nto one do = 1,n A() = A(-1) B() = A()/2 do = 1,n A() = A(-1) B() = A()/2 Pros Cons May mprove data localty May hurt data localty Reduces loop overhead May hurt cache performance Enables array contracton (opposte of scalar expanson) May enable better nstructon schedulng CS553 Lecture Loop Transformaton0 Legalty of Loop Fuson Basc Condtons Both loops must have same structure Same loop depth Same loop bounds Can we relax any of these restrctons? Same teraton drectons Dependences must be preserved e.g., Flow dependences must not become ant dependences do = 1,n body1 do = 1,n body2 All cross-loop dependences flow from body1 to body2 do = 1,n body1 body2 Ensure that fuson does not ntroduce dependences from body2 to body1 CS553 Lecture Loop Transformaton1 5

Loop Fuson What are the dependences? do = 1,n A() = B() + 1 δ f do = 1,n C() = A()/2 do = 1,n D() = 1/C(+1) What are the dependences? do = 1,n A() = B() + 1 δ f C() = A()/2 δ a D() = 1/C(+1) Fuson changes the dependence between and, so fuson s llegal Is there some transformaton that wll enable fuson of these loops? Loop Fuson (cont) Loop reversal s legal for the orgnal loops Does not change the drecton of any dep n the orgnal code Wll reverse the drecton n the fused loop: δ a wll become do = n,1 A() = B() + 1 δ f do = n,1 C() = A()/2 do = n,1 D() = 1/C(+1) do = n,1,-1 A() = B() + 1 δ f C() = A()/2 D() = 1/C(+1) After reversal and fuson all orgnal dependences are preserved CS553 Lecture Loop Transformaton2 CS553 Lecture Loop Transformaton3 Concepts Usng drecton and dstance vectors Transformaton legalty (from prevous) must respect data dependences scalar expanson as a technque to remove ant and output dependences Next Tme Lecture More loop transformatons An even cooler transformaton framework Transformatons: What s the beneft? What do they enable? When are they legal? Unmodular transformaton framework represents loop permutaton, loop reversal, and loop skewng provdes mathematcal framework for... testng transformaton legalty, transformng array accesses and loop bounds, and combnng transformatons CS553 Lecture Loop Transformaton4 CS553 Lecture Loop Transformaton5 6