Advanced Compiler Construction Theory And Practice

Similar documents
Advanced Compiler Construction

CS 293S Parallelism and Dependence Theory


Compiling for Advanced Architectures

Exploring Parallelism At Different Levels

Autotuning. John Cavazos. University of Delaware UNIVERSITY OF DELAWARE COMPUTER & INFORMATION SCIENCES DEPARTMENT

Transforming Imperfectly Nested Loops

Loop Transformations! Part II!

Data Dependence Analysis

Theory and Algorithms for the Generation and Validation of Speculative Loop Optimizations

A Crash Course in Compilers for Parallel Computing. Mary Hall Fall, L2: Transforms, Reuse, Locality

Control flow graphs and loop optimizations. Thursday, October 24, 13

6.189 IAP Lecture 11. Parallelizing Compilers. Prof. Saman Amarasinghe, MIT IAP 2007 MIT

Loops. Lather, Rinse, Repeat. CS4410: Spring 2013

Outline. L5: Writing Correct Programs, cont. Is this CUDA code correct? Administrative 2/11/09. Next assignment (a homework) given out on Monday

Program Analysis And Its Support in Software Development

Enhancing Parallelism

Parallel Programming Patterns

Review. Loop Fusion Example

Parallelization. Saman Amarasinghe. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

The Polyhedral Model (Transformations)

Theory and Algorithms for the Generation and Validation of Speculative Loop Optimizations

Spring 2 Spring Loop Optimizations

Lecture 10: Static ILP Basics. Topics: loop unrolling, static branch prediction, VLIW (Sections )

Overpartioning with the Rice dhpf Compiler

CS4961 Parallel Programming. Lecture 2: Introduction to Parallel Algorithms 8/31/10. Mary Hall August 26, Homework 1, cont.

Outline. Why Parallelism Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization Opportunities

CSE 501: Compiler Construction. Course outline. Goals for language implementation. Why study compilers? Models of compilation

Data Dependences and Parallelization

Parallelisation. Michael O Boyle. March 2014

Coarse-Grained Parallelism

Tiling: A Data Locality Optimizing Algorithm

Introduction to optimizations. CS Compiler Design. Phases inside the compiler. Optimization. Introduction to Optimizations. V.

Compiler techniques for leveraging ILP

Dependence Analysis. Hwansoo Han

Lecture 9 Basic Parallelization

Lecture 9 Basic Parallelization

Optimization Prof. James L. Frankel Harvard University

CS 701. Class Meets. Instructor. Teaching Assistant. Key Dates. Charles N. Fischer. Fall Tuesdays & Thursdays, 11:00 12: Engineering Hall

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

16.10 Exercises. 372 Chapter 16 Code Improvement. be translated as

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Loop Optimizations. Outline. Loop Invariant Code Motion. Induction Variables. Loop Invariant Code Motion. Loop Invariant Code Motion

Module 18: Loop Optimizations Lecture 35: Amdahl s Law. The Lecture Contains: Amdahl s Law. Induction Variable Substitution.

A loopy introduction to dependence analysis

Administration. Prerequisites. Meeting times. CS 380C: Advanced Topics in Compilers

Tiling: A Data Locality Optimizing Algorithm

PERFORMANCE OPTIMISATION

Loop Transformations, Dependences, and Parallelization

Analysis and Transformation in an. Interactive Parallel Programming Tool.

Structured Parallel Programming Patterns for Efficient Computation

Transforming Complex Loop Nests For Locality

Optimising with the IBM compilers

Loops and Locality. with an introduc-on to the memory hierarchy. COMP 506 Rice University Spring target code. source code OpJmizer

Compilation for Heterogeneous Platforms

A Framework for Safe Automatic Data Reorganization

Rechnerarchitektur (RA)

CSC D70: Compiler Optimization Prefetching

CPSC 510 (Fall 2007, Winter 2008) Compiler Construction II

Program Transformations for the Memory Hierarchy

Office Hours: Mon/Wed 3:30-4:30 GDC Office Hours: Tue 3:30-4:30 Thu 3:30-4:30 GDC 5.

Structured Parallel Programming

Assignment statement optimizations

CS4230 Parallel Programming. Lecture 7: Loop Scheduling cont., and Data Dependences 9/6/12. Administrative. Mary Hall.

Automatic Translation of Fortran Programs to Vector Form. Randy Allen and Ken Kennedy

Polyèdres et compilation

Review. Topics. Lecture 3. Advanced Programming Topics. Review: How executable files are generated. Defining macros through compilation flags

Module 13: INTRODUCTION TO COMPILERS FOR HIGH PERFORMANCE COMPUTERS Lecture 25: Supercomputing Applications. The Lecture Contains: Loop Unswitching

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

USC 227 Office hours: 3-4 Monday and Wednesday CS553 Lecture 1 Introduction 4

CS415 Compilers Register Allocation. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

ECE 454 Computer Systems Programming

Module 17: Loops Lecture 34: Symbolic Analysis. The Lecture Contains: Symbolic Analysis. Example: Triangular Lower Limits. Multiple Loop Limits

RECAP. B649 Parallel Architectures and Programming

Simone Campanoni Loop transformations

Lecture: Static ILP. Topics: loop unrolling, software pipelines (Sections C.5, 3.2)

Loop Modifications to Enhance Data-Parallel Performance

A Review - LOOP Dependence Analysis for Parallelizing Compiler

Into the Loops: Practical Issues in Translation Validation for Optimizing Compilers

Administrative. Optimizing Stencil Computations. March 18, Stencil Computations, Performance Issues. Stencil Computations 3/18/13

Lecture 21. Software Pipelining & Prefetching. I. Software Pipelining II. Software Prefetching (of Arrays) III. Prefetching via Software Pipelining

Introduction to Machine-Independent Optimizations - 1

CSC D70: Compiler Optimization Memory Optimizations

CSE Qualifying Exam, Spring February 2, 2008

Tour of common optimizations

Lecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time

Software Pipelining by Modulo Scheduling. Philip Sweany University of North Texas

Advanced optimizations of cache performance ( 2.2)

Supercomputing in Plain English Part IV: Henry Neeman, Director

Linear Loop Transformations for Locality Enhancement

Array Optimizations in OCaml

Module 18: Loop Optimizations Lecture 36: Cycle Shrinking. The Lecture Contains: Cycle Shrinking. Cycle Shrinking in Distance Varying Loops

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano

Final Exam Review Questions

Essential constraints: Data Dependences. S1: a = b + c S2: d = a * 2 S3: a = c + 2 S4: e = d + c + 2

CSE P 501 Compilers. Loops Hal Perkins Spring UW CSE P 501 Spring 2018 U-1

Cache Memories. Lecture, Oct. 30, Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition

Lecture 4. Instruction Level Parallelism Vectorization, SSE Optimizing for the memory hierarchy

Static Data Race Detection for SPMD Programs via an Extended Polyhedral Representation

What Do Compilers Do? How Can the Compiler Improve Performance? What Do We Mean By Optimization?

Transcription:

Advanced Compiler Construction Theory And Practice Introduction to loop dependence and Optimizations 7/7/2014 DragonStar 2014 - Qing Yi 1

A little about myself Qing Yi Ph.D. Rice University, USA. Associate Professor, University of Colorado at Colorado Springs Research Interests Compiler construction, software productivity program analysis and optimization for high-performance computing Parallel programming Overall goal: develop tools to improve the productivity and efficiency of programming Optimizing compilers for high performance computing Programmable optimization and tuning of Scientific codes 7/7/2014 DragonStar 2014 - Qing Yi 2

General Information Reference books n n n Optimizing Compilers for Modern Architectures: A Dependencebased Approach, Randy Allen and Ken Kennedy, Morgan-Kauffman Publishers Inc. Book Chapter: Optimizing And Tuning Scientific Codes Qing Yi. SCALABLE COMPUTING AND COMMUNICATIONS:THEORY AND PRACTICE. http://www.cs.uccs.edu/~qyi/papers/bookchapter11.pdf Structured Parallel Programming Patterns for Efficient Computation Michael McCool, James Reinders, and Arch Robison. Morgan Kaufmann. 2012. http://parallelbook.com/sites/parallelbook.com/files/sc13_20131117_ Intel_McCool_Robison_Reinders_Hebenstreit.pdf Course materials n http://www.cs.uccs.edu/~qyi/classes/dragonstar The POET project web site: n http://www.cs.uccs.edu/~qyi/poet/index.php 7/7/2014 DragonStar 2014 - Qing Yi 3

High Performance Computing Applications must efficiently manage architectural components n Parallel processing Multiple execution units --- pipelined Vector operations, multi-core Multi-tasking, multi-threading (SIMD), and messagepassing n Data management Registers Cache hierarchy Shared and distributed memory n Combinations of the above What are the compilation challenges? 7/7/2014 DragonStar 2014 - Qing Yi 4

Optimizing For High Performance Goal: eliminating inefficiencies in programs Eliminating redundancy: if an operation has already been evaluated, don t do it again n Especially if the operation is inside loops or part of a recursive evaluation n All optimizing compilers apply redundancy elimination, e.g., loop invariant code motion, value numbering, global RE Resource management: reorder operations and/ or data to better map to the targeting machine n Reorder computation(operations) parallelization, vectorization, pipelining, VLIW, memory reuse Instruction scheduling and loop transformations n Re-organization of data Register allocation, regrouping of arrays and data structures 7/7/2014 DragonStar 2014 - Qing Yi 5

Optimizing For Modern Architectures Key: reorder operations to better manage resources n Parallelization and vectorization n memory hierarchy management n Instruction and task/threads scheduling n Interprocedural (whole-program) optimizations Most compilers focus on optimizing loops, why? n This is where the application spends most of its computing time n What about recursive function/procedural calls? Extremely important, but often left unoptimized 7/7/2014 DragonStar 2014 - Qing Yi 6

Compiler Technologies Source-level optimizations n Most architectural issues can be dealt with by restructuring the program source Vectorization, parallelization, data locality enhancement n Challenges: Determining when optimizations are legal Selecting optimizations based on profitability Assembly level optimizations n Some issues must be dealt with at a lower level Prefetch insertion Instruction scheduling All require some understanding of the ways that instructions and statements depend on one another (share data) 7/7/2014 DragonStar 2014 - Qing Yi 7

Syllabus Dependence Theory and Practice n Automatic detection of parallelism. n Types of dependences; Testing for dependence; Memory Hierarchy Management n Locality enhancement; data layout management; n Loop interchange, blocking, unroll&jam, unrolling, Loop Parallelization n More loop optimizations: OMP parallelization, skewing, fusion, n Private and reduction variables Pattern-driven optimization n Structured Parallelization Patterns n Pattern-driven composition of optimizations Programmable optimization and tuning n Using POET to write your own optimizations 7/7/2014 DragonStar 2014 - Qing Yi 8

Dependence-based Optimization Bernstein s Conditions n it is safe to run two tasks R1 and R2 in parallel if none of the following holds: R1 writes into a memory location that R2 reads R2 writes into a memory location that R1 reads Both R1 and R2 write to the same memory location There is a dependence between two statements if n They might access the same location, n There is a path from one to the other, and n One of the accesses is a write Dependence can be used for n Automatic parallelization n Memory hierarchy management (registers and caches) n Scheduling of instructions and tasks 7/7/2014 DragonStar 2014 - Qing Yi 9

Dependence - Static Program Analysis Program analysis --- support software development and maintenance n Compilation --- identify errors without running program Smart development environment (check simple errors as you type) n Optimization --- cannot not change program meaning Improve performance, efficiency of resource utilization Code revision/re-factoring ==> reusability, maintainability, n Program correctness --- Is the program safe/correct? Program verification -- Is the implementation safe, secure? Program integration --- are there any communication errors? In contrast, if the program needs to be run to figure out information, it is called dynamic program analysis. 7/7/2014 DragonStar 2014 - Qing Yi 10

Data Dependences There is a data dependence from statement S 1 to S 2 if 1. Both statements access the same memory location, 2. At least one of them stores onto it, and 3. There is a feasible run-time execution path from S 1 to S 2 Classification of data dependence n True dependences (Read After Write hazard) S 2 depends on S 1 is denoted by S 1 δ S 2 n Anti dependence (Write After Read hazard) S 2 depends on S 1 is denoted by S 1 δ -1 S 2 n Output dependence (Write After Write hazard) S 2 depends on S 1 is denoted by S 1 δ 0 S 2 Simple example of data dependence: S 1 PI = 3.14 S 2 R = 5.0 S 3 AREA = PI * R ** 2 7/7/2014 DragonStar 2014 - Qing Yi 11

Transformations A reordering Transformation n Changes the execution order of the code, without adding or deleting any operations. Properties of Reordering Transformations n It does not eliminate dependences, but can change the ordering (relative source and sink) of a dependence n If a dependence is reverted by a reordering transformation, it may lead to incorrect behavior A reordering transformation is safe if it preserves the relative direction (i.e., the source and sink) of each dependence. 7/7/2014 DragonStar 2014 - Qing Yi 12

Dependence in Loops S 1 DO I = 1, N A(I+1) = A(I) + B(I) S 1 DO I = 1, N A(I+2) = A(I) + B(I) In both cases, statement S1 depends on itself n However, there is a significant difference We need to distinguish different iterations of loops n The iteration number of a loop is equal to the value of the loop index (loop induction variable) n Example:!!DO I = 0, 10, 2! S 1 <some statement>!!!! What about nested loops? n Need to consider the nesting level of a loop 7/7/2014 DragonStar 2014 - Qing Yi 13

Iteration Vectors Given a nest of n loops, iteration vector i is n A vector of integers {i 1, i 2,..., i n } where i k, 1 k n represents the iteration number for the loop at nesting level k Example: DO I = 1, 2 DO J = 1, 2 S 1 <some statement> n The iteration vector (2, 1) denotes the instance of S 1 executed during the 2nd iteration of the I loop and the 1st iteration of the J loop 7/7/2014 DragonStar 2014 - Qing Yi 14

Loop Iteration Space For each loop nest, its iteration space is n The set of all possible iteration vectors for a statement n Example: DO I = 1, 2 S1 DO J = 1, 2 <some statement> The iteration space for S1 is { (1,1),(1,2), (2,1),(2,2) } 7/7/2014 DragonStar 2014 - Qing Yi 15

Ordering of Iterations Within a single loop iteration space, n We can impose a lexicographic ordering among its iteration Vectors Iterations i precedes j, denoted i < j, iff n Iteration i is evaluated before j n That is, for some nesting level k 1. i[i:k-1] < j[1:k-1], or 2. i[1:k-1] = j[1:k-1] and i n < j n n Example: (1,1) < (1,2) < (2,1) < (2,2) 7/7/2014 DragonStar 2014 - Qing Yi 16

Loop Dependence There exists a dependence from statement S1 to S2 in a common nest of loops if and only if there exist two iteration vectors i and j for the nest, such that n (1) i < j or i = j and there is a path from S1 to S2 in the body of the loop, n (2) statement S1 accesses memory location M on iteration i and statement S2 accesses location M on iteration j, and n (3) one of these accesses is a write. 7/7/2014 DragonStar 2014 - Qing Yi 17

Distance and Direction Vectors Consider a dependence in a loop nest of n loops n Statement S1 on iteration i is the source of dependence n Statement S2 on iteration j is the sink of dependence The distance vector is a vector of length n d(i,j) such that: d(i,j)k = jk - Ik The direction Vector is a vector of length n D(i,j) such that (Definition 2.10 in the book) < if d(i,j)k > 0 D(i,j)k = = if d(i,j)k = 0 > if d(i,j)k < 0 What is the dependence distance/direction vector? DO I = 1, N DO J = 1, M DO K = 1, L S1 A(I+1, J, K-1) = A(I, J, K) + 10 7/7/2014 DragonStar 2014 - Qing Yi 18

Loop-carried and Loop-independent Dependences If in a loop statement S2 on iteration j depends on S1 on iteration i, the dependence is n Loop-carried if either of the following conditions is satisfied S1 and S2 execute on different iterations i.e., i j d(i,j) > 0 i.e. D(i,j) contains a < as leftmost non = component n Loop-independent if either of the conditions is satisfied S1 and S2 execute on the same iteration i.e., i=j d(i,j) = 0, i.e. D(i,j) contains only = component NOTE: there must be a path from S1 to S2 in the same iteration Example: DO I = 1, N S 1 A(I+1) = F(I)+ A(I) S 2 F(I) = A(I+1) 7/7/2014 DragonStar 2014 - Qing Yi 19

Level of loop dependence The level of a loop-carried dependence is the index of the leftmost non- = of D(i,j) n A level-k dependence from S 1 to S 2 is denoted S 1 δ k S 2 n A loop independent dependence from S1 to S 2 is denoted S 1 δ S 2 Example: DO I = 1, 10 DO J = 1, 10 DO K = 1, 10 S 1 A(I, J, K+1) = A(I, J, K) S2 F(I,J,K) = A(I,J,K+1) 7/7/2014 DragonStar 2014 - Qing Yi 20

Loop Optimizations A loop optimization may n Change the order of iterating the iteration space of each statement n Without altering the direction of any original dependence Example loop transformations n Change the nesting order of loops n Fuse multiple loops into one or split one into multiple n Change the enumeration order of each loop n And more Any loop reordering transformation that (1) does not alter the relative nesting order of loops and (2) preserves the iteration order of the level-k loop preserves all level-k dependences. 7/7/2014 DragonStar 2014 - Qing Yi 21

Dependence Testing DO i1 = L1, U1, S1 DO i2 = L2, U2, S2... DO in = Ln, Un, Sn S1 A(f1(i1,...,in),...,fm(i1,...,in)) =... S2... = A(g1(i1,...,in),...,gm(i1,...,in))... A dependence exists from S1 to S2 iff there exist iteration vectors x=(x1,x2,,xn) and y=(y1,y2,,yn) such that (1) x is lexicographically less than or equal to y; (2) the system of diophatine equations has an integer solution: fi(x) = gi(y) for all 1 i m i.e. fi(x1,,xn)-gi(y1,,yn)=0 for all 1 i m 22

Example DO I = 1, 10 DO J = 1, 10 DO K = 1, 10 S 1 A(I, J, K+1) = A(I, J, K) S2 F(I,J,K) = A(I,J,K+1) To determine the dependence between A(I,J,K+1) at iteration vector (I1,J1,K1) and A(I,J,K) at iteration vector (I2,J2,K2), solve the system of equations I1 = I2; J1=J2; K1+1=K2; 1 <= I1,I2,J1,J2,K1,K2 <= 10 n Distance vector is (I2-I1,J2-J1,K2-K1)=(0,0,1) n Direction vector is (=,=,<) n The dependence is from A(I,J,K+1) to A(I,J,K) and is a true dependence 23

The Delta Notation Goal: compute iteration distance between the source and sink of a dependence DO I = 1, N A(I + 1) = A(I) + B n Iteration at source/sink denoted by: I0 and I0+ΔI n Forming an equality gets us: I0 + 1 = I0 + ΔI n Solving this gives us: ΔI = 1 If a loop index does not appear, its distance is * n * means the union of all three directions <,>,= DO I = 1, 100 DO J = 1, 100 A(I+1) = A(I) + B(J) n The direction vector for the dependence is (<, *) 24

Complexity of Testing Find integer solutions to a system of Diophantine Equations is NP-Complete n Most methods consider only linear subscript expressions Conservative Testing n n Try to prove absence of solutions for the dependence equations Conservative, but never incorrect Categorizing subscript testing equations n ZIV if it contains no loop index variable n n SIV if it contains only one loop index variable MIV if it contains more than one loop index variables A(5,I+1,j) = A(1,I,k) + C 5=1 is ZIV; I1+1=I2 is SIV; J1=K2 is MIV 25

Summary Introducing data dependence n What is the meaning of S2 depends on S1? n What is the meaning of S 1 δ S 2, S 1 δ -1 S 2, S 1 δ 0 S 2? n What is the safety constraint of reordering transformations? Loop dependence n What is the meaning of iteration vector (3,5,7)? n What is the iteration space of a loop nest? n What is the meaning of iteration vector I < J? n What is the distance/direction vector of a loop dependence? n What is the relation between dependence distance and direction? n What is the safety constraint of loop reordering transformations? Level of loop dependence and transformations n What is the meaning of loop carried/independent dependences? n What is the level of a loop dependence or loop transformation? n What is the safety constraint of loop parallelization? Dependence testing theory 7/7/2014 DragonStar 2014 - Qing Yi 26