ECE 5775 (Fall 17) High-Level Digital Design Automation. Static Single Assignment

Similar documents
ECE 5775 High-Level Digital Design Automation Fall More CFG Static Single Assignment

Lecture 3 Overview of the LLVM Compiler

INTRODUCTION TO LLVM Bo Wang SA 2016 Fall

15-411: LLVM. Jan Hoffmann. Substantial portions courtesy of Deby Katz

LLVM & LLVM Bitcode Introduction

Lecture 3 Overview of the LLVM Compiler

Intermediate representation

Lecture 2 Overview of the LLVM Compiler

Lecture 23 CIS 341: COMPILERS

Reuse Optimization. LLVM Compiler Infrastructure. Local Value Numbering. Local Value Numbering (cont)

4/1/15 LLVM AND SSA. Low-Level Virtual Machine (LLVM) LLVM Compiler Infrastructure. LL: A Subset of LLVM. Basic Blocks

Plan for Today. Concepts. Next Time. Some slides are from Calvin Lin s grad compiler slides. CS553 Lecture 2 Optimizations and LLVM 1

Introduction to LLVM compiler framework

CSE Section 10 - Dataflow and Single Static Assignment - Solutions

Intermediate Representation (IR)

The LLVM Compiler Framework and Infrastructure

CS 426 Fall Machine Problem 4. Machine Problem 4. CS 426 Compiler Construction Fall Semester 2017

LLVM: A Compilation Framework. Transformation

Computing Static Single Assignment (SSA) Form. Control Flow Analysis. Overview. Last Time. Constant propagation Dominator relationships

CSE P 501 Compilers. SSA Hal Perkins Spring UW CSE P 501 Spring 2018 V-1

Intermediate Representations Part II

Introduction to LLVM compiler framework

SSA Construction. Daniel Grund & Sebastian Hack. CC Winter Term 09/10. Saarland University

Compiler Construction 2009/2010 SSA Static Single Assignment Form

Static Single Assignment (SSA) Form

Lecture 25 CIS 341: COMPILERS

Topic I (d): Static Single Assignment Form (SSA)

Using Static Single Assignment Form

Introduction to Machine-Independent Optimizations - 6

Goals of Program Optimization (1 of 2)

LLVM Tutorial. John Criswell University of Rochester

8. Static Single Assignment Form. Marcus Denker

EECS 583 Class 8 Classic Optimization

Lecture Compiler Middle-End

CS153: Compilers Lecture 23: Static Single Assignment Form

An example of optimization in LLVM. Compiler construction Step 1: Naive translation to LLVM. Step 2: Translating to SSA form (opt -mem2reg)

Intermediate Representations

Loop Optimizations. Outline. Loop Invariant Code Motion. Induction Variables. Loop Invariant Code Motion. Loop Invariant Code Motion

Advanced Compiler Construction

Compiler Design. Fall Data-Flow Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

6. Intermediate Representation!

What If. Static Single Assignment Form. Phi Functions. What does φ(v i,v j ) Mean?

Compilers and Code Optimization EDOARDO FUSELLA

Control-Flow Analysis

6. Intermediate Representation

Induction Variable Identification (cont)

LLVM code generation and implementation of nested functions for the SimpliC language

An Overview of GCC Architecture (source: wikipedia) Control-Flow Analysis and Loop Detection

Introduction to LLVM. UG3 Compiling Techniques Autumn 2018

BEAMJIT, a Maze of Twisty Little Traces

CS577 Modern Language Processors. Spring 2018 Lecture Optimization

This test is not formatted for your answers. Submit your answers via to:

Compiler Construction: LLVMlite

Intermediate Representations & Symbol Tables

Topic 9: Control Flow

Introduction to Machine-Independent Optimizations - 4

LLVM and IR Construction

Advanced Compilers CMPSCI 710 Spring 2003 Computing SSA

Building a Compiler with. JoeQ. Outline of this lecture. Building a compiler: what pieces we need? AKA, how to solve Homework 2

Tour of common optimizations

The View from 35,000 Feet

Intermediate Representations

Compiler Design. Fall Control-Flow Analysis. Prof. Pedro C. Diniz

Lecture 3 Local Optimizations, Intro to SSA

CONTROL FLOW ANALYSIS. The slides adapted from Vikram Adve

GCC Internals Control and data flow support

A Practical and Fast Iterative Algorithm for φ-function Computation Using DJ Graphs

A Sparse Algorithm for Predicated Global Value Numbering

Compilers and Code Optimization EDOARDO FUSELLA

Intermediate Representations

Middle End. Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code

OptiCode: Machine Code Deobfuscation for Malware Analysis

Compiler Construction 2016/2017 Loop Optimizations

LLVM Compiler System. The LLVM Compiler Framework and Infrastructure. Tutorial Overview. Three primary LLVM components

Accelerating Ruby with LLVM

CSE 501: Compiler Construction. Course outline. Goals for language implementation. Why study compilers? Models of compilation

Usually, target code is semantically equivalent to source code, but not always!

Simone Campanoni Loop transformations

The LLVM Compiler Framework and Infrastructure

Overview of the Class

A Brief Introduction to Using LLVM. Nick Sumner

Compiler Construction 2010/2011 Loop Optimizations

An Architectural Overview of GCC

CS 406/534 Compiler Construction Putting It All Together

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler

Intermediate Code & Local Optimizations

Generalizing loop-invariant code motion in a real-world compiler

Example. Example. Constant Propagation in SSA

Why Global Dataflow Analysis?

Tutorial: Building a backend in 24 hours. Anton Korobeynikov

Hello I am Zheng, Hongbin today I will introduce our LLVM based HLS framework, which is built upon the Target Independent Code Generator.

ECE 5775 (Fall 17) High-Level Digital Design Automation. More Pipelining

Calvin Lin The University of Texas at Austin

Targeting LLVM IR. LLVM IR, code emission, assignment 4

Combining Analyses, Combining Optimizations - Summary

Code Merge. Flow Analysis. bookkeeping

CSC D70: Compiler Optimization

Using GCC as a Research Compiler

Intermediate Representations. Reading & Topics. Intermediate Representations CS2210

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Transcription:

ECE 5775 (Fall 17) High-Level Digital Design Automation Static Single Assignment

Announcements HW 1 released (due Friday) Student-led discussions on Tuesday 9/26 Sign up on Piazza: 3 students / group Meet with the instructor 3 days before the talk 1

Agenda More on dominance relation Dataflow analysis static single assignment (SSA) Definition PHI node (F-node) placement Code optimizations with SSA LLVM overview 2

Revisiting CFG and Dominators What becomes a leader statement of a basic block? Does a node strictly (properly) dominate itself? Does a predecessor of a node B always dominate B? Suppose A that dominates all of B s predecessors. Does A always dominate B? 3

Dominator Tree A node (basic block) N in CFG may have multiple dominators, but only one of them will be closest to N and be dominated by all other dominators of N A dominator tree is a useful way to represent the dominance relation The entry node s is the root Each node in the tree is the immediate dominator of its children Each node d dominates only its descendants in the tree 4

Example: Dominator Tree CFG entry Dominator Tree 2 1 3 1 2 3 4 5 6 7 8 9 10 4 5 6 7 8 9 10 5

Static Single Assignment Static single assignment (SSA) form is a restricted IR where Each variable definition has a unique name Each variable use refers to a single definition SSA simplifies data flow analysis & many compiler optimizations Eliminates artificial dependences (on scalars) Write-after-write Write-after-read 6

SSA within a Basic Block Assign each variable definition a unique name Update the uses accordingly x 0 5 x = read() x = x * 5 x = x + 1 y = x * 9 x 0 = read() x 1 = x 0 * 5 x 2 = x 1 + 1 y = x 2 * 9 x 1 x 2 + 1 9 Original code SSA form y Corresponding data flow graph 7

SSA with Control Flow Consider a situation where two control-flow paths merge e.g., due to an if-then-else statement or a loop x = read() if (x > 0) y = 5 else y = 10 x = y x = read() if (x > 0) y = 5 y = 10 x = y x 0 = read() if (x 0 > 0) y 0 = 5 y 1 = 10 x 1 = y should this be y 0 or y 1? 8

Introducing ϕ-node Inserts special join functions (called ϕ-nodes or PHI nodes) at points where different control flow paths converge if (x 0 > 0) y 0 = 5 y 1 = 10 Note: ϕ is not an executable function! To generate executable code from this form, appropriate copy statements need to be generated in the predecessors (in other words, reversing the SSA process for code generation) y 2 = ϕ(y 0, y 1 ) x 1 = y 2 9

SSA in a Loop Insert ϕ-nodes in the loop header block x = 0 i = 1 while (i<10) { x = x+i i = i+1 } x = x + 1 i = i + 1 x = 0 i = 1 if (i < 10) Outside x 0 = 0 i 0 = 1 x 1 = ϕ(x 0, x 2 ) i 1 = ϕ(i 0, i 2 ) if (i 1 < 10) x 2 = x 1 + 1 i 2 = i 1 + 1 Outside 10

ϕ-node Placement x 0 = 0 i 0 = 1 x 1 = ϕ(x 0, x 2 ) i 1 = ϕ(i 0, i 2 ) if (i 1 < 10) When and where to add ϕ-nodes? If two control paths A à C and B à C converge at a node C, and both A and B contain assignments to variable X, then ϕ-node for X must be placed at C We often call C a join node or convergence point Can be generalized to more than two converging control paths x 2 = x 1 + 1 i 2 = i 1 + 1 Outside Objective: Minimize the number of ϕ-nodes Need to compute dominance frontier sets 11

Dominance Frontier A basic block F is in the dominance frontier set of basic block B iff B does NOT strictly dominate F B dominates some predecessor(s) of F Intuitively, the nodes in the dominance frontier set of B are almost strictly dominated by B Useful for efficiently computing the SSA form 12

Dominance Frontier and SSA B0 B1 B3 does not dominate B7, but dominates one of its predecessors (i.e., B6) B2 B4 B3 B5 Hence B7 is in the dominance frontier set of B3 B7 B6 Area dominated by B3 B7 is the destination of some edge(s) leaving an area dominated by B3 For each variable definition in B3, a ϕ node is needed in B7 13

Example: Dominance Frontiers CFG Dominance frontiers (DF) B2 B0 B1 B4 B7 B3 B6 B5 B 0 1 2 3 4 5 6 7 DF 1 7 7 6 6 7 1 14

Dominance Frontiers and Dominator Tree CFG Dominance frontiers (DF) Dominator tree B2 B0 B1 B4 B7 B3 B6 B5 B 0 1 2 3 4 5 6 7 DF 1 7 7 6 6 7 1 B2 B0 B1 B4 B7 B3 B5 B6 Algorithm to compute DF set foreach convergence point 1 X in CFG foreach predecessor, P, of X in CFG Run up to Q=IDOM(X) in the dominator tree, adding X to DF(Y) for each Y between [P, Q) 1 convergence point is a node (basic block) with more than one predecessors 15

Iterative ϕ-node Insertion B DF 0-1 1 2 7 3 7 4 6 5 6 6 7 7 1 B2 B0 B1 b = c = d = B7 B4 a = b = c = i = i = B3 d = B6 a = d = b = B5 a = ϕ( ) b = ϕ( ) c = ϕ( ) d = ϕ( ) i = ϕ( ) c = a = ϕ( ) b = ϕ( ) c = ϕ( ) d = ϕ( ) c = ϕ( ) d = ϕ( ) a is defined in 0, 3 need ϕ in 7, then a defined in 7 need ϕ in 1 b is defined in 0, 2, 6 need ϕ in 7 then b defined in 7 need ϕ in 1 c is defined in 0,2,5 need ϕ in 6,7 then c defined in 7 need ϕ in 1 d is defined in 2,3,4 need ϕ in 6,7 then d defined in 7 need ϕ in 1 i is defined in 7 need ϕ in 1 16

SSA Applications SSA form simplifies data flow analysis and many code transformations Primarily due to explicit & simplified (sparse) def-use chains Here we show two simple examples Dead code elimination Loop induction variable detection 17

Dead Code in CDFG Dead code is either Unreachable code Definitions never used B1 x = a + b y = c + d Dead statements? B2 x = a b B4 y = c d z = x + y B5 z = z + 1 B6 f(x, y) 18

Dead Code Elimination with SSA B1 x 1 = a + b y 1 = c + d B2 B4 x 2 = a b B3 z 2 = z 1 + 1 x 3 = ϕ(x 1, x 2 ) y 2 = c d z 1 = x 3 + y 2 x 2 = a b x 1 = a + b x 3 = ϕ(x 1, x 2 ) y 2 = c d B5 f(x 3, y 2 ) f(x 3, y 2 ) Iteratively remove unused definitions: 1. Remove y1, z2 (and B4) à 2. Remove z1 19

Loop Induction Variables An induction variable is a variable that Gets increased or decreased by a fixed amount (loop invariant) on every iteration of a loop (basic induction variable) i = i + c or is an affine function of another induction variable (mutual induction variable) j = a * i + b 20

Identifying Basic Loop Induction Variable Find basic loop induction variable(s) 1. Inspect back edges in the loop 2. Each back edge points to a ϕ node in the loop header, which may indicate a basic induction variable 3. ϕ is a function of an initialized variable and a definition in the form of i + c (i.e., increment operation) i 2 = i 1 + 1 i 1 = ϕ(0, i 2 ) if (i 1 < 10) Outside 21

LLVM Compiler Infrastructure The LLVM Project is a collection of modular and reusable compiler and toolchain technologies... 22

What is LLVM? Formerly Low Level Virtual Machine Brainchild of Chris Lattner and Vikram Adve back in 2000 2012 ACM Software System Award The core of LLVM is the SSA-base IR Language independent, target independent, easy to use RISC-like virtual instructions, unlimited registers, exception handling, etc. Many high-quality libraries (components) with clean interfaces Optimizations, analyses, modular code generator, profiling, link time optimization, ARM/X86/PPC/SPARC code generator Tools built from the libraries C/C++/ObjC compiler, modular optimizer, linker, debugger, LLVM JIT 23

What is a Compiler Compiler is a tool that inspects and manipulates a representation of programs Intentionally a very board definition Examples: Traditional C compiler (cc), Java JIT compiler (hotspot), system assembler (as), system linker (ld), IDEs, refactoring tools, GCC is a compiler LLVM is not a compiler source: http://llvm.org 24

What is a Compiler Infrastructure Provides modular & reusable components for building compilers Components are ideally language/target independent Allows choice of the right component for the job Allows components to be shared across different compilers Improvements made to one compiler benefits the others LLVM is a compiler infrastructure source: http://llvm.org 25

The LLVM C/C++ Compiler Clang is a C, C++, and Objective-C front end (sub-project of LLVM) Converts source to LLVM IR GCC compatibility C ObjC C++ Clang front end.bc LLVM optimizer.bc LLVM code generator arm mips x86 Distinguishing features:.bc files contain LLVM IR/bitcode, not machine code Executable can be bytecode (JIT d) or machine code 26

Events at Compile Time C/C++ file Clang LLVM.o file C to LLVM Frontend Compile-time Optimizer LLVM IR Parser LLVM Verifier LLVM Analysis & Optimization Passes LLVM.bc File Writer Dead Global Elimination, IP Constant Propagation, Dead Argument Elimination, Inlining, Reassociation, LICM, Loop Opts, Memory Promotion, Dead Store Elimination, ADCE, source: http://llvm.org 27

LLVM Program Structure Module contains Functions/GlobalVariables Module is unit of compilation/analysis/optimization Function contains BasicBlocks/Arguments Functions roughly correspond to functions in C BasicBlock contains list of instructions Each block ends in a control flow instruction Instruction is opcode + vector of operands All operands have types Instruction result is typed 28

LLVM Flow Analysis LLVM IR is in SSA form use-def and def-use chains are always available All objects have user/use info, even functions Control flow graph (CFG) is always available Exposed as BasicBlock predecessor/successor lists Many generic graph algorithms usable with the CFG Higher-level info implemented as passes CallGraph, Dominators, LoopInfo, source: http://llvm.org 29

LLVM Instruction Set Overview High-level information exposed in the code Explicit dataflow through SSA form Explicit control-flow graph Explicit language-independent type-information Explicit typed pointer arithmetic Preserve array subscript and structure indexing for (i=0; i<n; ++i) foo(a[i], &P); loop: %i.1 = phi i5 [ 0, %bb0 ], [ %i.2, %loop ] %AiAddr = getelementptr float* %A, i32 %i.1 call void %foo(float %AiAddr, %pair* %P) %i.2 = add i5 %i.1, 1 %tmp = icmp eq i5 %i.1, 16 br i1 %tmp, label %loop, label %outloop source: http://llvm.org 30

Arbitrary Precision Integers LLVM allows arbitrary fixed width integers since version 2.0 i2, i13, i128, i1024, etc. Essential for hardware synthesis An 11b multiplier is significantly cheaper/faster than a 16b implementation Can leverage other LLVM analyses/optimizations to perform bitwidth minimization 31

Summary Importance of compilers Essential component of SoC software development flow Essential component of high-level synthesis A good intermediate representation (IR) enables efficient and effective analysis and optimization Dominance relation helps effective CFG analysis SSA form facilitates efficient IR-level optimization LLVM: a modular and powerful open-source compiler infrastructure with a wide range of applications 32

Before Next Class Sign up for paper discussions on Piazza Next lecture: Scheduling algorithms 33

Acknowledgements These slides contain/adapt materials developed by Prof. Scott Mahlke (UMich) Dr. Chris Lattner (Apple Inc.) 34