Lecture Compiler Middle-End

Similar documents
Compiler Structure. Data Flow Analysis. Control-Flow Graph. Available Expressions. Data Flow Facts

Data Flow Analysis. Agenda CS738: Advanced Compiler Optimizations. 3-address Code Format. Assumptions

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler

Data Flow Analysis. Program Analysis

Midterm 2. CMSC 430 Introduction to Compilers Fall Instructions Total 100. Name: November 21, 2016

Formal Semantics. Prof. Clarkson Fall Today s music: Down to Earth by Peter Gabriel from the WALL-E soundtrack

CS 406/534 Compiler Construction Putting It All Together

The Substitution Model

Data-flow Analysis. Y.N. Srikant. Department of Computer Science and Automation Indian Institute of Science Bangalore

Lecture 3 Local Optimizations, Intro to SSA

ELEC 876: Software Reengineering

More Dataflow Analysis

Data-flow Analysis - Part 2

Global Optimization. Lecture Outline. Global flow analysis. Global constant propagation. Liveness analysis. Local Optimization. Global Optimization

Plan for Today. Concepts. Next Time. Some slides are from Calvin Lin s grad compiler slides. CS553 Lecture 2 Optimizations and LLVM 1

CMSC 330: Organization of Programming Languages. Context Free Grammars

Data Flow Analysis. CSCE Lecture 9-02/15/2018

Compiler Optimization and Code Generation

The Substitution Model. Nate Foster Spring 2018

Alternatives for semantic processing

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

CMSC 330: Organization of Programming Languages. Context Free Grammars

Compiler Optimisation

Compiler Design. Fall Control-Flow Analysis. Prof. Pedro C. Diniz

CMSC 330: Organization of Programming Languages

Lecture Compiler Backend

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages. Context Free Grammars

CMSC 330: Organization of Programming Languages. Formal Semantics of a Prog. Lang. Specifying Syntax, Semantics

A main goal is to achieve a better performance. Code Optimization. Chapter 9

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

Lecture 15 CIS 341: COMPILERS

CMSC 330: Organization of Programming Languages

Programming Language Processor Theory

Lecture 2. Introduction to Data Flow Analysis

Intermediate representation

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

CS553 Lecture Generalizing Data-flow Analysis 3

Intermediate Code Generation

CMSC 330: Organization of Programming Languages

Register Allocation. CS 502 Lecture 14 11/25/08

Homework Assignment #1 Sample Solutions

Acknowledgement. CS Compiler Design. Intermediate representations. Intermediate representations. Semantic Analysis - IR Generation

ECE 5775 (Fall 17) High-Level Digital Design Automation. Static Single Assignment

CS 132 Compiler Construction

Principles of Programming Languages COMP251: Syntax and Grammars

Compilers. Compiler Construction Tutorial The Front-end

CS153: Compilers Lecture 17: Control Flow Graph and Data Flow Analysis

DEMO A Language for Practice Implementation Comp 506, Spring 2018

Compilers. Lecture 2 Overview. (original slides by Sam

Live Variable Analysis. Work List Iterative Algorithm Rehashed

Type Checking. Outline. General properties of type systems. Types in programming languages. Notation for type rules.

CSCE 314 Programming Languages

Data Structures and Algorithms in Compiler Optimization. Comp314 Lecture Dave Peixotto

CS 314 Principles of Programming Languages. Lecture 3

Lecture Notes on Intermediate Representation

Outline. General properties of type systems. Types in programming languages. Notation for type rules. Common type rules. Logical rules of inference

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design

Writing Evaluators MIF08. Laure Gonnord

Compiler Optimisation

Tour of common optimizations

Declarative Intraprocedural Flow Analysis of Java Source Code

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

An Overview to Compiler Design. 2008/2/14 \course\cpeg421-08s\topic-1a.ppt 1

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

CSE302: Compiler Design

Chapter 3. Describing Syntax and Semantics

Code Genera*on for Control Flow Constructs

Context-free grammars (CFG s)

Principles of Programming Languages COMP251: Syntax and Grammars

Advanced C Programming

Middle End. Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code

Outline. Lecture 17: Putting it all together. Example (input program) How to make the computer understand? Example (Output assembly code) Fall 2002

INTRODUCTION TO LLVM Bo Wang SA 2016 Fall

Compiler Construction 2016/2017 Loop Optimizations

Syntax and Grammars 1 / 21

CS 314 Principles of Programming Languages. Lecture 9

MIT Introduction to Program Analysis and Optimization. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

What is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573

Programming Languages Third Edition

Chapter 3 (part 3) Describing Syntax and Semantics

Optimizing Finite Automata

OptiCode: Machine Code Deobfuscation for Malware Analysis

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Syntax-Directed Translation. Lecture 14

Flow Analysis. Data-flow analysis, Control-flow analysis, Abstract interpretation, AAM

Introduction to Compilers

Foundations: Syntax, Semantics, and Graphs

Compiler Construction

The View from 35,000 Feet

Last time. What are compilers? Phases of a compiler. Scanner. Parser. Semantic Routines. Optimizer. Code Generation. Sunday, August 29, 2010

Lecture Notes on Intermediate Representation

Control Flow Analysis

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Introduction to Lexing and Parsing

Program Optimizations using Data-Flow Analysis

Formats of Translated Programs

Transcription:

Lecture 16-18 18 Compiler Middle-End Jianwen Zhu Electrical and Computer Engineering University of Toronto Jianwen Zhu 2009 - P. 1

What We Have Done A lot! Compiler Frontend Defining language Generating scanner and parser Generating parse tree (AST) Performing semantic analysis Type analysis Jianwen Zhu 2009 - P. 2

Compiler Middle End Intermediate Representation (IR) Language-independent data structure to capture program Code generation From AST to IR Data flow analysis Infer program info from IR Optimization Code improvement IR->IR Covered by another course Jianwen Zhu 2009 - P. 3

Intermediate Representation Jianwen Zhu 2009 - P. 4

A Problem Modern compiler handles multiple languages gcc: GNU C Compiler gcc: GNU Compiler collection Assuming M languages + N processors Engineering effort for each lang/proc pair M * N effort Jianwen Zhu 2009 - P. 5

A Solution M + N effort! X86 Lang 1 Tree 1 MIPS Lang 2 Tree 2 IR ARM Lang 3 Tree 3 AVR Intermediate Representation (IR) Jianwen Zhu 2009 - P. 6

TinyC Program state defined by a set of variables Actions defined by statements Simplifying assumptions All statements are in a single procedure, and there are no procedure calls Only support int and bool primitive types Only support one-dimensional array No pointers are supported Jianwen Zhu 2009 - P. 7

TinyC Example: Dot Product int A[100], B[100]; int sum, i; sum = i = 0; while( i < 100 ) { sum = sum + A[i] *B[i]; i = i + 1; } Jianwen Zhu 2009 - P. 8

TinyC Syntax Defined by a set of production rules in Backus-Naur Form (BNF) Key constructs Declarations: define scalars or array variables Statements: assignment or control flow statements Expressions: transformations of scalar, defined primitive or program variable values Jianwen Zhu 2009 - P. 9

TinyC in One Page program: declaration* statement* statement: variable '=' expression ';', 'if' '(' expression ') ' statement ( 'else' statement )*, 'while' ' (' expression ') ' statement, 'break' '; ', '{' declaration* statement* '}' declaration: type identifier ['=' expression] '; ', type identifier '[' expression ']' '; ' type: 'int', 'bool' expression: '-' expression, '!' expression, expression '+' expression, expression '-' expression, expression '*' expression, expression '/' expression, expression '^' expression, expression '>>' expression, expression '<<' expression, expression '&' expression, expression ' ' expression, expression '=' expression, expression '!=' expression, expression '<' expression, expression '<=' expression, expression '>' expression, expression '>=' expression, '(' expression ')', integer, identifier, 'TRUE', 'FALSE', identifier '[' expression ']' Jianwen Zhu 2009 - P. 10

Notations A data type T corresponds to a set T, in particular the integer type Z corresponds to set Z A linked list or arrays whose elements are of type T corresponds to the power set of T, or the set of all subsets of T, denoted as T[] A record with fields a of type A, and b of type B corresponds to a set of named tuples, denoted as <a: A, b: B> A graph R whose nodes are of type A corresponds to a relation R: A x A A hash table or dictionary F that maps a value of type A to a value of type B corresponds to a function F: A --> B Jianwen Zhu 2009 - P. 11

TinyIR Why IR? Decouple optimization algorithms from input languages and target architectures Definition: A TinyIR is a tuple <O, S,V, B> with the following elements: A set O = {lds, sts, lda, sta, ba, br, cnst, +, -, *, /, <<, >>, } of operation codes, which corresponds to the set of all virtual instruction types. A set S of symbols, which corresponds to the scalar and array variables A set V: <opcode: O, src1: V, src2: V, symb: S B Z> of virtual instructions, which corresponds to the expressions and control transfers in the program. A set B: V[] of basic blocks, each containing a sequence of virtual instructions Jianwen Zhu 2009 - P. 12

From TinyC to TinyIR Constructs in TinyC have equivalent representation in TinyIR Declarations correspond to symbols Statements and expressions correspond to virtual instructions Virtual instructions are grouped within different basic blocks Jianwen Zhu 2009 - P. 13

Dot product in TinyIR scalar sum; scalar i; array A[100]; array B[100]; B1: (0) cnst 0 (1) sts (0), sum (2) sts (0), i B2: (3) lds i (4) lda (3), A (5) lda (3), B (6) * (4) (5) (7) lds sum (8) + (6) (7) (9) sts (8), sum (10) cnst 1 (11) + (3) (10) (12) sts (11), i (13) cnst 100 (14) < (11) (13) (15) bt (14), B2 Jianwen Zhu 2009 - P. 14

Code Generation Jianwen Zhu 2009 - P. 15

Assignment Statment sum = 0 (0) cnst 0 (1) sts (0), sum RHS expression Store instruction Symbol for scalar variable Jianwen Zhu 2009 - P. 16

If Statement if( c ) { stmt1; } else { stmt2; } Condition evaluation Branch instruction Fall-through else branch (10) c (11) bt L1 (12) stmt2 (20) ba L2 L1: (30) stmt1 Jump to merge point L2: Then branch Merge point Jianwen Zhu 2009 - P. 17

While Statement (Layout 1) while( c ) { stmt; } Loop entry Condition Evaluation Loop body Loop back L1: (10)!c (11) bt L2 (12) stmt (20) ba L1 L2: Loop exit Jianwen Zhu 2009 - P. 18

While Statement (Layout 2) while( c ) { stmt; } Loop entry Loop body Condition Evaluation Loop back (1) ba L3 L1: (10) stmt L3: (10) c (11) bt L1 L2: Loop exit Jianwen Zhu 2009 - P. 19

Data Flow Analysis Jianwen Zhu 2009 - P. 20

Data Flow Analysis A framework for proving facts about programs Reasons about lots of little facts Little or no interaction between facts Works best on properties about how program computes Based on all paths through program Including infeasible paths Jianwen Zhu 2009 - P. 21

Available Expressions An expression e is available at program point p if e is computed on every path to p, and the value of e has not changed since the last time e is computed on Optimization If an expression is available, need not be recomputed (At least, if it s still in a register somewhere) Jianwen Zhu 2009 - P. 22

Data Flow Facts Is expression e available? Facts: a + b is available a * b is available a + 1 is available Jianwen Zhu 2009 - P. 23

Gen and Kill What is the effect of each statement on the set of facts? Stmt Gen Kill x := a + b a + b y := a * b a * b Jianwen Zhu 2009 - P. 24

Computing Available Expressions {a + b} {a + b, a * b} {a + b, a * b} {a + b} {a + b} {a + b} Ø {a + b} Jianwen Zhu 2009 - P. 25

Terminology A joint point is a program point where two branches meet Available expressions is a forward must problem Forward = Data flow from in to out Must = At join point, property must hold on all paths that are joined Jianwen Zhu 2009 - P. 26

Data Flow Equations Let s be a statement succ(s) = { immediate successor statements of s } pred(s) = { immediate predecessor statements of s} In(s) = program point just before executing s Out(s) = program point just after executing s Jianwen Zhu 2009 - P. 27

Liveness Analysis A variable v is live at program point p if v will be used on some execution path originating from p... before v is overwritten Optimization If a variable is not live, no need to keep it in a register If variable is dead at assignment, can eliminate assignment Jianwen Zhu 2009 - P. 28

Data Flow Equations Available expressions is a forward must analysis Data flow propagate in same dir as CFG edges Expr is available only if available on all paths Liveness is a backward may problem To know if variable live, need to look at future uses Variable is live if used on some path Out(s) = s succ(s) In(s ) In(s) = Gen(s) (Out(s) - Kill(s)) Jianwen Zhu 2009 - P. 29

Gen and Kill What is the effect of each statement on the set of facts? Stmt Gen Kill x := a + b a, b x y := a * b a, b y y > a a, y Jianwen Zhu 2009 - P. 30

Computing Live Variables {a, b} {x, a, b} {x, {x, y, y, a, a} b} {y, a, b} {x} {y, a, b} {x, {x, y, y, a, a} b} Jianwen Zhu 2009 - P. 31

Very Busy Expressions An expression e is very busy at point p if On every path from p, expression e is evaluated before the value of e is changed Optimization Can hoist very busy expression computation What kind of problem? Forward or backward? May or must? backward must Jianwen Zhu 2009 - P. 32

Reaching Definitions A definition of a variable v is an assignment to v A definition of variable v reaches point p if There is no intervening assignment to v Also called def-use information What kind of problem? Forward or backward? May or must? forward may Jianwen Zhu 2009 - P. 33

Space of Data Flow Analyses Forward Backward May Reaching definitions Live variables Must Available expressions Very busy expressions Most data flow analyses can be classified this way A few don t fit: bidirectional analysis Lots of literature on data flow analysis Jianwen Zhu 2009 - P. 34