Overview of a Compiler

Similar documents
Overview of a Compiler

Overview of a Compiler

The View from 35,000 Feet

Code Merge. Flow Analysis. bookkeeping

High-level View of a Compiler

Compiling Techniques

CS415 Compilers Overview of the Course. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Introduction to Compiler

CS415 Compilers Overview of the Course. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Compilers and Interpreters

CS 132 Compiler Construction

Compiler Construction

Goals for course What is a compiler and briefly how does it work? Review everything you should know from 330 and before

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

CS 352: Compilers: Principles and Practice

Compilers. Lecture 2 Overview. (original slides by Sam

Front End. Hwansoo Han

Introduction to Parsing

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

CS 406/534 Compiler Construction Putting It All Together

Intermediate Representations

Introduction to Compilers

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Compiler Construction LECTURE # 1

CSE 501: Compiler Construction. Course outline. Goals for language implementation. Why study compilers? Models of compilation

Loop Optimizations. Outline. Loop Invariant Code Motion. Induction Variables. Loop Invariant Code Motion. Loop Invariant Code Motion

Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit

CST-402(T): Language Processors

Intermediate Representations & Symbol Tables

Lexical Analysis. Introduction

Instruction Selection: Preliminaries. Comp 412

Intermediate Representations

Intermediate Representations

What do Compilers Produce?

Compiler Code Generation COMP360

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano

Formats of Translated Programs

The SGI Pro64 Compiler Infrastructure - A Tutorial

CS5363 Final Review. cs5363 1

Principles of Compiler Design

Compiler Construction 1. Introduction. Oscar Nierstrasz

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

SYLLABUS UNIT - I UNIT - II UNIT - III UNIT - IV CHAPTER - 1 : INTRODUCTION CHAPTER - 4 : SYNTAX AX-DIRECTED TRANSLATION TION CHAPTER - 7 : STORA

Crafting a Compiler with C (II) Compiler V. S. Interpreter

CSE P 501 Compilers. Intermediate Representations Hal Perkins Spring UW CSE P 501 Spring 2018 G-1

Part 5 Program Analysis Principles and Techniques

Introduction to Parsing. Comp 412

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler

Undergraduate Compilers in a Day

Compiler Construction 1. Introduction. Oscar Nierstrasz

Just-In-Time Compilers & Runtime Optimizers

EECS 6083 Intro to Parsing Context Free Grammars

Intermediate Representations

Compiling and Interpreting Programming. Overview of Compilers and Interpreters

What is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Compilers and Code Optimization EDOARDO FUSELLA

Appendix A The DL Language

CJT^jL rafting Cm ompiler

Parsing II Top-down parsing. Comp 412

CS 406/534 Compiler Construction Parsing Part I

Goals of Program Optimization (1 of 2)

Agenda. CSE P 501 Compilers. Big Picture. Compiler Organization. Intermediate Representations. IR for Code Generation. CSE P 501 Au05 N-1

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

CSE 401/M501 Compilers

USC 227 Office hours: 3-4 Monday and Wednesday CS553 Lecture 1 Introduction 4

Group B Assignment 8. Title of Assignment: Problem Definition: Code optimization using DAG Perquisite: Lex, Yacc, Compiler Construction

Time : 1 Hour Max Marks : 30

Intermediate Code & Local Optimizations

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

CS 415 Midterm Exam Spring 2002

Intermediate Code Generation (ICG)

Office Hours: Mon/Wed 3:30-4:30 GDC Office Hours: Tue 3:30-4:30 Thu 3:30-4:30 GDC 5.

Usually, target code is semantically equivalent to source code, but not always!

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Compiler Design (40-414)

6. Intermediate Representation

Semantic Analysis. Lecture 9. February 7, 2018

CS415 Compilers. Intermediate Represeation & Code Generation

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

Introduction. No Optimization. Basic Optimizations. Normal Optimizations. Advanced Optimizations. Inter-Procedural Optimizations

Pioneering Compiler Design

Middle End. Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code

Compiler Optimization

Computing Inside The Parser Syntax-Directed Translation. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

AMD S X86 OPEN64 COMPILER. Michael Lai AMD

Group B Assignment 9. Code generation using DAG. Title of Assignment: Problem Definition: Code generation using DAG / labeled tree.

The structure of a compiler

Intermediate Representation (IR)

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Register Allocation. Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations

Compilers. Intermediate representations and code generation. Yannis Smaragdakis, U. Athens (original slides by Sam

Compiling Regular Expressions COMP360

Optimization. ASU Textbook Chapter 9. Tsan-sheng Hsu.

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

CprE 488 Embedded Systems Design. Lecture 6 Software Optimization

Academic Formalities. CS Modern Compilers: Theory and Practise. Images of the day. What, When and Why of Compilers

Topic I (d): Static Single Assignment Form (SSA)

Transcription:

Overview of a Compiler Copyright 2017, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class at the University of Southern California have explicit permission to make copies of these materials for their personal use.

High-level View of a Compiler Source code Compiler Machine code Errors Implications: Must recognize legal (and illegal) programs Must generate correct code Must manage storage of all variables (and code) Must agree with OS & linker on format for object code Big step up from assembly language use higher level notations 2

Traditional Two-pass Compiler Source code Front Back Machine code Errors Implications Use an intermediate representation () Front end maps legal source code into Back end maps into target machine code Admits multiple front ends & multiple passes (better code) Typically, front end is O(n) or O(n log n), while back end is NPC 3

A Common Fallacy Fortran Front end Back end Target 1 Scheme Java Front end Front end Back end Target 2 Smalltalk Front end Back end Target 3 Can we build n x m compilers with n+m components? Must encode all language specific knowledge in each front end Must encode all features in a single Must encode all target specific knowledge in each back end Limited success in systems with very low-level s 4

The Front Source code Scanner tokens Parser Errors Responsibilities Recognize legal (& illegal) programs Report errors in a useful way Produce & preliminary storage map Shape the code for the back end Much of front end construction can be automated 5

The Front Source code Scanner tokens Parser Errors Scanner Maps character stream into words the basic unit of syntax Produces pairs a word & its part of speech x = x + y ; becomes <id,x> = <id,x> + <id,y> ; word lexeme, part of speech token type In casual speech, we call the pair a token Typical tokens include number, identifier, +,, new, while, if Scanner eliminates white space (including comments) Speed is important 6

The Front Source code Scanner tokens Parser Parser Recognizes context-free syntax & reports errors Guides context-sensitive ( semantic ) analysis (type checking) Builds for source program Hand-coded parsers are fairly easy to build Errors Most books advocate using automatic parser generators 7

The Front Context-free syntax is specified with a grammar SheepNoise SheepNoise baa baa This grammar defines the set of noises that a sheep makes under normal circumstances It is written in a variant of Backus Naur Form (BNF) Formally, a grammar G = (S,N,T,P) S is the start symbol N is a set of non-terminal symbols T is a set of terminal symbols or words P is a set of productions or rewrite rules (P : N N T ) 8

The Front Context-free syntax can be put to better use 1. goal expr 2. expr expr op term 3. term 4. term number 5. id 6. op + 7. - S = goal T = { number, id, +, - } N = { goal, expr, term, op } P = { 1, 2, 3, 4, 5, 6, 7} This grammar defines simple expressions with addition & subtraction over number and id This grammar, like many, falls in a class called context-free grammars, abbreviated CFG 9

Given a CFG, we can derive sentences by repeated substitution Production Result goal 1 expr 2 expr op term 5 expr op y 7 expr - y 2 expr op term - y 4 expr op 2 - y 6 expr + 2 - y 3 term + 2 - y 5 x + 2 - y The Front To recognize a valid sentence in some CFG, we reverse this process and build up a parse 10

The Front A parse can be represented by a tree (parse tree or syntax tree) x + 2 - y goal expr expr op term expr op term - <id,y> term <id,x> + <number,2> This contains a lot of unneeded information. 1. goal expr 2. expr expr op term 3. term 4. term number 5. id 6. op + 7. - 11

The Front Compilers often use an abstract syntax tree - <id,x> + <number,2> <id,y> The AST summarizes grammatical structure, without including detail about the derivation This is much more concise ASTs are one kind of intermediate representation () 12

The Back Instruction Selection Register Allocation Instruction Scheduling Machine code Errors Responsibilities Translate into target machine code Choose instructions to implement each operation Decide which value to keep in registers Ensure conformance with system interfaces Automation has been less successful in the back end 13

The Back Instruction Selection Register Allocation Instruction Scheduling Machine code Instruction Selection Produce fast, compact code Take advantage of target features such as addressing modes Usually viewed as a pattern matching problem ad hoc methods, pattern matching, dynamic programming This was the problem of the future in 1978 Spurred by transition from PDP-11 to VAX-11 Orthogonality of RISC simplified this problem Errors 14

The Back Instruction Selection Register Allocation Instruction Scheduling Machine code Register Allocation Errors Have each value in a register when it is used Manage a limited set of resources Can change instruction choices & insert LOADs & STOREs Optimal allocation is NP-Complete (1 or k registers) Compilers approximate solutions to NP-Complete problems 15

The Back Instruction Selection Register Allocation Instruction Scheduling Machine code Errors Instruction Scheduling Avoid hardware stalls and interlocks Use all functional units productively Can increase lifetime of variables (changing the allocation) Optimal scheduling is NP-Complete in nearly all cases Heuristic techniques are well developed 16

Traditional Three-pass Compiler Source Code Front Middle Back Machine code Errors Code Improvement (or Optimization) Analyzes and rewrites (or transforms) Primary goal is to reduce running time of the compiled code May also improve space, power consumption, Must preserve meaning of the code Measured by values of named variables 17

The Optimizer (or Middle ) Opt 1 Opt 2 Opt 3 Opt n... Errors Modern optimizers are structured as a series of passes Typical Transformations Discover & propagate some constant value Move a computation to a less frequently executed place Specialize some computation based on context Discover a redundant computation & remove it Remove useless or unreachable code Encode an idiom in some particularly efficient form 18

Example Ø Optimization of Subscript Expressions in Fortran Address(A(I,J)) = address(a(0,0)) + J * (column size) + I Does the user realize a multiplication is generated here? 19

Example Ø Optimization of Subscript Expressions in Fortran Address(A(I,J)) = address(a(0,0)) + J * (column size) + I Does the user realize a multiplication is generated here? DO I = 1, M A(I,J) = A(I,J) + C ENDDO 20

Example Ø Optimization of Subscript Expressions in Fortran Address(A(I,J)) = address(a(0,0)) + J * (column size) + I Does the user realize a multiplication is generated here? DO I = 1, M A(I,J) = A(I,J) + C ENDDO compute addr(a(0,j)) DO I = 1, M add 1 to get addr(a(i,j)) A(I,J) = A(I,J) + C ENDDO 21

Modern Restructuring Compiler Source Code Front HL AST Restructurer HL AST Gen Opt + Back Machine code Errors Typical Restructuring Transformations: Blocking for Memory Hierarchy and Register Reuse Vectorization Parallelization All based on dependence Also full and partial inlining 22

Role of the Run-Time System Memory management services Allocate In the heap or in an activation record (stack frame) Deallocate Collect garbage Run-time type checking Error processing Interface to the operating system Input and output Support of parallelism Parallel Thread initiation Communication and Synchronization 23

Classic Compilers 1957: The FORTRAN Automatic Coding System Front Index Optimiz n Code Merge bookkeeping Flow Analysis Register Allocation Final Assembly Front Middle Back Six passes in a fixed order Generated good code Assumed unlimited index registers Code motion out of loops, with ifs and gotos Did flow analysis & register allocation 24

Classic Compilers 1969: IBM s FORTRAN H Compiler Scan & Parse Build CFG & DOM Find Busy Vars CSE Loop Inv Code Mot n Copy Elim. OSR Re - assoc (consts) Reg. Alloc. Final Assy. Front Middle Back Used low-level (quads), identified loops with dominators Focused on optimizing loops ( inside out order) Passes are familiar today Simple front end, simple back end for IBM 370 25

Classic Compilers 1975: BLISS-11 compiler (Wulf et al., CMU) Register allocation Lex- Syn- Flo Delay TLA Rank Pack Code Final Front Middle Back The great compiler for the PDP-11 Seven passes in a fixed order Focused on code shape & instruction selection LexSynFlo did preliminary flow analysis Final included a grab-bag of peephole optimizations Basis for early VAX & Tartan Labs compilers 26

1980: IBM s PL.8 Compiler Classic Compilers Front Middle Back Many passes, 1 front end, several back ends Collection of 10 or more passes Repeat some passes and analyses Represent complex operations at 2 levels Below machine-level Dead code elimination Global cse Code motion Constant folding Strength reduction Value numbering Dead store elimination Code straightening Trap elimination Algebraic reassociation * 27

1980: IBM s PL.8 Compiler Classic Compilers Front Middle Back Many passes, 1 front end, several back ends Collection of 10 or more passes Repeat some passes and analyses Represent complex operations at 2 levels Below machine-level Multi-level has become common wisdom * 28

1986: HP s PA-RISC Compiler Classic Compilers Front Middle Back Several front ends, an optimizer, and a back end Four fixed-order choices for optimization (9 passes) Coloring allocator, instruction scheduler, peephole optimizer 29

1999: The SUIF Compiler System Classic Compilers Fortran 77 C/Fortran C & C++ Alpha Java x86 Front Middle Back Another classically-built compiler 3 front ends, 3 back ends 18 passes, configurable order Two-level (High SUIF, Low SUIF) Intended as research infrastructure 30

1999: The SUIF Compiler System Classic Compilers Fortran 77 C/Fortran C & C++ Alpha Java x86 Front Middle Back Another classically-built compiler 3 front ends, 3 back ends 18 passes, configurable order Two-level (High SUIF, Low SUIF) Intended as research infrastructure SSA construction Dead code elimination Partial redundancy elimination Constant propagation Global value numbering Strength reduction Reassociation Instruction scheduling Register allocation 31

1999: The SUIF Compiler System Classic Compilers Fortran 77 C/Fortran C & C++ Alpha Java Front Middle Another classically-built compiler 3 front ends, 3 back ends 18 passes, configurable order Two-level (High SUIF, Low SUIF) Intended as research infrastructure x86 Back Data dependence analysis Scalar & array privatization Reduction recognition Pointer analysis Affine loop transformations Blocking Capturing object definitions Virtual function call elimination Garbage collection 32

Classic Compilers 2000: The SGI Pro64 Compiler (now Open64 from Intel) Fortran C & C++ Interpr. Anal. & Optim n Loop Nest Optim n Global Optim n Code Gen. Java Front Open source optimizing compiler for IA 64 3 front ends, 1 back end Five-levels of Middle Gradual lowering of abstraction level Back Interprocedural Classic Analysis Inlining (user & library code) Cloning (constants & locality) Dead function elimination Dead variable elimination 33

Classic Compilers 2000: The SGI Pro64 Compiler (now Open64 from Intel) Fortran C & C++ Interpr. Anal. & Optim n Loop Nest Optim n Global Optim n Code Gen. Java Front Open source optimizing compiler for IA 64 3 front ends, 1 back end Five-levels of Middle Gradual lowering of abstraction level Back Loop Nest Optimization Dependence Analysis Parallelization Loop transformations (fission, fusion, interchange, peeling, tiling, unroll & jam) Array privatization 34

Classic Compilers 2000: The SGI Pro64 Compiler (now Open64 from Intel) Fortran C & C++ Interpr. Anal. & Optim n Loop Nest Optim n Global Optim n Code Gen. Java Front Open source optimizing compiler for IA 64 3 front ends, 1 back end Five-levels of Middle Gradual lowering of abstraction level Back Global Optimization SSA-based analysis & opt n Constant propagation, PRE, OSR+LFTR, DVNT, DCE (also used by other phases) 35

Classic Compilers 2000: The SGI Pro64 Compiler (now Open64 from Intel) Fortran C & C++ Interpr. Anal. & Optim n Loop Nest Optim n Global Optim n Code Gen. Java Front Open source optimizing compiler for IA 64 3 front ends, 1 back end Five-levels of Middle Gradual lowering of abstraction level Back Code Generation If conversion & predication Code motion Scheduling (inc. sw pipelining) Allocation Peephole optimization 36

Classic Compilers Even a 2000 JIT fits the mold, albeit with fewer passes bytecode native code Middle Back Java Environment Front end tasks are handled elsewhere Few (if any) optimizations Avoid expensive analysis Emphasis on generating native code Compilation must be profitable 37

A Modern Compiler 2014: Clang (LLVM) Compiler System Front C C++ Preprocessor + Scanner Parser + Semantic Analysis AST Addt'l Semantic Analysis AST Lowering LLVM Obj-C Clang C Language Front for LLVM compiler infrastructure (default C/C++ compiler on Mac OS X and FreeBSD) Parser is recursive descent (top down), and most semantic analysis is completed at parse time Lowering process converts the higher-level (AST) into the lower-level (LLVM bitcode ) 38

A Modern Compiler 2014: Clang (LLVM) Compiler System Optimizer LLVM Basic Local Opt. Call Graph Opt. Loop Opt. Vectorize Opt. LLVM LLVM includes ~100 optimization passes this is (approximately) the default grouping of passes with the -O2 compiler flag Some passes are repeated multiple times (like simplify CFG) Basic local memory promotion, dead arguments, combine instructions Call graph Remove dead functions, inline functions, etc. Loops Loop invariant code, loop unrolling, loop deletion Vectorization Take advantage of SIMD processors 39

A Modern Compiler 2014: Clang (LLVM) Compiler System Back LLVM Instr. Select. Instr. Sched. SSA Opt. Reg. Alloc. Func. Prolog/ Epilog Late Opt. Machine Code Back end is designed to be retargetable to new platforms without many changes to the earlier code, but some (like function ABI) requires changes to Clang codebase SSA LLVM uses static single assignment (covered later in the course) Some optimizations still happen in the back end 40

Overview of a Compiler s Tasks Basic Translation from High-level to Instruction level Structure of a (Classical) Compiler Traditional Three Phase Structure Classical Compilers Static vs. Dynamic Summary 41