Review. Pat Morin COMP 3002

Similar documents
More Code Generation and Optimization. Pat Morin COMP 3002

CS5363 Final Review. cs5363 1

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

VIVA QUESTIONS WITH ANSWERS

CS 406/534 Compiler Construction Putting It All Together

2068 (I) Attempt all questions.

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Table-Driven Parsing

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design

COMP 3002: Compiler Construction. Pat Morin School of Computer Science

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Question Bank. 10CS63:Compiler Design

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Subject Name: CS2352 Principles of Compiler Design Year/Sem : III/VI

CA Compiler Construction

Wednesday, August 31, Parsers

BSCS Fall Mid Term Examination December 2012

Torben./Egidius Mogensen. Introduction. to Compiler Design. ^ Springer

Static Checking and Intermediate Code Generation Pat Morin COMP 3002

Monday, September 13, Parsers

List of Figures. About the Authors. Acknowledgments

Front End. Hwansoo Han

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

SYLLABUS UNIT - I UNIT - II UNIT - III UNIT - IV CHAPTER - 1 : INTRODUCTION CHAPTER - 4 : SYNTAX AX-DIRECTED TRANSLATION TION CHAPTER - 7 : STORA

Appendix Set Notation and Concepts

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Compiler Code Generation COMP360

LR Parsing Techniques

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

VETRI VINAYAHA COLLEGE OF ENGINEERING AND TECHNOLOGY

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

LECTURE NOTES ON COMPILER DESIGN P a g e 2

Chapter 3. Parsing #1

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

PRINCIPLES OF COMPILER DESIGN

Top down vs. bottom up parsing

Syntactic Analysis. Top-Down Parsing

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING ACADEMIC YEAR / EVEN SEMESTER

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

CS308 Compiler Principles Lexical Analyzer Li Jiang

CS606- compiler instruction Solved MCQS From Midterm Papers

Page No 1 (Please look at the next page )

Appendix A The DL Language

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

Code Analysis and Optimization. Pat Morin COMP 3002

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler

Compiler Design 1. Top-Down Parsing. Goutam Biswas. Lect 5

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

INSTITUTE OF AERONAUTICAL ENGINEERING

CJT^jL rafting Cm ompiler

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Compiling Regular Expressions COMP360


Downloaded from Page 1. LR Parsing

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

Principles of Programming Languages [PLP-2015] Detailed Syllabus

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

Optimization. ASU Textbook Chapter 9. Tsan-sheng Hsu.

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 1. Top-Down Parsing. Lect 5. Goutam Biswas

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Lexical Analysis. Chapter 2

Optimizing Finite Automata

Introduction; Parsing LL Grammars

COP 3402 Systems Software Syntax Analysis (Parser)

CSC 4181 Compiler Construction. Parsing. Outline. Introduction

CS415 Compilers. Lexical Analysis

Note that for recursive descent to work, if A ::= B1 B2 is a grammar rule we need First k (B1) disjoint from First k (B2).

Outline. 1 Introduction. 2 Context-free Grammars and Languages. 3 Top-down Deterministic Parsing. 4 Bottom-up Deterministic Parsing

DEPARTMENT OF INFORMATION TECHNOLOGY / COMPUTER SCIENCE AND ENGINEERING UNIT -1-INTRODUCTION TO COMPILERS 2 MARK QUESTIONS

Lecture 3: Lexical Analysis

Syntax Analysis, III Comp 412

Compiler Design Aug 1996

3. Parsing. Oscar Nierstrasz

4. An interpreter is a program that

Compiler Construction Using

Gujarat Technological University Sankalchand Patel College of Engineering, Visnagar B.E. Semester VII (CE) July-Nov Compiler Design (170701)

VALLIAMMAI ENGNIEERING COLLEGE SRM Nagar, Kattankulathur

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Concepts Introduced in Chapter 4

Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Instruction Selection: Peephole Matching. Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.


10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Abstract Syntax Trees & Top-Down Parsing

Lexical Analysis. Lecture 3-4

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

CS 164 Programming Languages and Compilers Handout 9. Midterm I Solution

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

Transcription:

Review Pat Morin COMP 3002

What is a Compiler A compiler translates from a source language S to a target language T while preserving the meaning of the input 2

Structure of a Compiler program text syntactic analyzer interm. rep. code generator machine code 3

Compiler Structure: Front End program text syntactic analyzer interm. rep. code generator machine code tokenizer token stream parser 4

Compiler Structure: Back End program text syntactic analyzer interm. rep. code generator machine code machine interm. indep. opts. rep. basic code gen. machine dep. opts. 5

Tokenizing The first step in compilation takes the input (a character stream) and converts it into a token stream Tokens have attributes Technology Convert regular expressions into NFA and then convert into DFA 6

Regular Expressions Concatenation, alternation, Kleene closure, and parenthesization Regular definitions multiline regular expressions Exercise: Write regular definitions for All strings of lowercase letters that contain the five vowels in order All strings of lowercase letters in which the letters are in ascending lexicographic order Comments, consisting of a string surrounded by /* and */ without any intervening */ 7

NFAs Finite collection of states Edges are labelled with letters One start state (Wlog) one accepting state Exercise: Convert these to NFA a b (a b)c (a b)*c (a b)* a (a b)(a b) 8

DFAs Like NFAs, but all the outgoing edges of any node have distinct labels Any NFA can be converted to an equivalent DFA Exercises: Convert to DFA: 1 start a a 01 p p p e l e start 1 01 01 01 9

Parsing Purpose Convert a token stream into a parse tree x = x + 1 <id, "x"> <assign> <id, "x"> <plus> <number, "1"> id, "x" = + id, "x" number, "1" 10

Context-Free Grammars Context-free grammars have terminals (tokens) non-terminals sentential forms sentences Derivations Derive id + id * id with this grammar: E E + T T T T * F F F ( E ) id 11

Derivations Leftmost (rightmost) derivations Always expand the leftmost (rightmost) non-terminal Derivations and parse trees Internal nodes correspond to non-terminals Leaves correspond to terminal Ambiguity When a string has more than one derivation Can result in different parse trees 12

Derivations and Parse Trees E E E + E E + id E * E + id E * id + id ( E ) * id + id ( E + E ) * id + id ( id + E ) * id + id ( id + id ) * id + id E E * ( E ) E + E + E id E id id id 13

Left-Recursion Left-recursion makes parsing difficult Immediate left recursion: A A Rewrite as: A A' and A' A' More complicated left recursion A + A 14

Left Factoring Makes a grammar suitable for top-down parsing For each non-terminal A find the longest prefix common to two or more alternatives Replace A 1 n with A A' and A' 1 n Repeat until not two alternatives have a common prefix 15

Exercise rexpr rexpr + rterm rterm rterm rterm rfactor rfactor rfactor rfactor * rprimary rprimary a b Exercise: Remove left recursion Left-factor 16

First and Follow FIRST(X) : The set of terminals that begin strings that can be derived from X FOLLOW(X): The set of terminals that can appear immediately to the right of X in some sentential form Be able to: compute FIRST and FOLLOW for a small example 17

FIRST and FOLLOW Example E T E' E' + T E' T F T' T' * F T' F ( E ) id FIRST(F) = FIRST(T) = FIRST(E) = {(, id } FIRST(E') = {+, } FIRST(T') = {*, } FOLLOW(E) = FOLLOW(E') = {), $} FOLLOW(T) = FOLLOW(T') = {+,),$} FOLLOW(F) = {+, *, ), $} 18

LL(1) Grammars Left to right parsers producing a leftmost derivation looking ahead by at most 1 symbol Grammar G is LL(1) iff for every two productions of the form A FIRST( ) and FIRST( ) are disjoint If is in FIRST( ) then FIRST( ) and FOLLOW(A) are disjoint (and vice versa) 19

LL(1) Parser LL(1) Parsers are driven by a table Non-terminal x Next token => Expansion Be able to: fill in a table given the FIRST and FOLLOW sets use a table to parse an input string 20

Bottom-up Parsing Shift-reduce parsing Won't be covered on the exam 21

Type-Checking Type checking is done by a bottom-up traversal of the parse tree For each type of node, define what type it evaluates to given the types of its children Some extra types may be introduced error type unknown type These can be used for error recovery Environments Used for keeping track of types of variables Static lexical scoping Exercise: pick a parse tree and assign types to its nodes 22

Parse DAGs Parse DAGS Like parse trees Common subexpressions get merged Exercise: Construct the parse DAG for (x+y)-((x+y)*(x-y)) ((x1-x2)*(x1-x2))+((y1-y2)*(y1-y2)) 23

Intermediate Code Generation Two kinds of intermediate code Stack machine 3 address instructions For 3AI Assign temporary variables to internal nodes of parse dags output the instructions in reverse topological order For stack machines just like in assignment 3 Recipes for control structures 24

Example Exercise: Generate 3AI and stack-machine code for this parse tree + + a * * a - - d b c b c 25

Scope and Code Generation The interaction between static lexical scope and the machine stack Frame pointers Parent frame pointers The frame pointer array Object scope Inheritance Virtual methods dispatch tables Be able to: Illustrate state of stack fp, and fpp for a function call Illustrate memory-layout of an OOP-language object 26

Basic Blocks Blocks of code that always execute from beginning to end Be able to: Given a program, compute the basic blocks Next-use information: Lazy algorithm for code generation and register usage based on next-use information Be able to: Compute next-use information for all the variables in a basic block Illustrate a register allocation based on next-use information The dangers of pointers and arrays 27

Basic Blocks as DAGS Applications Dead-code elimination, algebraic identities, associativity, etc Be able to: Given a basic block, compute its DAG representation Reassemble a basic block from its DAG be careful with pointers and arrays 28

Peephole Optimization Different kinds of peephole optimizations redundant load/stores unreachable code flow of control optimizations (shortcuts) algebraic simplifications and reduction in strength 29

The Control Flow Graph Indicates which basic blocks may succeed other basic blocks during execution Be able to: Compute a control flow graph Choose register variables based on the control-flow graph Eliminate unreachable code Find no-longer-used variables Compute the transitive closure 30

Register Allocation by Graph Coloring The interference graph nodes are variables two nodes are adjacent if the variables are active simultaneously Color the graph with the minimum number of colors Inductive graph coloring algorithm Delete vertex of lowest degree Recurse Reinsert vertex and color with lowest available color Be able to: Illustrate inductive coloring algorithm 31

Ershov Numbers Computed by bottom-up traversal of parse tree Represent the minimum number of registers required to avoid loads and stores Dynamic programming extension reorderings of children different instructions Be able to: Compute Ershov numbers Compute dynamic programming costs 32

Data-Flow Analysis Define in and out for each line of code for each basic block Define transfer functions out[l] = f(l, in[l]) in[b] = f(out[b1],...,out[bk]) where B1,...,Bk are predecessors of B Sometimes works backwards Example applications reaching definitions, undefined variables, live variable analysis Be able to: Apply iterative algorithm for solving equations 33

The GNU Compiler Collection History and background Started in 1985 Open source Compilation steps: Input language Parse tree GENERIC GIMPLE RTL Machine language Be able to: Recognize a picture of Richard Stallman Know difference between GENERIC and GIMPLE 34

Want to Build a Compiler Cross-compiling Bootstrapping Self compiling T-diagrams Be able to: Understand T-diagrams Solve a cross-compilation problem S I T Java JVM Java C C i386 i386 i386 JVM 35

36

37

LL(1) Parser 38

Want to Write a Compiler? A compiler has 3 main parameter Source language (S) What kind of input does the compiler take? C, C++, Java, Python,... Implementation language (I) What language is the compiler written in? C, Java, i386, x84_64 Target language (T) What is the compiler's target language i386, x86_64, PPC, MIPS,... source code (in S) S I T compiled code (in T) 39

Source Language Issues Complexity Is a completely handwritten compiler feasible? Stability Is the language definition still changing? Novelty Do there already exist compilers for this language? Complicated, or still-changing languages promote the use of compiler generation tools 40

Target Language Issues Novelty Is this a new architecture? Are there similar architectures/instruction sets? Available tools Is there an assembler for this language? Are there other compilers for this language? 41

Performance criteria Speed Does it have to be a fast compiler? Does it have to be a small compiler? Does it have to generate fast code? Portability Should the compiler run on many different architectures (rehostability) Should the compiler generate code for many different architectures (retargetability) 42

Possible Workarounds Rewrite an existing front end when the source is new reuse back (code generation) end of the compiler Rewrite an existing back end when the target architecture is new retarget an existing compiler to a new architecture What happens when both the source language and target language are new? Write a compiler from scratch? Do we have other options? 43

Composing Compilers Compilers can be composed and used to compile each other Example: We have written a Java to JVM compiler in C and we want to make it to run on two different platforms i386 and x86_64 both platforms have C compilers Java JVM Java JVM C C i386 i386 i386 Java JVM Java C C x64 x64 x64 JVM 44

Example PRM JVM Assignment 3: Java Assignment 4: JVM JVM' Java PRM JVM fib.prm PRM JVM a.j Java Java JVM JVM i386 JVM JVM' a.j PRM JVM JVM' a.j' Java Java JVM JVM i386 45

Example Show how to To take your PRM compiler and make it faster To take your Jasmin optimizer and make it faster PRM PRM Java Java Java i386 JVM PRM' JVM 46

Bootstrapping by cross-compiling Sometimes the source and implementation language are the same E.g. A C compiler written in C In this case, cross compiling can be useful C 4 x64 C 5 x64 C 1 x64 C C 3 x64 x64 C C 2 i386 i386 i386 47

Bootstrapping Cont'd Bootstrapping by reduced functionality Implement, in machine language, a simplified compiler A subset of the target language No optimizations Write a compiler for the full language in the reduced language C i386 C i386 C C i386 i386 asm 48

Bootstrapping for Self- Improvement If we are writing a good optimizing compiler with I=S then We can compile the compiler with itself We get a fast compiler gcc does this (several times) C gcc i386 C gcc i386 C gcc i386 C C gcc i386 i386 C C cc i386 i386 i386 49

Summary When writing a compiler there are several techniques we can use to leverage existing technology Reusing front-ends or back ends Cross-compiling Starting from reduced instruction sets Self-compiling 50