PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design

Similar documents
Principles of Compiler Design

Chapter 6 Intermediate Code Generation

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Subject Name: CS2352 Principles of Compiler Design Year/Sem : III/VI

PRINCIPLES OF COMPILER DESIGN

Compiler Design Aug 1996

VETRI VINAYAHA COLLEGE OF ENGINEERING AND TECHNOLOGY

VIVA QUESTIONS WITH ANSWERS

Question Bank. 10CS63:Compiler Design

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Gujarat Technological University Sankalchand Patel College of Engineering, Visnagar B.E. Semester VII (CE) July-Nov Compiler Design (170701)

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

Intermediate Code Generation

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING ACADEMIC YEAR / EVEN SEMESTER

CS606- compiler instruction Solved MCQS From Midterm Papers

1. (a) What are the closure properties of Regular sets? Explain. (b) Briefly explain the logical phases of a compiler model. [8+8]

CS5363 Final Review. cs5363 1

Front End. Hwansoo Han

UNIT I INTRODUCTION TO COMPILING

Parsing II Top-down parsing. Comp 412

2068 (I) Attempt all questions.

Compiler Theory. (Intermediate Code Generation Abstract S yntax + 3 Address Code)

LECTURE NOTES ON COMPILER DESIGN P a g e 2

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

SYLLABUS UNIT - I UNIT - II UNIT - III UNIT - IV CHAPTER - 1 : INTRODUCTION CHAPTER - 4 : SYNTAX AX-DIRECTED TRANSLATION TION CHAPTER - 7 : STORA

1. Explain the input buffer scheme for scanning the source program. How the use of sentinels can improve its performance? Describe in detail.

Syntax Analysis Part I

General issues. Section 9.1. Compiler Construction: Code Generation p. 1/18

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

CST-402(T): Language Processors

CSE 504: Compiler Design. Code Generation

TABLE OF CONTENTS S.No DATE TOPIC PAGE No UNIT I LEXICAL ANALYSIS 1 Introduction to Compiling-Compilers 6 2 Analysis of the source program 7 3 The

CS 406/534 Compiler Construction Putting It All Together

DEPARTMENT OF INFORMATION TECHNOLOGY / COMPUTER SCIENCE AND ENGINEERING UNIT -1-INTRODUCTION TO COMPILERS 2 MARK QUESTIONS

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

CJT^jL rafting Cm ompiler

VALLIAMMAI ENGINEERING COLLEGE

VALLIAMMAI ENGNIEERING COLLEGE SRM Nagar, Kattankulathur

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Time : 1 Hour Max Marks : 30

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Semantic Analysis computes additional information related to the meaning of the program once the syntactic structure is known.

SRM UNIVERSITY FACULTY OF ENGINEERING AND TECHNOLOGY SCHOOL OF COMPUTER SCIENCE AND ENGINEERING COURSE PLAN

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Page No 1 (Please look at the next page )

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Context-free grammars

Chapter 4: Syntax Analyzer

UNIT IV INTERMEDIATE CODE GENERATION

CT32 COMPUTER NETWORKS DEC 2015

4. An interpreter is a program that

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

INSTITUTE OF AERONAUTICAL ENGINEERING

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Alternatives for semantic processing

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Part 5 Program Analysis Principles and Techniques

List of Figures. About the Authors. Acknowledgments

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.

Compiler Construction: Parsing

Review. Pat Morin COMP 3002

INSTITUTE OF AERONAUTICAL ENGINEERING (AUTONOMOUS)

COMPILER DESIGN - QUICK GUIDE COMPILER DESIGN - OVERVIEW

Optimizing Finite Automata

LR Parsing Techniques

Appendix A The DL Language

Intermediate Representa.on

Intermediate Representations Part II

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Unit 13. Compiler Design

NARESHKUMAR.R, AP\CSE, MAHALAKSHMI ENGINEERING COLLEGE, TRICHY Page 1

Group B Assignment 9. Code generation using DAG. Title of Assignment: Problem Definition: Code generation using DAG / labeled tree.

Principles of Programming Languages [PLP-2015] Detailed Syllabus

Programming Language Processor Theory

Compiler phases. Non-tokens

Chapter 3: Lexing and Parsing

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

CSE 3302 Programming Languages Lecture 2: Syntax

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

GUJARAT TECHNOLOGICAL UNIVERSITY

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Concepts Introduced in Chapter 6

Comp 204: Computer Systems and Their Implementation. Lecture 22: Code Generation and Optimisation

Formal Languages and Compilers Lecture VI: Lexical Analysis

CMSC 330: Organization of Programming Languages

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Intermediate Code Generation

Introduction to Parsing


CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Transcription:

PSD3A Principles of Compiler Design Unit : I-V 1

UNIT I - SYLLABUS Compiler Assembler Language Processing System Phases of Compiler Lexical Analyser Finite Automata NFA DFA Compiler Tools 2

Compiler - A compiler is computer software that transforms computer code written in one programming language (the source language) into another computer language (the target language). 3

Compiler Architecture 4

Assembler 5

Language-Processing System 6

Compiler Vs Interpreter 7

Phases of Compiler 8

Lexical Analyser www.csd.uwo.ca/~moreno/cs447/lectures/introduction.html/node10.html 9

Finite Automata - FA also called Finite State Machine (FSM) Abstract model of a computing entity. Decides whether to accept or reject a string. Every regular expression can be represented as a FA and vice versa - Two types of FAs: Non-deterministic (NFA): Has more than one alternative action for the same input symbol. Deterministic (DFA): Has at most one action for a given input symbol. 10

scanner generator Main components of scanner generation (e.g., Lex) Convert a regular expression to a non-deterministic finite automaton (NFA) Convert the NFA to a determinstic finite automaton (DFA) Improve the DFA to minimize the number of states Generate a program in C or some other language to simulate the DFA RE Thompson construction Subset construction Minimization DFA simulation NFA DFA Minimized DFA Program Scanner generator 11

Non-deterministic Finite Automata (NFA) NFA (Non-deterministic Finite Automaton) is a 5-tuple (S, Σ,, S0, F): S: a set of states; : the symbols of the input alphabet; : a set of transition functions; S0: s0 S, the start state; F: F S, a set of final or accepting states. Non-deterministic -- a state and symbol pair can be mapped to a set of states. Finite the number of states is finite. 12

Transition Diagram FA can be represented using transition diagram. Corresponding to FA definition, a transition diagram has: States represented by circles; An Alphabet (Σ) represented by labels on edges; Transitions represented by labeled directed edges between states. The label is the input symbol; One Start State shown as having an arrow head; One or more Final State(s) represented by double circles. Example transition diagram to recognize (a b)*abb 13

Compiler Tools 14

UNIT II - SYLLABUS Context-Free Grammar Parse Tree Leftmost & Rightmost Derivation Derivation Trees Ambiguous Grammar Parser Types of Parser Shift Reduce Parsing 15

Context-Free Grammars Languages that are generated by context-free grammars are context-free languages Context-free grammars are more expressive than finite automata: if a language L is accepted by a finite automata then L can be generated by a context-free grammar Definition. A context-free grammar is a 4-tuple (, NT, R, S), where: is an alphabet (each character in is called terminal) NT is a set (each element in NT is called nonterminal) R, the set of rules, is a subset of NT ( NT)* S, the start symbol, is one of the symbols in NT 16

Parse Tree A parse tree of a derivation is a tree in which: Each internal node is labeled with a nonterminal If a rule A A 1 A 2 A n occurs in the derivation then A is a parent node of nodes labeled A 1, A 2,, A n S a S a S S e b 17

Leftmost & Rightmost Derivations A left-most derivation of a sentential form is one in which rules transforming the left-most nonterminal are always applied. A right-most derivation of a sentential form is one in which rules transforming the right-most nonterminal are always applied EX: S A A B A e a A b A A B b b c B c b B -> S A B A A b B a a b 18

Derivation Trees S A A B A e a A b A A B b b c B c b B w = aabb S S S A B A B A A A b B a a b A a A A b a b A A A a e A A A b a b 19

Ambiguity & Disambiguation Given an ambiguous grammar, would like an equivalent unambiguous grammar. Allows you to know more about structure of a given derivation. Simplifies inductive proofs on derivations. Can lead to more efficient parsing algorithms. In programming languages, want to impose a canonical structure on derivations. E.g., for 1+2 3. 20

Role of Parser 21

Types of Parser https://www.tutorialspoint.com/compiler_design/compiler_design_ types_of_parsing.htm 22

Bottom-Up Parsing 23

Top-Down Parsing 24

Shift-reduce Parsing Shift-reduce parsing is a form of bottom-up parsing in which a stack holds grammar symbols and an input buffer holds the rest of the tokens to be parsed. Shift: shift the next input token onto the top of the stack. Reduce: the right end of the string to be reduced must be at the top of the stack. Locate the left end of the string within the stack and decide what non-terminal to replace that string. Accept: announce successful completion of parsing. Error: discover a syntax error and call an error recovery routine. 25

Shift-reduce Parsing 26

Predictive Parser TM Predictive Parser Predictive parser is a recursive descent parser It has the capability to predict which production is to be used to replace the input string. The predictive parser does not suffer from backtracking. To accomplish its tasks, the predictive parser uses a lookahead pointer, which points to the next input symbols. To make the parser back-tracking free, the predictive parser puts some constraints on the grammar and accepts only a class of grammar known as LL(k) grammar.. 27

UNIT III - SYLLABUS Variants of Syntax Trees Three-address code Types and declarations Translation of expressions Type checking Control flow Backpatching 28

Intermediate Code Intermediate code is the interface between front end and back end in a compiler Ideally the details of source language are confined to the front end and the details of target machines to the back end (a m*n model) In this chapter we study intermediate representations, static type checking and intermediate code generation Parser Static Checker Intermediate Code Generator Code Generator Front end Back end 29

Variants of syntax trees It is sometimes beneficial to crate a DAG instead of tree for Expressions. This way we can easily show the common sub-expressions and then use that knowledge during code generation Example: a+a*(b-c)+(b-c)*d + + * a * - d b c 30

Value-number method for constructing DAG s i = + 10 id num 10 + 1 2 3 1 3 To entry for i Algorithm Search the array for a node M with label op, left child l and right child r. If there is such a node, return the value number M If not create in the array a new node N with label op, left child l, and right child r and return its value We may use a hash table 31

Three address code In a three address code there is at most one operator at the right side of an instruction Example: + + * a * b - c d t1 = b c t2 = a * t1 t3 = a + t2 t4 = t1 * d t5 = t3 + t4 www.geeksforgeeks.org/intermediate-code-generation-in-compiler-design 32

Forms of three address instructions x = y op z x = op y x = y goto L if x goto L and iffalse x goto L if x relop y goto L Procedure calls using: param x call p,n y = call p,n x = y[i] and x[i] = y x = &y and x = *y and *x =y 33

Data structures for three address codes Quadruples Has four fields: op, arg1, arg2 and result Triples Temporaries are not used and instead references to instructions are made Indirect triples In addition to triples we use a list of pointers to triples 34

Type Expressions Example: int[2][3] array(2,array(3,integer)) A basic type is a type expression, A type name is a type expression A type expression can be formed by applying the array type constructor to a number and a type expression. A record is a data structure with named field A type expression can be formed by using the type constructor g for function types Type expressions may contain variables whose values are type expressions 35

Backpatching Previous codes for Boolean expressions insert symbolic labels for jumps It therefore needs a separate pass to set them to appropriate addresses We can use a technique named backpatching to avoid this We assume we save instructions into an array and labels will be indices in the array For nonterminal B we use two attributes B.truelist and B.falselist together with functions: makelist(i), Merge(p1,p2), Backpatch(p,i) 36

Type Equivalence They are the same basic type. They are formed by applying the same constructor to structurally equivalent types. One is a type name that denotes the other. 37

Three-address code for expressions 38

Addressing Array Elements Layouts for a two-dimensional array: 39

Control Flow Boolean expressions are often used to: Alter the flow of control. Compute logical values. Short-Circuit Code 40

UNIT IV - SYLLABUS Optimization Rules Basic Blocks Control Flow Graph (CFG) Loops Local Optimizations Peephole optimization 41

Levels of Optimizations Local inside a basic block Global (intraprocedural) Across basic blocks Whole procedure analysis Interprocedural Across procedures Whole program analysis 42

Basic Blocks A basic block is a maximal sequence of consecutive three-address instructions with the following properties: The flow of control can only enter the basic block thru the 1st instr. Control will leave the block without halting or branching, except possibly at the last instr. Basic blocks become the nodes of a flow graph, with edges indicating the order. https://www.youtube.com/watch?v=bc3yshc5rh0 43

Examples 1) i = 1 2) j = 1 3) t1 = 10 * i 4) t2 = t1 + j 5) t3 = 8 * t2 6) t4 = t3-88 7) a[t4] = 0.0 8) j = j + 1 9) if j <= 10 goto (3) 10) i = i + 1 11) if i <= 10 goto (2) 12) i = 1 13) t5 = i - 1 14) t6 = 88 * t5 15) a[t6] = 1.0 16) i = i + 1 17) if i <= 10 goto (13) for i from 1 to 10 do for j from 1 to 10 do a[i,j]=0.0 for i from 1 to 10 do a[i,i]=0.0 44

Identifying Basic Blocks Input: sequence of instructions instr(i) Output: A list of basic blocks Method: Identify leaders: the first instruction of a basic block Iterate: add subsequent instructions to basic block until we reach another leader 45

Identifying Leaders Rules for finding leaders in code First instr in the code is a leader Any instr that is the target of a (conditional or unconditional) jump is a leader Any instr that immediately follow a (conditional or unconditional) jump is a leader 46

Basic Block Partition Algorithm leaders = {1} // start of program for i = 1 to n // all instructions if instr(i) is a branch leaders = leaders U targets of instr(i) U instr(i+1) worklist = leaders While worklist not empty x = first instruction in worklist worklist = worklist {x} block(x) = {x} for i = x + 1; i <= n && i not in leaders; i++ block(x) = block(x) U {i} 47

Control-Flow Edges Basic blocks = nodes Edges: Add directed edge between B1 and B2 if: Branch from last statement of B1 to first statement of B2 (B2 is a leader), or B2 immediately follows B1 in program order and B1 does not end with unconditional branch (goto) Definition of predecessor and successor B1 is a predecessor of B2 B2 is a successor of B1 48

Control-Flow Edge Algorithm Input: block(i), sequence of basic blocks Output: CFG where nodes are basic blocks for i = 1 to the number of blocks x = last instruction of block(i) if instr(x) is a branch for each target y of instr(x), create edge (i -> y) if instr(x) is not unconditional branch, create edge (i -> i+1) 49

Loops Loops comes from while, do-while, for, goto Loop definition: A set of nodes L in a CFG is a loop if 1. There is a node called the loop entry: no other node in L has a predecessor outside L. 2. Every node in L has a nonempty path (within L) to the entry of L. Loop Examples {B3} {B6} {B2, B3, B4} 50

Peephole Optimization Simple compiler do not perform machine-independent code improvement They generates naive code It is possible to take the target hole and optimize it Sub-optimal sequences of instructions that match an optimization pattern are transformed into optimal sequences of instructions This technique is known as peephole optimization Peephole optimization usually works by sliding a window of several instructions (a peephole) 51

Peephole Optimization Goals: - improve performance - reduce memory footprint - reduce code size Method: 1. Exam short sequences of target instructions 2. Replacing the sequence by a more efficient one. redundant-instruction elimination algebraic simplifications flow-of-control optimizations use of machine idioms 52

Peephole Optimization Common Techniques 53

UNIT V - SYLLABUS Code Generation Code Generation Algorithm Function getreg DAG Types of Error Phrase Level Recovery 54

Code generation and Instruction Selection input Front end Intermediate Code generator Code generator output Symbol table output code must be correct output code must be of high quality code generator should run efficiently 55

Issues in the design of code generator Input: Intermediate representation with symbol table assume that input has been validated by the front end target programs : absolute machine language fast for small programs relocatable machine code requires linker and loader assembly code requires assembler, linker, and loader www.geeksforgeeks.org/intermediate-code-generation-in-compilerdesign 56

Instruction Selection Instruction selection Uniformity Completeness Instruction speed Register allocation Instructions with register operands are faster store long life time and counters in registers temporary locations Even odd register pairs Evaluation order 57

Target Machine Byte addressable with 4 bytes per word It has n registers R 0, R 1,..., R n-l Two address instructions of the form opcode source, destination Usual opcodes like move, add, sub etc. Addressing modes MODE FORM ADDRESS Absolute M M register R R index c(r) c+cont(r) indirect register *R cont(r) indirect index *c(r) cont(c+cont(r)) literal #c c PSD1B- Advance Java Programming 58

Code Generator consider each statement remember if operand is in a register Register descriptor Keep track of what is currently in each register. Initially all the registers are empty Address descriptor Keep track of location where current value of the name can be found at runtime The location might be a register, stack, memory address or a set of those 59

Code Generation Algorithm for each X = Y op Z do invoke a function getreg to determine location L where X must be stored. Usually L is a register. Consult address descriptor of Y to determine Y'. Prefer a register for Y'. If value of Y not already in L generate Mov Y', L Generate op Z', L Again prefer a register for Z. Update address descriptor of X to indicate X is in L. If L is a register update its descriptor to indicate that it contains X and remove X from all other register descriptors. If current value of Y and/or Z have no next use and are dead on exit from block and are in registers, change register descriptor to indicate that they no longer contain Y and/or Z. 60

Function getreg 1. If Y is in register (that holds no other values) and Y is not live and has no next use after X = Y op Z then return register of Y for L. 2. Failing (1) return an empty register 3. Failing (2) if X has a next use in the block or op requires register then get a register R, store its content into M (by Mov R, M) and use it. 4. else select memory location X as L 61

Example Stmt code reg desc addr desc t 1 =a-b mov a,r 0 R 0 contains t 1 t 1 in R 0 sub b,r 0 t 2 =a-c mov a,r 1 R 0 contains t 1 t 1 in R 0 sub c,r 1 R 1 contains t 2 t 2 in R 1 t 3 =t 1 +t 2 add R 1,R 0 R 0 contains t 3 t 3 in R 0 R 1 contains t 2 t 2 in R 1 d=t 3 +t 2 add R 1,R 0 R 0 contains d d in R 0 mov R 0,d d in R 0 and memory 62

DAG representation of basic blocks useful data structures for implementing transformations on basic blocks gives a picture of how value computed by a statement is used in subsequent statements good way of determining common sub-expressions A dag for a basic block has following labels on the nodes leaves are labeled by unique identifiers, either variable names or constants interior nodes are labeled by an operator symbol nodes are also optionally given a sequence of identifiers for labels 63

DAG representation: example 1. t 1 := 4 * i 2. t 2 := a[t 1 ] 3. t 3 := 4 * i 4. t 4 := b[t 3 ] 5. t 5 := t 2 * t 4 6. t 6 := prod + t 5 7. prod := t 6 8. t 7 := i + 1 9. i := t 7 10. if i <= 20 goto (1) + prod 0 * t 2 t 6 prod t 5 [ ] [ ] t 4 t 1 t 3 a b * + (1) <= t 7 i 20 4 i 0 1 64

Code Generation from DAG S 1 = 4 * i S 2 = addr(a)-4 S 3 = S 2 [S 1 ] S 4 = 4 * i S 5 = addr(b)-4 S 6 = S 5 [S 4 ] S 7 = S 3 * S 6 S 8 = prod+s 7 prod = S 8 S 9 = I+1 I = S 9 If I <= 20 goto (1) S 1 = 4 * i S 2 = addr(a)-4 S 3 = S 2 [S 1 ] S 5 = addr(b)-4 S 6 = S 5 [S 4 ] S 7 = S 3 * S 6 prod = prod + S 7 I = I + 1 If I <= 20 goto (1) 65

Types of Error There are mainly four types of error. They are as follows: Lexical Error : Such as misspelling an identifier, keyword or operator. Syntactic Error : Such as an arithmetic expression with unbalanced parentheses. Semantic Error : Such as operator applied to an incompatible operand. Logical Error : Such as infinitely recursive call. 66

Phrase Level Recovery On discovering an error, a parser may perform local correction on the remaining input; that is, it may replace a prefix of the remaining input by some string that allows the parser to continue. For example, in case of an error like the one above, it will report the error, generate the ; and continue. Global Correction We would like compiler to make as few changes as possible in processing an incorrect input string. There are algorithms for choosing a minimal amount of changes to obtain a globally least-cost correction. 67