Lecture Compiler Construction

Similar documents
PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Context-free grammars

Compiler Construction: Parsing

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Syntax-Directed Translation

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

CS308 Compiler Principles Lexical Analyzer Li Jiang

Syntax Analysis Part I

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Compilers. 5. Attributed Grammars. Laszlo Böszörmenyi Compilers Attributed Grammars - 1

LR Parsing Techniques

Concepts Introduced in Chapter 4

PART 4 - SYNTAX DIRECTED TRANSLATION. F. Wotawa TU Graz) Compiler Construction Summer term / 309

UNIT-III BOTTOM-UP PARSING

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Compiler Construction 2016/2017 Syntax Analysis

[Lexical Analysis] Bikash Balami

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

Syntax-Directed Translation Part I

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

Module 6 Lexical Phase - RE to DFA

VIVA QUESTIONS WITH ANSWERS

Formal Languages and Compilers Lecture VI: Lexical Analysis

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

[Syntax Directed Translation] Bikash Balami

Lexical Analyzer Scanner

Lexical Analyzer Scanner

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Principles of Programming Languages

Syntax-Directed Translation. Concepts Introduced in Chapter 5. Syntax-Directed Definitions

S Y N T A X A N A L Y S I S LR

CS606- compiler instruction Solved MCQS From Midterm Papers

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Wednesday, August 31, Parsers

Zhizheng Zhang. Southeast University

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Question Bank. 10CS63:Compiler Design

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Bottom-Up Parsing. Lecture 11-12

Chapter 4. Lexical and Syntax Analysis

Principles of Programming Languages

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

Parser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

3. Parsing. Oscar Nierstrasz

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

VETRI VINAYAHA COLLEGE OF ENGINEERING AND TECHNOLOGY

Principles of Compiler Design Presented by, R.Venkadeshan,M.Tech-IT, Lecturer /CSE Dept, Chettinad College of Engineering &Technology

Monday, September 13, Parsers

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1

LR Parsing LALR Parser Generators


4. Lexical and Syntax Analysis

Introduction to Syntax Analysis

Syntax-Directed Translation

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design

Compilers. Bottom-up Parsing. (original slides by Sam

UNIT III & IV. Bottom up parsing

Lexical Analysis (ASU Ch 3, Fig 3.1)

Part 5 Program Analysis Principles and Techniques

Chapter 3 Lexical Analysis

Introduction to Syntax Analysis. The Second Phase of Front-End

A programming language requires two major definitions A simple one pass compiler

Lecture 7: Deterministic Bottom-Up Parsing

Lecture 8: Deterministic Bottom-Up Parsing

MODULE 14 SLR PARSER LR(0) ITEMS

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

LR Parsing Techniques

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

UNIT II LEXICAL ANALYSIS

Compiler course. Chapter 3 Lexical Analysis

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.

General Overview of Compiler

4. Lexical and Syntax Analysis

Bottom up parsing. The sentential forms happen to be a right most derivation in the reverse order. S a A B e a A d e. a A d e a A B e S.

Syntax-Directed Translation

LR Parsing, Part 2. Constructing Parse Tables. An NFA Recognizing Viable Prefixes. Computing the Closure. GOTO Function and DFA States

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Gujarat Technological University Sankalchand Patel College of Engineering, Visnagar B.E. Semester VII (CE) July-Nov Compiler Design (170701)

Lexical and Syntax Analysis. Bottom-Up Parsing

Bottom-Up Parsing. Lecture 11-12

Unit 13. Compiler Design

2068 (I) Attempt all questions.

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

LR Parsing LALR Parser Generators

Earlier edition Dragon book has been revised. Course Outline Contact Room 124, tel , rvvliet(at)liacs(dot)nl

PART 4 - SYNTAX DIRECTED TRANSLATION. F. Wotawa TU Graz) Compiler Construction Summer term / 264

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

COMPILER DESIGN. For COMPUTER SCIENCE

COMPILER DESIGN - QUICK GUIDE COMPILER DESIGN - OVERVIEW

Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh.

Transcription:

Lecture Compiler Construction Franz Wotawa wotawa@ist.tugraz.at Institute for Software Technology Technische Universität Graz Inffeldgasse 16b/2, A-8010 Graz, Austria Summer term 2017 F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 1 / 309

Compilers are everywhere Programming languages like Java, C#, C, C++, Pascal, Modula, SML, Lisp, VHDL, Basic,.. Graphical languages also need compilers (HTML, L A T E X,...) Means for communication between computers like XML Natural languages Compilers translate a sentence written in one language to another language. F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 2 / 309

Example HTML <!DOCTYPE HTML PUBLIC "...- > <html> <head> <meta content=text/html;..."http-equiv=content-type- > <title>hello World</title> </head> <body> <h1>hello World!</h1> </body> </html> F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 3 / 309

Another example (Source: en.wikipedia.org/wiki/java bytecode) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 4 / 309

Compiler vs. Interpreter Compiler: Translates from one into another language where programs are directly executed (C,C++,Pascal,...). Interpreter: Executes a program directly without converting it (BASIC, batch languages,..). Exact boundary difficult to define nowadays (Byte code interpreter vs. CPU, which executes machine code statements. Front end (analysis phase) of compilers is always needed! F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 5 / 309

Organizational issues Lecture (Vorlesung) 2h Practical exercises (Übung) 1h Examples from the Compiler Construction theory Hands on part: Development of a compiler More information soon (via email and/or the webpage) Office hours: Franz Wotawa: Tuesday, 13:00 14:00 Roxane Koitz: Monday, 13:00 14:00 F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 6 / 309

Lecture dates Monday, 6.3.2017, 16:00-18:00 (preliminary discussion) Tuesday, 7.3.2017, 16:00-17:30 (lexical analysis) Monday, 20.3.2017, 16:00-19:00 (parsing) Tuesday, 21.3.2017, 16:00-17:30 (attributed grammars) Monday, 27.3.2017, 16:00-19:00 (type checking) Tuesday, 28.3.2017, 16:00-17:00 (runtime environment) Monday, 3.4.2017, 16:00-19:00 (code generation) Tuesday, 4.4.2017, 16:00-17:30 (code optimization, Q&A) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 7 / 309

Exam Written form only; no papers, books etc. allowed! Content: DFA, NFA, LL(1), SLR(1), Attributed Grammars, Type Checking, Code Generation, Code Optimization, etc.) 0. date: Monday, 15.5.2017, 18:00-19:00, i13, (1 groups, 1 hour) 1. date: Monday, 19.6.2017, 18:00-20:00, i13, (2 groups, each 1 hour) 2. date: in October 2017, will be announced soon F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 8 / 309

Literature, etc. Aho, Seti, Ullman, Compilers Principles, Techniques, and Tools, Addison-Wesley, 1985, ISBN 0-201-10088-6. Tremblay, Sorenson, The Theory and Practice of Compiler Writing, McGraw Hill, 1985, ISBN 0-07-065161-2. Copy of slides but no lecture notes Compiler construction webpage: http://www.ist.tugraz.at/cb16.html F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 9 / 309

PART 1 - INTRODUCTION F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 10 / 309

What is a compiler? Program Source language Target language Error reports if source code contains errors Source program Compiler Target program Error messages F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 11 / 309

Implications of definition There are (very) many compilers Thousands of source languages (e.g. Pascal, Modula, C, C++, Java,... ) Thousands of target languages (other high-level programming languages, machine code) Classification: single-pass, multi-pass, debugging, or optimizing But: basics of creating a compiler are mostly consistent F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 12 / 309

Concept of compilation Analysis phase Split source program into tokens Generate intermediate code (intermediate representation of source program) Synthesis phase Generate target program based on intermediate code Enables reuse of program parts for different source and target languages F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 13 / 309

Concept of compilation Analysis Operations used within program are stored in hierarchical structure Syntax Tree Other tools which perform an analysis: Structure editors Pretty printers Static checkers Interpreters Web browser F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 14 / 309

Language-processing systems skeletal source program preprocessor source program compiler target assembly language assembler relocatable machine code relocatable machine code loader/link-editor absolute machine code library, relocatable object files F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 15 / 309

Components of a compiler source code lexical analyser syntax analyser semantic analyser symbol-table manager intermediate code generator error handler code optimizer code generator target program F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 16 / 309

Lexical analysis, scanning Syntax tree Example: pos := init + rate * 60 Tokens: 1 Identifier pos 2 Assignment symbol := 3 Identifier init 4 Plusoperator 5 Identifier rate 6 Multiplication operator 7 Constant (number) 60 := pos + init * rate 60 F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 17 / 309

Syntax analysis, parsing Grammatical analysis Token rules of grammar Rules of grammar (example) 1 Each identifier is an expression 2 Each number is an expression 3 Assuming that ex 1 and ex 2 are expressions, then ex 1 + ex 2 and ex 1 ex 2 are expressions as well 4 A term: identifier := Expression is a statement Parse tree F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 18 / 309

Example - Parse tree assignment statement identifier := expression pos expression + expression identifier expression * expression init identifier number rate 60 F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 19 / 309

Semantic analysis Check for semantic errors Type checking Identification of operators and operands Necessary for code generation Example: conversion from int to real statement pos := init + rate * 60 becomes pos := init + rate * int2real(60) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 20 / 309

Intermediate code generation E.g.: three-address code Example: 1. tmp1 := int2real(60) 2. tmp2 := id3 * tmp1 3. tmp3 := id2 + tmp2 4. id1 := tmp3 using the symbol table pos (id1) real... init (id2) real... rate (id3) real... F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 21 / 309

Code optimizer Goal: faster machine code Example: 1. tmp1 := int2real(60) 2. tmp2 := id3 * tmp1 3. tmp3 := id2 + tmp2 4. id1 := tmp3 1. tmp1 := id3 * 60.0 2. id1 := id2 + tmp1 F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 22 / 309

Code generator Generation of actual target code (Machine code, Assembler) Example: 1. MOVF id3, R2 2. MULF #60.0, R2 3. MOVF id2, R1 4. ADDF R2, R1 5. MOVF R1, id1 F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 23 / 309

PART 2 - LEXICAL ANALYSIS F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 24 / 309

Goals of this section Specification and implementation of lexical analysers Easiest method: 1 Create diagram which describes structure of tokens 2 Manually translate diagram into a program Regular expressions in finite state machines F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 25 / 309

Tasks source program lexical analyser token get next token parser symbol table Filter comments, white spaces,.. Establish relations between error messages of compiler and source code (e.g. via line number) Preprocessor functions (e.g. in C) Create copy of source program containing error messages F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 26 / 309

Why separate lexical analysis and parsing? 1 Simpler design: e.g. removal of comments and white spaces enables use of simpler parser design 2 Improve efficiency: large portion of time is invested into reading of programs and conversion into tokens 3 Enhance portability of compilers 4 ( There are tools for both phases ) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 27 / 309

Tokens, patterns, lexemes Pattern: Rule describing a set of strings (a token) Token: Set of strings described by a pattern Lexem: Sequence of characters which are matched by a pattern of a token Token Lexem Patterns (verbal) if if if relation >, <,... < or > or... id pi, test cases letter followed by letters, digits, or num 2.7, 0, 12e-4 any numeric constant literal segmentation fault any characters between and F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 28 / 309

Tokens Terminals in the grammar (checked by the parser) In programming languages: keywords, operators, identifiers, constants, strings, brackets Language conventions: 1 Tokens have to occur in specific places within the code 2 Tokens are separated by white spaces Example (Fortran): DO 5 I = 1.25 is interpreted as DO5I = 1.25 3 Keywords are reserved and must not be used as identifiers 1. is (currently) not used 2. is accepted 3. is reasonable (and simplyfies compiler construction) IF THEN THEN THEN = ELSE; ELSE ELSE = THEN; F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 29 / 309

Token attributes Store lexeme for further processing Store pointers to entries in symbol table Line number or position of token within source code U = 2 * R * PI < id, pointer to U > < assign op > < num, integer value 2 > < mult op > < id, pointer to R > < mult op > < id, pointer to PI > F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 30 / 309

Lexical errors fi ( a == 1)... fi is interpreted as undefined identifier String sequences which cannot be matched to a token e.g.: 2.3e*4 instead of 2.3e+4. Error-correcting: 1 Panic Mode: Removal of faulty characters until a token can be matched 2 Remove a faulty character 3 Include a new character 4 Replace a character 5 Change order of neighboring characters Some errors (like the first example) may be only recognizable by the parser F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 31 / 309

Specification of tokens Strings over an alphabet: Alphabet... finite set of symbols example: {0, 1} Binary alphabet String (word, sentence)... finite sequence of symbols of the alphabet ɛ... empty string Language: Set of all strings over an alphabet Operators: Concatenation xy is a string whereby y is attached to the end of x x i is defined as: x 0 = ɛ, i > 0 x i = x i 1 x Prefix: computer comp Suffix: computer uter Substring: computer put Subsequence: computer opt F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 32 / 309

Operations on languages Union L M = {x x L x M} Concatenation LM = {xy x L y M} Kleen Closure L = i=0 Li Positive Closure L + = i=1 Li Example: L = {A,..., Z}, M = {0, 1,..., 9} L M Set of letters and numbers LM Strings containing a letter followed by a number L 4 Strings which are made up of four letters L Strings made up of letters including ɛ M + Number strings containing at least one number F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 33 / 309

Regular expressions Example: Pascal identifier letter ( letter digit ) Regular expression r defines a language over an alphabet Σ using the following rules: 1 The regular expression ɛ describes {ɛ} 2 a Σ: The regular Expression a describes {a} 3 r, s are regular expressions (languages L(r), L(s)): 1 (r) (s) describes L(r) L(s) 2 (r)(s) describes L(r)L(s) 3 (r) describes (L(r)) Regular sets 4 (r)+ describes (L(r)) + F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 34 / 309

Examples Assuming Σ = {a, b} 1 a b describes the language {a, b} 2 (a b)(a b) describes {aa, ab, ba, bb} 3 a describes {ɛ, a, aa, aaa, aaaa,...} 4 (a b) describes all strings which contain no or multiple a s and b s Not all languages can be described by regular expressions e.g. {wcw w is a string made up of a s and b s} Regular expressions may only describe words containing either a fixed amount of repetitions or an infinite amount of repetitions F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 35 / 309

Regular definitions A regular definition is a sequence of the form: d 1 r 1... d n r n whereby d i is a distinct name and r i is a regular expression over Σ {d 1,..., d n } Example: letter A... Z a... z digit 0 1... 9 id letter(letter digit) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 36 / 309

Token detection Goal: Identify lexeme of token in input buffer Output: Token-attribute-pair Regular Expression Token Attribute Value white space - - if if - then then - id id pointer to table entry num num pointer to table entry < relop LT > relop GT......... F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 37 / 309

Transition diagrams Actions performed by lexical analyser (after get next token of the parser) States connected by directed edges Edges are labeled: Labels contain symbols expected in input buffer in order to to progress to next state Label other denotes all symbols not used by any other edge originating from the same node One start state End states after matching a token Deterministic (not necessarily) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 38 / 309

Example Regular expression letter (letter digit) can be described by following transition diagram: letter or digit start letter other 1 2 3 return(gettoken(),install_id()) Assume the input buffer contains: Pos 1 2 3 4... P I =... and the current-symbol pointer is pointing to position 1 F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 39 / 309

Tokenmatching - Example 1 Current symbol = P, position = 1 Transition from state 1 to state 2 and read new symbol 2 Current symbol = I, position = 2 Remain in state 2 and read new symbol 3 Current symbol = SPACE, position = 3 Transition from state 2 to state 3 4 State 3 is an end state I 2 letter or digit start 1 letter 2 other 3 P 1 3 return(gettoken(),install_id()) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 40 / 309

Example Implementing a Lexer directly Pseudo code 1 lexem = ; 2 if (next char letter) then 3 lexem = lexem + next char; 4 get next char(); 5 while (next char letter digit) do 6 lexem = lexem + next char; 7 get next char(); 8 return IDENTIFIER(lexem); F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 41 / 309

Generalization Finite automata Nondeterministic finite automaton (NFA) (S, Σ, move, s 0, F ) 1 Set of states S 2 Set of input symbols Σ 3 Transition relation move : S (Σ {ɛ}) 2 S, relating state-symbol-pairs to states 4 Initial state s 0 S 5 Set of end states F S NFA can be visualized in form of directed graph (transition graph) Distinctions from transition diagram: Nondeterministic ɛ-moves allowed F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 42 / 309

NFA Example NFA for aa bb (language, accepting NFA): ({0, 1, 2, 3, 4}, {a, b}, move, 0, {2, 4}) with move: state symbol new state 0 ɛ {1,3} 1 a {2} 2 a {2} 3 b {4} 4 b {4} F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 43 / 309

Deterministic finite automaton Deterministic finite automaton (DFA) is a special case of a NFA whereby: 1 No state has an ɛ-move 2 For each state s there is at most one edge which is marked with an input symbol a DFAs contain exactly one transition for each input Each entry in transition table contains exactly one state State Input Symbols a b 0 1 2 F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 44 / 309

DFA simulation Input: Input string x, DFA D Output: Yes if D accepts x, No otherwise s := s 0 c := nextchar while c eof do s := move(s, c) if s == then return No c := nextchar end if s F then return Yes else return No end F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 45 / 309

DFA Example DFA for aa bb : ({0, 1, 2}, {a, b}, move, 0, {1, 2}) whereby move: state symbol new state 0 a 1 0 b 2 1 a 1 2 b 2 a 1 a 0 b b 2 F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 46 / 309

Conversion NFA DFA Subset construction algorithm Idea: 1 Each DFA state corresponds to a set of NFA states 2 After reading a 1 a 2... a n the DFA is in a state which ist equivalent to a subset of the NFA. This subset corresponds to the path of the NFA when reading a 1 a 2... a n. Number of DFA states is exponential in the nunber of NFA states (worst case) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 47 / 309

Subset construction Input: NFA N = (S, Σ, move, s 0, F ) Output: DFA D, accepting the same language as N Initially, ɛ-closure(s 0 ) is the only state in Dstates and it is unmarked while there is an unmarked state T in Dstates do mark T for each input symbol a do U := ɛ-closure(move(t,a)) if U Dstates then Add U as unmarked state to Dstates end Dtran[T, a] := U end end F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 48 / 309

ɛ-closure, move ɛ-closure(s): Set of NFA states which are accessible from a state s via ɛ-moves ɛ-closure(t ): Set of NFA states which are accessible from a state s T via ɛ-moves move(t, a): Set of NFA states which are accessible from a state s T via a move relation of the input symbol a F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 49 / 309

Calculation of ɛ-closure Input: NFA N, set of states T Output: ɛ-closure Push all states in T onto stack Initialize ɛ-closure(t ) to T while stack do Pop t, the top element of stack for each state u with an edge from t to u labeled ɛ do if u ɛ-closure(t ) then Add u to ɛ-closure(t ) Push u onto stack end end end F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 50 / 309

Example NFA DFA ε ε a 2 3 ε ε 0 1 ε a b b 6 7 8 9 10 ε ε b 4 5 ε F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 51 / 309

Thompson s Construction Input: Regular expression r over an alphabet Σ Output: NFA N, which accepts L(r) ɛ ε i f st i f N(s) ε N(t) a a Σ i f s i ε ε f N(s) ε s t i ε N(s) ε f [s] ε ε N(t) whereby [s] = (s ɛ) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 52 / 309

NFA simulation Input: NFA N, input string x Output: Yes if N accepts x, No otherwise S := ɛ-closure({s 0 }) a := nextchar while a eof do S := ɛ-closure(move(s,a)) a := nextchar end if S F then return Yes else return No end F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 53 / 309

Summary Check whether an input string x is contained in a language defined by the regular expression r: 1 Construct NFA (Thompson s construction) and apply simulation algorithm for NFAs 2 Construct NFA (Thompson s construction), convert NFA to DFA and apply simulation algorithm for DFAs Time-space tradeoffs Automaton Space Time NFA O( r ) O( r x ) DFA O(2 r ) O( x ) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 54 / 309

Regular expression DFA Direct conversion Notation: NFA state s is important s has at least one non-ɛ-move to a successor Extended regular expression (Augmented regular expression) (r)# Illustration of extended expressions via syntax trees (cat-node, or-node, star-node) Each leaf node is labeled either with a symbol (of the alphabet) or with ɛ Each leaf node ( ɛ) has a designated number (position) Remark: NFA-DFA-conversion only via important states = positions F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 55 / 309

Syntax tree Example # 6 * a 3 b 4 b 5 a b 1 2 Syntax tree for (a b) abb# F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 56 / 309

Algorithm - Idea Approach: 1 Create syntax tree for extended regular expression 2 Calculate four functions: nullable, f irstpos, lastpos, f ollowpos 3 Construct DFA using f ollowpos DFA states correspond to sets of positions (important NFA states) Position i, f ollowpos(i) are positions j for which there exists an input string... cd... where i corresponds to c and j corresponds to d F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 57 / 309

f ollowpos Example Syntax tree for (a b) abb# followpos(1) = {1, 2, 3} Explanation: Suppose we see an a. This symbol could belong either to (a b) or to the following a. If we see a b then this symbol has to belong to (a b). Thus positions 1,2,3 are contained in followpos(1). Informal definition of functions (node n, string s) firstpos... positions, which match the first symbol of s lastpos... positions, which match the last symbol of s nullable... True, if node n can create a language containing the empty string, false otherwise F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 58 / 309

Rules for nullable, f irstpos, lastpos Node n nullable f irstpos [lastpos] Leaf n labeled true [ ] ɛ Leaf n labeled false {i} [{i}] position i c 1 c 2 nullable(c 1 ) nullable(c 2 ) firstpos(c 1 ) firstpos(c 2 ) [lastpos(c 1 ) lastpos(c 2 )] c 1 c 2 nullable(c 1 ) nullable(c 2 ) if nullable(c 1 ) then firstpos(c 1 ) firstpos(c 2 ) else firstpos(c 1 ) if nullable(c 2) then lastpos(c 1 ) lastpos(c 2 ) else lastpos(c 2 ) c 1 true firstpos(c 1 ) [lastpos(c 1 )] F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 59 / 309

f irstpos, f ollowpos Example {1,2,3} {6} {1,2,3} {5} {6} # {6} 6 {1,2,3} {4} {5} b {5} 5 {1,2,3} {3} {4} b {4} 4 {1,2} * {1,2} {3} a {3} 3 {1,2} {1,2} {1} a {1} {2} b {2} 1 2 F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 60 / 309

Calculation f ollowpos 1 If n is a cat-node ( ), c 1 is the left, c 2 is the right child and i is a position in lastpos(c 1 ), then all positions of firstpos(c 2 ) are contained in f ollowpos(i) 2 If n is a star-node ( ) and i is contained in lastpos(n), then all positions of f irstpos(c) are contained in f ollowpos(i) Node followpos 1 {1,2,3} 2 {1,2,3} 3 {4} 4 {5} 5 {6} 6-1 2 3 4 5 6 F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 61 / 309

f ollowpos-graph NFA A NFA without ɛ-moves can be created based on a f ollowpos-graph: 1 Mark all positions in firstpos of the root node of the syntax tree as initial states 2 Label each directed edge (i, j) with the symbol of position i 3 Mark position which belongs to # as end state f ollowpos-graph can be converted to DFA (using subset construction) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 62 / 309

DFA-algorithm Input: Regular expression r Output: DFA D, accepting the language L(r) Initially, the only unmarked state in Dstates is f irstpos(root) while there is an unmarked state T Dstates do Mark T for each input symbol a do Let U be the set of positions that are in followpos(p) for some position p T where the symbol of p is a. if U and U Dstates then Add U as unmarked state to Dstates end DT ran(t, a) = U end end F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 63 / 309

PART 3 - SYNTAX ANALYSIS F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 64 / 309

Goals Definition of the syntax of a programming language using context free grammars Methods for parsing of programs determine whether a program is syntactically correct Advantages (of grammars): Precise, easily comprehensible language definition Automatic construction of parsers Declaration of the structure of a programming language (important for translation and error detection) Easy language extensions and modifications F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 65 / 309

Tasks source program lexical analyser token get next token parser parse tree rest of the front end intermediate representation symbol table Parser types: Universal parsers (inefficient) Top-down-parser Bottom-up-parser Only subclasses of grammars (LL, LR) Collect token informations Type checking Immediate code generation F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 66 / 309

Syntax error handling Error types: Lexical errors (spelling of a keyword) Syntactic errors (closing bracket is missing) Semantic errors (operand is incompatible to operator) Logic Errors (infinite loop) Tasks: Exact error description Error recovery consecutive errors should be detectable Error correction should not slow down the processing of correct programs F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 67 / 309

Problems during error handling Spurious Errors: Consecutive errors created by error recovery Example: Compiler issues error-recovery resulting in the removal of the declaration of pi Error during semantic analysis: pi undefined Error is detected late in the process error message does not point to the correct position within the code Too many error messages are issued F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 68 / 309

Error-recovery Panic mode: Skip symbols until input can by synchronized to a token Phrase-level recovery: Local error corrections, e.g. replacement of, by a ; Error productions: Extension of grammar to handle common errors Global correction: Minimal correction of program in order to find a matching derivation (cost intensive) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 69 / 309

Grammars Grammar A grammar is a 4-tupel G = (V N, V T, S, Φ) whereby: V N Set of nonterminal symbols V T Set of terminal symbols S V N Start symbol Φ : (V N V T ) V N (V N V T ) (V N V T ) Set of production rules (rewriting rules) (α, β) is represented as α β Example: ({S, A, Z}, {a, b, 1, 2}, S, {S AZ, A a, A b, Z ɛ, Z 1, Z 2, Z ZZ}) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 70 / 309

Derivations in grammars Direct derivation σ, ψ (V T V N ). σ can be directly derived from ψ (in one step; ψ σ), if there are two strings φ 1, φ 2, so that σ = φ 1 βφ 2 and ψ = φ 1 αφ 2 and α β Φ. Example: ψ σ Rule used φ 1 φ 2 S A Z S AZ ɛ ɛ az a1 Z 1 a ɛ AZZ A2Z Z 2 A Z F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 71 / 309

Derivation Production: A string ψ produces σ (ψ + σ), if there are strings φ 0,..., φ n (n > 0), so that ψ = φ 0 φ 1, φ 1 φ 2,..., φ n 1 φ n = σ. Example: S AZ AZZ A2Z a2z a21 Reflexive, transitive closure: ψ σ ψ + σ or ψ = σ Accepted language: A grammar G accepts the following language L(G) = {σ S σ, σ (V T ) } F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 72 / 309

Parse trees Example: E E + E E E id 2 derivations (and parse trees) for id+id*id E E E + E E * E id E * E E + E id id id id id Grammar is ambiguous F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 73 / 309

Classification of grammars Chomsky (restriction of production rules α β) Unrestricted Grammar: no restrictions Context-Sensitive Grammar: α β Context-Free Grammar: α β and α V N Regular Grammar: α β, α V N and β is in the form of: ab or a whereby a V T and B V N F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 74 / 309

Grammar examples Regular grammar: (a b) abb A 0 aa 0 ba 0 aa 1 A 1 ba 2 A 2 ba 3 A 3 ɛ Context-sensitive grammars: L 1 = {wcw w (a b) } But L 1 = {wcw R w (a b) } is context-free L 2 = {a n b m c n d m n 1, m 1} L 3 = {a n b n c n n 1} F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 75 / 309

Conversions Remove ambiguities stmnt if expr then stmnt if expr then stmnt else stmnt other 2 parse trees for if E 1 then if E 2 then S 1 else S 2. smtn smtn if expr then smtn E1 if expr then smtn else smtn if expr then smtn else smtn E1 S2 if expr then smtn E2 S1 S2 E2 S1 Prefer left tree Associate each else with the closest preceding then F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 76 / 309

Removing left recursions A grammar is left-recursive if there is a nonterminal A and a production A + Aα Top-Down-Parsing can t handle left recursions Example: convert A Aα β to: A βa 1 A 1 αa 1 ɛ F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 77 / 309

Algorithm to eliminate left recursions Input: Grammar G without cycles and ɛ-productions Output: Grammar without left recursions Arrange the nonterminals in some order A 1, A 2,..., A n for i := 1 to n do for j := 1 to i 1 do Replace each production A i A j γ by the productions A i δ 1 γ... δ k γ, where A j δ 1... δ k are all the current A j -production end Eliminate the immediate left recursion among the A i -productions end F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 78 / 309

Left factoring Important for predictive parsing Elimination of alternative productions stmnt if expr then stmnt else stmnt Example: if expr then stmnt Solution: For each nonterminal A find the longest prefix α for two or more alternative productions If α ɛ then replace all A-productions A αβ 1 αβ 2... αβ n γ (γ does not start with α) with: A αa 1 γ A 1 β 1 β 2... β n Apply transformation until no prefixes α ɛ can be found F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 79 / 309

Top-down-parsing Idea: Construct parse tree for a given input, starting at root node Recursive-descent parsing (with backtracking) Example: S cad A ab a Matching of cad c S A (1) Predictive parsing (without backtracking, special case of recursive-descent parsing) Left-recursive grammars can lead to infinite loops! d c a S A (2) b d c S A a (3) d F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 80 / 309

Predictive parsers Recursive-descent parser without backtracking Possible if production which needs to be used is obvious for each input symbol Transition diagrams 1 Remove left recursions 2 Left factoring 3 For each nonterminal A: 1 Create a initial state and an end state 2 For each production A X 1X 2... X n create a path leading from the initial state to the end state while labeling the edges X 1,..., X n F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 81 / 309

Predictive parsers (II) Processing: 1 Start at the initial state of the current start symbol 2 Suppose we are currently in the state s which has an edge whose label contains a terminal a and leads to the state t. If the next input symbol is a then go to state t and read a new input symbol. 3 Suppose the edge (from s) is marked by a nonterminal A. In that case go to the initial state of A (without reading a new input symbol). If we reach the end state of A then go to state t which is succeeding s. 4 If the edge is marked by ɛ then go directly to t without reading the input. Easily implemented by recursive procedures F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 82 / 309

Example - Predictive parser E() E1() E ide 1 (E) E 1 ope ɛ if nexttoken=id then getnexttoken E1() if nexttoken=( then getnexttoken E() if nexttoken=) then accept if nexttoken=op then getnexttoken E() else return E: E1: id 0 1 ( 2 op E 0 1 2 ε E1 E ) 3 4 F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 83 / 309

Non-recursive predictive parser INPUT a + b $ STACK X Y Z $ Predictive Parsing Program OUTPUT Parsing Table M Input buffer: String to be parsed (terminated by a $) Stack: Initialized with the start symbol and contains nonterminals which are not derivated yet (terminated by a $) Parsing table M(A, a), A is a nonterminal, a a terminal or $ F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 84 / 309

Top-down parsing with stack Mode of operation: X is top element of stack, a the current input symbol 1 X is a terminal: If X = a = $, then the input was matched. If X = a $, pop X off the stack and read next input symbol. Otherwise an error occured. 2 X ist a nonterminal: Fetch entry of M(X, a). If this entry is an error skip to error recovery. Otherwise the entry is a production of the form X UV W. Replace X on the stack with W V U (afterward U is the top most element on the stack). F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 85 / 309

Example Grammar E id E 1 (E) E 1 op E ɛ Parsing table M(X, a) NONTERMINAL id op ( ) $ E E id E 1 E (E) E 1 E 1 op E E 1 ɛ E 1 ɛ Derivation of id op id. F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 86 / 309

Example (II) STACK INPUT OUTPUT $ E id op id $ $ E 1 id id op id $ E id E 1 $ E 1 op id $ $ E op op id $ E 1 op E $ E id $ $ E 1 id id $ E id E 1 $ E 1 $ $ $ E 1 ɛ F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 87 / 309

FIRST and FOLLOW Used when calculating parse table F IRST (α) Set of terminals, which can be derived from α (α string of grammar symbols) F OLLOW (A) Set of terminals which occur directly on the right side next to the nonterminal A in a derivation If A is the right most element of a derivation, then $ is contained in F OLLOW (A) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 88 / 309

Calculation of FIRST F IRST (X) for a grammar symbol X 1 X is a terminal: F IRST (X) = {X} 2 X ɛ is a production: Add ɛ to F IRST (X) 3 X is a nonterminal and X Y 1 Y 2... Y k is a production a is in F IRST (X) if: 1 An i exists; a is in F IRST (Y i) and ɛ is in every set F IRST (Y 1)... F IRST (Y i 1) 2 a = ɛ and ɛ is in every set F IRST (Y 1)... F IRST (Y k ) F IRST (X 1 X 2... X n ): Each non-ɛ symbol of F IRST (X 1 ) is in the result If ɛ F IRST (X 1 ), then each non-ɛ symbol of F IRST (X 2 ) is in the result and so on Is ɛ in every F IRST -set, then it it also is contained in the result F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 89 / 309

Calculation of FOLLOW In order to calculate F OLLOW (A) of a nonterminal A use following rules: 1 Add $ to F OLLOW (S), whereby S is the initial symbol 2 For each production A αbβ, add all elements of F IRST (β) except ɛ to F OLLOW (B) 3 For each production A αb and A αbβ with ɛ F IRST (β), add each element of F OLLOW (A) to F OLLOW (B) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 90 / 309

Example Grammar: FIRST sets: FOLLOW sets: E id E 1 (E) E 1 op E ɛ F IRST (E) = {id, (} F IRST (E 1 ) = {op, ɛ} F OLLOW (E) = F OLLOW (E 1 ) = {$, )} F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 91 / 309

Construction of parsing tables Input: Grammar G Output: Parsing table M 1 For each production A α do Steps 2 and 3. 2 For each terminal a in F IRST (α), add A α to M(A, a). 3 If ɛ is in F IRST (α), add A α to M(A, b) for each terminal b in F OLLOW (A). If ɛ is in F IRST (α) and $ is in F OLLOW (A), add A α to M(A, $) 4 Make each undefined entry of M be error. Example: See table of last example grammar F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 92 / 309

LL(1) Grammars Parsing table construction can be used with arbitrary grammars Multiple elements per entry may occur LL(1) Grammar: Grammar whose parsing table contains no multiple entries L... Scanning the Input from LEFT to right L... Producing the LEFTMOST derivation 1... Using 1 input symbol lookahead F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 93 / 309

Properties of LL(1) No ambiguous or left-recursive grammar is LL(1) G ist LL(1) For each two different productions A α β it is neccessary that: 1 No strings may be derived from both α and β which start with the same terminal a 2 At most one of the productions α or β may be derivable to ɛ 3 If β ɛ, then α may not derive any string which starts with an element in F OLLOW (A) Multiple entries in the parsing table can occasionally be removed by hand (without changing the language recognized by the automaton) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 94 / 309

Error-recovery in predictive parsing Heuristics in panic-mode error recovery: 1 Initially, all symbols in F OLLOW (A) can be used for synchronization: Skip all tokens until an element in F OLLOW (A) is read and remove A from the stack. 2 If F OLLOW sets don t suffice: Use hierarchical structure of program constructs. E.g. use keywords occurring at the beginning of a statement as addition to the synchronization set. 3 F IRST (A) can be used as well: If an element in F IRST (A) is read, continue parsing at A. 4 If a terminal which can t be matched is at the top of the stack, remove it. F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 95 / 309

Bottom-up parsing Shift-reduce parsing Reduction of an input towards the start symbol of the grammar Reduction step: Replace a substring, which matches the right side of a production with the left side of that same production Example: S aabe A Abc b B d abbcde aabcde aade aabe S F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 96 / 309

Handles Substring, which matches the right side of a production and leads to a valid derivation (rightmost derivation) Example (ambiguous grammar): E E + E E E E E (E) E id Rightmost derivation of id + id * id: Right-Sentential Form Handle Reducing Production id + id * id id E id id + id * E id E id id + E * E E E E E E id + E id E id E + E E + E E E + E E F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 97 / 309

Stack implementation Initially: Stack Input $ w$ Shift n 0 symbols from input onto stack until a handle can be found Reduce handle (replace handle with left side of production) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 98 / 309

Example shift-reduce parsing Stack Input Action (1) $ id + id * id $ shift (2) $ id + id * id $ reduce by E id (3) $ E + id * id $ shift (4) $ E+ id * id $ shift (5) $ E+ id * id $ reduce by E id (6) $ E + E * id $ shift (7) $ E + E id $ shift (8) $ E + E id $ reduce by E id (9) $ E + E E $ reduce by E E E (10) $ E + E $ reduce by E E + E (11) $ E $ accept F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 99 / 309

Viable prefixes, conflicts Viable prefix: Right sentential forms which can occur within the stack of a shift-reduce parser Conflicts: (Ambiguous grammars) stmt if expr then stmt if expr then stmt else stmt other Configuration: Stack Input... if expr then stmt else... No unambiguous handle (shift-reduce conflict) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 100 / 309

LR parser LR(k) parsing L... Left-to-right scanning R... Rightmost derivation in reverse Advantages: Can be used for (nearly) every programming language construct Most generic backtrack-free shift-reduce parsing method Class of LR-grammars is greater than those of LL-grammars LR-parsers identify errors as early as possible Disadvantage: LR-parser is hard to build manually F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 101 / 309

LR-parsing algorithm INPUT a... a... 1 i a n $ STACK s m Xm s m-1 Xm-1... LR Parsing Program OUTPUT s 0 action goto Stack stores s 0 X 1 s 1 X 2 s 2... X m s m (X i grammar, s i state) Parsing table = action- and goto-table s m current state, a i current input symbol action[s m, a i ] {shift, reduce, accept, error} goto[s m, a i ] transition function of a DFA F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 102 / 309

LR-parsing mode of operation Configuration (s 0 X 1 s 1... X m s m, a i a i+1... a n ) Next step (move) is determined by reading of a i Dependent on action[s m, a i ]: 1 action[s m, a i ] = shift s New configuration: (s 0 X 1 s 1... X m s m a i s, a i+1... a n ) 2 action[s m, a i ] = reduce A β New configuration: (s 0 X 1 s 1... X m r s m r As, a i a i+1... a n ) whereby s = goto[s m r, A], r length of β 3 action[s m, a i ] = accept 4 action[s m, a i ] = error F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 103 / 309

Example ) E E + T ) E T ) T T F ) T F ) F (E) ) F id State action goto id + * ( ) $ E T F 0 s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5 F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 104 / 309

Construction of SLR parsing tables LR(0)-items: Production with dot at one position of the right side Example: Production A XY Z has 4 items: A.XY Z, A X.Y Z, A XY.Z and A XY Z. Exception: Produktion A ɛ only has the item: A. Augmented grammar: Grammar with new start symbol S and production S S. Functions: closure and goto closure(i) (I... set of items) 1 All I are within closure 2 If A α.bβ is part of closure and B γ is a production, then add B.γ to closure F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 105 / 309

Construction, goto goto(i, X) with I as set of items and X a symbol of the grammar goto = closure of set of all items A αx.β for all A α.xβ in I Example: I = {E E., E E. + T } goto(i, +) = {E E +.T, T.T F, T.F, F.(E), F.id} Sets-of-items construction (Construction of all LR(0)-items) items(g ) I 0 = closure({s.s}) C = {I 0 } repeat for each set of items I C and each grammar symbol X such that goto(i, X) is not empty and not in C do Add goto(i, X) to C until no more sets of items can be added to C F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 106 / 309

SLR parsing table Input: Augmented grammar G Output: SLR parsing table 1 Calculate C = {I 0, I 1,..., I n }, the set of LR(0)-items of G 2 State i is created by I i as follows: 1 If A α.aβ is in I i and goto(i i, a) = I j, then action(i, a) = shift j (a is a terminal symbol) 2 If A α. is in I i, then action[i, a] = reduce A α for all a F OLLOW (A) A S 3 If S S. is in I i, then action[i, $] = accept 3 For all nonterminal symbols A: goto[i, A] = j if goto(i i, A) = I j 4 Every other table entry is set to error 5 Initial state is determined by the item set with S.S F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 107 / 309

SLR(1), conflicts, error handling If we recieve a table without multiple entries using the SLR-parsing-table-algorithm then the grammar is SLR(1) Otherwise the algorithm fails and an algorithm for extended languages (like LR) needs to be utilized generally results in increased processing requirements Shift/reduce-conflicts can be partially resolved The process usually involves the determination of operator binding strength and associativity Error handling can be directly incorporated into the parsing table F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 108 / 309

PART 4 - SYNTAX DIRECTED TRANSLATION F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 109 / 309

Setting Translation of context-free languages Information attributes of grammar symbols Values of attributes are defined by semantic rules 2 possibilities: Syntax directed definitions (high-level spec) Translation schemes (implementation details) Evaluation: (1) Parse input, (2) Generate parse tree, (3) Evaluate parse tree F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 110 / 309

Syntax directed definitions Generalization of context-fee grammars Each grammar symbol has a set of attributes Synthesized vs. inherited attributes Attribute: string, number, type, memory location,... Value of attribute is defined by semantic rules Synthesized: Value of child node in parse tree Inherited: Value of parent node in parse tree Semantic rules define dependencies between attributes Dependency graph defines calculation order of semantic rules Semantic rules can have side effects F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 111 / 309

Form of a syntax directed definition Grammar production: A α Associated semantic rule: b := f(c 1,..., c k ) f is a function Synthesized: b is a synthesized attribute of A and c 1,..., c k are grammar symbols of the production Inherited: b is an inherited attribute of a grammar symbol on the right side of the production and c 1,..., c k are grammar symbols of the production b depends on c 1,..., c k F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 112 / 309

Example Calculator -program: val is a synthesized attribute for nonterminals E, T and F Production L En E E 1 +T E T T T 1 *F T F F (E) F digit Semantic Rule print(e.val) E.val := E 1.val + T.val E.val := T.val T.val := T 1.val F.val T.val := F.val F.val := E.val F.val := digit.lexval F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 113 / 309

S-attributed grammar Attributed grammar exclusively using synthesized attributes Example-evaluation: 3*5+4n (annotated parse tree) L E.val=19 n E.val=15 + T.val=4 T.val=15 F.val=4 T.val=3 * F.val=5 digit.lexval=4 F.val=3 digit.lexval=5 digit.lexval=3 F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 114 / 309

Inherited attributes Definition of dependencies of program language constructs and their context Example: (type checking) Production D T L T int T real L L 1, id L id Semantic Rule L.in := T.type T.type := integer T.type := real L 1.in := L.in addtype(id.entry, L.in) addtype(id.entry, L.in) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 115 / 309

Inherited attributes Annotated parse tree real id 1, id 2, id 3 D T.type=real L.in=real real L.in=real, id 3 L.in=real, id 2 id 1 F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 116 / 309

Dependency graphs Show dependencies between attributes Each rule is represented in the form b := f(c 1,..., c k ) Nodes correspond to attributes; edges to dependencies Definition: for each node n in the parse tree do for each attribute a of the grammar symbol at n do construct a node in the dependency graph for a for each node n in the parse tree do for each semantic rule b := f(c 1,..., c k ) associated with the production used at n do for i := 1 to k do construct an edge from the node for c i to the node for b F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 117 / 309

Dependency graph Example D T type 4 5 in L 6 real in 7 L 8, id 3 3 entry in 9 L, 10 id 2 2 entry id 1 1 entry F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 118 / 309

Topological sort Arrangement of m 1,..., m k nodes in a directed, acyclic graph where edges point from smaller nodes to bigger nodes If m i m j is an edge, then the node m i is smaller than the node m j Important for order in which the attributes are calculated Example (cont.): 1 a 4 := real 2 a 5 := a 4 3 addtype(id 3.entry, a 5 ) 4 a 7 := a 5 5 addtype(id 2.entry, a 7 ) 6 a 9 := a 7 7 addtype(id 1.entry, a 9 ) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 119 / 309

Example - syntax trees Abstract syntax tree = simplified form of a parse tree Operators and keywords are supplied to intermediate nodes by leaf nodes Productions with only one element can collapse Examples: if-then-else + B S S 1 2 * 3 5 4 F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 120 / 309

Syntax trees Expressions Functions (return value: pointer to new node): mknode(op, left, right): node label op, 2 child nodes left, right mkleaf(id, entry): leaf id, entry in symbol table entry mkleaf(num, val): leaf num, value val Syntax directed definition: Production E E 1 + T E E 1 T E T T (E) T id T num Semantic Rule E.nptr := mknode( +, E 1.nptr, T.nptr) E.nptr := mknode(, E 1.nptr, T.nptr) E.nptr := T.nptr T.nptr := E.nptr T.nptr := mkleaf(id, id.entry) T.nptr := mkleaf(num, num.val) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 121 / 309

Syntax trees Expressions (ex.) Syntax tree for a-4+c E nptr E nptr + T nptr E - T nptr id + T nptr num id - id to entry for c id num 4 to entry for a F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 122 / 309

Evaluation of S-attributed definitions Attributed definition exclusively using synthesized attributes Evaluation using bottom-up parser (LR-parser) Idea: store attribute information on stack State Val Semantic rule:...... A.a := f(x.x, Y.y, Z.z) X X.x Production: A XY Z Y Y.y Before XY Z is reduced to A, value top Z Z.z of Z.z stored in val[top], Y.y stored...... in val[top 1], X.x in val[top 2] F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 123 / 309

Example - S-attributed evaluation Calculator -example: Production Code Fragment L En print(val[top 1]) E E 1 + T val[ntop] := val[top 2] + val[top] E T T T 1 F val[ntop] := val[top 2] val[top] T F F (E) val[ntop] := val[top 1] F digit Code executed before reduction ntop = top r + 1, after reduction: top := ntop F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 124 / 309

Result for 3*5+4n Input state val Production used 3*5+4n *5+4n 3 3 *5+4n F 3 F digit *5+4n T 3 T F 5+4n T * 3 +4n T * 5 3 5 +4n T * F 3 5 F digit +4n T 15 T T F +4n E 15 E T 4n E + 15 n E + 4 15 4 n E + F 15 4 F digit n E + T 15 4 T F n E 19 E E + T E n 19 L 19 L En F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 125 / 309

L-attributed definitions Definition: A syntax directed definition is L-attributed if each inherited attribute of X j, 1 j n, on the right side of A X 1,..., X n is only dependent on: 1 the attributes X 1,..., X j 1 to the left of X j and 2 the inherited attributes of A Each S-attributed grammar is a L-attributed grammar Evaluation using depth-first order procedure df visit(n : node) for each child m of n, from left to right do evaluate inherited attributes of m df visit(m) end evaluate synthesized attributes of n end F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 126 / 309

Translation schemes Translation scheme = context-free language with attributes for grammar symbols and semantic actions which are placed on the right side of a production between grammar symbols and are confined within {} Example: T T 1 F {T.val := T 1.val F.val} If only synthesized attributes are used, the action is always placed at the end of the right side of a production Note: Actions may not access attributes which are not calculated yet (limits positions of semantic actions) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 127 / 309

Translation schemes (cont.) If both inherited and synthesized attributes are used the following needs to be taken into consideration: 1 An inherited attribute of a symbol on the right side of a production has to be calculated in an action which is positioned to the left of the symbol 2 An action may not reference a synthesized attribute belonging to a symbol which is positioned to the right of the action 3 A synthesized attribute of a nonterminal on the left side can only be calculated if all referenced attributes have already been calculated actions like these are usually placed at the end of the right side F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 128 / 309