Topic 3: Syntax Analysis I

Similar documents
Topic 5: Syntax Analysis III

Context free grammars and predictive parsing

Compilation 2014 Warm-up project

Defining syntax using CFGs

Abstract Syntax. Mooly Sagiv. html://

LL parsing Nullable, FIRST, and FOLLOW

Types of parsing. CMSC 430 Lecture 4, Page 1

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Wednesday, August 31, Parsers

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a

A Simple Syntax-Directed Translator

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Abstract Syntax Trees & Top-Down Parsing

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Compiler Design Concepts. Syntax Analysis

CS 314 Principles of Programming Languages

Introduction to Lexical Analysis

Chapter 3. Parsing #1

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Top down vs. bottom up parsing

CSE 3302 Programming Languages Lecture 2: Syntax

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

ICOM 4036 Spring 2004

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

CS 132 Compiler Construction

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Principles of Programming Languages COMP251: Syntax and Grammars

Error Recovery. Computer Science 320 Prof. David Walker - 1 -

Building Compilers with Phoenix

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Syntax Analysis Part I

CPS 506 Comparative Programming Languages. Syntax Specification

A clarification on terminology: Recognizer: accepts or rejects strings in a language. Parser: recognizes and generates parse trees (imminent topic)


Course Overview. Introduction (Chapter 1) Compiler Frontend: Today. Compiler Backend:

Syntactic Analysis. Top-Down Parsing

Lexical Analysis. Lecture 3. January 10, 2018

CMSC 330: Organization of Programming Languages

Programming Languages and Compilers (CS 421)

Programming Languages & Compilers. Programming Languages and Compilers (CS 421) I. Major Phases of a Compiler. Programming Languages & Compilers

Lexical and Syntax Analysis. Top-Down Parsing

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Some Basic Definitions. Some Basic Definitions. Some Basic Definitions. Language Processing Systems. Syntax Analysis (Parsing) Prof.

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Syntax Analysis Check syntax and construct abstract syntax tree

Lexical and Syntax Analysis

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

CSE 401 Midterm Exam Sample Solution 2/11/15

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Part 3. Syntax analysis. Syntax analysis 96

CS 536 Midterm Exam Spring 2013

Software II: Principles of Programming Languages

Chapter 4. Lexical and Syntax Analysis

Monday, September 13, Parsers

Programming Languages & Compilers. Programming Languages and Compilers (CS 421) Programming Languages & Compilers. Major Phases of a Compiler

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

ASTs, Objective CAML, and Ocamlyacc

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Introduction to Lexical Analysis

A simple syntax-directed

CA Compiler Construction

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

A programming language requires two major definitions A simple one pass compiler

4. Lexical and Syntax Analysis

COP 3402 Systems Software Syntax Analysis (Parser)

Syntax. In Text: Chapter 3

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Homework & Announcements

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Defining syntax using CFGs

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Compilers. Bottom-up Parsing. (original slides by Sam

Last time. What are compilers? Phases of a compiler. Scanner. Parser. Semantic Routines. Optimizer. Code Generation. Sunday, August 29, 2010

4. Lexical and Syntax Analysis

Introduction to Lexing and Parsing

Context-free grammars

CS 11 Ocaml track: lecture 6

Syntax-Directed Translation. Lecture 14

Fall Compiler Principles Lecture 4: Parsing part 3. Roman Manevich Ben-Gurion University of the Negev

CSCI312 Principles of Programming Languages

Lexical Analysis. Finite Automata

Programming Language Syntax and Analysis

CS 406/534 Compiler Construction Parsing Part I

ECE251 Midterm practice questions, Fall 2010

Transcription:

Topic 3: Syntax Analysis I Compiler Design Prof. Hanjun Kim CoreLab (Compiler Research Lab) POSTECH 1

Back-End Front-End The Front End Source Program Lexical Analysis Syntax Analysis Semantic Analysis IR Code Generation Intermediate Representation IR Optimization Target Code Generation Target Code Optimization Target Program Lexical Analysis Break into tokens Think words, punctuation Syntax Analysis Parse phase structure Think document, paragraphs, sentences Semantic Analysis Calculate meaning 2

Parser in the Front-End Source Stream of Abstract Lexer Tokens Parser Syntax Tree FE IR Parser Functions: Verify that token stream is valid If it is not valid, report syntax error and recover Build Abstract Syntax Tree (AST) 3

Analogy to English Parsing Understanding sentence structure Check grammar Ex: This line is a longer sentence article noun verb article adjective noun subject complement sentence

Syntax Analysis (Parsing) A process that verifies that token stream is valid Check grammar in program language Ex: if a < b then c = 1 else c = 2 ID LT ID ID ASSIGN NUM ID ASSIGN NUM IF expression THEN statement ELSE statement IF-THEN-ELSE statement

Syntax Analysis (Parsing) Syntax analysis (Parsing) Every programming language has a set of rules that describe syntax of well-formed programs A process that determine if source program satisfies these rules Why do we need a parser in addition to a lexer? Some program construct may have recursive structures digits = [0-9]+ expr = {digits} ( {expr} + {expr} ) 28, (28+301), ((28+301) + 9) Finite automata cannot recognize recursive constructs 6

Limitation of Finite Automata Cannot recognize recursive constructs A machine with N states cannot remember a parenthesis-nesting depth greater than N Can FA check correctness for (( ))? ( ( ) ) Then, the FA check correctness for ((( )))? Can FA remember its nested states? ( ) ) 7

We need a more powerful formalism: Context-Free Grammar 8

Context-Free Grammar Regular Expressions describe lexical structure of tokens Regular Expressions Lexer Generator Lexer Context-Free Grammars describe syntactic nature of programs Context-Free Grammar Parser Generator Parser 9

Analogy Lexical Analysis Syntax Analysis Output Set of tokens Set of source programs Output of Each Rule Token Source Program Input ASCII character Token 10

Context-Free Grammars Context-Free Grammars consist of a set of productions symbol -> symbol symbol symbol Symbol types: Terminal : token types Non-terminal : a symbol that appears on the left-side of some production Left-Hand Side (LHS) : non-terminal Right-Hand Side (RHS) : terminals or non-terminals Start Symbol : A special non-terminal; A whole accepted program by grammar Each production specifies how terminals and non-terminals may be combined to form a substring in language Easy to specify recursion: stmt -> IF exp THEN stmt ELSE stmt 11

End-of-File Marker Parse must also recognize the End-of-File (EOF) EOF marker in the grammar is $ Introduce new start symbol and the production S -> S$ 12

Derivation Derivation (Execution of Parsing) 1. Begin with start symbol 2. While non-terminal exist, replace any non-terminal with RHS of production Multiple derivations exist for given sentence Left-most derivation replace left-most non-terminal in each step Right-most derivation replace right-most non-terminal in each step 13

Example Terminals SEMI ; ID NUM ASSIGN := LPAREN ( RPAREN ) PLUS + PRINT print COMMA, Non Terminals stmt: statement expr: expression expr_list: expression list Rules stmt -> stmt ; stmt stmt -> ID := expr stmt -> PRINT (expr_list) expr -> ID expr -> NUM expr -> expr + expr expr -> (stmt, expr) expr_list -> expr expr_list -> expr_list, expr 14

Example Terminals SEMI ; ID NUM ASSIGN := LPAREN ( RPAREN ) PLUS + PRINT print COMMA, Non Terminals stmt: statement expr: expression expr_list: expression list Rules stmt -> stmt SEMI stmt stmt -> ID ASSIGN expr stmt -> PRINT LPAREN expr_list RPAREN expr -> ID expr -> NUM expr -> expr PLUS expr expr -> LPAREN stmt COMMA expr RPAREN expr_list -> expr expr_list -> expr_list COMMA expr 15

Example: Left-most Derivation Input: a := 12; print(23) Results from Lexical Analysis ID ASSIGN NUM SEMI PRINT LPAREN NUM RPAREN Left-most Derivation 1. stmt 2. stmt SEMI stmt 3. ID ASSIGN expr SEMI stmt 4. ID ASSIGN NUM SEMI stmt 5. ID ASSIGN NUM SEMI PRINT LPAREN expr_list RPAREN 6. ID ASSIGN NUM SEMI PRINT LPAREN expr RPAREN 7. ID ASSIGN NUM SEMI PRINT LPAREN NUM RPAREN 16

Example: Right-most Derivation Input: a := 12; print(23) Results from Lexical Analysis ID ASSIGN NUM SEMI PRINT LPAREN NUM RPAREN Right-most Derivation 1. stmt 2. stmt SEMI stmt 3. stmt SEMI PRINT LPAREN expr_list RPAREN 4. stmt SEMI PRINT LPAREN expr RPAREN 5. stmt SEMI PRINT LPAREN NUM RPAREN 6. ID ASSIGN expr SEMI LPAREN NUM RPAREN 7. ID ASSIGN NUM SEMI LPAREN NUM RPAREN 17

Parsing Tree Graphical representation of derivation Each internal node is labeled with a non-terminal Each leaf node is labeled with a terminal Parsing tree of the example: ID ASSIGN NUM SEMI PRINT LPAREN NUM RPAREN stmt stmt SEMI stmt ID ASSIGN expr PRINT LPAREN expr_list RPAREN NUM expr NUM 18

Inefficiency in Parsing Tree Concrete parse tree Each internal node labeled with non-terminal Children labeled with symbols in RHS of production Concrete parse trees are inconvenient to use!!! Punctuation needed to specify structure when writing code, but Tree already describes program structure Make trees simple! Remove tokens containing no additional information 19

Inefficiency in Parsing Tree P -> (S) E -> ID E -> E - E S -> S ; S E -> NUM E -> E * E S -> ID := E E -> E + E E -> E / E ( a := 4 ; b := 5 ) P ( S ) S ; S ID( a ) := E ID( b ) := E NUM(4) NUM(5) Do we need (, ) or ;? 20

Abstract Syntax Tree Solution: generate abstract parse tree (abstract syntax tree, AST) AST similar to concrete parse tree, except redundant tokens left out CompoundStm AssignStm AssignStm ID( a ) NUM(4) ID( b ) NUM(5) 21

Abstract Syntax Tree Example P -> (S) E -> ID E -> E - E S -> S ; S E -> NUM E -> E * E S -> ID := E E -> E + E E -> E / E How can you describe abstract syntax tree structure? type id = string datatype binop = PLUS MINUS TIMES DIV datatype stm = CompoundStm of stm * stm AssignStm of id * exp datatype exp = IDExp of id NUMExp of int OpExp of exp * binop * exp 22

Ambiguous Grammars A grammar is ambiguous if it can derive a string of tokens with two or more different parsing trees Example expr -> NUM expr -> expr + expr expr -> expr * expr Consider: 4 + 5 * 6; is this 34 or 54? expr expr expr * expr expr + expr expr + expr NUM(6) NUM(4) expr * expr NUM(4) NUM(5) NUM(5) NUM(6) 23

Ambiguous Grammars Problem: Compiler uses parse tree to interpret meaning of parsed expressions Different Parse trees may have different meanings, resulting in different interpreted results For example, does 4+5*6 equal 34 or 54? Solution: rewrite grammar to eliminate ambiguity Operators have a relative precedence * binds tighter than + Operators wit the same precedence must be resolved by associativity Some operators have left associativity; others have right associativity 24

Ambiguous Grammars Non-Terminals expr: Expression term: Term (add) fact: Factor (mult) expr 4 + 5 * 6 expr + term Rules expr -> expr + term expr -> term term -> term * fact term -> fact fact -> NUM term fact NUM(4) term fact NUM(5) * fact NUM(6) 25

How to analyze the syntax of a program? 26

Back to analogy How do you recognize an English sentence? Prediction-based approach If you see a subject, you will expect a verb to be followed. If you see a verb at the beginning of a sentence, you can know the sentence is a question. Predictive parsing (LL parsing) Bottom-up based approach Read a sentence, and then figure out its structure. Bottom-up parsing (LR parsing, shift-reduce parsing) 27

Recursive Descent Parsing 1. LL(k) Parsing 28

Recursive Descent Parsing One recursive function for each non-terminal Each production becomes clause in function A.K.A. predictive parsing, top-down parsing, LL(1) LL(1) Left-to-right parse, Leftmost-derivation, 1 symbol lookahead 29

Example Grammar: Non-terminals: S, E, L Terminals: IF(if), THEN(then), ELSE(else), BEGIN(begin),END(end), SEMI(;), NUM, EQ(=) S -> if E then S else S L -> end E -> num = num S -> begin S L L -> ; S L datatype token = EOF IF THEN ELSE BEGIN END SEMI NUM EQ val tok = ref (gettoken()) fun advance() = tok := gettoken() fun eat(t) = if (!tok = t) then advance() else error() fun S() = case!tok of IF BEGIN fun L() = case!tok of END SEMI fun E() = => (eat(if); E(); eat(then); S(); eat(else); S()) => (eat(begin); S(); L()) => (eat(end)) => (eat(semi); S(); L()) (eat(num); eat(eq); eat(num)) 30

Formal Techniques Before making a parser, we need to compute 3 values Nullable For each γ corresponding to RHS of production, γ is nullable if γ can be derived to empty string (ε) First(γ) For each γ corresponding to RHS of production, first(γ) is a set of all terminal symbols that can begin any string derived from γ Ex: S -> if E then S else S First(S): if Follow(X) For each non-terminal X in grammar, follow(x) is a set of all terminal symbols that can immediately follow X in a derivation Ex: S -> if E then S else S Follow(E): then 31

Computation of Nullable γ is nullable if every symbol S γ is nullable Check if S can derive ε Example Z XYZ Y c X a Z d Y ε X bye Initial Iteration 1 Iteration 2 X No No No Y No Yes Yes Z No No No 32

Computation of First If T is a terminal symbol, then First(T) = {T} If X is a non-terminal and X Y 1 Y 2 Y 3 Y n then, first Y 1 first Y 2 first Y 3 first Y n First X first X if Y 1 is nullable first X if Y 1, Y 2 are nullable first X if Y 1, Y 2,, Y n 1 are nullable 33

Computation of Follow Let X, Y be non-terminals; γ, γ 1, γ 2 be strings of terminals and non-terminals If grammar includes production: X γy follow X follow Y If grammar includes production: X γ 1 Yγ 2 first(γ 2 ) follow Y follow X follow Y, if γ 2 is nullable Perform iterative technique in order to compute nullable, first and follow set for each non-terminal in grammar 34

Example Z XYZ Y c X a Z d Y ε X bye X Y Z Initial nullable first follow No No No Iteration 1 nullable first follow X No a,b Y Yes c Z No d Iteration 2 nullable first follow X No a,b Y Yes c Z No d,a,b Iteration 2 nullable first follow X No a,b c,d,a,b Y Yes c e,d,a,b Z No d,a,b 35

Example Z XYZ Y c X a Z d Y ε X bye nullable first follow X No a,b c,d,a,b Y Yes c e,d,a,b Z No d,a,b Build predictive parsing table from nullable, first, and follow sets a b c d e X X a X bye Y Y ε Y ε Y c Y ε Y ε Z Z XYZ Z XYZ Z d Enter S γ in row S, column T: for each T first γ If γ is nullable, enter S γ in row S, column T: for each T follow(s) Entry in row S, column T tells parser which clause to execute if current function is S and next token is T Blank entries are syntax errors 36

Another Example S S$ S IF E THEN A ELSE A T NUM S E E E + T A ID = NUM S IF E THEN A E T S S E T A nullable first follow 37

Another Example S S$ S IF E THEN A ELSE A T NUM S E E E + T A ID = NUM S IF E THEN A E T nullable first follow S No IF, NUM S No IF, NUM $ E No NUM $,THEN,+ T No NUM $,THEN,+ A No ID $,ELSE 38

Another Example S S$ S IF E THEN A ELSE A T NUM S E E E + T A ID = NUM S IF E THEN A E T IF THEN ELSE + NUM ID = $ S S S$ S S$ S E T A S IF E THEN A S IF E THEN A ELSE A S E E E + T E T T NUM A ID = NUM 39

Left-Recursion Problem E E + T E T First(E+T) = First(T) When in Function E(), if next token is NUM, parser will get stuck Grammar is left-recursive that cannot be LL(1) Solution: rewrite grammar so that it is right-recursive E TE E ϵ E +TE Rule: X Xγ X α X αx X ε X γx 40

Left-Factoring S IF E THEN A S IF E THEN A ELSE A Two productions begin with the same symbol first(if E THEN A) = first(if E THEN A ELSE A) Solution: Left-Factoring S IF E THEN A V V ε V ELSE A 41

Modified Example S S$ V ELSE A T NUM S E E TE A ID = NUM S IF E THEN A V E ε V ε E +TE S S V E E T A nullable first follow 42

Modified Example S S$ V ELSE A T NUM S E E TE A ID = NUM S IF E THEN A V E ε V ε E +TE nullable first follow S No IF, NUM S No IF, NUM $ V Yes ELSE $ E No NUM $,THEN E Yes + $,THEN T No NUM $,THEN,+ A No ID $,ELSE 43

Modified Example S S$ V ELSE A T NUM S E E TE A ID = NUM S IF E THEN A V E ε V ε E +TE IF THEN ELSE + NUM ID = $ S S S$ S S$ S S IF E THEN A V S E V V ELSE A V ε E E TE E E ε E +TE E ε T T NUM A A ID = NUM 44