Topic 5: Syntax Analysis III

Similar documents
Topic 3: Syntax Analysis I

Compilation 2013 Parser Generators, Conflict Management, and ML-Yacc

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Compilation 2014 Warm-up project

Abstract Syntax. Mooly Sagiv. html://

COMPILER (CSE 4120) (Lecture 6: Parsing 4 Bottom-up Parsing )

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

Introduction to Parsing Ambiguity and Syntax Errors

Introduction to Parsing Ambiguity and Syntax Errors

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

Introduction to Parsing. Lecture 5

Using an LALR(1) Parser Generator

Introduction to Parsing. Lecture 5

Conflicts in LR Parsing and More LR Parsing Types

A Simple Syntax-Directed Translator

Fall Compiler Principles Lecture 4: Parsing part 3. Roman Manevich Ben-Gurion University of the Negev

CS 11 Ocaml track: lecture 6

LR Parsing LALR Parser Generators

COP 3402 Systems Software Syntax Analysis (Parser)

Parser Tools: lex and yacc-style Parsing

TDDD55 - Compilers and Interpreters Lesson 3

PART ONE Fundamentals of Compilation

Fundamentals of Compilation

LR Parsing LALR Parser Generators

Fall Compiler Principles Lecture 5: Parsing part 4. Roman Manevich Ben-Gurion University

Context-Free Grammars

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Lecture 8: Deterministic Bottom-Up Parsing

CS 132 Compiler Construction

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Syntax Analysis. Tokens ---> Parse Tree. Main Problems. Grammars. Convert the list of tokens into a parse tree ( hierarchical analysis)

Defining syntax using CFGs

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Parsing. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren.

Lecture 7: Deterministic Bottom-Up Parsing

Compilers and Language Processing Tools

Syntax Analysis Check syntax and construct abstract syntax tree

Yacc: A Syntactic Analysers Generator

Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5. Derivations.

CSE 401 Midterm Exam Sample Solution 11/4/11

Bottom-Up Parsing. Lecture 11-12

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

Compilers. Bottom-up Parsing. (original slides by Sam

Parser Tools: lex and yacc-style Parsing

Compiler Design Concepts. Syntax Analysis

G53CMP: Lecture 4. Syntactic Analysis: Parser Generators. Henrik Nilsson. University of Nottingham, UK. G53CMP: Lecture 4 p.1/32

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1

Parser Generators. Aurochs and ANTLR and Yaccs, Oh My. Tuesday, December 1, 2009 Reading: Appel 3.3. Department of Computer Science Wellesley College

Last Time. What do we want? When do we want it? An AST. Now!

( ) i 0. Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5.

In One Slide. Outline. LR Parsing. Table Construction

A simple syntax-directed

QUESTIONS RELATED TO UNIT I, II And III

Introduction to Parsing. Lecture 5. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

A lexical analyzer generator for Standard ML. Version 1.6.0, October 1994

COMPILERS AND INTERPRETERS Lesson 4 TDDD16

Properties of Regular Expressions and Finite Automata

Yacc Yet Another Compiler Compiler

Syntax-Directed Translation. Lecture 14

Syntax and Parsing COMS W4115. Prof. Stephen A. Edwards Fall 2003 Columbia University Department of Computer Science

Bottom-Up Parsing. Lecture 11-12

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

Wednesday, September 9, 15. Parsers


Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Syntax-Directed Translation

Introduction to Parsing. Lecture 8

Lexical and Syntax Analysis

Review main idea syntax-directed evaluation and translation. Recall syntax-directed interpretation in recursive descent parsers

A clarification on terminology: Recognizer: accepts or rejects strings in a language. Parser: recognizes and generates parse trees (imminent topic)

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

CPS 506 Comparative Programming Languages. Syntax Specification

Defining syntax using CFGs

LECTURE 11. Semantic Analysis and Yacc

CPSC 411: Introduction to Compiler Construction

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

COMP 181. Prelude. Prelude. Summary of parsing. A Hierarchy of Grammar Classes. More power? Syntax-directed translation. Analysis

CMSC 330: Organization of Programming Languages. Context Free Grammars

Intro To Parsing. Step By Step

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Error Detection in LALR Parsers. LALR is More Powerful. { b + c = a; } Eof. Expr Expr + id Expr id we can first match an id:

Syntax Analysis Part IV

Compilers Course Lecture 4: Context Free Grammars

Introduction to Lexing and Parsing

Lab 2. Lexing and Parsing with Flex and Bison - 2 labs

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Downloaded from Page 1. LR Parsing

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

As we have seen, token attribute values are supplied via yylval, as in. More on Yacc s value stack

Lex & Yacc (GNU distribution - flex & bison) Jeonghwan Park

CMSC 330: Organization of Programming Languages

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Programming Languages & Translators PARSING. Baishakhi Ray. Fall These slides are motivated from Prof. Alex Aiken: Compilers (Stanford)

Compiler Construction

A Bison Manual. You build a text file of the production (format in the next section); traditionally this file ends in.y, although bison doesn t care.

1. Explain the input buffer scheme for scanning the source program. How the use of sentinels can improve its performance? Describe in detail.

Syn S t yn a t x a Ana x lysi y s si 1

LECTURE 3. Compiler Phases

Transcription:

Topic 5: Syntax Analysis III Compiler Design Prof. Hanjun Kim CoreLab (Compiler Research Lab) POSTECH 1

Back-End Front-End The Front End Source Program Lexical Analysis Syntax Analysis Semantic Analysis IR Code Generation Intermediate Representation IR Optimization Target Code Generation Target Code Optimization Target Program Lexical Analysis Break into tokens Think words, punctuation Syntax Analysis Parse phase structure Think document, paragraphs, sentences Semantic Analysis Calculate meaning 2

Parser in the Front-End Source Stream of Abstract Lexer Tokens Parser Syntax Tree FE IR Parser Functions: Verify that token stream is valid If it is not valid, report syntax error and recover Build Abstract Syntax Tree (AST) 3

Parsing Power Unambiguous Grammar LL(k) LR(k) Ambiguous Grammar LL(1) LR(1) LALR(1) SLR LL(0) LR(0) 4

Real-world Parser Generators 5

Real-world Parser Generators Context-Free Grammar Parser Generator Stream of Tokens Parser Parser generators yacc, bison: LALR parser generators for C ml-yacc: a LALR parser generator for ML Parser Generator Specification Input: a set of context-free grammars specifying a parser Outputs A parser in target language A description of state machine Rules: consists of a pattern and an action: Pattern is context free grammar Action is a fragment of ordinary target code Examples: exp: exp PLUS exp (exp1 + exp2) Abstract Syntax Tree 6

Lexer Generator Example: Bison %{ #include <math.h> %} %token NUM %left '-' '+' %left '*' %left NEG /* negation--unary minus */ %% line: '\n' exp '\n' { printf ("\t%.10g\n", $1); } ; exp: NUM { $$ = $1; } exp '+' exp { $$ = $1 + $3; } exp '-' exp { $$ = $1 - $3; } exp '*' exp { $$ = $1 * $3; } '-' exp %prec NEG { $$ = -$2; } ; %% main () { yyparse (); } Declarations Rules User Codes 7

Lexer Generator Example: ML-YACC structure A = struct type id = S.symbol datatype binop = PLUS MINUS TIMES DIV datatype stm = CompoundStm of stm * stm AssignStm of id * exp datatype exp = IDExp of id NUMExp of int OpExp of exp * binop * exp end %% %term INT of int ID of string PLUS MINUS %nonterm exp of A.exp stm of A.stm prog of A.stm %% prog: LPAREN stm RPAREN (stm) User Declaration YACC Definition stm: stm SEMICOLON stm stm: ID ASSIGN exp exp: INT exp: ID exp: exp PLUS exp exp: exp MINUS exp (A.CompoundStm(stm1, stm2)) (A.AssignStm(S.symbol(ID), exp)) (A.IntExp(INT)) (A.IDExp(S.symbol(ID))) (A.OpExp(exp1, A.PLUS, exp2)) (A.OpExp(exp1, A.MINUS, exp2)) Rules 8

Lexer Generator Example: ML-YACC User Declaration Define various values that are available to rules section YACC Definition Declare terminal and non-terminal symbols, and their attribute %term IF THEN ELSE NUM of int %nonterm prog stmt exp Declare precedences for terminals that help resolve shift-reduce conflicts Specify the type of the current input file position (%pos int) Optionally specify end-of-parse symbol (%eop EOF) Optionally specify start symbol otherwise, LHS non-terminal of first rule is taken as start symbol %start prog Rules Specify productions of grammar and semantic actions associated with productions symbol 0 symbol 1 symbol 2 symbol n (semantic action) 9

Lexer Generator Example: ML-YACC Positions In order to report semantic error, need to annotate each AST node with source file position of characters X < n >: returns attribute of nth occurrence of X X < n > left: returns left-end position of token corresponding to X X < n > right: returns right-end position of token corresponding to X Example: stm: stm SEMICOLON stm (A.PosStm(stm1left, A.CompoundStm(stm1, stm2))) 10

AST Example structure A = struct type id = S.symbol datatype binop = PLUS MINUS TIMES DIV datatype stm = CompoundStm of stm * stm AssignStm of id * exp PosStm of int * stm datatype exp = IDExp of id NUMExp of int OpExp of exp * binop * exp PosExp of int * exp end %% %term INT of int ID of string PLUS MINUS %nonterm exp of A.exp stm of A.stm prog of A.stm %% prog: LPARAEN stm RPAREN (stm) stm: stm SEMICOLON stm stm: ID ASSIGN exp exp: INT exp: ID (A.PosStm(stm1left, A.CompoundStm(stm1, stm2))) (A.PosStm(IDleft, A.AssignStm(S.symbol(ID),exp))) (A.PosExp(INTleft, A.IntExp(INT))) (A.PosExp(IDleft, A.IDExp(S.symbol(ID)))) 11

AST Example Input Program: (a := 5 ; b := a + 1) Abstract syntax: PosStm[ int = 1, stm = CompoundStm[ stm = PosStm[ int = 2, stm = AssignStm[ ID = PosExp[int = 2, exp = IDExp(S.symbol( a ))], exp = PosExp[int = 7, exp = NUMExp(5)] ] ], stm = PosStm[ int = 11, stm = AssignStm[ ID = PosExp[int = 11, exp = IDExp(S.symbol( b ))], exp = PosExp[ int = 16, exp = OpExp[ exp = PosExp[int = 16, exp = IDExp(S.symbol( b ))], binop = PLUS, exp = PosExp[int = 20, exp = NUMExp(1)] ] ] ] ] ] ] 12

YACC & Ambiguous Grammars A grammar is ambiguous if it can derive a string of tokens with two or more different parse tree Consider 4+5*6 * + + NUM(6) NUM(4) * NUM(4) NUM(5) NUM(5) NUM(6) We prefer to bind * tighter than + 13

YACC & Ambiguous Grammars Similarly, consider: 4+5+6 We prefer to bind left + first + + + NUM(6) NUM(4) + NUM(4) NUM(5) NUM(5) NUM(6) 14

YACC & Ambiguous Grammars YACC will report shift-reduce conflicts 4+5*6 When + is on top of stack, parser gets * as the current token Parser can reduce by rule + or shift Prefer shift 4+5+6 When + is on top of stack, parser gets + as the current token Parser can reduce by rule + or shift Prefer reduce 15

Directives Three Solutions Let YACC complain, but check if the choice (shift) is correct Rewrite grammar to eliminate ambiguity Keep grammar, but add precedence directives which enable conflicts to be resolved Use %left, %right, %nonassoc For this grammar %left PLUS MINUS %left MULT DIV PLUS, MINUS are left associative, bind equally tightly MULT, DIV are left associative, bind equally tightly MULT, DIV bind tighter than PLUS, MINUS 16

Directives Given directives, YACC assigns precedence to each terminal and rule Precedence of terminal based on order in which associativity specified Precedence of rule is the precedence of right-most terminal Ex: precedence( + )=precedence(plus) Given shift-reduce conflict, YACC performs the following: Find precedence of rule to be reduced, terminal to be shifted prec(terminal) > prec(rule) : shift prec(rule) > prec(terminal) : reduce prec(terminal) = prec(rule) assoc(terminal) = left : reduce assoc(terminal) = right: shift assoc(terminal) = nonassoc: report error 17

Precedence Example Input : 4 + 5 * 6 Stack : 4 + 5 Action: prec(*) > prec(+) -> shift Input : 4 * 5 + 6 Stack : 4 * 5 Action: prec(*) > prec(+) -> reduce Input : 4 + 5 + 6 Stack : 4 + 5 Action: assoc(+) = left -> reduce 18

Default Behavior What if directives not specified? shift-reduce: report error, shift by default reduce-reduce: report error, reduce by rule that occurs first What to do: shift-reduce: acceptable in well defined cases (dangling else) reduce-reduce: unacceptable, Rewrite grammar 19

%prec directive Commonly used for the unary minus problem %left PLUS MINUS %left MULT DIV Consider -4*6 We prefer to bind left unary minus (-) tighter, but precedence of MINUS is lower than MULT -(4*6) not (-4)*6 Solution: %term NUM PLUS MINUS MULT DIV UMINUS %left PLUS MINUS %left MULT DIV %left UMINUS : MINUS %prec UMINUS () PLUS () 20

A parser can support semantic action. Why does a compiler separate semantic action from parsing? 21

Precedence Parsing with semantic action E -> E + E E -> E E E -> E * E E -> NUM E -> -NUM %% %term INT of int PLUS MINUS TIMES UMINUS EOF %nonterm exp of int %start exp %eop EOF Left Associativity %left PLUS MINUS %left TIMES %left UMINUS %% exp: INT exp: exp PLUS exp exp: exp MINUS exp exp: exp TIMES exp exp: MINUS exp %prec UMINUS (INT) (exp1 + exp2) (exp1 exp2) (exp1 * exp2) (~exp) 22

Parsing with semantic action E -> E + E E -> E E E -> E * E E -> NUM E -> -NUM Input Program: 1 + 2 * 3 Stack Input Action 1 + 2 * 3 $ shift NUM(1) + 2 * 3 $ reduce E(1) + 2 * 3 $ shift E(1) PLUS 2 * 3 $ shift E(1) PLUS NUM(2) * 3 $ reduce E(1) PLUS E(2) * 3 $ shift E(1) PLUS E(2) TIMES 3 $ shift E(1) PLUS E(2) TIMES NUM(3) $ reduce E(1) PLUS E(2) TIMES E(3) $ reduce E(1) PLUS E(6) $ reduce E(7) $ accept 23

Parsing with semantic action Parser with semantic action Disadvantages File becomes too large; difficult to manage Program must be processed in order in which it is parsed; Impossible to do global/inter-procedural optimization Alternative: Separate parsing from remaining compiler phases 24

Context-Free Grammars are more powerful than Regular Expressions 25

Context-Free Grammar & REs CFGs are More powerful than REs Any language that can be generated using regular essions can be generated by a context-free grammar There are languages that can be generated by a contextfree grammar that cannot be generated by any regular ession Example: Matching parentheses Nested comments 26

Proof Given a RE R, we can generate a CFG G such that L(R) == L(G) We can define a grammar G for which there is no FA F such that L(F) == L(G) 27

Proof 1 Base Cases: Symbol(a): RE a Epsilon(ε): RE ε Inductive Cases: Alternation (M N): RE M RE N Concatenation (M N): RE M N Repetition (M*): RE M RE RE ε 28

Proof 2 S S S ε FAs have a FINITE number of states, N FA must remember the number of (, to generate ) s At or before N+1 ( s, FA will revisit a state that represents two different counts of ) s Both count must now be accepted One count will be invalid Representations Regular, finite-state grammars: FAs Context-free grammars: Push-Down Automata 29

Application of a Lexer and Parser 30

Applications Compiler & Interpreter Pattern matching Searching an exact word (ex. compiler ) Find and replace with a rule (ex. [a-z][a-z0-9]*) Rendering Rendering a web page of HTML + Content Rendering an image Printing a document Natural language processing Translation Understanding Korean particles Data Analysis Analyze xml files Big data analysis 31