Defining syntax using CFGs

Similar documents
Defining syntax using CFGs

Principles of Programming Languages COMP251: Syntax and Grammars

Dr. D.M. Akbar Hussain

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Introduction to Parsing

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

CS 314 Principles of Programming Languages

COP 3402 Systems Software Syntax Analysis (Parser)

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

EECS 6083 Intro to Parsing Context Free Grammars

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Syntax Analysis Check syntax and construct abstract syntax tree

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

CMPT 755 Compilers. Anoop Sarkar.

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

CIT Lecture 5 Context-Free Grammars and Parsing 4/2/2003 1

Context-Free Languages and Parse Trees

Topic 3: Syntax Analysis I

Habanero Extreme Scale Software Research Project

Lecture 4: Syntax Specification

Properties of Regular Expressions and Finite Automata

Parsing II Top-down parsing. Comp 412

CMSC 330: Organization of Programming Languages

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

A programming language requires two major definitions A simple one pass compiler

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

3. Context-free grammars & parsing

CMSC 330: Organization of Programming Languages. Context Free Grammars

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

CS 406/534 Compiler Construction Parsing Part I

CSE302: Compiler Design

CSE 3302 Programming Languages Lecture 2: Syntax

CPS 506 Comparative Programming Languages. Syntax Specification

EDA180: Compiler Construc6on Context- free grammars. Görel Hedin Revised:

CMSC 330: Organization of Programming Languages. Context Free Grammars

Building Compilers with Phoenix

Introduction to Lexing and Parsing

Introduction to Parsing. Comp 412

Part 5 Program Analysis Principles and Techniques

Syntax. In Text: Chapter 3

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Lexical and Syntax Analysis. Top-Down Parsing

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications


Introduction to Parsing. Lecture 8

Eng. Maha Talaat Page 1

Optimizing Finite Automata

Syntax Analysis Part I

3. Parsing. Oscar Nierstrasz

Context-Free Grammars

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

CMSC 330: Organization of Programming Languages. Context-Free Grammars Ambiguity

Lexical and Syntax Analysis

CMSC 330: Organization of Programming Languages. Context Free Grammars

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

Compilers Course Lecture 4: Context Free Grammars

Principles of Programming Languages COMP251: Syntax and Grammars

Chapter 3. Describing Syntax and Semantics

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Context-free grammars (CFG s)

Describing Syntax and Semantics

Syntax Analysis, III Comp 412

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

CS 536 Midterm Exam Spring 2013

Context-free grammars (CFGs)

Parsing #1. Leonidas Fegaras. CSE 5317/4305 L3: Parsing #1 1

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

A Simple Syntax-Directed Translator

Administrativia. PA2 assigned today. WA1 assigned today. Building a Parser II. CS164 3:30-5:00 TT 10 Evans. First midterm. Grammars.

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

A simple syntax-directed

Introduction to Parsing. Lecture 5

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Context-Free Grammar (CFG)

2.2 Syntax Definition

Chapter 3. Parsing #1

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Building a Parser II. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Context-Free Grammars

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Recursive Descent Parsers

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

Compiler Design Concepts. Syntax Analysis

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

Transcription:

Defining syntax using CFGs

Roadmap Last time Defined context-free grammar This time CFGs for specifying a language s syntax Language membership List grammars Resolving ambiguity

CFG Review G = (N,Σ,P,S) " means derives in 1 or more steps CFG generates a string by applying productions until no non-terminals remain Example: Nested parens N = { Q } Σ = { (, ) } P = Q ( Q ) ε S = Q

Formal Definition of a CFG s Language Let G = (N,Σ,P,S) be a CFG. Then L(G) = w S " w where S is the start nonterminal of G, and w is a sequence that consists of (only) terminal symbols or ε

A CFG Defines a Language CFG productions define the syntax of a language 1. Prog begin end 2. semicolon Stmt 3. Stmt 4. Stmt id assign 5. id 6. plus id We call this notation BNF (for Backus-Naur Form ) or extended BNF HTTP grammar using BNF: http://www.w3.org/protocols/rfc2616/rfc2616-sec2.html

List Grammars Useful to repeat a structure arbitrarily often semicolon Stmt Stmt ; Stmt List skews left ; Stmt ; Stmt ; Stmt

List Grammars Useful to repeat a structure arbitrarily often Stmt semicolon Stmt Stmt ; Stmt ; List skews right Stmt ; Stmt ;

List Grammars What if we allowed both skews? semicolon Stmt ; ; ; Stmt ;

Derivation Order Leftmost Derivation: always expand the leftmost nonterminal Rightmost Derivation: always expand the rightmost nonterminal Prog 1. Prog begin end 2. semicolon Stmt begin end 3. Stmt 4. Stmt id assign semicolon Stmt 5. id 6. plus id Leftmost expands this nonterminal Rightmost expands this nonterminal

Ambiguity Even with a fixed derivation order, it is possible to derive the same string in multiple ways For Grammar G and string w G is ambiguous if >1 leftmost derivation of w >1 rightmost derivation of w > 1 parse tree for w

Exercise Give a grammar G and a word w that has more than 1 left-most derivation in G

Example: Ambiguous Grammars minus times lparen rparen Derive the string 4-7 * 3 (assume tokenization) Parse Tree 1 Parse Tree 2 minus times times minus 4 3 7 3 4 7

Why is Ambiguity Bad?

Why is Ambiguity Bad? Eventually, we ll be using CFGs as the basis for our parser Parsing is much easier when there is no ambiguity in the grammar The parse tree may mismatch user understanding! 4-7 * 3 Operator precedence minus times times minus 4 3 7 3 4 7

Resolving Grammar Ambiguity: Precedence minus times lparen rparen Intuitive problem Nonterminals are the same for both operators To fix precedence 1 nonterminal per precedence level Parse lowest level first

Resolving Grammar Ambiguity: Precedence minus times lparen rparen lowest precedence level first 1 nonterminal per precedence level Derive the string 4-7 * 3 minus minus times Factor times Factor Factor Factor Factor lparen rparen 4 7 3

Resolving Grammar Ambiguity: Precedence Fixed Grammar expr minus expr times Factor Factor lparen rparen Derive the string 4-7 * 3 Let s try to re-build the wrong parse tree times minus Factor times 4 Factor Factor 7 3 We ll never be able to derive minus without parens Factor 3

Did we fix all ambiguity? Fixed Grammar minus times Factor Factor lparen rparen Derive the string 4-7 - 3 minus Factor minus Factor Factor NO! These subtrees could have been swapped!

Where we are so far Precedence We want correct behavior on 4 7 * 9 A new nonterminal for each precedence level Associativity We want correct behavior on 4 7 9 Minus should be left associative: a b c = (a b) c Problem: the recursion in a rule like minus

Definition: Recursion in Grammars A grammar is recursive in (nonterminal) X if X " αxγ for non-empty strings of symbols α and γ A grammar is left-recursive in X if X " Xγ for non-empty string of symbols γ A grammar is right-recursive in X if X " αx for non-empty string of symbols α

4 Resolving Grammar Ambiguity: Associativity Recognize left-assoc operators with left-recursive productions Recognize right-assoc operators with right-recursive productions minus Factor times Factor Factor lparen rparen Example: 4 7 9 E E - E - T T F F 7 T F 9

Resolving Grammar Ambiguity: Associativity minus times Factor Factor Factor lparen rparen E Example: 4 7 9 Let s try to re-build the wrong parse tree again E - T T F 4 We ll never be able to derive minus without parens

Example Language of Boolean expressions bexp TRUE bexp FALSE bexp bexp OR bexp bexp bexp AND bexp bexp NOT bexp bexp LPAREN bexp RPAREN Add nonterminals so that OR has lowest precedence, then AND, then NOT. Then change the grammar to reflect the fact that both AND and OR are left associative. Draw a parse tree for the expression: true AND NOT true.

Another ambiguous example Stmt if Cond then Stmt if Cond then Stmt else Stmt Consider this word in this grammar: if a then if b then s else s2 How would you derive it?

Summary To understand how a parser works, we start by understanding context-free grammars, which are used to define the language recognized by the parser. terminal symbol (non)terminal symbol grammar rule (or production) derivation (leftmost derivation, rightmost derivation) parse (or derivation) tree the language defined by a grammar ambiguous grammar