Principles of Programming Languages COMP251: Syntax and Grammars

Similar documents
Principles of Programming Languages COMP251: Syntax and Grammars

EECS 6083 Intro to Parsing Context Free Grammars

Syntax Analysis Check syntax and construct abstract syntax tree

CMPT 755 Compilers. Anoop Sarkar.

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

Habanero Extreme Scale Software Research Project

Dr. D.M. Akbar Hussain

Context-Free Grammars

Defining syntax using CFGs

Introduction to Parsing

COP 3402 Systems Software Syntax Analysis (Parser)

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

3. Context-free grammars & parsing

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

Syntax. In Text: Chapter 3

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

Part 5 Program Analysis Principles and Techniques

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

CS 314 Principles of Programming Languages

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

CPS 506 Comparative Programming Languages. Syntax Specification

Describing Syntax and Semantics

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

Chapter 3. Describing Syntax and Semantics

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

CSCI312 Principles of Programming Languages!

CMSC 330: Organization of Programming Languages

Chapter 3. Describing Syntax and Semantics ISBN

CSE 3302 Programming Languages Lecture 2: Syntax

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CMSC 330: Organization of Programming Languages. Context Free Grammars

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Parsing II Top-down parsing. Comp 412

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

CMSC 330: Organization of Programming Languages. Context Free Grammars

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

Chapter 4. Syntax - the form or structure of the expressions, statements, and program units

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Syntax. A. Bellaachia Page: 1

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

A programming language requires two major definitions A simple one pass compiler

Compiler Design Concepts. Syntax Analysis

CSCE 314 Programming Languages

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

More Assigned Reading and Exercises on Syntax (for Exam 2)

Chapter 3. Describing Syntax and Semantics

This book is licensed under a Creative Commons Attribution 3.0 License

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

Homework & Announcements

Programming Language Definition. Regular Expressions

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Part 3. Syntax analysis. Syntax analysis 96

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

3. Parsing. Oscar Nierstrasz

A Simple Syntax-Directed Translator

Context-free grammars (CFG s)

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Chapter 3. Describing Syntax and Semantics ISBN

CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

Syntax/semantics. Program <> program execution Compiler/interpreter Syntax Grammars Syntax diagrams Automata/State Machines Scanning/Parsing

Related Course Objec6ves

Introduction to Syntax Analysis. The Second Phase of Front-End

Syntax and Semantics

Introduction to Syntax Analysis

Syntax Intro and Overview. Syntax

announcements CSE 311: Foundations of Computing review: regular expressions review: languages---sets of strings

A language is a subset of the set of all strings over some alphabet. string: a sequence of symbols alphabet: a set of symbols

Chapter 3. Topics. Languages. Formal Definition of Languages. BNF and Context-Free Grammars. Grammar 2/4/2019

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Chapter 3. Describing Syntax and Semantics

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

CS 406/534 Compiler Construction Parsing Part I

Outline. Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Lexical and Syntax Analysis. Top-Down Parsing

CSE 12 Abstract Syntax Trees

22c:111 Programming Language Concepts. Fall Syntax III

Chapter 3. Syntax - the form or structure of the expressions, statements, and program units

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Lecture 10 Parsing 10.1

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

Lexical and Syntax Analysis

CIT Lecture 5 Context-Free Grammars and Parsing 4/2/2003 1

( ) i 0. Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5.

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Introduction to Parsing Ambiguity and Syntax Errors

Topic 3: Syntax Analysis I

Introduction to Parsing Ambiguity and Syntax Errors

ICOM 4036 Spring 2004

Transcription:

Principles of Programming Languages COMP251: Syntax and Grammars Prof. Dekai Wu Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong, China Fall 2006

Part II Grammar

Grammar: Motivation What do the following sentences really mean? I saw a small kid on the beach with a binocular. What is the final value of x? x = 15 if (x > 20) then if (x > 30) then x = 8 else x = 9 Ambiguity in semantics is often caused by ambiguous grammar of the language.

A Formal Description: Example 7 1. < real-number > ::= < integer-part >. < fraction > 2. < integer-part > ::= <empty> < digit-sequence > 3. < fraction > ::= < digit-sequence > 4. < digit-sequence > ::= < digit > < digit >< digit-sequence > 5. < digit > ::= 0 1 2 3 4 5 6 7 8 9 This is the context-free grammar of real numbers written in the Backus-Naur Form.

Context Free Grammar (CFG) A context-free grammar has 4 components: 1 A set of tokens or terminals: atomic symbols of the language. English : a, b, c,..., z Reals : 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,. 2 A set of nonterminals: variables denoting language constructs. English : < Noun >, < Verb >, < Adjective >,... Reals : < real-number >, < integer-part >, < fraction >, < digit-sequence >, < digit >

Context Free Grammar.. 3 A set of rules called productions: for generating expressions of the language. nonterminal ::= a string of terminals and nonterminals English : < Sentence > ::= < Noun > < Verb > < Noun > Reals : < integer-part > ::= <empty> < digit-sequence > Notice that CFGs allow only a single non-terminal on the left-hand side of any production rules. 4 A nonterminal chosen as the start symbol: represents the main construct of the language. English : < Sentence > Reals : < real-number > The set of strings that can be generated by a CFG makes up a context-free language.

Backus-Naur Form (BNF) One way to write context-free grammar. Terminals appear as they are. Nonterminals are enclosed by < and >. e.g.: < real-number >, < digit >. The special empty string is written as <empty>. Productions with a common nonterminal may be abbreviated using the special or symbol. e.g. X ::= W1, X ::= W2,..., X ::= Wn may be abbreviated as X ::= W1 W2 Wn

Top-Down Parsing: Example 8 A parser checks to see if a given expression or program can be derived from a given grammar. Check if.5 is a valid real number by finding from the CFG of Example 6 a leftmost derivation of.5 : < real-number > => < integer-part >. < fraction > [Production 1] => <empty>. < fraction > [Production 2] =>.< fraction > [By definition] =>.< digit-sequence > [Production 3] =>.< digit > [Production 4] =>.5 [Production 5]

Bottom-Up Parsing: Example 9 Check if.5 is a valid real number by finding from the CFG of Example 6 a rightmost derivation of.5 in reverse:.5 = <empty>.5 [By definition] => < integer-part >.5 [Production 2] => < integer-part >. < digit > [Production 5] => < integer-part >. < digit-sequence > [Production 4] => < integer-part >. < fraction > [Production 3] => < real-number > [Production 1]

Parse Tree: Example 10 [Real Numbers] A parse tree of.5 generated by the CFG of Example 6. <real-number> / \ <integer-part>. <fraction> <empty> <digit-sequence> <digit> 5

Parse Tree A parse tree shows how a string is generated by a CFG the concrete syntax in a tree representation. Root = start symbol. Leaf nodes = terminals or <empty>. Non-leaf nodes = nonterminals For any subtree, the root is the left-side nonterminal of some production, while its children, if read from left to right, make up the right side of the production. The leaf nodes, read from left to right, make up a string of the language defined by the CFG.

Example 11: CFG/BNF [Expression] < Expr > ::= < Expr >< Op >< Expr > < Expr > ::= (< Expr >) < Expr > ::= < Id > < Op > ::= + - * / = < Id > ::= a b c 1. Terminals: a, b, c, +, -, *, /, =, (, ) 2. Nonterminals: Expr, Op, Id 3. Start symbol: Expr

Parse Tree : Example 12 [Expression] A parse tree of a + b c generated by the CFG of Example 10: <Expr> / \ <Expr> <Op> <Expr> / \ <Expr> <Op> <Expr> - <Id> <Id> + <Id> c a b Question: What is the difference between a parse tree and an abstract syntax tree?

Ambiguous Grammar: Example 13 A grammar is (syntactically) ambiguous if some string in its language is generated by more than one parse tree. <Expr> / \ <Expr> <Op> <Expr> / \ <Id> + <Expr> <Op> <Expr> a <Id> - <Id> b c Solution: Rewrite the grammar to make it unambiguous.

Handle Left Associativity: Example 14 CFG of Example 10 cannot handle a + b c correctly. Add a left recursive production. < Expr > ::= < Expr >< Op >< Term > < Expr > ::= < Term > < Term > ::= (< Expr >) < Id > < Op > ::= + - * / = < Id > ::= a b c

Handle Left Associativity.. Now there is only one parse tree for a + b c : <Expr> / \ / \ <Expr> <Op> <Term> / \ <Expr> <Op> <Term> - <Id> <Term> + <Id> c <Id> b a

Handling Right Associativity: Example 15 CFG of Example 10 cannot handle a = b = c correctly. Add a right recursive production. < Assign > ::= < Expr > = < Assign > < Assign > ::= < Expr > < Expr > ::= < Expr >< Op >< Term > < Term > < Term > ::= (< Expr >) < Id > < Op > ::= + - * / < Id > ::= a b c Question: this grammar will accept strings like a + b = c - d. Try to correct it.

Handling Right Associativity.. Now there is only one parse tree for a = b = c : <Assign> / \ / \ <Expr> = <Assign> / \ <Term> <Expr> = <Assign> <Id> <Term> <Expr> a <Id> <Term> b <Id> c

Handling Precedence: Example 16 CFG of Example 10 cannot handle a + b c correctly. Add one nonterminal (plus appropriate productions) for each precedence level. < Assign > ::= < Expr > = < Assign > < Expr > < Expr > ::= < Expr > + < Term > < Expr > ::= < Expr > < Term > < Term > < Term > ::= < Term > < Factor > < Term > ::= < Term > / < Factor > < Factor > < Factor > ::= (< Expr >) < Id > < Id > ::= a b c

Handling Precedence.. Now there is only one parse tree for a + b c : <Assign> <Expr> / \ / \ <Expr> + <Term> / \ <Term> <Term> * <Factor> <Factor> <Factor> <Id> <Id> <Id> c a b

Tips on Handling Precedence/Associativity left associativity left-recursive production right associativity right-recursive production n levels of precedence divide the operators into n groups write productions for each group of operators start with operators with the lowest precedence In all cases, introduce new non-terminals whenever necessary. In general, one needs a new non-terminal for each new group of operators of different associativity and different precedence.

Dangling-Else: Example 17 Consider the following grammar: < S > ::= if < E > then < S > < S > ::= if < E > then < S > else < S > How many parse trees can you find for the statement: if E 1 then if E 2 then S 1 else S 2

Dangling-Else.. S E 1 if then S E 2 S 1 if then else S 2 S E 1 if then S else S 2 if E 2 then S 1

Dangling-Else... Ambiguity is often a property of a grammar, not of a language. Solution: matching an else with the nearest unmatched if. i.e. the first case.

More CFG Examples 1 < S > ::= < A >< B >< C > < A > ::= a< A > a < B > ::= b< B > b < C > ::= c< C > c 2 < S > ::= < A > a < B > b < A > ::= < A > b b < B > ::= a< B > a 3 <stmts> ::= <empty> <stmt> ; <stmts> <stmt> ::= <id> := <expr> if <expr> then <stmt> if <expr> then <stmt> else <stmt> while <expr> do <stmt> begin <stmts> end

Non-Context Free Grammars: Examples 1 < S > ::= < B >< A >< C > < C >< A >< B > b< A > ::= c< A >< B > < B > c< A > ::= b< A >< C > < C > < B > ::= b < C > ::= c L = { (cb) n, b(cb) n, (bc) n, c(bc) n }. 2 L = { wcw w is a string of a s or b s }. This language abstracts the problem of checking that an identifier is declared before its use in a program. The first w = declaration of the identifier, and the second w = its use in the program.

Summary on Grammar Context-free grammar (CFG) is commonly used to specify most of the syntax of a programming language. However, most programming languages are not CFL! CFG is commonly written in Backus-Naur Form (BNF). CFG = (Terminals, Nonterminals, Productions, Start Symbol) A program is valid if we may construct a parse tree, or a derivation from the grammar. Associativity and precedence of operations are part of the design of a CFG. Avoid ambiguous grammars by rewriting them or imposing parsing rules.