Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Similar documents
Syntax. In Text: Chapter 3

CPS 506 Comparative Programming Languages. Syntax Specification

Chapter 4. Syntax - the form or structure of the expressions, statements, and program units

A simple syntax-directed

More Assigned Reading and Exercises on Syntax (for Exam 2)

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

A Simple Syntax-Directed Translator

Principles of Programming Languages COMP251: Syntax and Grammars

3. Context-free grammars & parsing

COP 3402 Systems Software Syntax Analysis (Parser)

This book is licensed under a Creative Commons Attribution 3.0 License

Chapter 3. Describing Syntax and Semantics ISBN

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

Compiler Design Concepts. Syntax Analysis

Principles of Programming Languages COMP251: Syntax and Grammars

Syntax/semantics. Program <> program execution Compiler/interpreter Syntax Grammars Syntax diagrams Automata/State Machines Scanning/Parsing

1. The output of lexical analyser is a) A set of RE b) Syntax Tree c) Set of Tokens d) String Character

Describing Syntax and Semantics

Habanero Extreme Scale Software Research Project

EECS 6083 Intro to Parsing Context Free Grammars

Chapter 3. Describing Syntax and Semantics

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

Syntax Analysis Check syntax and construct abstract syntax tree

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Part 5 Program Analysis Principles and Techniques

Chapter 3. Describing Syntax and Semantics

CSCI312 Principles of Programming Languages!

ECE251 Midterm practice questions, Fall 2010

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

CS 314 Principles of Programming Languages

afewadminnotes CSC324 Formal Language Theory Dealing with Ambiguity: Precedence Example Office Hours: (in BA 4237) Monday 3 4pm Wednesdays 1 2pm

A programming language requires two major definitions A simple one pass compiler

Syntax. A. Bellaachia Page: 1

2.2 Syntax Definition

Chapter 3. Syntax - the form or structure of the expressions, statements, and program units

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

CSE302: Compiler Design

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

Lecture 4: Syntax Specification

QUESTION BANK CHAPTER 1 : OVERVIEW OF SYSTEM SOFTWARE. CHAPTER 2: Overview of Language Processors. CHAPTER 3: Assemblers

Chapter 3. Describing Syntax and Semantics ISBN

Chapter 3. Describing Syntax and Semantics

B The SLLGEN Parsing System

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Programming Language Syntax and Analysis

CSE 3302 Programming Languages Lecture 2: Syntax

Parsing II Top-down parsing. Comp 412

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

Introduction to Parsing

Lexical and Syntax Analysis. Top-Down Parsing

SOFTWARE ARCHITECTURE 5. COMPILER

Dr. D.M. Akbar Hussain

Related Course Objec6ves

Defining syntax using CFGs

Topic 3: Syntax Analysis I

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

Eng. Maha Talaat Page 1

CMPT 755 Compilers. Anoop Sarkar.

UNIT I Programming Language Syntax and semantics. Kainjan Sanghavi

Lexical and Syntax Analysis

Chapter 3. Topics. Languages. Formal Definition of Languages. BNF and Context-Free Grammars. Grammar 2/4/2019

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Lecture 10 Parsing 10.1

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

Exercise 1: Balanced Parentheses

More on Syntax. Agenda for the Day. Administrative Stuff. More on Syntax In-Class Exercise Using parse trees

1. Consider the following program in a PCAT-like language.

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Languages and Compilers

St. MARTIN S ENGINEERING COLLEGE Dhulapally, Secunderabad

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Part 3. Syntax analysis. Syntax analysis 96

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER COMPILERS ANSWERS. Time allowed TWO hours

ICOM 4036 Spring 2004

Building Compilers with Phoenix

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Syntax and Semantics

Syntax Intro and Overview. Syntax

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

Syntax Analysis Part I

CMSC 330: Organization of Programming Languages

The SPL Programming Language Reference Manual

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Compilers - Chapter 2: An introduction to syntax analysis (and a complete toy compiler)

Question Bank. 10CS63:Compiler Design

Syntax and Grammars 1 / 21

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

11. a b c d e. 12. a b c d e. 13. a b c d e. 14. a b c d e. 15. a b c d e

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Defining Languages GMU

CMSC 330: Organization of Programming Languages

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Project 2 Interpreter for Snail. 2 The Snail Programming Language

Transcription:

Chapter 3: Describing Syntax and Semantics Introduction Formal methods of describing syntax (BNF) We can analyze syntax of a computer program on two levels: 1. Lexical level 2. Syntactic level Lexical analyzer collect characters into tokens. Syntactic analyzer determine syntax structure and determine whether the given programs are syntactically correct. 1

Derivation A derivation is a repeated application of rules, starting with the start symbol and ending with a sentence (all terminal symbols) <program> => <stmts> => <stmt> => <var> = <expr> => a =<expr> => a = <term> + <term> => a = <var> + <term> => a = b + <term> => a = b + const <program> <stmts> <stmts> <stmt> <stmt> ; <stmts> <stmt> <var> = <expr> <var> a b c d <expr> <term> + <term> <term> - <term> <term> <var> const 2

Parse Tree: A hierarchical representation of a derivation For example x + y + z is parsed: <exp> <exp> <binary> <exp> <exp> <identifier> <literal> <unary> <exp> <exp> <binary> <exp> <binary> '<' '>' '+' '-' <unary> '-' <identifier> x y z 3

An Ambiguous Expression Grammar A grammar that can have more than one parse tree generating a particular string is ambiguous. <E> <E> + <E> <E> - <E> <E> * <E> <E> / <E> id There are two parse trees for the expression id + id * id E E E * E E + E E + E id id E * E id id id id 4

Ambiguity is a problem: compiler chooses the code to be generated for a statement by examining its parse tree. If more than one parse tree, the meaning of the structure cannot be unique. 5

Another example: <expr> <expr><op><expr> const <op> / - Two parse trees for expression const const / const <expr> <expr> <expr> <op> <expr> <expr> <op> <expr> <expr> <op> <expr> <expr> <op> <expr> const - const / const const - const / const 6

Exercise: <Exp> <Num> <Exp> + <Exp> <Exp> * <Exp> <Num> 2 3 5 Two parse trees for string 2 + 3 * 5 7

Exp Exp * Exp Exp + Exp Num Num Num 5 2 3 In order to avoid ambiguity, it is essential that the grammar generate only one possible structure for each string in the language. The ambiguity can be eliminated by imposing the precedence of one operator over the other. 8

Imposing the precedence of one operator over the other. We say that op1 has precedence over op2 if an expression of the form e 1 op1 e 2 op2 e 3 is interpreted only as (e 1 op1 e 2 ) op2 e 3 In other words, op1 binds tighter than op2. From the point of view of derivation trees, the fact that e 1 op1 e 2 op2 e 3 is interpreted as (e 1 op1 e 2 ) op2 e 3 means that the introduction of op1 must be done at a level strictly lower than op2. In order to modify the grammar so that it generates only this kind of tree, a possible solution is to introduce a new syntactic category producing expressions of the form e 1 op e 2, and to force an order to op1 and op2. 9

<Exp> Num <Exp> + <Exp> <Exp> * <Exp> Num 2 3 5 We can eliminate the ambiguities from the grammar by introducing a new syntactic category Term producing expressions of the form <Exp> * <Exp> <Exp> <Exp> + <Exp> <Term> <Term> <Term> * <Term> Num Num 2 3 5 This modification corresponds to assigning * a higher priority than +. 10

In the new grammar there is only one tree which can generate it: Exp Exp + Exp Term Term <Exp> <Exp> + <Exp> <Term> <Term> <Term> * <Term> Num Num 2 3 5 Num Term * Term 2 Num Num 3 5 11

<expr> Associativity Previous grammar is unambiguous regarding the precedence of * and +, but still has ambiguities of another kind. It allows two different derivation trees for the string 2 + 3 + 5, one corresponding to the structure (2 + 3) + 5 and one corresponding to the structure 2 + (3 + 5). <expr> <expr> <op> <expr> const <op> + Two parse trees for expression const + const + const <expr> <op> <expr> <expr> <op> <expr> <expr> This kind of ambiguity does not cause problems <expr> <op> <expr> <expr> <op> <expr> const + const + const const + const + const 12

However, an operator might be not associative. For instance the case for the - and ^ exponentiation operators: (5-3) - 2 and 5 - (3-2) have different values, as well as (5 ^ 3) ^ 2 and 5 ^ (3 ^ 2). In order to eliminate this kind of ambiguity, we must establish whether the operator is left-associative or right-associative. Left-associative: e 1 op e 2 op e 3 is interpreted as (e 1 op e 2 ) op e (op associates to the left). Right-associative: e 1 op (e 2 op e 3 ) (op associates to the right). We can impose left-associativity (resp. right-associativity) by using a left-recursive (resp. right-recursive) production for op. <Exp> <Exp> + <Exp> <Term> <Term> <Term> * <Term> <Num> is changed again to <Exp> <Exp> + <Term> <Term> <Term> <Term> * Num Num This grammar is now unambiguous. 13

Another Example <Exp> Num <Exp> - <Exp> This grammar is ambiguous since it allows both the interpretations (5-3) - 2 and 5 - (3-2). If we want to impose the left-associativity: <Exp> Num <Exp> - Num Consider the following grammar: <Exp> Num <Exp> ^ <Exp> This grammar is ambiguous since it allows both the interpretations (5 ^ 3) ^ 2 and 5 ^ (3 ^ 2). If we want to impose the right-associativity: <Exp> Num Num ^ <Exp> 14

Generally, we can eliminate ambiguity by revising the grammar. Grammar: <E> <E> + <E> <E> * <E> ( E ) id Two parse trees for expression id + id * id 15

It is possible to write a grammar for arithmetic expressions that 1. is unambiguous. 2. enforces the precedence of * and / over + and -. 3. enforces left associativity. Removal of Ambiguity: Grammar: <E> <E> + <E> <E> * <E> ( E ) id 1. Enforce higher precedence for * <E> <E> + <E> <T> <T> <T> * <T> id (E) 2. Eliminate right-recursion for <E> <E> + <E> and <T> <T> * <T>. <E> <E> + <T> <T> <T> <T> * id <T> * (E) id ( E ) 16

Example: position := initial + rate * 60 Lexical Analysis Group these character as follows tokens: Token position identifier := assignment symbol initial identifier + plus sign rate identifier * multiplication sign 60 literal Next, we want to determine that this is a structurally correct statement. This is the main province of syntax analysis or parsing. The result of parsing is a parse tree 17

<Assignment statement> <identifier><assignment symbol> <expression> <expression> <expression> <plus sign> <expression> <expression> <minus sign> <expression> <expression> <multiplication sign> <expression> <expression> <identifier> <literal> Assignmen Statement expression identifier Assignment symbol expression identifier Plus sign expression identifier expression Multiplication sign expression literal position := initial + rate * 60 18

Exercise 1. A grammar of binary numbers Write a grammar of the language whose elements are all and only those unsigned binary numbers that contain at least three consecutive digits 1. The language includes, for example, 111, 00001111111010 and 1111110, but not 0011000101011 or 1010101010. Answer 1. A grammar of binary numbers <string> -> <term> <mix> <term> <term> <mix> <mix> <term> <mix> <mix> -> <bit> <mix> <bit> <bit> -> 0 1 <term> -> 1 1 1 or <string> -> <term> <bit> <string> <string> <bit> <bit> -> 0 1 <term> -> 1 1 1 19

Exercise 2 Parse trees Consider the following grammar with three terminal symbols: a b g. <S> -> a <A> <B> g <A> -> <C> g b <A> a <A> <B> -> <C> b <C> <B> <C> -> a The start symbol is <S>. Consider the string: a b a g Show the leftmost derivation of a b a g, Next, draw a parse tree for the string a b a g. 20

Answer 2. Parse trees <S> => a <A> => a b <A> => a b <C> g => a b a g <S> / \ a <A> / \ b <A> / \ <C> g / a 21

Exercise: (1) Describe, in English, the language defined by the following grammar: <S> <A><B><C> <A> a<a> a <B> b<b> b <C> c<c> c (2) Write a grammar for the language consisting of strings that have n copies of the letter a followed by the same number of copies of the letter b, where n>0. For example, the strings ab, aaaabbbb are in the language but a, abb are not. (3) Convert the following BNF to EBNF <assign> <id> = <expr> <id> A B C <expr> <expr> + <expr> <expr> * <expr> (<expr>) <id> 22

Answer: (1) One or more a's followed by one or more b's followed by one or more c's. (2) <S> a <S> b a b (3) <assign> <id> = <expr> <id> A B C <expr> <expr> (+ *) <expr> (<expr>) <id> 23