Chapter 3. Topics. Languages. Formal Definition of Languages. BNF and Context-Free Grammars. Grammar 2/4/2019

Similar documents
3. DESCRIBING SYNTAX AND SEMANTICS

Programming Languages

Chapter 3. Describing Syntax and Semantics ISBN

Chapter 3. Describing Syntax and Semantics

Chapter 3. Describing Syntax and Semantics

Chapter 3. Syntax - the form or structure of the expressions, statements, and program units

Syntax and Semantics

Describing Syntax and Semantics

Chapter 3: Syntax and Semantics. Syntax and Semantics. Syntax Definitions. Matt Evett Dept. Computer Science Eastern Michigan University 1999

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

Chapter 3 (part 3) Describing Syntax and Semantics

Chapter 3. Describing Syntax and Semantics. Introduction. Why and How. Syntax Overview

Chapter 3. Describing Syntax and Semantics ISBN

Syntax. In Text: Chapter 3

Chapter 3. Describing Syntax and Semantics ISBN

Chapter 3. Describing Syntax and Semantics

P L. rogramming anguages. Fall COS 301 Programming Languages. Syntax & Semantics. UMaine School of Computing and Information Science

Syntax & Semantics. COS 301 Programming Languages

10/30/18. Dynamic Semantics DYNAMIC SEMANTICS. Operational Semantics. An Example. Operational Semantics Definition Process

Semantics. There is no single widely acceptable notation or formalism for describing semantics Operational Semantics

Chapter 4. Syntax - the form or structure of the expressions, statements, and program units

10/18/18. Outline. Semantic Analysis. Two types of semantic rules. Syntax vs. Semantics. Static Semantics. Static Semantics.

10/26/17. Attribute Evaluation Order. Attribute Grammar for CE LL(1) CFG. Attribute Grammar for Constant Expressions based on LL(1) CFG

Programming Language Syntax and Analysis

Defining Languages GMU

Syntax. A. Bellaachia Page: 1

Habanero Extreme Scale Software Research Project

Program Assignment 2 Due date: 10/20 12:30pm

EECS 6083 Intro to Parsing Context Free Grammars

COSC252: Programming Languages: Semantic Specification. Jeremy Bolton, PhD Adjunct Professor

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Chapter 3. Semantics. Topics. Introduction. Introduction. Introduction. Introduction

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Programming Languages Third Edition

Principles of Programming Languages COMP251: Syntax and Grammars

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

CPS 506 Comparative Programming Languages. Syntax Specification

Programming Language Definition. Regular Expressions

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

ICOM 4036 Spring 2004

Part 5 Program Analysis Principles and Techniques

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages

Dr. D.M. Akbar Hussain

Principles of Programming Languages COMP251: Syntax and Grammars

2.2 Syntax Definition

CMSC 330: Organization of Programming Languages. Context Free Grammars

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

A language is a subset of the set of all strings over some alphabet. string: a sequence of symbols alphabet: a set of symbols

Syntax Analysis Check syntax and construct abstract syntax tree

More Assigned Reading and Exercises on Syntax (for Exam 2)

This book is licensed under a Creative Commons Attribution 3.0 License

Unit-1. Evaluation of programming languages:

CSE 12 Abstract Syntax Trees

Introduction. Introduction. Introduction. Lexical Analysis. Lexical Analysis 4/2/2019. Chapter 4. Lexical and Syntax Analysis.

A Simple Syntax-Directed Translator

CMSC 330: Organization of Programming Languages. Context Free Grammars

Introduction to Syntax Analysis

COP 3402 Systems Software Syntax Analysis (Parser)

Crafting a Compiler with C (II) Compiler V. S. Interpreter

CSCE 314 Programming Languages

Introduction to Syntax Analysis. The Second Phase of Front-End

Homework & Announcements

Chapter 4. Lexical and Syntax Analysis

Programming Languages, Summary CSC419; Odelia Schwartz

Context-Free Grammars and Languages (2015/11)

CSCI312 Principles of Programming Languages!

Chapter 3. Semantics. Topics. Introduction. Introduction. Introduction. Introduction

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

Formal Languages. Formal Languages

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

A programming language requires two major definitions A simple one pass compiler

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Languages and Compilers

Lecture 8: Context Free Grammars

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Compiler Design Concepts. Syntax Analysis

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

1 Lexical Considerations

Context-Free Grammars

Syntax/semantics. Program <> program execution Compiler/interpreter Syntax Grammars Syntax diagrams Automata/State Machines Scanning/Parsing

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Parsing II Top-down parsing. Comp 412

15 212: Principles of Programming. Some Notes on Grammars and Parsing

Compiler Construction

CSE 3302 Programming Languages Lecture 2: Syntax

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Theory and Compiling COMP360

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Introduction to Parsing

Lexical Considerations

4. Lexical and Syntax Analysis

DEMO A Language for Practice Implementation Comp 506, Spring 2018

Transcription:

Chapter 3. Topics The terms of Syntax, Syntax Description Method: Context-Free Grammar (Backus-Naur Form) Derivation Parse trees Ambiguity Operator precedence and associativity Extended Backus-Naur Form Attribute grammars used to describe both the syntax and static semantics. Formal methods of describing semantics Operational semantics Axiomatic semantics Denotational semantics Languages A language, whether natural (English) or artificial (such as Java), is a set of statement. A language has rules (grammar) to describe a sentence and each sentence have meaning (semantics). Syntax: the form or structure of the expressions, statements, and program units : the meaning of the expressions, statements, and program units Syntax and semantics provide a language s definition Users of a language definition Other language designers Implementers Programmers (the users of the language) 1 2 The General Problem of Describing Syntax (Terminology) Formal Definition of Languages A statement (sentence) is a string of characters over some alphabet A language is a set of statements A lexeme is the lowest level syntactic unit of a language. A token is a category of lexemes. Ex) index = 2 * count + 17; Lexemes Tokens index identifier = equal_sign 2 int_literal * mult_op count identifier + plus_op 17 int_literal ; semicolon In general, languages can be formally defined in two distinct ways: by recognition and by generation. Recognizers (R) A recognition device reads input strings over the alphabet of the language L and decides whether the input strings belong to the language Example: syntax analysis part of a compiler - Detailed discussion of syntax analysis appears in Chapter 4 Generators A device that generates sentences of a language One can determine if the syntax of a particular sentence is syntactically correct by comparing it to the structure of the generator 3 4 BNF and Developed by Noam Chomsky in the mid-1950s It is useful for describing the syntax of programming languages. The syntax of whole programming languages, with minor exceptions, can be described by context-free grammars. Backus-Naur Form (1959) Invented by John Backus to describe the syntax of Algol 58 BNF nearly identical to context-free grammars Grammar A grammar G is defined as a quadruple G = (V, T, S, P) V: a finite set variables (non-terminal), T: finite set of terminals, S: start variable (S V), P: finite set of production rules. The production rules are of the form X Y Where X (V T) + and Y (V T) * 5 6 1

Grammar A grammar G = (V, T, S, P) is said to be context-free if all productions in P have the form X Y, (where X V and Y (V T)* A grammar G = (V, T, S, P) is said to be context-sensitive if all productions in P have the form X Y, (where X, Y (V T) + and X Y ) Grammar Let G = (V, T, S, P) be a grammar. Then a language generated by grammar G is denoted by L (G) = {w w T *, S * w } Ex) The grammar G = {S, {a,b}, S, P) with production rules P: S asa bsb λ From the grammar G, we can get a language L, L(G) = {ww R w {a, b}*} 7 8 Grammar Derivation of string abbbba from the grammar G = {S, {a,b}, S, P) with production rules P: S asa bsb λ S asa absba abbsbba abbbba we can write S * abbbba Sentential forms The strings which contain variable as well as terminals. Ex) asa, absba, abbsbba are sentential forms Grammar Consider a language L = {a n b n : n 0}. We can easily get a grammar G for L. G = {{S},{a, b}, S, P} with production rules P: S asb λ 9 10 Grammar Let L = {w w {a, b} *, n a (w) = n b (w) where n i (w): number of i s in w}. A grammar for L is G = ({S}, {a, b}, S, P) with P given by S SS asb bsa λ To Prove L = L(G). We need to show 1. Every sting w generated by G has n a (w) = n b (w) 2. Every string w L could be generated by G. G = (V, T, S, P) is context free if production rules forms X Y where X V and Y (V T) * G = (V, T, S, P) is context sensitive if production rules forms X Y where X, Y (V T) + and X Y 11 12 2

Ex) L ={a n b n n0} Sol) The grammar G = {S, {a, b}, S, P) with production rules P: S asb Ex) L = {ww R w {a, b}*} is context-free language? Sol) The grammar G = {S, {a, b}, S, P) with production rules P P: S asa bsb Ex) L ={a n b m n m} is context-free language? Sol) The grammar G = ({S, A, B, C}, {a, b}, S, P) with production rules P S AC CB A aa a B bb b C acb 13 14 (Leftmost and Rightmost derivation) (Leftmost and Rightmost derivation) Definition 5.2) Leftmost derivation in each step of derivation, the leftmost variable in the sentential form is replaced. Rightmost derivation in each step of derivation, the rightmost variable in the sentential form is replaced. Ex) G = ({A,B,S}, {a, b}, S, P) with production rule S AB A aaa B Bb Lets derive a string aab. Leftmost derivation: S AB aaab aab aabb aab Rightmost derivation: S AB ABb Ab aaab aab 15 16 (Derivation Tree) (Derivation Tree) Definition ) For context-free grammar G = (V, T, S, P). An ordered tree is a derivation tree for G if and only if it has the following properties. The root is labeled S. Every leaf has a label from T { } Every interior vertex has a label from V If a vertex has label A V, and its children are labeled x 1,x 2, x n then there must be a production rule; A x 1 x 2 x n A leaf labeled has no siblings, since applied production rule A A derivation tree with every leaf is not labeled with T {} is called partial derivation tree. The strings obtained by reading the leaves of derivation tree from left to right is said to be the yield of tree. Ex) G = ({A,B,S}, {a, b}, S, P) with production rule S AB A aaa B Bb S AB aaab aab aabb aab 17 18 3

S AB aaab aab aabb aab S A B a a B B b (Derivation Tree) Theorem ) Let G = (V, T, S, P) be a context-free grammar. Then for every w L(G), there exist a derivation tree of G whose yield is w. Also, if t G is any partial derivation tree for G whose root is labeled S, then the yield of t G is a sentential form of G. 19 20 (Parsing and Ambiguity) Parsing A process finding a sequence of productions by which we can decide any w is in L(G) or not. Top-down parsing Start from starting symbol to the string with scanning left to right through the string corresponds to a left-most derivation. Bottom-up parsing Start from the sting to starting symbol with scanning left to right through the string corresponds to a right-most derivation in reverse order. (Parsing and Ambiguity) Ex) E E + T E T T T T * F T/F F F (E) a Parse for a + (a * a) Top-down parsing with leftmost derivation E E + T T + T F + T a + T a + F a + (E) a + (T) a + (T * F) a + (F * F) a + (a * F) a + (a * a) Bottom-up parsing with reverse of rightmost derivation a + (a * a) F + (a * a) T + (a * a) E + (a * a) E + (F * a) E + (T * a) E + (T * F) E + (T) E + (T) E + (E) E + F E + T E 21 22 (Parsing and Ambiguity) (Parsing and Ambiguity) E E + T Top-down parsing Bottom-up parsing E T F T F E F ( E ) T a T E T * F T T F a F F F a a + ( a * a ) 23 24 4

(Parsing and Ambiguity) Definition) A grammar G is said ambiguous if there exists w L(G) which has more than one distinct derivation (parse) tree. Ex) Consider the grammar G = (V, T, E, P) with V={E, I} T = {a, b, c, +, *, (, )} And production rules E I E + E E * E (E) I a b c (Parsing and Ambiguity) Definition ) Lets L is a context-free language, If every context-free grammar G L = L(G) is ambiguous, then we call L is inherently ambiguous. 25 26 Syntactic Ambiguity Syntactic Ambiguity Ambiguity in Grammars and Languages Ex) Show following grammar is ambiguous. A grammar G is said ambiguous if there exists w L(G) which has more than one distinct parse tree (multiple meanings). E E + E E E E * E E / E (E) D D 0 1 2 3 4 5 6 7 8 9 27 28 Syntactic Ambiguity BNF Fundamentals For string 2 + 4 * 3 E E + E D E * E 2 D D 4 3 E E * E + E D D 2 4 E D 3 In BNF, abstractions (nonterminal variables) are used to represent classes of syntactic structures-- they act like syntactic variables (also called nonterminal symbols, or just terminals) Terminals are lexemes or tokens A production rule has a left-hand side (LHS), which is a nonterminal, and a right-hand side (RHS), which is a string of terminals and/or nonterminals (variables) X -> Y X a nonterminal variable Y a string of terminals and/or nonterminals 29 30 5

BNF Fundamentals Backus-Naur form (BNF) it is also context-free grammar In BNF, nonterminals (variables) are enclosed in triangular brackets (<>). Terminal symbols are written without any special marking. Empty string is written as <empty>. ->(or ::=) is interpreted as can be derived to and is interpreted as or at production rule. Ex) BNF production rules for real number <real-num> -> <fraction>. <fraction> <fraction>: -> <digit> <digit> <fraction> <digit> -> 0 1 2 3 4 5 6 7 8 9 BNF Fundamentals Nonterminal symbols can have two or more distinct definitions, representing two or more possible syntactic forms in the language Ex) <if_stmt> if ( <logic_expr> ) <stmt> if ( <logic_expr> ) <stmt> else <stmt> Syntactic lists are described using recursion <ident_list> ident ident, <ident_list> A derivation is a repeated application of rules, starting with the start symbol and ending with a sentence (all terminal symbols) 31 32 Derivations Every string of symbols in a derivation is a sentential form A sentence is a sentential form that has only terminal symbols A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the one that is expanded A derivation may be neither leftmost nor rightmost An Example Grammar 1 <program> <stmts> <stmts> <stmt> <stmt> ; <stmts> <stmt> <var> = <expr> <var> a b c d <expr> <term> + <term> <term> - <term> <term> <var> const Derivation for a sentense a = b + const from the grammar <program> => <stmts> => <stmt> => <var> = <expr> => a = <expr> => a = <term> + <term> => a = <var> + <term> => a = b + <term> => a = b + const 33 34 An Example Grammar 2 <program> begin <stmt_list> end <stmt_list> <stmt> <stmt> ; <stmt_list> <stmt> <var> = <expression> <var> A B C <expression> <var> + <var> <var> <var> <var> The derivation of sentence begin A = B + C ; B = C end from the grammar 2 <program> => begin <stmt_list> end => begin <stmt> ; <stmt_list> end => begin <var> = <expression> ; <stmt_list> end => begin A = <expression> ; <stmt_list> end => begin A = <var> + <var> ; <stmt_list> end => begin A = B + <var> ; <stmt_list> end => begin A = B + C ; <stmt_list> end => begin A = B + C ; <stmt> end => begin A = B + C ; <var> = <expression> end => begin A = B + C ; B = <expression> end => begin A = B + C ; B = <var> end => begin A = B + C ; B = C end Ex. Grammar for Assignment Statement <assign> = <expr> A B C <expr> + <expr> * <expr> ( <expr> ) The left most derivation for sentence A = B * ( A + C ) <assign> => = <expr> => A = <expr> => A = * <expr> => A = B * <expr> => A = B * ( <expr> ) => A = B * ( + <expr> ) => A = B * ( A + <expr> ) => A = B * ( A + ) => A = B * ( A + C ) 35 36 6

Parse Tree Grammars naturally describe the hierarchical syntactic structure (Parse Tree) of the sentences of the languages they define. Ex) A= B * (A + C) Ambiguous Grammar A grammar is ambiguous if and only if it generates a sentential form that has two or more distinct parse trees Ex) A = B * (A + C) with previous production rules Simple Assignment Grammar <assign> = <expr> A B C <expr> + <expr> * <expr> ( <expr> ) Parse tree for A= B * (A + C) 37 38 Ambiguous Grammar Operator Precedence Syntactic ambiguity of language structures is a problem because compilers often base the semantics of those structures on their syntactic form. If a language structure has more than one parse tree, then the meaning of the structure cannot be determined uniquely. Some parsing algorithms can be based on ambiguous grammars. When such a parser encounters an ambiguous construct, it uses non-grammatical information provided by the designer to construct the correct parse tree. In many cases, an ambiguous grammar can be rewritten to be unambiguous but still generate the desired language. When an expression includes more than one different operators, one semantic issue is the order of evaluation of operators. (ex. X + Y * Z) By assigning different precedence levels to operators,this semantic question can be answered. Higher precedence is assigned to *, / than +, - Higher precedence operators evaluate before lower precedence operators. Between same precedence operators, evaluate left to right. A grammar can describe a certain syntactic structure so that part of the meaning of the structure can be determined from its parse tree (evaluate from the bottom to up in parse tree). 39 40 Operator Precedence Operator Precedence Some grammar is not ambiguous but might not build proper parse tree for proper evaluation based on the precedence. Unambiguous Grammar for Expressions without considering precedence. <assign> = <expr> A B C <expr> + <expr> * <expr> ( <expr> ) Ex) check following two expressions A + B * C A * B + C Unambiguous grammar without considering precedence <assign> = <expr> A B C <expr> + <expr> * <expr> ( <expr> ) A = A + B * C <assign> => = <expr> => A = <expr> => A = + <expr> => A = A + <expr> => A = A + * <expr> => A = A + B * <expr> => A = A + B * => A = A + B * C <assign> = <expr> A + <expr> A B * <expr> B * C evaluate first C 41 42 7

Operator Precedence Operator Precedence Unambiguous grammar without considering precedence <assign> = <expr> A B C <expr> + <expr> * <expr> ( <expr> ) A = A * B + C <assign> => = <expr> => A = <expr> => A = * <expr> => A = A * <expr> => A = A * + <expr> => A = A * B + <expr> => A = A * B + => A = A * B + C B + C evaluate first <assign> = <expr> A * <expr> A B + <expr> C Unambiguous Grammar for Expressions with considering precedence <assign> = <expr> A B C <expr> <expr> + <term> <term> <term> <term> * <factor> <factor> <factor> ( <expr> ) Ex) check following two expressions A = B + C * A A = B * C + A 43 44 Operator Precedence Operator Precedence Unambiguous grammar without considering precedence <assign> = <expr> A B C <expr> <expr> + <term> <term> <term> <term> * <factor> <factor> <assign> <factor> ( <expr> ) Unambiguous grammar without considering precedence <assign> = <expr> A B C <expr> <expr> + <term> <term> <term> <term> * <factor> <factor> <assign> <factor> ( <expr> ) A = B + C * A <assign> => = <expr> => A = <expr> = <expr> A <expr> + <term> A = B * C + A <assign> => = <expr> => A = <expr> = <expr> A <expr> + <term> => A = <expr> + <term> => A = <term> + <term> <term> <term> * <factor> => A = <expr> + <term> => A = <term> + <term> <term> <factor> => A = <factor> + <term> => A = + <term> => A = B + <term> => A = B + <term> * <factor> => A = B + <factor> * <factor> => A = B + * <factor> => A = B + C * <factor> <factor> B <factor> C A => A = <term> * <factor> + <term> => A = <factor> * <factor> + <term> => A = * <factor> + <term> => A = B * <factor> + <term> => A = B * + <term> => A = B * C + <term> => A = B * C + <factor> <term> <factor> * <factor> C A => A = B + C * => A = B + C * A => A = B * C + => A = B + C * A B 45 46 Associativity of Operators Associativity of Operators The rule of associativity When an expression includes two operators that have the same precedence, which should be evaluate first. Ex) A + B + C A * B * C When a grammar rule has its LHS also appearing at the beginning of its RHS, the rule is said to be left recursive. This left recursion specifies left associativity. From the previous grammar <assign> = <expr> A B C <expr> <expr> + <term> <term> <term> <term> * <factor> <factor> <factor> ( <expr> ) Unfortunately, left recursion disallows the use of some important syntax analysis algorithms. When such algorithms are to be used, the grammar must be modified to remove the left recursion. Fortunately, left associativity can be enforced by the compiler, even though the grammar does not dictate it. The exponentiation operator is right associative. To indicate right associativity, right recursion can be used. <factor> <exp> ** <factor> <exp> <exp> ( <expr> ) 47 48 8

An Unambiguous Grammar for if-then-else <stmt> <if_stmt> <if_stmt> if <logic_expr> then <stmt> if <logic_expr> then <stmt> else <stmt> An Unambiguous Grammar for if-then-else <stmt> <if_stmt> <if_stmt> if <logic_expr> then <stmt> if <logic_expr> then <stmt> else <stmt> Ex) if <logic_expr> then if <logic_expr> then <stmt> else <stmt> <stmt> => <if_stmt> => if <logic_expr> then <stmt> else <stmt> => if <logic_expr> then <if_stmt> else <stmt> => if <logic_expr> then if <logic_expr> then <stmt> else <stmt> Ex) if <logic_expr> then if <logic_expr> then <stmt> else <stmt> <stmt> => <if_stmt> => if <logic_expr> then <stmt> => if <logic_expr> then <if_stmt> => if <logic_expr> then if <logic_expr> then <stmt> else <stmt> 49 50 An Unambiguous Grammar for if-then-else The rule for if constructs in many languages is that an else clause is matched with the nearest previous unmatched then. Therefore, there cannot be an if statement without an else between a then and its matching else. So, statements must be distinguished between those that are matched and those that are unmatched, where unmatched statements are else-less ifs and all other statements are matched. An Unambiguous Grammar for if-then-else <stmt> <matched> <unmatched> <matched> if <logic_expr> then <matched> else <matched> any non-if statement <unmatched> if <logic_expr> then <stmt> if <logic_expr> then <matched> else <unmatched> Ex) if <logic_expr> then if <logic_expr> then <stmt> else <stmt> <stmt> => <unmatched> => if <logic_expr> then <stmt> => if <logic_expr> then <matched> => if <logic_expr> then if <logic_expr> then <matched> else <matched> => if <logic_expr> then if <logic_expr> then <stmt> else <stmt> 51 52 An Unambiguous Grammar for if-then-else if <logic_expr> then if <logic_expr> then <stmt> else <stmt> <stmt> <unmatched> if <logic_expr> then <stmt> <matched> if <logic_expr> then <matched> else <matched> <stmt> <stmt> Extended BNF (EBNF) The extensions do not enhance the descriptive power of BNF; they only increase its readability and writability. Three extensions are commonly included in the various versions of EBNF. 1. The first of these denotes an optional part of an RHS, which is delimited by brackets Ex) BNF <if_stmt> if (<expression>) <statement> if (<expression>) <statement> else <statement> Can be expressed with EBSF <if_stmt> if (<expression>) <statement> [else <statement>] 53 54 9

Extended BNF (EBNF) The extensions do not enhance the descriptive power of BNF; they only increase its readability and writability. Three extensions are commonly included in the various versions of EBNF. 2. The Second common extension deals with multiplechoice options. Ex) BNF <term> <term> * <factor> <term> / <factor> <term> % <factor> Can be expressed with EBNF <term> <term> (* / %) <factor> Extended BNF (EBNF) The extensions do not enhance the descriptive power of BNF; they only increase its readability and writability. Three extensions are commonly included in the various versions of EBNF. 3. The third extension is the use of braces in an RHS to indicate that the enclosed part can be repeated indefinitely or left out altogether. Ex) lists of identifiers separated by commas can be <ident_list> <identifier> {, <identifier>} The brackets, braces, and parentheses in the EBNF extensions are metasymbols, which means they are notational tools and not terminal symbols in thesyntactic entities they help describe. 55 56 Extended BNF (EBNF) Extended BNF (EBNF) BNF: EBNF: <expr> <expr> + <term> <expr> - <term> <term> <term> <term> * <factor> <term> / <factor> <factor> <factor> <exp> ** <factor> <exp> <exp> (<expr>) id <expr> <term> {(+ -) <term>} <term> <factor> {(* /) <factor>} <factor> <exp> { ** <exp>} <exp> (<expr>) id The BNF rule <expr> <expr> + <term> specifies the + operator to be left associative. However the EBNF version <expr> <term> {+ <term>} does not imply the direction of associativity. This problem is overcome in a syntax analyzer based on an EBNF grammar for expressions by designing the syntax analysis process to enforce the correct associativity 57 58 Extended BNF (EBNF) Attribute Grammars Some versions of EBNF allow a numeric superscript to be attached to the right brace to indicate an upper limit to the number of times the enclosed part can be repeated. Some versions use a plus (+) superscript to indicate one or more repetitions. Ex) <compound> begin <stmt> {<stmt>} end equivalent to <compound> begin {<stmt>} + end There are some characteristics of the structure of programming languages that are difficult to describe with BNF, and some that are not even possible. An attribute grammar is a device used to describe more of the structure of a programming language than can be described with a contextfree grammar. An attribute grammar is an extension to a context-free grammar 59 60 10

Attribute Grammars (Static ) The static semantics of a language is only indirectly related to the meaning of programs during execution; rather, it has to do with the legal forms of programs (syntax rather than semantics). Ex) The common rule that all variables must be declared before they are referenced. Because of the problems of describing static semantics with BNF, a variety of more powerful mechanisms has been devised for that task. One such mechanism, attribute grammars, was designed by Knuth (1968) to describe both the syntax and the static semantics of programs. Attribute grammars are a formal approach both to describing and checking the correctness of the static semantics rules of a program Attribute Grammars (Basic Concept) Attribute grammars are context-free grammars with attributes, attribute computation functions, and predicate functions. Attributes associated with grammar symbols (terminal and nonterminal symbols) Attribute computation functions (semantic functions)- associated with grammar rules. They are used to specify how attribute values are computed. Predicate functions state the static semantic rules of the language, are associated with grammar rules. 61 62 Attribute Grammars (Formal Definition) Attribute Grammars (Additional Features of Attribute Grammars: I) An attribute grammar is a context-free grammar with the following additions: For each grammar symbol X there is a set A(X) of attribute values Each rule has a set of functions that define certain attributes of the nonterminals in the rule Each rule has a (possibly empty) set of predicates to check for attribute consistency I. Associated with each grammar symbol X is a set of attributes A(X). The set A(X) consists of two disjoint sets synthesized attributes S(X) and inherited attributes I(X). Synthesized attributes are used to pass semantic information up a parse tree. inherited attributes used to pass semantic information down and across a parse tree 63 64 Attribute Grammars (Additional Features of Attribute Grammars: II) Attribute Grammars (Additional Features of Attribute Grammars: III) II. Associated with each grammar rule is a set of semantic functions and a possibly empty set of predicate functions over the attributes of the symbols in the grammar rule. Ex) For a rule X 0 X 1... X n, The synthesized attributes of X 0 are computed with semantic functions of the form S(X 0 ) = f(a(x 1 ),..., A(X n )). So the value of a synthesized attribute on a parse tree node depends only on the values of the attributes on that node s children nodes. Inherited attributes of symbols X j, 1 j n (in the rule above), are computed with a semantic function of the form I(X j ) = f(a(x 0 ),, A(X n )). So the value of an inherited attribute on a parse tree node depends on the attribute values of that node s parent node and those of its sibling nodes. III. A predicate function has the form of a Boolean expression on the union of the attribute set {A(X 0 ),, A(X n )} and a set of literal attribute values. The only derivations allowed with an attribute grammar are those in which every predicate associated with every nonterminal is true. A false predicate function value indicates a violation of the syntax or static semantics rules of the language. A parse tree of an attribute grammar is the parse tree based on its underlying BNF grammar, with a possibly empty set of attribute values attached to each node. If all the attribute values in a parse tree have been computed, the tree is said to be fully attributed. 65 66 11

Attribute Grammars Example #1 An attribute grammar that describes the rule that the name on the end of an Ada procedure must match the procedure s name (It cannot be stated in BNF). // Example - a procedure that prints 2 lines: procedure demo is procedure printlines is begin put("------------------"); put("------------------"); end printlines; begin printlines; -- do something printlines; end demo; Syntax rule: <proc_def> procedure <proc_name>[1] is begin <proc_body> end <proc_name>[2]; Predicate: <proc_name>[1]string == <proc_name>[2].string Attribute Grammars Example #1 (continue) Syntax rule: <proc_def> procedure <proc_name>[1] is begin <proc_body> end <proc_name>[2]; Predicate: <proc_name>[1].string == <proc_name>[2].string In this example, the predicate rule states that the name string attribute of the <proc_name> nonterminal in the subprogram header must match the name string attribute of the <proc_name> nonterminal following the end of the subprogram. The string attribute of <proc_name>, denoted by <proc_name>.string, is the actual string of characters that were found immediately following the reserved word procedure by the compiler. Notice that when there is more than one occurrence of a nonterminal in a syntax rule in an attribute grammar, the non-terminals are subscripted with brackets ([])to distinguish them. 67 68 Attribute Grammars Example #2 Attribute Grammars Example #2 (continue) Following example illustrates how an attribute grammar can be used to check the type rules of a simple assignment statement. The Syntax and static semantics of these assignment statements are: Syntax: The only variable names are A, B, and C. The right side of the assignments can be either a variable or an expression in the form of a variable added to another variable. <assign> <var> = <expr> <expr> <var> + <var> <var> <var> A B C : The variable can be one of two types: int or real When there are two variables on the right side of an assignment, they need not be the same type. The type of expression when the operand types are not the same is always real. When they are the same, the expression type is that of the operands. The type of the left side of the assignment must match the type of the right side. So the types of operands in the right side can be mixed, but the assignment is valid only if the target and the value resulting from evaluating the right side have the same type. <real> = <real>+<int> ok <int> = <int>+<int> ok <real>=<real><real> ok <real>=<int>+ <real> ok <real>=<int> + <int> not ok <int> = <real>+<int> or <int> = <int>+<real> not ok 69 70 Attribute Grammars Example #2 (continue) Attribute Grammars Example #2 (continue) The attributes for the nonterminals in the example attribute grammar are described in the following paragraphs: actual_type A synthesized attribute associated with the nonterminals <var> and <expr>. It is used to store the actual type, int or real, of a variable or expression. In the case of a variable, the actual type is intrinsic. In the case of an expression, it is determined from the actual types of the child node or children nodes of the <expr> nonterminal. expected_type An inherited attribute associated with the nonterminal <expr>. It is used to store the type, either int or real, that is expected for the expression, as determined by the type of the variable on the left side of the assignment statement. 71 An Attribute Grammar for Simple Assignment Statements 1. Syntax rule: <assign> <var> = <expr> Semantic rule: <expr>.expected_type <var>.actual_type 2. Syntax rule: <expr> <var>[2] + <var>[3] Semantic rule: <expr>.actual_type if (<var>[2].actual_type = int) and (<var>[3].actual_type = int) else real end if Predicate: <expr>.actual_type == <expr>.expected_type 3. Syntax rule: <expr> <var> Semantic rule: <expr>.actual_type <var>.actual_type Predicate: <expr>.actual_type == <expr>.expected_type 4. Syntax rule: <var> A B C Semantic rule: <var>.actual_type look-up(<var>.string) then int The look-up function looks up a given variable name in the symbol table and returns the variable s type 72 12

Attribute Grammars Example #2 (continue) <assign> <expr> <var> <var>[2] <var>[3] A = A + B A parse tree for A = A + B Computing Attribute Values with Example #2 Now, consider the process of computing the attribute values of a parse tree, which is sometimes called decorating the parse tree. If all attributes were inherited, this could proceed in a completely top-down order, from the root to the leaves. If all the attributes were synthesized, it could proceed in a completely bottom-up order, from the leaves to the root. Because our grammar has both synthesized and inherited attributes, the evaluation process cannot be in any single direction. 73 74 Computing Attribute Values with Example #2 Computing Attribute Values with Example #2 The following is an evaluation of the attributes, in an order in which it is possible to compute them (A = A + B): <assign> expected_type actual_type <expr> actual_type <var> actual_type <var>[2] <var>[3] actual_type 1. <var>.actual_type look-up(a) (Rule 4) 2. <expr>.expected_type <var>.actual_type (Rule 1) 3. <var>[2].actual_type look-up(a) (Rule 4) <var>[3].actual_type look-up(b) (Rule 4) 4. <expr>.actual_type either int or real (Rule 2) 5. <expr>.expected_type == <expr>.actual_type is either TRUE or FALSE (Rule 2) A = A + B Shows the flow of attribute values in the example parse tree. Solid lines are used for the parse tree Dashed lines show attribute flow in the tree 75 76 Computing Attribute Values with Example #2 Computing Attribute Values with Example #2 <assign> <assign> expected_type = real_type <expr> actual_type = real_type expected_type = int_type <expr> actual_type = real_type actual_type = real_type <var> actual_type = <var>[2] real_type actual_type = <var>[3] int_type actual_type = int_type <var> actual_type = <var>[2] int_type actual_type = <var>[3] real_type A = A + B A = A + B Shows A fully attributed parse tree with A is defined as a real and B is defined as an int. expected_type == actual_type Shows A fully attributed parse tree with A is defined as a int and B is defined as an real. expected_type!= actual_type 77 78 13

Evaluation of Attribute Grammar Checking the static semantic rules of a language is an essential part of all compilers. One of the main difficulties in using an attribute grammar to describe all of the syntax and static semantics of a real contemporary programming language is the size and complexity of the attribute grammar. The attribute values on a large parse tree are costly to evaluate. Less formal attribute grammars are commonly used for compiler writer. are meaning of the expressions, statements and program units of programming languages. There is no single widely acceptable notation or formalism for describing semantics Several needs for a methodology and notation for semantics: Programmers need to know what statements mean Compiler writers must know exactly what language constructs do Programs written in the language could be proven correct without testing. Compiler generators would be possible Designers could detect ambiguities and inconsistencies 79 80 Software developers and compiler designers typically determine the semantics of programming languages by reading English explanations in language manuals. Because such explanations are often imprecise and incomplete, this approach is clearly unsatisfactory. Due to the lack of complete semantics specifications of programming languages, programs are rarely proven correct without testing, and commercial compilers are never generated automatically from language descriptions. (Operational ) Operational semantics is to describe the meaning of a statement or program by specifying the effects of running it on a machine. The effects on a machine are viewed as the sequence of machine state changes (the collection of the values in its storages). An obvious operational semantics description is given by executing a compiled version of the program on a computer. Most programmers write a small test program to determine the meaning of some programming language construct, while learning the language. 81 82 (Operational ) (Operational ) Problems with operational semantics The individual steps in the execution of machine language and the resulting changes to the state of the machine are too small and too numerous. The storage of a real computer is too large and complex. Different levels of uses of operational semantics. At the highest level, the interest is in the final result of the execution of a complete program (called natural operational semantics). At the lowest level, operational semantics can be used to determine the precise meaning of a program through an examination of the complete sequence of state changes that occur when the program is executed (called structural operational semantics). 83 84 14

( The Basic Process of Operational ) ( The Basic Process of Operational ) Design an appropriate intermediate language, where the primary characteristic of the language is clarity. Every construct of the intermediate language must have an obvious and unambiguous meaning. If the semantics description is to be used for natural operational semantics, a virtual machine (an interpreter) must be constructed for the intermediate language. The basic process of operational semantics is not unusual. In fact, the concept is frequently used in programming textbooks and programming language reference manuals. Example with a C construct for (expr1; expr2; expr3) { } expr1; Loop: if expr2 == 0 goto out expr3; goto Loop Out: C Statement Meaning (Semantic) The human reader of such a description is the virtual computer and is assumed to be able to execute the instructions in the definition correctly and recognize the effects of the execution. 85 86 ( Evaluation of Operational ) (Summary of Operational ) Evaluation of operational semantics: Good if used informally (language manuals, etc.) Extremely complex if used formally (e.g., VDL), it was used for describing semantics of PL/I. Uses of operational semantics: - Language manuals and textbooks - Teaching programming languages Two different levels of uses of operational semantics: - Natural operational semantics - Structural operational semantics Evaluation - Good if used informally (language manuals, etc.) - Extremely complex if used formally (e.g.,vdl) 87 88 (Denotation ) (Denotation ) Denotational semantics is the most widely known formal method based on recursive function theory for describing the meaning of programs. The process of building a denotational specification for a language: Define a mathematical object for each language entity Define a function that maps instances of the language entities onto instances of the corresponding mathematical objects The difficulty with this method lies in creating the objects and the mapping functions. The method is named denotational because the mathematical objects denote the meaning of their corresponding syntactic entities. The mapping functions of a denotational semantics programming language specification have a domain and a range. The domain is the collection of values that are parameters to the function; the range is the collection of objects to which the parameters are mapped. The domain is called the syntactic domain, because it is syntactic structures that are mapped. The range is called the semantic domain, because each result of function based on parameter has a meaning. In denotational semantics, programming language constructs are mapped to mathematical objects, either sets or, more often, functions. 89 90 15

(Denotation : Example: binary number) The syntax of binary numbers can be described by the following grammar rules: Notice that we put quotation mark to show they are not mathematical digits but a character. A example binary number '110' is a string with sequence of character '1', '1' and '0'. <bin_num> '0' '1' <bin_num> '0' <bin_num> '1' ( Denotation : Example: binary number) A parse tree for the example binary number, '110', is shown in Figure <bin_num> '1' <bin_num> <bin_num> '1' '0' 91 92 (Denotation : Example: binary number) To describe the meaning of binary numbers using denotational semantics, we associate the actual meaning (a decimal number) with each rule that has a single terminal symbol as its RHS. In example, decimal numbers must be associated with the first two grammar rules. The other two grammar rules are, in a sense, computational rules, because they combine a terminal symbol, to which an object can be associated, with a nonterminal, which can be expected to represent some construct. The semantic function M bin, defined as recursive function. It maps the syntactic objects, as described in the previous grammar rules, to the objects in N, the set of nonnegative decimal numbers. (Denotation : Example: binary number) The function M bin is defined as follows: M bin ('0') = 0 M bin ('1') = 1 M bin (<bin_num> '0') = 2 * M bin (<bin_num>) M bin (<bin_num> '1') = 2 * M bin (<bin_num>) + 1 The meanings, or denoted objects (decimal numbers), can be attached to the nodes of the parse tree shown on the previous page, yielding the tree. 93 94 (Denotation : Example: binary number) M bin ('110')=M bin ('11' '0' )=2 * M bin ('11' ) = 2 * M bin ('1' '1' ) = 2* (2 * M bin ('1' ) + 1) = 2 * (2 * 1 + 1) = 2 * 3 = 6 6 <bin_num> ( Denotation : Example: binary number) M bin ('1010')=M bin ('101' '0' )=2 * M bin ('101' ) = 2 * M bin ('10' '1' ) = 2* (2 * M bin ('10' ) + 1) = 2 2 * M bin ('1' '0' ) + 2 = 2 2 * 2 * M bin ('1' ) + 2 10 <bin_num> = 2 2 * 2 * 1 + 2 = 10 5 <bin_num> '0' 3 <bin_num> '0' 2 <bin_num> '1' 1 <bin_num> '1' 1 <bin_num> 0' '1' '1' 95 96 16

(Denotation : Example: decimal number) We can extend this idea to decimal number. The syntax of decimal numbers can be described by the following grammar rules: <dec_num> '0' '1' '2' '3' '4' '5' '6' '7''8' '9' <dec_num> ('0' '1' '2' '3' '4' '5' '6' '7' '8' '9') The denotational mapping functions M dec are defined as follows: M dec ('0') = 0, M dec ('1') = 1, M dec ('9') = 9 M dec (<dec_num> '0') = 10 * M dec (<dec_num>) M dec (<dec_num> '1') = 10 * M dec (<dec_num>) + 1... M dec (<dec_num> '9') = 10 * M dec (<dec_num>) + 9 (Denotation : The State of a Program) The denotational semantics of a program could be defined in terms of state changes in an ideal computer. Denotational semantics is defined in terms of only the values of all of the program s variables. In denotational semantics, state changes are defined by mathematical functions. Let the state s of a program be represented as a set of ordered pairs, as follows: s = {<i 1, v 1 >, <i 2, v 2 >,..., <i n, v n >} where i is the name of variable, and v s are the current values of those variables. Any of the v s can have the special value undef, which indicates that its associated variable is currently undefined. 97 98 (Denotation : The State of a Program) ( Denotation : Expressions) Let VARMAP be a function of two parameters: a variable name and the program state. It returns the current value of the variable. VARMAP(i j, s) = v j Most semantics mapping functions for programs and program constructs map states to states. These state changes are used to define the meanings of programs and program constructs. Following example BNF description of expressions are base on following assumption. No side effects The only operators are + and *, and an expression can have at most one operator. The only operands are scalar integer variables and integer literals There are no parentheses The value of expression is integer. <expr> <dec_num> <var> <binary_expr> <binary_expr> <left_expr> <operator> <right_expr> <left_expr> <dec_num> <var> <right_expr> <dec_num> <var> <operator> + * <dec_num> '0' '1' '2' '3' '4' '5' '6' '7''8' '9' <dec_num> ('0' '1' '2' '3' '4' '5' '6' '7' '8' '9') 99 100 ( Denotation : Expressions) (Denotation : Expressions) The only error we consider in expressions is a variable having an undefined value. Let Z be the set of integers, and let error be the error value. Then Z {error} is the semantic domain for the denotational specification for example expressions. The mapping function for a given expression E and state s follows. Symbol = is used to define mathematical functions. The implication symbol, =>, used in this definition connects the form of an operand with its associated case (or switch) construct. Dot notation is used to refer to the child nodes of a node. For example, <binary_expr>.<left_expr> refers to the left child node of <binary_expr>. M e (<expr>, s) = case <expr> of <dec_num> => M dec (<dec_num>, s) <var> => if VARMAP(<var>, s) == undef then error else VARMAP(<var>, s) <binary_expr> => if (M e (<binary_expr>.<left_expr>, s) == undef OR M e (<binary_expr>.<right_expr>, s) = undef) then error else if (<binary_expr>.<operator> == '+' then M e (<binary_expr>.<left_expr>, s) + M e (<binary_expr>.<right_expr>, s) else M e (<binary_expr>.<left_expr>, s) * M e (<binary_expr>.<right_expr>, s) 101 102 17

(Denotation : Assignment Statements) An assignment statement is an expression evaluation plus the setting of the target variable to the expression s value. In this case, the meaning function maps a state to a state. M a (x = E, s) = if M e (E, s) == error then error else s ={<i 1,v 1 >,<i 2,v 2 >,...,<i n,v n >}, where for j = 1, 2,..., n, if i j == x then v j = M e (E, s) else v j = VARMAP(i j, s) (Denotation : Logical Pretest Loops) To expedite the discussion, we assume that there are two other existing mapping functions, M sl map statement lists and states to states M b, map Boolean expressions to Boolean values (or error). M l (while B do L, s) = if M b (B, s) == undef then error else if M b (B, s) == false then s else if M sl (L, s) == error then error else M l (while B do L, M sl (L, s)) Note that the comparison i j == x is of names, not values. 103 104 (Evaluation of Denotation ) (Axiomatic ) Can be used to prove the correctness of programs Provides a rigorous way to think about programs Can be an aid to language design Has been used in compiler generation systems Because of its complexity, it are of little use to language users Based on formal logic (predicate calculus) Original purpose: formal program verification (check the correctness of a program) Axioms or inference rules are defined for each statement type in the language (to allow transformations of logic expressions into more formal logic expressions) The logic expressions are called assertions (or predicate) 105 106 18