CPS 506 Comparative Programming Languages Syntax Specification
Compiling Process Steps Program Lexical Analysis Convert characters into a stream of tokens Lexical Analysis Syntactic Analysis Send tokens to develop an abstract representation or parse tree 2
Compiling Process Steps (con t) Syntactic Analysis Semantic Analysis Send parse tree to analyze for semantic consistency and convert for efficient run in the architecture (Optimization) Semantic Analysis Machine Code Convert abstract t representation ti n to executable machine code using code generation n 3
Meta-Language Formal Methods and Language Processing A language to define other languages BNF (Backus-Naur Form) A set of rewriting rules ρ A set of terminal symbols A set of non-terminal symbols Ν A start symbol S є Ν ρ : Α ω ΑєΝand ωє(ν U Σ) Right-hand hand side: a sequence of terminal and non-terminal symbols Left-hand side: a non-terminal symbol 4
BNF (con t) The words in Ν : grammatical categories or Identifier, Expression, Loop, Program, S : principal grammatical category Symbols in Σ : the basic alphabet Example 1: Example 2: binarydigit 0 binarydigit 1 binarydigit 0 1 Integer Digit Integer Digit Digit 0 1 2 3 4 5 6 7 8 9 5
BNF (con t) Parse Tree Integer Integer Digit Integer Digit 1 Digit 8 Derivation 2 Integer Integer Digit Integer Digit Digit Digit Digit Digit 2 Digit Digit 28 Digit 281 6
BNF (con t) Lexeme: The lowest-level l l syntactic units Tokens : A set of all grammatical categories that define strings of non-blank characters (Lexical Syntax) Identifier (variable names, function names, ) Literal (integer and decimal numbers, ) Operator (+,-,*,/, ) Separator (;,.,(,),{,}, )( ) Keyword (int, if, for, where, ) 7
BNF (con t) Comment Keyword Identifier // comments void main ( ) { float p; p = 3.14 ; } Literal Operator Separator 8
BNF (con t) 9
Regular Expressions An alternative ti for BNF to define a language lexical l rules x : A character abc : A literal string A B : A or B A B : Concatenation of A and B A* : Zero or more occurrence of A A+ : One or more occurrence of A A? : Zero or one occurrence of A [a-z A-Z] : Any alphabetic character [0-9] : Any digit. : Any single character Example Integer : [0-9]+ Identifier : [a-z A-Z][a-z A-Z 0-9]* 10
Syntactic Analysis Primary tool: BNF Input: Tokens from lexical analysis Output: Parse Syntactic categories Program Declaration Assignment Expression Loop Function definition 11
Syntactic Analysis (con t) Example Arithmetic Expression Term Arithmetic Expression + Term Arithmetic ti Expression Term Term Factor Term * Factor Term / Factor Factor Identifier Literal ( Arithmetic Expression ) 12
Syntactic Analysis (con t) Example Arithmetic Expression 2 * a - 3 Term Arithmetic Expression Term * Factor Term Factor Identifier Factor Literal Letter Literal Integer a Integer 2 3 13
Syntactic Analysis (con t) BNF limitations Declaration of identifiers? Initial value of identifiers? In statically typed languages Using Type System for the first problem Detect in compile time or run time 14
Ambiguous Grammar A string is parsed into two or more various trees Example Exp Identifier Literal Exp Exp Input: A B C Output: 1- A (B C) 2- (A B) C Another example is dangling else Using BNF rules Using extra-grammatical rules 15
Operator Precedence <expr> <id> + <expr> <id> * <expr> ( <expr> ) <id> A = B + C * A A = B + (C * A) A = B * C + A A = B * (C + A) Solution <expr> <expr> + <term> <term> <term> <term> * <factor> <factor> <factor> ( <expr> ) <id> A = B + C * A A = B + (C * A) A = B * C + A A = (B * C) + A 16
Associativity of Operators A + B + C A * B * C A / B / C Left Associativity Left Recursive: In a grammar rule, LHS also appears at the beginning of its RHS <expr> <expr> + <term> <term> A + B + C (A + B) + C Right Associativity Right Recursive: In a grammar rule, LHS also appears at the end of its RHS <factor> <exp> ** <factor> <exp> <exp> ( <expr> ) <id> A + B ** C A + (B ** C) 17
Extended BNF (EBNF) Optional part of an RHS <if_stmt> if ( <expr> ) <statement> [ else <statement> ] Repetition, or recursion, part of an RHS <id_list> <id> {, <id_list> } Multiple choice option of an RHS <term> <term> ( * / % ) <factor> Optional use of * and + <id_list> <id> {, <id_list> }* <integer> {0 9}+ 18
Extended BNF (EBNF) (con t) opt subscript Conditional Statement if ( Expr ) Statement { else Statement } opt Syntax Diagram Term Factor * / 19
Case Study A BNF or EBNF for one grammar, such as Expression, different Literals, or if Statement in Java, C, C++, or Pascal BNF or EBNF for floating gpoint numbers in Java, C, C++ BNF or EBNF for loop statements in one language 20
Abstract Syntax Consider the following codes: Pascal C or Java While i < 10 do begin i := i+ 1; end; while (i < 10) { i = i + 1; } Although h syntax are different, they are essentially equivalent Abstract Syntax is a solution to show the essential elements of a language g 21
Abstract Syntax (con t) General Form Abstract Syntax Class = list of essential components n Member Example Loop = Expression test; Statement body Element A Java class for abstract syntax of loop class Loop extends Statement { Expression test; Statement body; } 22
Abstract Syntax (con t) More examples Member Assignment = Variable target; Expression source Element A Java class for abstract syntax of Assignment class Assignment extends Statement { Variable target; Expression source; } 23
Abstract Syntax Tree A tree to show the abstract syntax tree Example x = 2; x := 2; Assignment = Variable target; Expression source Statement Assignment Variable Expression x Vl Value 2 24
Recursive Descent Parser A top-down parser to verify the syntax of a stream of text from left to right It contains several recursive methods, each of which implements a rule of the grammar More details and parsing algorithms in Compiler course 25
Exercises 1. Modify the following grammar to add a unary minus operator that has higher precedence than either + or *. <assign> <id> = <expr> <id> A B C <expr> <expr> + <term> <term> <term> <term> * <factor> <factor> <factor> ( <expr> ) <id> 26
Exercises 2. Consider the following grammar: <S> <A> a <B> b <A> <A> b b <B> a <B> a Which of the following sentences are in the language generated by this grammar? 1. baab 2. bbbab 3. bbaaaaa 4. bbaab 27
Exercises 3. Convert the following EBNF to BNF: S A { ba } A a [b]a 4. Using grammar in question 1, add the ++ and unary operators of Java. 5. Using grammar in question 1, show a parse tree and a leftmost derivation for each of the following statements: a) A = (A+B) * C b) A = B * (C * (A + B)) 28
Exercises 6. Rrewrite the BNF in question 1 to give + precedence over *, and force + to be right associative. 7. Using BNF write an algorithm for the language consisting of strings {ab} n, where n>0, such as ab, aabb,. Can you write this using regular expressions? 29