Syntax-directed is a grammaroriented compiling technique Programming languages: Syntax: what its programs look like? Semantic: what its programs mean? 1 A simple syntax-directed Lexical Syntax Character analyzer token directed intermediate stream stream representation The lexical analyzer converts the stream of input characters into a stream of token that becomes the input to the following phase Syntax directed is a combination of a syntax analyzer and an intermediate code generator. 2 1
Syntax definitions Context Free Grammar: a notation used for specifying the syntax of a language ( BNF ). A CFG has four components : A set of tokens known as terminals A set of non-terminals A set of production rules A start symbol 3 Syntax definitions Example: <list >::= < list > + < digit > <list >::= < list > - < digit > <list >::= < digit > <digit >::= 0,1,2,..,9 show that 9-5+2 is a list 4 2
Parse Tree 5 Parse Tree A parse tree shows how the start symbol of a grammar derives a string in a language. Example: Consider the following CFG: 6 3
Parse Tree 7 Syntax Tree 8 4
Left and right recursive productions 9 Ambiguity A grammar is said to be ambiguous if it produces more than one parse tree for some sentence. Ambiguity is acceptable in spoken languages. Ambiguous programming languages are useless unless the ambiguity can be resolved. 10 5
Ambiguity 11 Ambiguity The above is ambiguous grammar 12 6
Ambiguity Note that in case of operators like and / these two parse tree would evaluate differently. Even in case of associate operators like + and * these two parse trees can be evaluate differently due to the possibility of overflow. 13 Associativity of operators e.g. 9+5+2 ( 9+5 ) + 2 We say that operator + associative to the left because an operand with plus sign on both sides of it is taken by the operator to its left Assignment in C : a = b = c right associative The basic four arithmetic operators are left associative. That is why the next grammar <list> stand growing from left ( the same goes for parse tree) 14 7
Associativity of operators 15 Precedence Of Operators Consider the exp. 9+5*2, there are two possible interpretations of this exp: ( 9+5 )*2 or 9 + (5*2 ) We need to know the precedence of operators when more than one kind of operators is present ( associativity of + and * do not resolve this ambiguity ). 9+5*2 9+ ( 5*2 ) * higher precedence than + 9*5+2 ( 9*5 ) +2 16 8
Syntax of expressions A grammar for arithmetic expressions can be constructed from a table showing the associativity and precedence of operators. Left associative : + - lower precedence Left associative : * / higher precedence We create < exp > and < term > for 2 levels of precedence and an extra non-terminal < factor > for generating the basic units of expressions ( digits & parenthesized exp.) 17 Syntax of expressions 18 9
Syntax of expressions This grammar is a list of terms separated by either + or - signs, and a term is a list of factors separated by * or / signs. A factor is a digit or a parenthesized exp. 19 Exercises 1) From the above grammar derive the following expressions ( and draw the Parse Tree ). 9 * 7 * ( 5 ( 3 + 2 ) ( 7 * ( 5 2 + 4 ) / 6 ) 2) Write grammar rule for arithmetic exp consider the operators *, /, -, +, ^, %. 20 10
Syntax of statements The following grammar defines a statement in Pascal: 21 Lexical Analysis Removal of white spaces & comments Recognizing the tokens of the source program. Let num be a token representing an integer, when a sequence of digit appears in an input, the lexical analyzer will pass num to the parser. 22 11
Lexical Analysis 23 Lexical Analysis Regular grammar is used to describe different tokens of programming languages and denoted as regular expressions. <id>::= <letter> (<letter> <digit>)* <int>::=<digit> <digit>* 24 12
Lexical Analysis The needs to know that lexeme count forms the first 2 instances of token ( id ), and the lexeme increment forms the third instance of ( id ) The data structure used to do such a thing is the symbol table. 25 Recognizing identifiers & keywords A grammar of a language often treats an identifier as a token (id) and a keyword as a token (kw) A mechanism is needed to distinguish between keywords & identifiers. Each identifier needs to be first checked against keywords list to decide whether it is kw or id 26 13
Symbol Table It is a data structure that is generally used to store information about various source language constructs. The symbol table interfaces with the phases using the following operations: insert (s, t) returns the index of new entry for string s, token t. lookup (s) { returns index of the entry for string s, or 0 if s is not in the table. 27 Handling keywords e.g. Consider token opr with lexemes div & mod, we initialize the symbol table using the following calls: initialization insert ( "div", opr ) ; insert ( "mod", opr ) ; The lookup ( " div " ) returns the token opr, so div cannot be used as an identifier. 28 14
Parsing Parsing is the process of determining if a string of tokens can be generated by a grammar. 29 Parsing Top down Parsing Parsing refers to the order in which the parse tree is constructed Bottom up Parsing 30 15
Top Down Parsing 31 Top Down Parsing 32 16
Top Down Parsing 33 Top Down Parsing 34 17
Top Down Parsing 35 One symbol lookahead Usually this can be implemented during a single left to right scan of the input string. The current token being scanned in the input is referred to as the lookahead symbol. 36 18
Top Down Parsing The following grammar generates a subset of types of Pascal. 37 38 19
Bottom-up parsing Used to parse arithmetic exp. 39 Bottom-up parsing 40 20