1
2
3
Attributes can be added to the grammar symbols, and program fragments can be added as semantic actions to the grammar, to form a syntax-directed translation scheme. Some attributes may be set by the lexical analysis, and some attributes may be computed in the semantic actions. Examples of attributes: values of evaluated subtrees, type information, source file coordinates, By injecting corresponding code fragments into the parser implementation, the semantic actions can be executed during the parse. This is known as a syntaxdirected translation. 4
Example from the book (Section 2.3): turn infix arithmetic expressions into postfix dittos. 5
Postfix notation for arithmetic expressions puts the operator at the end instead of in-between the operands (which is called infix notation). With postfix notation, no parentheses are needed. This is a good example since it postfix notation is similar to the stack machine code that you will generate in the first lab assignment. 6
The postfix notation of a single constant num is defined as just that constant. The postfix notation of an infix expression on the form (E) is defined as the postfix notation of E. The postfix notation of an infix expression on the form E 1 op E 2, where op is some binary operator, is defined as the postfix notation of E 1 followed by the postfix notation of E 2 followed by op. Note that this definition is not concerned with the precedence or associativity of operators. It assumes that the intended order in which the operators are applied is already reflected in the parse tree for the expression, and the same application order will be used in the resulting postfix expression. 7
8
Translation scheme based on the old expression grammar (we will take care of the left-recursion later). The code fragments should be executed by the parser as soon as the production has been identified. The semantic actions can also be put in the middle of production bodies. We assume that the scanner has attached an attribute value to the num tokens (the book uses a nonterminal). Note that attributes can be attached also to nonterminals, and the attributes may be changed by the semantic actions to propagate information to different parts of the parse tree. 9
The semantic actions can be seen as grammar symbols. If inserted as leaves in the parse tree, they are executed in the order given by a depthfirst, left-right traversal of the tree. 10
11
Since we treat the semantic actions as grammar symbols, they can be included in the left recursion elimination. Here the same translation scheme, with left recursion removed using the simple procedure shown before, is shown. Note that since the semantic actions now appear in the middle of the production bodies, they should be executed as soon as the parser has processed the symbols to the left in the body. However, this is a common mistake in the first lab assignment. Make sure e.g. the expression 3 2 1 is translated as (3 2) 1 instead of as 3 (2 1)! 12
13
Syntax-directed definitions are similar to syntax-directed translation schemes, but more abstract or declarative. Extends the grammar in the following way: Attaches attributes to grammar symbols (terminals and nonterminals) Attaches semantic rules to productions that define the attributes. Contrary to semantic actions, no evaluation order is specified, but is instead implied by the definition. It is common to add subscripts to grammar symbols that occur several times in the same production, to be able to distinguish them in the semantic rules. The table shows a syntax-directed definition for the infix postfix translation. The operator means string concatenation. 14
15
16
Note that the lexical analysis should not assume that the input program is syntactically correct. For instance, the following regular expression used to distinguish the keyword if from identifiers starting with the letters if is problematic: if[ \t\n]*( The problem is that it assumes that the next token is a left parenthesis (which it will be if the program is syntactically correct, but this might not be the case). 17
The typesetting language TeX has support for configuring the lexical analysis. For example, it has support for changing which character should be used to start comments. Another example: if the macro \A expands to some and the macro \B expands to macro, then \csname\a\b\endcsname generates a call to the macro \somemacro. In this case, a token (a control sequence token) has been generated from the invocation of other macros. 18
19
20
21
22
The lexical analysis can also be implemented as a DFA. The DFA is invoked each time GetNextToken() is called. The tokens recognized by the DFA in this example are: < Less than <= Less than or equal > Greater than >= Greater than or equal = Equal <> Not equal The * at states 4 and 8 means that the current input position must be moved back one step. Optional exercise: Draw a DFA that recognizes the following tokens (from the language C): add Plus operator: + incr Increment operator: ++ sub Minus operator: - decr Decrement operator: -- arrow Struct member accessor: -> id Identifiers: [a-za-z_][a-za-z0-9_]* if The keyword if 23
This table encodes the DFA from the previous slide. Green cells mark success: the returning of a token. Red cells mark lexical error. In reality, other characters in the state 0 would be the start of some other lexemes, since it is not common that a language only contains relational operators. 24
With a keyword table, recognized identifiers can be checked against the keywords in the table to see if they should be returned as keyword tokens instead. Another strategy is to always test for keywords before identifiers, e.g. by constructing the DFA this way. 25
26
27
28