Monday, April 14, 2014 Topics for today Grammars and Languages (Chapter 7) Finite State Machines Semantic actions Code generation - Overview Analysis Algorithm 1: evaluation of postfix Algorithm 2: infix to postfix Algorithm 3: evaluation of infix Finite State Machines (see 7.2) If a language is regular (Type 3) we can answer the question Is string X in the language? by means of an appropriate Finite State Machine (FSM). For every regular language there is an FSM that can recognize strings in the language. (The reverse is also true - for every FSM, we can represent the language it recognizes by means of a regular expression). Visually an FSM might be a graph with the states as nodes and transitions as arcs. One of the states is identified as the start state. The machine is input driven, the combination of current state and input determines what the next state is. Only strings that are in the language will cause the FSM to end up in one of the "final" states. (In our graphs, the start state is labeled with -, final states are labeled with + ) FSM Example 1: recognizer for label letter or digit 1-2 3 + letter colon Comp 162 Notes Page 1 of 10 April 14, 2014
As a transition table we might have something like the following where the entries tell us the next state given the current state (row) and input (column). Input Current State Letter Digit Colon Other 1 2 error error error 2 2 2 3 error If we start in state 1 then any string that takes us to state 3 is a label. If input is not a label the FSM will crash in some error state. The following general algorithm will determine if the input is in the language defined by a transition table. current_state = Start while not ismember(finish-states, current_state) && current_state!= error ) { get(input) current_state = transition_table[current_state][typeof(input)] } FSM Example 2: recognizer for train On a certain railroad, all trains begin with one or more locomotives followed by zero or more wagons followed by a caboose. Using W for wagons, L for locomotives, and C for cabooses, the following regular expression defines valid trains. The following FSM recognizes valid trains. LL*W*C L W W C - + L C Comp 162 Notes Page 2 of 10 April 14, 2014
As a table Input Current State L W C Other 1 2 error error error 2 2 3 4 error 3 error 3 4 error Semantic actions Semantic actions are actions associated with particular transitions from one state to another. They are useful for performing such operations as forming input characters into a string or creating a variable containing the value of an input number. Example 1 We can attach semantic actions to our label checker that build up the string as it is read characterby-character then, when we read the colon, enters the string into the symbol table. Here are the actions Action When done Details A On reading the first character String S = character B On reading a subsequent non-colon character Append character to S C On reading a colon Enter S into symbol table Here is the modified FSM with transitions labeled input/action Letter/B or digit/b - + letter /A colon/c Comp 162 Notes Page 3 of 10 April 14, 2014
FSM Example 3: recognizer for real numbers Assume that in a real number we must have at least one digit before the decimal point or at least one digit after the decimal point. A number can begin with an optional sign character. Thus, Valid Invalid -3.4 1.3.4.6. +54. 34-3. +-3.4 2.1 23 Regular expression: [sign] (digit digit*.digit* digit*.digit digit* ) [x] = none or one x Here is a FSM that recognizes real numbers defined in this way. digit digit digit digit dec. pt - sign + dec. pt dec. pt digit We can attach semantic actions to our real number parser to create a variable containing the value of the number. There will be an action performed before input is read and another action performed after input has finished and actions associated with state-to-state transitions. The actions will use global variables. Here are the actions: Comp 162 Notes Page 4 of 10 April 14, 2014
Action When done Details Initially dec_places = 0, sign = 1, N = 0 A On reading a decimal point dec_places = 0 B On reading a sign character if (input = '-') sign = -1 C On reading a digit N = N * 10 + input; dec_places++ At the end N *= sign, N = N / (10 dec_places ) So our augmented FSM with transitions now depicting input/action is digit/c digit/c digit/c digit/c dec. pt/a - sign/b + dec. pt/a dec. pt/a digit/c Here is a trace of what happens when input is the 7-character string -217.86 Input Sign N Dec_places <initially> 1 0 0 - -1 2 2 1 1 21 2 7 217 3. 0 8 2178 1 6 21786 2 <end> -21786-217.86 Comp 162 Notes Page 5 of 10 April 14, 2014
Code generation (see section 7.4) A compiler, having verified that the input string is legal in the particular programming language, will generate appropriate assembly language. Warford s example is not very general; we describe and solve a more general problem: Translate an arithmetic expression into Pep/8 assembly language Like a compiler, we can produce the assembly code in two phases: (1) Analysis: verify input and build an appropriate data structure. (2) Synthesis: generate code from the data structure Example - simple assignment statements such as X = A + ( B + C ) * ( D + E ) Analysis: A binary tree representing the expression is constructed using semantic actions as the expression is read symbol by symbol. If input were the expression above we get this tree that correctly represents the priorities of the arithmetic operations. = X + A * + + B C D E Synthesis: We traverse the tree in a systematic way to generate assembly code. For example, from the tree above, the sequence of Pep/8 assembly language begins subsp 2,i lda B,d sta 0,s subsp 2,i lda C,d etc We will lead up to the Analysis and Synthesis algorithms involved by first looking at three simpler ones. Comp 162 Notes Page 6 of 10 April 14, 2014
Algorithm 1. Evaluation of a postfix expression Algorithm 2. Conversion of an infix expression to postfix form Algorithm 3. Evaluation of an infix expression (algorithms 1 and 2 combined). Then we can look at the Analysis algorithm Algorithm 4: Building a tree from an infix expression And finally at the Synthesis algorithm Algorithm 5: Generating assembly code We know that the language of arithmetic expressions is not Type 3 so a simple finite state machine will not be sufficient to process it. Our algorithm will use stacks. Comp 162 Notes Page 7 of 10 April 14, 2014
Algorithm 1: evaluation of a post-fix expression An infix expression is one where an operator is placed between its operands as in A + B A postfix expression is one where the operator follows its operands as in A B + An advantage of a postfix expression is that it is parenthesis free and can be evaluated from left to right (using a stack) unlike an infix expression such as A * B + C * D The post-fix expression corresponding to this infix example is A B * C D * + Only two kinds of symbols appear in a post-fix expression: operators and operands. There are no parentheses. So all our algorithm has to do is define what action to take for each type of symbol. These actions are: Symbol operand put it on the stack Action operator apply it to the top two stack items and replace them by the result. Consider the postfix evaluation of 23 5 * 7 4 * + Here is what the stack looks like as the expression is read: Input 23 5 * 7 4 * + Stack 23 5 23 115 7 115 4 7 115 28 115 143 If the expression is well-formed there will be exactly one item on the stack after the last input symbol has been processed. Comp 162 Notes Page 8 of 10 April 14, 2014
Algorithm 2: conversion of an infix expression to postfix It would be handy to have an algorithm that reads an infix expression from left to right and outputs the corresponding post-fix. For example input: ( A + B ) * C + D / E output: A B + C * D E / + Features of the algorithm: * has to process open and close parentheses * the operands in the output are in the same order as in the input but the order of operators may be different reflecting the different operator priorities - multiplication has higher priority than addition for example * the stack used by the algorithm is a stack of operators - we include "(" in this category. We just have to define what action to take for each of the 4 kinds of symbol that can appear in the input. Here are the actions. Symbol ( Action put it on the stack with lowest possible priority ) Unstack and output operators until "(" reached. Unstack but do not output the "(" operand output it operator while ( priority(top-of-stack) priority(input)) unstack it and output it. push the input onto the stack At the end of input there may be operators left on the stack; if so, we unstack and output them one by one. Here is a trace of the example above Input ( A + B ) * C + D / E Stack + + / / ( ( ( ( * * + + + + + Output A B + C * D E / + Comp 162 Notes Page 9 of 10 April 14, 2014
Algorithm 3: Evaluate infix - a combination of algorithms 1 and 2. We can combine algorithms 1 and 2 to give us an algorithm that reads an infix expression and determines its value. It will use two stacks (an operand stack as in algorithm 1 and an operator stack as in algorithm 2). As in algorithm 1 it will leave the value of the expression as the only item on the operand stack. Here are the actions required for each of the 4 kinds of symbol: Symbol ( Action put it on the operator stack with lowest possible priority ) Unstack and apply operators until "(" reached. Unstack but do not apply the "(" operand put on operand stack operator while ( priority(top-of-operator-stack) priority(input) ) unstack and apply the top-of-operator-stack. push the input onto the operator stack At the end: unstack and apply any operators remaining on the operator stack. "Apply an operator" means apply it to the top 2 items on the operand stack and replace them by the result of the operation (like we did in algorithm 1). Here is a trace of the algorithm on ( 6 + 4 ) * 3 + 16 / 2 Input ( 6 + 4 ) * 3 + 16 / 2 Operator Stack + + ( ( ( ( * * + + / + / + + Operand Stack 6 6 4 6 10 10 3 10 30 16 30 16 30 2 16 8 30 30 38 Reading We are looking at an alternative to Warford s section 7.4 Comp 162 Notes Page 10 of 10 April 14, 2014