Monday, pril 15, 2013 Topics for today Code generation nalysis lgorithm 1: evaluation of postfix lgorithm 2: infix to postfix lgorithm 3: evaluation of infix lgorithm 4: infix to tree Synthesis lgorithm 5: tree to code Code generation We will lead up to the nalysis and Synthesis algorithms involved by first looking at three simpler ones. lgorithm 1. Evaluation of a postfix expression lgorithm 2. Conversion of an infix expression to postfix form lgorithm 3. Evaluation of an infix expression (algorithms 1 and 2 combined). Then we can look at the nalysis algorithm lgorithm 4: uilding a tree from an infix expression nd finally at the Synthesis algorithm lgorithm 5: Generating assembly code We know that the language of arithmetic expressions is not Type 3 so a simple finite state machine will not be sufficient to process it. Our algorithm will use stacks. Comp 162 Notes Page 1 of 17 pril 15, 2013
lgorithm 1: evaluation of a post-fix expression n infix expression is one where an operator is placed between its operands as in postfix expression is one where the operator follows its operands as in n advantage of a postfix expression is that it is parenthesis free and can be evaluated from left to right (using a stack) unlike an infix expression such as * C * D The post-fix expression corresponding to this infix example is * C D * Only two kinds of symbols appear in a post-fix expression: operators and operands. There are no parentheses. So all our algorithm has to do is define what action to take for each type of symbol. These actions are: Symbol operand put it on the stack ction operator apply it to the top two stack items and replace them by the result. Consider the postfix evaluation of 23 5 * 7 4 * Here is what the stack looks like as the expression is read: Input 23 5 * 7 4 * Stack 23 5 23 115 7 115 4 7 115 28 115 143 If the expression is well-formed there will be exactly one item on the stack after the last input symbol has been processed. Comp 162 Notes Page 2 of 17 pril 15, 2013
lgorithm 2: conversion of an infix expression to postfix It would be handy to have an algorithm that reads an infix expression from left to right and outputs the corresponding post-fix. For example input: ( ) * C D / E output: C * D E / Features of the algorithm: * has to process open and close parentheses * the operands in the output are in the same order as in the input but the order of operators may be different reflecting the different operator priorities - multiplication has higher priority than addition for example * the stack used by the algorithm is a stack of operators - we include "(" in this category. We just have to define what action to take for each of the 4 kinds of symbol that can appear in the input. Here are the actions. Symbol ( ction put it on the stack with lowest possible priority ) Unstack and output operators until "(" reached. Unstack but do not output the "(" operand output it operator while ( priority(top-of-stack) priority(input)) unstack it and output it. push the input onto the stack t the end of input there may be operators left on the stack; if so, we unstack and output them one by one. Here is a trace of the example above Input ( ) * C D / E Stack / / ( ( ( ( * * Output C * D E / Comp 162 Notes Page 3 of 17 pril 15, 2013
lgorithm 3: Evaluate infix - a combination of algorithms 1 and 2. We can combine algorithms 1 and 2 to give us an algorithm that reads an infix expression and determines its value. It will use two stacks (an operand stack as in algorithm 1 and an operator stack as in algorithm 2). s in algorithm 1 it will leave the value of the expression as the only item on the operand stack. Here are the actions required for each of the 4 kinds of symbol: Symbol ( ction put it on the operator stack with lowest possible priority ) Unstack and apply operators until "(" reached. Unstack but do not apply the "(" operand put on operand stack operator while ( priority(top-of-operator-stack) priority(input) ) unstack and apply the top-of-operator-stack. push the input onto the operator stack t the end: unstack and apply any operators remaining on the operator stack. "pply an operator" means apply it to the top 2 items on the operand stack and replace them by the result of the operation (like we did in algorithm 1). Here is a trace of the algorithm on ( 6 4 ) * 3 16 / 2 Input ( 6 4 ) * 3 16 / 2 Operator Stack ( ( ( ( * * / / Operand Stack 6 6 4 6 10 10 3 10 30 16 30 16 30 2 16 8 30 30 38 Comp 162 Notes Page 4 of 17 pril 15, 2013
lgorithm 4: building a tree from an infix expression This algorithm is a variation on algorithm 3. It uses two stacks, one of operators and one of pointers to tree nodes. Here are the actions for the four types of symbol: Symbol ( ction put it on the operator stack with lowest possible priority ) Unstack and apply operators until "(" reached. Unstack but do not apply the "(" operand create a binary tree node with the operand as the data item and nil in the two pointer fields. Push a pointer to this node onto the operand stack operator while ( priority(top-of-operator-stack) priority(input) ) unstack and apply the top-of-operator-stack. push the input onto the operator stack t the end: unstack and apply any operators remaining on the operator stack What does "apply" an operator mean in this algorithm? It means create a binary tree node with the operator as the data item and left and right pointers containing the pointer values in the top items on the operand stack. Then pop those two items and push a pointer to the new node. For example, efore fter * 9 * -1 9-1 Operator Operand Operator Operand Comp 162 Notes Page 5 of 17 pril 15, 2013
fter the expression has been read, we should get a binary expression tree pointed to from the only item left on the operand stack. Example: Expression * C D Input: Operator Stack <empty> Input: Operator Stack Comp 162 Notes Page 6 of 17 pril 15, 2013
Input: Operator Stack Input: * Operator Stack * Comp 162 Notes Page 7 of 17 pril 15, 2013
Input: C Operator Stack * C Input: Operator Stack * C Comp 162 Notes Page 8 of 17 pril 15, 2013
Input: D Operator Stack D * C t the end of input we unstack and apply the operators on the operator stack resulting in first. Comp 162 Notes Page 9 of 17 pril 15, 2013
Operator Stack D * C and finally, Operator Stack <empty> D * C Comp 162 Notes Page 10 of 17 pril 15, 2013
Error checking Here are some simple error checks we can add to lgorithm 4 to make it more robust 1. llowable sequences of symbols in the input Previous Symbol Current Symbol ( ) Operator Operand ( OK Error Error OK ) Error OK OK Error Operator OK Error Error OK Operand Error OK OK Error 2. Parentheses. We can maintain a counter, initially zero, that is incremented whenever we read an opening parenthesis and decremented when we read a closing one. The counter should never go negative and must be zero at the end of the input. 3. Operator. In general an operator requires n operands. In our case we just have n=2. There should be at least n items on the operand stack when we apply the operator. Next we will see how to generate assembly code from the binary tree. Comp 162 Notes Page 11 of 17 pril 15, 2013
Code generation lgorithm 5: generating assembly code Visiting all the nodes in a linked list is easy. We start at the beginning and move node-by-node to the end. Note that a list can be viewed as a recursive structure because it is made up of a head (the first node) and tail the rest of the list. The tail is also a list. Thus we could write a function to print a list recursively as follows: void printlist (listnode *L) { if (L!= NULL) { output (L->data); printlist(l->next); } } If we want to print the list backwards, it is quite tricky to do with iteration but a recursive solution is simple: void printlistbackwards (listnode *L) { if (L!= NULL) { printlistbackwards(l->next); output (L->data); } } binary tree can also be regarded as a recursive structure. It consists of a root node and (possibly empty) left and right sub-trees each of which in turn is a binary tree. "tree traversal" is a systematic visiting of all the nodes in a tree. Three common traversals are characterized by the order in which they visit the left and right sub-trees and the root node. Preorder: Inorder: Postorder: root, left, right left, root, right left, right, root. Comp 162 Notes Page 12 of 17 pril 15, 2013
The ordering is applied recursively to the sub-trees of the tree. For example if the tree is t / \ / \ w x / \ / \ / \ / \ h b a r / \ / \ j m preorder traversal visits the nodes in this order: t w h b x a r j m n inorder traversal visits the nodes in this order: h w b t a x j r m postorder traversal visits the nodes in this order: h b w a j m r x t lgorithm 5 Consider the binary expression tree that we have constructed by processing an arithmetic expression with lgorithm 4. We can traverse this tree and output appropriate instruction sequences for our target machine (in this case Pep/8). In the example that follows we assume that there is a MUL and a DIV instruction for * and / respectively. The sequence of instructions uses the Pep/8 user stack to evaluate the expression because, unlike the set of registers, this is virtually unlimited in size. For example, if the binary expression tree is X / \ / \ Y We generate ; next three lines from leaf X lda X,d ; next three lines from leaf Y lda Y,d ; next four lines from operator lda 0,s adda 2,s sta 2,s Comp 162 Notes Page 13 of 17 pril 15, 2013
When this sequence is executed on the Pep/8 machine we end up with the value of the expression as the top item on the Pep/8 stack. You can see from this example that the traversal we need is postorder (left, right, root). ecause the same ordering is used within subtrees, tree traversal algorithms are often recursive. This is illustrated in the following C/pseudocode function generate that outputs assembly language from a given expression tree. It assumes that the operands in the expression tree are names of global variables so it uses direct mode to reference them. void generate (treenode* T) { if (T!= null) { if ( T->left==NULL) /* true if node is a leaf */ { printf("\n"); printf("lda %s,d\n",t->data);/* ssume T->data is name of a global */ printf("\n"); } else { generate(t->left); generate(t->right); printf("lda 0,s\n"); if (T->data == ) printf( DDa 2,s\n ); if (T->data == - ) printf( SUa 2,s\n ); if (T->data == * ) printf( MULa 2,s\n ); if (T->data == / ) printf( DIVa 2,s\n ); printf("sta 2,s\n"); printf("\n"); ) } } The tree generated from assignment X = ( C ) * ( D E ) Comp 162 Notes Page 14 of 17 pril 15, 2013
is = X * C D E In our implementation, because of the way that pointers in nodes are actually assigned, the code generated begins lda e,d lda d,d lda 0,s DDa 2,s sta 2,s If we substitute a call to a MUL subroutine in place of the MULr instruction then the complete sequence contains 32 instructions: 3 generated from each of the 5 leaves/operands 4 generated from each of the 3 operators 2 generated from the * operator ( call to MUL) 3 for the assignment and tidy up of stack Comp 162 Notes Page 15 of 17 pril 15, 2013
Optimization There is clearly scope for an optimizer to improve the code generated from lgorithm 5. Some possibilities are: (1) Removing redundant load instructions. See line 7 in the example above. (2) Combining/eliminating consecutive SP changes The translation of x=a*bc*d includes call MUL lda b,d oth changes to SP can be removed. In general, changes to SP on consecutive lines can be combined and if the net change is zero, they can be eliminated. (3) Combining SP operations with a wider view Example optimization (1) and (2) only require the optimizer to look at small sections of assembly code (perhaps two or three lines). y looking at larger sections, further savings might be possible. For example, the beginning of the translation of x=a*bc*d is lda d,d lda c,d which can be simplified to subsp 4,i lda d,d sta 2,s lda c,d Optimizations (1) and (2) have been implemented as shown in the following example. Comp 162 Notes Page 16 of 17 pril 15, 2013
$ codegen "x=a5*b7" $ codegen " x=a5*b7" optimizer lda 7,i lda b,d lda 5,i call MUL lda a,d lda 0,s DDa 2,s sta 2,s lda 0,s DDa 2,s sta 2,s lda 0,s sta x,d lda 7,i lda b,d lda 5,i call MUL lda a,d DDa 2,s sta 2,s lda 0,s DDa 2,s sta 2,s lda 0,s sta x,d Reading Our treatment of Code Generation is an alternative to section 7.4. Comp 162 Notes Page 17 of 17 pril 15, 2013