PESIT Bangalore South Campus Hosur road, 1km before Electronic City, Bengaluru -100 Department of Computer Science and Engineering

TEST 1 Date : 24 02 2015 Marks : 50 Subject & Code : Compiler Design ( 10CS63) Class : VI CSE A & B Name of faculty : Mrs. Shanthala P.T/ Mrs. Swati Gambhire Time : 8:30 10:00 AM SOLUTION MANUAL 1. a. Define Compiler. What are the phases of the Compiler? Explain with a neat diagram. Mention the input and output for each. Answar: A compiler is a computer program (or set of programs) that transforms source code written in a programming language (the source language) into another computer language (the target language, often having a binary form known as object code). Phases of a compiler The first three phases, forms the bulk of the analysis portion of a compiler. Symbol table management and error handling, are shown interacting with the six phases.

Symbol table management An essential function of a compiler is to record the identifiers used in the source program and collect information about various attributes of each identifier. A symbol table is a data structure containing a record for each identifier, with fields for the attributes of the identifier. The data structure allows us to find the record for each identifier quickly and to store or retrieve data from that record quickly. When an identifier in the source program is detected by the lex analyzer, the identifier is entered into the symbol table. Error Detection and Reporting Each phase can encounter errors. A compiler that stops when it finds the first error is not as helpful as it could be. The syntax and semantic analysis phases usually handle a large fraction of the errors detectable by the compiler. The lexical phase can detect errors where the characters remaining in the input do not form any token of the language. Errors when the token stream violates the syntax of the language are determined by the syntax analysis phase. During semantic analysis the compiler tries to detect constructs that have the right syntactic structure but no meaning to the operation involved. The Analysis phases As translation progresses, the compiler s internal representation of the source program changes. The lexical analysis phase reads the characters in the source pgm and groups them into a stream of tokens in which each token represents a logically cohesive sequence of characters, such as an identifier, a keyword etc. The character sequence forming a token is called the lexeme for the token. Certain tokens will be augmented by a lexical value. For example, for any identifier the lex analyzer generates not only the token id but also enter s the lexeme into the symbol table, if it is not already present there. The lexical value associated this occurrence of id points to the symbol table entry for this lexeme. Syntax analysis imposes a hierarchical structure on the token stream, which is shown by syntax trees. Intermediate Code Generation After syntax and semantic analysis, some compilers generate an explicit intermediate representation of the source program. This intermediate representation can have a variety of forms. Code Optimisation

The code optimization phase attempts to improve the intermediate code, so that faster running machine codes will result. Some optimizations are trivial. There is a great variation in the amount of code optimization different compilers perform. In those that do the most, called optimising compilers, a significant fraction of the time of the compiler is spent on this phase. Code Generation The final phase of the compiler is the generation of target code, consisting normally of relocatable machine code or assembly code. Memory locations are selected for each of the variables used by the program. Then, intermediate instructions are each translated into a sequence of machine instructions that perform the same task. A crucial aspect is the assignment of variables to registers. Lexical analyzer takes the source program as an input and produces a long string of tokens. Syntax Analyzer takes an out of lexical analyzer and produces a large tree. Semantic analyzer takes the output of syntax analyzer and produces another tree. Intermediate code generator takes a tree as an input produced by semantic analyzer and produces intermediate code. ------------------------------------------------------------------------------------------------------------------- b. What are Compiler Construction Tools? Explain its specifications in detail. The compiler writer, like any software developer, can profitably use modern software development environments containing tools such as language editors, debuggers, version managers, profilers, test harnesses, and so on. In addition to these general software-development tools, other more specialized tools have been created to help implement various phases of a compiler. These tools use specialized languages for specifying and implementing specific components, and many use quite sophisticated algorithms. The most successful tools are those that hide the details of the generation algorithm and produce components that can be easily integrated into the remainder of the compiler. Some commonly used compiler-construction tools include 1. Parser generators that automatically produce syntax analyzers from a grammatical description of a programming language. 2. Scanner generators that produce lexical analyzers from a regular-expression description of the tokens of a language.

3. Syntax-directed translation engines that produce collections of routines for walking a parse tree and generating intermediate code. 4. Code-generator generators that produce a code generator from a collection of rules for translating each operation of the intermediate language into the machine language for a target machine. 5. Data-flow analysis engines that facilitate the gathering of information about how values are transmitted from one part of a program to each other part. Data-flow analysis is a key part of code optimization. 6. Compiler-construction toolkits that provide an integrated set of routines for constructing various phases of a compiler. ------------------------------------------------------------------------------------------------------------------ 2. a. Define the role of input buffer in lexical analysis. It is used for the purpose of reading a reading a character from source program in speed. If the source program is large only some lexemes of source program are loaded in buffer,in some case if lexemes are more compared to size of buffer for this purpose we need to use two input buffer. This two buffer handles the large lookaheads safely. Buffer pairs : An important scheme involves two buffers that are alternatively reloaded. Each buffer is of size N,where N can be size of disk block ie 4096. If the lexemes in source program are more compared to the size of buffer, then move to next buffer and mark current buffer as eof which is a sentinel which saves the time checking for the ends of buffer and also it is not a part of source program. Any eof that appears other than at the end of the buffer marks the end of input. Two pointers to the input are maintained : 1. lexemebegin is a pointer marks the beginning of current lexeme. 2. Forward is a pointer scans ahead until it gets a next lexeme.

Once the next lexeme is determined, forward is set to the character at its right end. Then after the lexeme is recorded as an attribute value of a token returned to the parser. lexemebegin is set to character immediately after the lexeme just found. Code: fwd++; if ( *fwd == EOF ) { /* special processing needed */ if (fwd at end of first half)... else if (fwd at end of second half)... else /* end of input */ terminate processing. } The code indicates that once we have reached the end of first buffer then we must reload second buffer the process continues and finally terminates the processing. ---------------------------------------------------------------------------------------------------------------- b. Construct the transition diagram to recognize the tokens given below. i). Relational operator. ii). Unsigned number. Answer: As an intermediate step in the construction of a lexical analyzer, we first convert patterns into stylized flowcharts called transition diagrams. i) for relop : it recognizes the lexemes matching token relop. In beginning start state < first input symbol,then among the lexemes that matches the pattern for relop,we can look at <,<> or <= we go to state 1, then next character, if = then recognize lexeme <= then state 2 and return token relop with attribute LE constant representing this particular comparison operator. if in state 1 next is > then instead we have <> and enter 3 to return an indication that the not equal operator has been found. on any other if lexeme is < and we enter state 4 to return information, however that state 4 has a* to indicate that we must retract input to state 1.if in state 0 first character is = then this one character must be lexeme we immediately return from state 5 remaining possibility is > we must enter state 6 and decide on basis of next character, whether the lexeme is >= {> or = ).

ii). for unsigned numbers : transition diagram for token no. is so far most complex diagram. Beginning in state 12, if we see digit we go to 13, in that state we can read any no. of additional digits. if we see anything but a digt or dot, we have seen a no. in form of integers eg.. 123 that case is handled by entering state 20,where we return token no. and pointer to a table of constants where the found lexeme is entered. If we instead see a dot in state 13, we have optional fraction. state 14 is entered and we look for one or more digits 15 is used. if we see E then optional exponent whose recognition is job of state 16 through state 19. in state 15 instead see anything but E or digit, then we have to come to end of fraction, there is no exponent and we return lexeme found via state 21.

3.a. Show the translation for an assignment statement: Position=initial + rate * 60 Clearly indicate the output of each phase. b for the grammar: S cad A ab a Trace the input cad for the recursive descent parser. Trace for the given example: S -> c A d A -> a b a

w=cad string step 1: S has only one production we expand S first character of input w=cad matches the leftmost leaf in the tree c Step 2:; we expand A->a b we have a match for second input character a we go to next symbol d b does not match d we report failure we go back to A to try another alternative we reset input pointer to position 2 step 3: the second alternative for A is A->a leaf a matches second symbol leaf d matched the third symbol we halt with successful parsing message

---------------------------------------------------------------------------------------------------- 4. a Explain the left recursion. Describe the algorithm used for eliminating the left recursion. Left Recursion A grammar is left recursive if it has a non-terminal A such that there is a derivation. A A for some string Algorithm for leftrecursion: - Arrange non-terminals in some order: A 1... A n - for i from 1 to n do { - for j from 1 to i-1 do { replace each production A i A j by A i 1... k where A j 1... k } - eliminate immediate left-recursions among A i productions } ---------------------------------------------------------------------------------------------------- b Define FIRST and FOLLOW rules used in predictive parsing technique. Rules to compute FIRST? 1. if X is terminal then FIRST(X)={X} 2. if X is non-terminal X->Y 1 Y 2 Y k is a production for some k>=1 place a in FIRST(X) if for some i a is in FIRST(Y i ) and ε is in FIRST(Y 1 ) FIRST(Y i-1 )

if ε is in FIRST(Y j ) j=1,,k then add ε to FIRST(X) 3.if X-> ε is a production then add ε to FIRST(X) Rules to compute FOLLOW 1. place $ in FOLLOW(S) S is the start symbol $ is the right end-marker 2. if there is a production A->αBβ Bβ everything in FIRST(β) except ε is in FOLLOW(B) 3. if there is a production A->αB or A->αBβ where first(β) contains ε everything in FOLLOW(A) is in FOLLOW(B) 5. Consider the following CFG, which has the set of terminals T = {id, (, ), [, ], ;} E id id(a) id [E] A E E ; A (a). Left-factor this grammar so that no two productions with the same lefthand side have right-hand sides with a common prefix. (b). Construct an LL(1) parsing table for the left-factored grammar. (c). Show the operation of an LL(1) parser on the input string id(id[id]; id) Answer: a). E id X X ε (A) [E] A EY Y ε ; A (b) The First and Follow sets of the non-terminals are as follows. First(E) = {id} Follow(E) = {$, ], ;, )} First(X) = { (, [,ε } Follow(X) = {$, ], ;, )} First(A) = {id} Follow(A) = { ) } First(Y ) = {;, ε} Follow(Y ) = { ) } Here is an LL(1) parsing table for the grammar:

Id ( ) [ ] ; $ E E id X X X (A) X ε X [E] X ε X ε X ε A A EY Y Y ε Y ; A Stack Input Action E$ id(id[id]; id)$ E idx idx$ id(id[id]; id)$ terminal id X$ (id[id]; id)$ X (A) (A)$ (id[id]; id)$ terminal ( A)$ id[id]; id)$ A EY EY )$ id[id]; id)$ E idx idxy )$ id[id]; id)$ terminal id XY )$ [id]; id)$ X! [E] [E]Y )$ [id]; id)$ terminal [ E]Y )$ id]; id)$ E idx idx]y )$ id]; id)$ terminal id X]Y )$ ]; id)$ X ε ]Y )$ ]; id)$ terminal ] Y )$ ; id)$ Y ; A ; A)$ ; id)$ terminal ; A)$ id)$ A EY EY )$ id)$ E -->! idx idxy )$ id)$ terminal id XY )$ )$ X ε Y )$ )$ Y ε )$ )$ terminal ) $ $ Accept ***************************************************************************