CS6660 COMPILER DESIGN L T P C

Size: px
Start display at page:

Download "CS6660 COMPILER DESIGN L T P C"

Transcription

1 COMPILER DESIGN

2 CS6660 COMPILER DESIGN L T P C UNIT I INTRODUCTION TO COMPILERS 5 Translators-Compilation and Interpretation-Language processors -The Phases of CompilerErrors Encountered in Different Phases-The Grouping of Phases-Compiler Construction Tools - Programming Language basics. UNIT II LEXICAL ANALYSIS 9 Need and Role of Lexical Analyzer-Lexical Errors-Expressing Tokens by Regular Expressions- Converting Regular Expression to DFA- Minimization of DFA-Language for Specifying Lexical Analyzers-LEX-Design of Lexical Analyzer for a sample Language. UNIT III SYNTAX ANALYSIS 10 Need and Role of the Parser-Context Free Grammars -Top Down Parsing -General Strategies- Recursive Descent Parser Predictive Parser-LL(1) Parser-Shift Reduce ParserLR Parser-LR (0) Item-Construction of SLR Parsing Table -Introduction to LALR Parser - Error Handling and Recovery in Syntax Analyzer-YACC-Design of a syntax Analyzer for a Sample Language. UNIT IV SYNTAX DIRECTED TRANSLATION & RUN TIME ENVIRONMENT 12 Syntax directed Definitions-Construction of Syntax Tree-Bottom-up Evaluation of SAttribute Definitions- Design of predictive translator - Type Systems-Specification of a simple type checker- Equivalence of Type Expressions-Type Conversions. RUN-TIME ENVIRONMENT: Source Language Issues-Storage Organization-Storage Allocation- Parameter Passing-Symbol Tables-Dynamic Storage Allocation-Storage Allocation in FORTAN. UNIT V CODE OPTIMIZATION AND CODE GENERATION 9 Principal Sources of Optimization-DAG- Optimization of Basic Blocks-Global Data Flow Analysis- Efficient Data Flow Algorithms-Issues in Design of a Code Generator - A Simple Code Generator Algorithm. TOTAL: 45 PERIODS TEXTBOOK: 1. Alfred V Aho, Monica S. Lam, Ravi Sethi and Jeffrey D Ullman, Compilers Principles, Techniques and Tools, 2nd Edition, Pearson Education, REFERENCES: 1. Randy Allen, Ken Kennedy, Optimizing Compilers for Modern Architectures: A Dependence-based Approach, Morgan Kaufmann Publishers, Steven S. Muchnick, Advanced Compiler Design and Implementation, Morgan Kaufmann Publishers - Elsevier Science, India, Indian Reprint Keith D Cooper and Linda Torczon, Engineering a Compiler, Morgan Kaufmann Publishers Elsevier Science, Charles N. Fischer, Richard. J. LeBlanc, Crafting a Compiler with C, Pearson Education, 2008.

3 TABLE OF CONTENTS UNIT I INTRODUCTION TO COMPILERS 1.1 Translators Compilation and Interpretation Language processors The Phases of Compiler Errors Encountered in Different Phases The Grouping of Phases Compiler Construction Tools Programming Language basics 1.10 UNIT II LEXICAL ANALYSIS 2.1 Need and Role of Lexical Analyzer Lexical Errors Expressing Tokens by Regular Expressions Converting Regular Expression to DFA Minimization of DFA Language for Specifying Lexical Analyzers-LEX Design of Lexical Analyzer for a sample Language 2.12 UNIT III SYNTAX ANALYSIS 3.1 Need and Role of the Parser Context Free Grammars Top Down Parsing -General Strategies Recursive Descent Parser Predictive Parser LL(1) Parser Shift Reduce Parser LR Parser 3.15

4 3.9 LR (0) Item Construction of SLR Parsing Table Introduction to LALR Parser Error Handling and Recovery in Syntax Analyzer YACC Design of a syntax Analyzer for a Sample Language 3.29 UNIT IV SYNTAX DIRECTED TRANSLATION & RUN TIME ENVIRONMENT 4.1 Syntax directed Definitions Construction of Syntax Tree Bottom-up Evaluation of S-Attribute Definitions Design of predictive translator Type Systems Specification of a simple type checker Equivalence of Type Expressions Type Conversions RUN-TIME ENVIRONMENT: Source Language Issues Storage Organization Storage Allocation Parameter Passing Symbol Tables Dynamic Storage Allocation Storage Allocation in FORTAN 4.29 UNIT V CODE OPTIMIZATION AND CODE GENERATION 5.1 Principal Sources of Optimization DAG Optimization of Basic Blocks Global Data Flow Analysis Efficient Data Flow Algorithms Issues in Design of a Code Generator A Simple Code Generator Algorithm 5.24

5 CS6660 Compiler Design Unit I 1.1 UNIT I INTRODUCTION TO COMPILERS 1.1 TRANSLATORS A translator is one kind of program that takes one form of program (input) and converts into another form (output). The input program is called source language and the output program is called target language. The source language can be low level language like assembly language or a high level language like C, C++, JAVA, FORTRAN, and so on. The target language can be a low level language (assembly language) or a machine language (set of instructions executed directly by a CPU). Source language Translator Target language Figure 1.1: Translator Types of Translators are: (1). Compilers (2). Interpreters (3). Assemblers 1.2 COMPILATION AND INTERPRETATION A compiler is a program that reads a program in one language and translates it into an equivalent program in another language. The translation done by a compiler is called compilation. An interpreter is another common kind of language processor. Instead of producing a target program as a translation, an interpreter appears to directly execute the operations specified in the source program on inputs supplied by the user. An interpreter executes the source program statement by statement. The translation done by an interpreter is called Interpretation. 1.3 LANGUAGE PROCESSORS (i) Compiler A compiler is a program that can read a program in one language (the source language) and translate it into an equivalent program in another language (the target language) compilation is shown in Figure 1.2. Source program (Input) Figure 1.2: A Compiler Compiler Target program (Output) An important role of the compiler is to report any errors in the source program that it detects during the translation process. If the target program is an executable machine-language program, it can then be called by the user to process inputs and produce outputs.

6 CS6660 Compiler Design Unit I 1.2 Input Target Program Output Figure 1.3: Running the target program (ii) Interpreter An interpreter is another common kind of language processor. Instead of producing a target program as a translation, an interpreter appears to directly execute the operations specified in the source program on inputs supplied by the user, as shown in Figure 1.4. Source Program Input Output Interpreter Figure 1.4: An interpreter The machine-language target program produced by a compiler is usually much faster than an interpreter (mapping inputs to outputs is easy in compiler). Compiler converts the source to target completely, but an interpreter executes the source program statement by statement. Usually interpreter gives better error diagnostics than a Compiler. (iii) Hybrid Compiler Hybrid Compiler is combination of compilation and interpretation. Java language processors combine compilation and interpretation as shown in Figure 1.4. Java source program first be compiled into an intermediate form called bytecodes. The bytecodes are then interpreted by a virtual machine. A benefit of this arrangement is that bytecodes compiled on one machine can be interpreted on another machine. Source program Translator Intermediate program Input Virtual Machine Output Figure 1.5: A hybrid compiler In order to achieve faster processing of inputs to outputs, some Java compilers, called justintime compilers, translate the bytecodes into machine language immediately before they run. (iv) Language processing system In addition to a compiler, several other programs may be required to create an executable target program, as shown in Figure 1.6. Preprocessor: Preprocessor collects the source program which is divided into modules and stored in separate files. The preprocessor may also expand shorthands called macros into source language statements. E.g. # include<math.h>, #define PI.14

7 CS6660 Compiler Design Unit I 1.3 Compiler: The modified source program is then fed to a compiler. The compiler may produce an assembly-language program as its output. because assembly language is easier to produce as output and is easier to debug. Assembler: The assembly language is then processed by a program called an assembler that produces relocatable machine code as its output. Linker: The linker resolves external memory addresses, where the code in one file may refer to a location in another file. Large programs are often compiled in pieces, so the relocatable machine code may have to be linked together with other relocatable object files and library files into the code that actually runs on the machine. Loader: The loader then puts together all of the executable object files into memory for execution. It also performs relocation of an object code. Figure 1.6: A language-processing system Note: Preprocessors, Assemblers, Linkers and Loader are collectively called cousins of compiler. 1.4 THE PHASES OF COMPILER / STRUCTURE OF COMPILER The process of compilation carried out in two parts, they are analysis and synthesis. The analysis part breaks up the source program into constituent pieces and imposes a grammatical structure on them. It then uses this structure to create an intermediate representation of the source program. The analysis part also collects information about the source program and stores it in a data structure called a symbol table, which is passed along with the intermediate representation to the synthesis part. The analysis part carried out in three phases, they are lexical analysis, syntax analysis and Semantic Analysis. The analysis part is often called the front end of the compiler. The synthesis part constructs the desired target program from the intermediate representation and the information in the symbol table. The synthesis part carried out in three phases, they are Intermediate Code Generation, Code Optimization and Code Generation. The synthesis part is called the back end of the compiler.

8 CS6660 Compiler Design Unit I 1.4 Figure 1.7: Phases of a compiler Lexical Analysis The first phase of a compiler is called lexical analysis or scanning or linear analysis. The lexical analyzer reads the stream of characters making up the source program and groups the characters into meaningful sequences called lexemes. For each lexeme, the lexical analyzer produces as output a token of the form <token-name, attribute-value> The first component token-name is an abstract symbol that is used during syntax analysis, and the second component attribute-value points to an entry in the symbol table for this token. Information from the symbol-table entry 'is needed for semantic analysis and code generation. For example, suppose a source program contains the assignment statement position = initial + rate * 60 (1.1)

9 CS6660 Compiler Design Unit I 1.5 Figure 1.8: Translation of an assignment statement The characters in this assignment could be grouped into the following lexemes and mapped into the following tokens. (1) position is a lexeme that would be mapped into a token <id,1>. where id is an abstract symbol standing for identifier and 1 points to the symbol able entry for position. (2) The assignment symbol = is a lexeme that is mapped into the token <=>. (3) initial is a lexeme that is mapped into the token <id, 2>. (4) + is a lexeme that is mapped into the token <+>. (5) rate is a lexeme that is mapped into the token <id, 3>. (6) * is a lexeme that is mapped into the token <*>. (7) 60 is a lexeme that is mapped into the token <60>. Blanks separating the lexemes would be discarded by the lexical analyzer. The sequence of tokens produced as follows after lexical analysis. <id, 1> <=> <id, 2> <+> <id, 3> <*> <60> (1.2)

10 CS6660 Compiler Design Unit I Syntax Analysis The second phase of the compiler is syntax analysis or parsing or hierarchical analysis. The parser uses the first components of the tokens produced by the lexical analyzer to create a tree-like intermediate representation that depicts the grammatical structure of the token stream. The hierarchical tree structure generated in this phase is called parse tree or syntax tree. In a syntax tree, each interior node represents an operation and the children of the node represent the arguments of the operation. Figure 1.9: Syntax tree for position = initial + rate * 60 The tree has an interior node labeled * with <id, 3> as its left child and the integer 60 as its right child. The node <id, 3> represents the identifier rate. Similarly <id,2> and <id, 1> are represented as in tree. The root of the tree, labeled =, indicates that we must store the result of this addition into the location for the identifier position Semantic Analysis The semantic analyzer uses the syntax tree and the information in the symbol table to check the source program for semantic consistency with the language definition. It ensures the correctness of the program, matching of the parenthesis is also done in this phase. It also gathers type information and saves it in either the syntax tree or the symbol table, for subsequent use during intermediate-code generation. An important part of semantic analysis is type checking, where the compiler checks that each operator has matching operands. The compiler must report an error if a floating-point number is used to index an array. The language specification may permit some type conversions like integer to float for float addition is called coercions. The operator * is applied to a floating-point number rate and an integer 60. The integer may be converted into a floating-point number by the operator inttofloat explicitly as shown in the figure. Figure 1.10: Semantic tree for position = initial + rate * Intermediate Code Generation After syntax and semantic analysis of the source program, many compilers generate an explicit low-level or machine-like intermediate representation. The intermediate representation have two important properties: a. It should be easy to produce b. It should be easy to translate into the target machine.

11 CS6660 Compiler Design Unit I 1.7 Three-address code is one of the intermediate representations, which consists of a sequence of assembly-like instructions with three operands per instruction. Each operand can act like a register. The output of the intermediate code generator in Figure 1.8 consists of the three-address code sequence for position = initial + rate * 60 t1 = inttofloat(60) t2 = id3 * t1 t3 = id2 + t2 id1 = t3 (1.3) Code Optimization The machine-independent code-optimization phase attempts to improve the intermediate code so that better target code will result. Usually better means faster. Optimization has to improve the efficiency of code so that the target program running time and consumption of memory can be reduced. The optimizer can deduce that the conversion of 60 from integer to floating point can be done once and for all at compile time, so the inttofloat operation can be eliminated by replacing the integer 60 by the floating-point number Moreover, t3 is used only once to transmit its value to id1 so the optimizer can transform (1.3) into the shorter sequence t1 = id3 * 60.0 id1 = id2 + t1 (1.4) Code Generation The code generator takes as input an intermediate representation of the source program and maps it into the target language. If the target language is machine code, then the registers or memory locations are selected for each of the variables used by the program. The intermediate instructions are translated into sequences of machine instructions. For example, using registers R1 and R2, the intermediate code in (1.4) might get translated into the machine code LDF R2, id3 MULF R2, R2, #60.0 LDF Rl, id2 ADDF Rl, Rl, R2 STF idl, Rl (1.5) The first operand of each instruction specifies a destination. The F in each instruction tells us that it deals with floating-point numbers. The code in (1.5) loads the contents of address id3 into register R2, then multiplies it with floating-point constant The # signifies that 60.0 is to be treated as an immediate constant. The third instruction moves id2 into register R1 and the fourth adds to it the value previously computed in register R2. Finally, the value in register R1 is stored into the address of id1, so the code correctly implements the assignment statement (1.1) Symbol-Table Management The symbol table, which stores information about the entire source program, is used by all phases of the compiler. An essential function of a compiler is to record the variable names used in the source program and collect information about various attributes of each name.

12 CS6660 Compiler Design Unit I 1.8 These attributes may provide information about the storage allocated for a name, its type, its scope. In the case of procedure names, such things as the number and types of its arguments, the method of passing each argument (for example, by value or by reference), and the type returned are maintained in symbol table. The symbol table is a data structure containing a record for each variable name, with fields for the attributes of the name. The data structure should be designed to allow the compiler to find the record for each name quickly and to store or retrieve data from that record quickly. A symbol table can be implemented in one of the following ways: o Linear (sorted or unsorted) list o Binary Search Tree o Hash table Among the above all, symbol tables are mostly implemented as hash tables, where the source code symbol itself is treated as a key for the hash function and the return value is the information about the symbol. A symbol table may serve the following purposes depending upon the language in hand: o To store the names of all entities in a structured form at one place. o To verify if a variable has been declared. o To implement type checking, by verifying assignments and expressions. o To determine the scope of a name (scope resolution). 1.5 ERRORS ENCOUNTERED IN DIFFERENT PHASES An important role of the compiler is to report any errors in the source program that it detects during the entire translation process. Each phases of compiler can encounter errors, after detecting errors, must be corrected to precede compilation process. The syntax and semantic phases handles large number of errors in compilation process. Error handler handles all types of errors like lexical errors, syntax errors, semantic errors and logical errors. Lexical errors: Lexical analyzer detects errors from input characters. Name of some keywords identifiers typed incorrectly. Example: switch is written as swich. Syntax errors: Syntax errors are detected by syntax analyzer. Errors like semicolon missing or unbalanced parenthesis. Example: ((a+b* (c-d)). In this statement ) missing after b. Semantic errors: Data type mismatch errors handled by semantic analyzer. Incompatible data type vale assignment. Example: Assigning a string value to integer. Logical errors: Code note reachable and infinite loops.

13 CS6660 Compiler Design Unit I 1.9 Misuse of operators. Codes written after end of main() block. 1.6 THE GROUPING OF PHASES Each phases deals with the logical organization of a compiler. Activities of several phases may be grouped together into a pass that reads an input file and writes an output file. The front-end phases of lexical analysis, syntax analysis, semantic analysis, and intermediate code generation might be grouped together into one pass. Code optimization might be an optional pass. A back end pass consisting of code generation for a particular target machine. Source program (input) Front end Lexical Analyzer Syntax analyzer Semantic Analyzer Intermediate Code Generator Source language dependent ( Machine independent) Intermediate code Back end Code optimizer ( optional) Code Generator Machine dependent ( Source language dependent) Target program (output) Figure 1.11: The Grouping of Phases of compiler Some compiler collections have been created around carefully designed intermediate representations that allow the front end for a particular language to interface with the back end for a certain target machine. Advantages:

14 CS6660 Compiler Design Unit I 1.10 With these collections, we can produce compilers for different source languages for one target machine by combining different front ends. Similarly, we can produce compilers for different target machines, by combining a front end for different target machines. 1.7 COMPILER CONSTRUCTION TOOLS The compiler writer, like any software developer, can profitably use modern software development environments containing tools such as language editors, debuggers, version managers, profilers, test harnesses, and so on. Writing a compiler is a tedious and time consuming task; there are some specialized tools to implement various phases of a compiler. These tools are called Compiler Construction Tools. Some commonly used compiler-construction tools are given below: Scanner generators [Lexical Analysis] Parser generators [Syntax Analysis] Syntax-directed translation engines [Intermediate Code] Data-flow analysis engines [Code Optimization] Code-generator generators [Code Generation] Compilerconstruction toolkits [For all phases] 1. Scanner generators that produce lexical analyzers from a regular-expression description of the tokens of a language. Unix has a tool for Scanner generator called LEX. 2. Parser generators that automatically produce syntax analyzers (parse tree) from a grammatical description of a programming language. Unix has a tool called YACC which is a parser generator. 3. Syntax-directed translation engines that produce collections of routines for walking a parse tree and generating intermediate code. 4. Data-flow analysis engines that facilitate the gathering of information about how values are transmitted from one part of a program to each other part. Data-flow analysis is a key part of code optimization. 5. Code-generator generators that produce a code generator from a collection of rules for translating each operation of the intermediate language into the machine language for a target machine. 6. Compiler-construction toolkits that provide an integrated set of routines for constructing various phases of a compiler. 1.8 PROGRAMMING LANGUAGE BASICS. To design an efficient compiler we should know some language basics. Important concepts from popular programming languages like C, C++, C#, and Java are listed below. Some of the Programming Language basics which are used in most of the languages are listed below. They are:

15 CS6660 Compiler Design Unit I 1.11 The Static/Dynamic Distinction Environments and States Static Scope and Block Structure Explicit Access Control Dynamic Scope Parameter Passing Mechanisms Aliasing The Static/Dynamic Distinction The language uses a static policy or that the issue can be decided at compile time. On the other hand, a policy that only allows a decision to be made when we execute the program is said to be a dynamic policy or to require a decision at run time. The scope of a declaration of x is the region of the program in which uses of x refer to this declaration. A language uses static scope or lexical scope if it is possible to determine the scope of a declaration by looking only at the program. Otherwise, the language uses dynamic scope. With dynamic scope, as the program runs, the same use of x could refer to any of several different declarations of x. Example: consider the use of the term "static" as it applies to data in a Java class declaration. In Java, a variable is a name for a location in memory used to hold a data value. Here, "static" refers not to the scope of the variable, but rather to the ability of the compiler to determine the location in memory where the declared variable can be found. A declaration like public static int x; This makes x a class variable and says that there is only one copy of x, no matter how many objects of this class are created. Moreover, the compiler can determine a location in memory where this integer x will be held. In contrast, had "static" been omitted from this declaration, then each object of the class would have its own location where x would be held, and the compiler could not determine all these places in advance of running the program Environments and States Programming languages affect the values of data elements or affect the interpretation of names for that data changes, as the program runs. For example, the execution of an assignment such as x = y + 1 changes the value denoted by the name x. More specifically, the assignment changes the value in whatever location is denoted by x. The location denoted by x can change at run time. If x is not a static (or "class") variable, then every object of the class has its own location for an instance of variable x. In that case, the assignment to x can change any of those "instance" variables, depending on the object to which a method containing that assignment is applied. environment state names locations(variables) values The association of names with locations in memory (the store) and then with values can be described by two mappings that change as the program runs: 1. The environment is a mapping from names to locations in the store. Since variables refer to locations ('l-values" in the terminology of C), we could alternatively define an environment as a mapping from names to variables.

16 CS6660 Compiler Design Unit I The state is a mapping from locations in store to their values. That is, the state maps 1- values to their corresponding r-values, in the terminology of C. Environments change according to the scope rules of a language. Example: Consider the C program fragment, Integer i is declared a global variable, and also declared as a variable local to function f. When f is executing, the environment adjusts so that name i refers to the location reserved for the i that is local to f, and any use of i, such as the assignment i = 3 shown explicitly, refers to that location. Typically, the local i is given a place on the run-time stack. int i; /* global i */... void f(..) { int i; /* local i */ i=3; /* use of local i */ } x=i+1; /* use of global i */ Whenever a function g other than f is executing, uses of i cannot refer to the i that is local to f. Uses of name i in g must be within the scope of some other declaration of i. An example is the explicitly shown statement x = i+l, which is inside some procedure whose definition is not shown. The i in i + 1 presumably refers to the global i Static Scope and Block Structure The scope rules for C are based on program structure; the scope of a declaration is determined implicitly by where the declaration appears in the program. Later languages, such as C++, Java, and C#, also provide explicit control over scopes through the use of keywords like public, private, and protected. A block is a grouping of declarations and statements. C uses braces { and } to delimit a block; the alternative use of begin and end in some languages. Example: The C++ program in Fig has four blocks, with several definitions of variables a and b. As a memory aid, each declaration initializes its variable to the number of the block to which it belongs.

17 CS6660 Compiler Design Unit I 1.13 Output Figure 1.12: Blocks in a C++ program Consider the declaration int a = 1 in block B1. Its scope is all of B1, except for those blocks nested within B1 that have their own declaration of a. B2, nested immediately within B1, does not have a declaration of a, but B3 does. B4 does not have a declaration of a, so block B3 is the only place in the entire program that is outside the scope of the declaration of the name a that belongs to B1. That is, this scope includes B4 and all of B2 except for the part of B2 that is within B3. The scopes of all five declarations are summarized in Figure Figure 1.13: Scopes of declarations Explicit Access Control Classes and structures introduce a new scope for their members. If p is an object of a class with a field (member) x, then the use of x in p.x refers to field x in the class definition. the scope of a member declaration x in a class C extends to any subclass C', except if C' has a local declaration of the same name x. Through the use of keywords like public, private, and protected, object oriented languages such as C++ or Java provide explicit control over access to member names in a super class. These keywords support encapsulation by restricting access. Thus, private names are purposely given a scope that includes only the method declarations and definitions associated with that class and any "friend" classes (the C++ term). Protected names are accessible to subclasses. Public names are accessible from outside the class.

18 CS6660 Compiler Design Unit I Dynamic Scope Technically, any scoping policy is dynamic if it is based on factor(s) that can be known only when the program executes. The term dynamic scope, however, usually refers to the following policy: a use of a name x refers to the declaration of x in the most recently called procedure with such a declaration. Dynamic scoping of this type appears only in special situations. We shall consider two examples of dynamic policies: macro expansion in the C preprocessor and method resolution in object-oriented programming. Example: In the C program, identifier a is a macro that stands for expression (x + I). But we cannot resolve x statically, that is, in terms of the program text. #define a (x+1) int x = 2; void b() { int x = 1 ; printf ( %d\n, a); } void c() { printf("%d\n, a); } void main() { b(); c(); } In fact, in order to interpret x, we must use the usual dynamic-scope rule. the function main first calls function b. As b executes, it prints the value of the macro a. Since (x + 1) must be substituted for a, we resolve this use of x to the declaration int x=l in function b. The reason is that b has a declaration of x, so the (x + 1) in the printf in b refers to this x. Thus, the value printed is 1. After b finishes, and c is called, we again need to print the value of macro a. However, the only x accessible to c is the global x. The printf statement in c thus refers to this declaration of x, and value 2 is printed Parameter Passing Mechanisms All programming languages have a notion of a procedure, but they can differ in how these procedures get their arguments. The actual parameters (the parameters used in the call of a procedure) are associated with the formal parameters (those used in the procedure definition). In call-by-value, the actual parameter is evaluated (if it is an expression) or copied (if it is a variable). The value is placed in the location belonging to the corresponding formal parameter of the called procedure. This method is used in C and Java. In call- by-reference, the address of the actual parameter is passed to the callee as the value of the corresponding formal parameter. Uses of the formal parameter in the code of the callee are implemented by following this pointer to the location indicated by the caller. Changes to the formal parameter thus appear as changes to the actual parameter. A third mechanism call-by-name was used in the early programming language Algol 60. It requires that the callee execute as if the actual parameter were substituted literally for the formal parameter in the code of the callee, as if the formal parameter were a macro standing for the actual parameter Aliasing There is an interesting consequence of call-by-reference parameter passing or its simulation, as in Java, where references to objects are passed by value. It is possible that two formal parameters can refer to the same location; such variables are said to be aliases of one another. As a result, any two variables, which may appear to take their values from two distinct formal parameters, can become aliases of each other.

19 CS6660 Compiler Design Unit I 1.15 Example: Suppose a is an array belonging to a procedure p, and p calls another procedure q(x, y) with a call q(a, a). Suppose also that parameters are passed by value, but that array names are really references to the location where the array is stored, as in C or similar languages. Now, x and y have become aliases of each other. The important point is that if within q there is an assignment x [10] = 2, then the value of y[10] also becomes 2.

20 CS6660 Compiler Design Unit II 2.1 UNIT II LEXICAL ANALYSIS 2.1 NEED AND ROLE OF LEXICAL ANALYZER Lexical Analysis is the first phase of compiler. It reads the input characters from left to right, one character at a time, from the source program. It generates the sequence of tokens for each lexeme. Each token is a logical cohesive unit such as identifiers, keywords, operators and punctuation marks. It needs to enter that lexeme into the symbol table and also reads from the symbol table. These interactions are suggested in Figure 2.1. Figure 2.1: Interactions between the lexical analyzer and the parser Since the lexical analyzer is the part of the compiler that reads the source text, it may perform certain other tasks besides identification of lexemes. One such task is stripping out comments and whitespace (blank, newline, tab). Another task is correlating error messages generated by the compiler with the source program. Needs / Roles / Functions of lexical analyzer It produces stream of tokens. It eliminates comments and whitespace. It keeps track of line numbers. It reports the error encountered while generating tokens. It stores information about identifiers, keywords, constants and so on into symbol table. Lexical analyzers are divided into two processes: a) Scanning consists of the simple processes that do not require tokenization of the input, such as deletion of comments and compaction of consecutive whitespace characters into one. b) Lexical analysis is the more complex portion, where the scanner produces the sequence of tokens as output. Lexical Analysis versus Parsing / Issues in Lexical analysis 1. Simplicity of design: It is the most important consideration. The separation of lexical and syntactic analysis often allows us to simplify tasks. whitespace and comments removed by the lexical analyzer. 2. Compiler efficiency is improved. A separate lexical analyzer allows us to apply specialized techniques that serve only the lexical task, not the job of parsing. In addition, specialized buffering techniques for reading input characters can speed up the compiler significantly. 3. Compiler portability is enhanced. Input-device-specific peculiarities can be restricted to the lexical analyzer. Tokens, Patterns, and Lexemes

21 CS6660 Compiler Design Unit II 2.2 A token is a pair consisting of a token name and an optional attribute value. The token name is an abstract symbol representing a kind of single lexical unit, e.g., a particular keyword, or a sequence of input characters denoting an identifier. Operators, special symbols and constants are also typical tokens. A pattern is a description of the form that the lexemes of a token may take. Pattern is set of rules that describe the token. A lexeme is a sequence of characters in the source program that matches the pattern for a token. Table 2.1: Tokens and Lexemes TOKEN INFORMAL DESCRIPTION (PATTERN) if characters i, f if else characters e, l, s, e else comparison < or > or <= or >= or == or!= <=,!= SAMPLE LEXEMES id Letter, followed by letters and digits pi, score, D2, sum, id_1, AVG number any numeric constant 35, , 0, 6.02e23 literal anything surrounded by Core, Design Appasami, In many programming languages, the following classes cover most or all of the tokens: 1. One token for each keyword. The pattern for a keyword is the same as the keyword itself. 2. Tokens for the operators, either individually or in classes such as the token comparison mentioned in table One token representing all identifiers. 4. One or more tokens representing constants, such as numbers and literal strings. 5. Tokens for each punctuation symbol, such as left and right parentheses, comma, and semicolon Attributes for Tokens When more than one lexeme can match a pattern, the lexical analyzer must provide the subsequent compiler phases additional information about the particular lexeme that matched. The lexical analyzer returns to the parser not only a token name, but an attribute value that describes the lexeme represented by the token. The token name influences parsing decisions, while the attribute value influences translation of tokens after the parse. Information about an identifier - e.g., its lexeme, its type, and the location at which it is first found (in case an error message) - is kept in the symbol table. Thus, the appropriate attribute value for an identifier is a pointer to the symbol-table entry for that identifier. Example: The token names and associated attribute values for the Fortran statement E = M * C ** 2 are written below as a sequence of pairs. <id, pointer to symbol-table entry for E> < assign_op > <id, pointer to symbol-table entry for M> <mult_op>

22 CS6660 Compiler Design Unit II 2.3 <id, pointer to symbol-table entry for C> <exp_op> <number, integer value 2 > Note that in certain pairs, especially operators, punctuation, and keywords, there is no need for an attribute value. In this example, the token number has been given an integer-valued attribute. 2.2 LEXICAL ERRORS It is hard for a lexical analyzer to tell that there is a source-code error without the aid of other components. Consider a C program statement fi ( a == f(x)). The lexical analyzer cannot tell whether fi is a misspelling of the keyword if or an undeclared function identifier. Since fi is a valid lexeme for the token id, the lexical analyzer must return the token id to the parser. The lexical analyzer is unable to proceed because none of the patterns for tokens matches any prefix of the remaining input. The simplest recovery strategy is "panic mode" recovery. We delete successive characters from the remaining input, until the lexical analyzer can find a well-formed token at the beginning of what input is left. Other possible error-recovery actions are: 1. Delete one character from the remaining input. 2. Insert a missing character into the remaining input. 3. Replace a character by another character. 4. Transpose two adjacent characters. Transformations like these may be tried in an attempt to repair the input. The simplest such strategy is to see whether a prefix of the remaining input can be transformed into a valid lexeme by a single transformation. In practice most lexical errors involve a single character. A more general correction strategy is to find the smallest number of transformations needed to convert the source program into one that consists only of valid lexemes. 2.3 EXPRESSING TOKENS BY REGULAR EXPRESSIONS Specification of Tokens Regular expressions are an important notation for specifying lexeme patterns. We cannot express all possible patterns, they are very effective in specifying those types of patterns that we actually need for tokens. Strings and Languages An alphabet is any finite set of symbols. Examples of symbols are letters, digits, and punctuation. The set {0,1) is the binary alphabet. ASCII is an important example of an alphabet. A string (sentence or word) over an alphabet is a finite sequence of symbols drawn from that alphabet. The length of a string s, usually written s, is the number of occurrences of symbols in s. For example, banana is a string of length six. The empty string, denoted ε, is the string of length zero. A language is any countable set of strings over some fixed alphabet. Abstract languages like Φ, the empty set, or { ε }, the set containing only the empty string, are languages under this definition. Parts of Strings: 1. A prefix of string s is any string obtained by removing zero or more symbols from the end of s. For example, ban, banana, and ε are prefixes of banana. 2. A sufix of string s is any string obtained by removing zero or more symbols from the beginning of s. For example, nana, banana, and ε are suffixes of banana.

23 CS6660 Compiler Design Unit II A substring of s is obtained by deleting any prefix and any suffix from s. For instance, banana, nan, and ε are substrings of banana. 4. The proper prefixes, suffixes, and substrings of a string s are those, prefixes, suffixes, and substrings, respectively, of s that are not ε or not equal to s itself. 5. A subsequence of s is any string formed by deleting zero or more not necessarily consecutive positions of s. For example, baan is a subsequence of banana. 6. If x and y are strings, then the concatenation of x and y, denoted xy, is the string formed by appending y to x. Operations on Languages In lexical analysis, the most important operations on languages are union, concatenation, and closure, which are defined in table 2.2. Table 2.2: Definitions of operations on languages Example: Let L be the set of letters {A, B,..., Z, a, b,..., z) and let D be the set of digits {0,1,...9). Other languages that can be constructed from languages L and D 1. L U D is the set of letters and digits - strictly speaking the language with 62 strings of length one, each of which strings is either one letter or one digit. 2. LD is the set df 520 strings of length two, each consisting of one letter followed by one digit. 3. L4 is the set of all 4-letter strings. 4. L* is the set of ail strings of letters, including e, the empty string. 5. L(L U D)* is the set of all strings of letters and digits beginning with a letter. 6. D+ is the set of all strings of one or more digits. Regular expression Regular expression can be defined as a sequence of symbols and characters expressing a string or pattern to be searched. Regular expressions are mathematical representation which describes the set of strings of specific language. Regular expression for identifiers represented by letter_ ( letter_ digit )*. The vertical bar means union, the parentheses are used to group subexpressions, and the star means "zero or more occurrences of". Each regular expression r denotes a language L(r), which is also defined recursively from the languages denoted by r's subexpressions. The rules that define the regular expressions over some alphabet Σ. Basis rules: 1. ε is a regular expression, and L(ε) is { ε }. 2. If a is a symbol in Σ, then a is a regular expression, and L(a) = {a}, that is, the language with one string of length one. Induction rules: Suppose r and s are regular expressions denoting languages L(r) and L(s), respectively. 1. (r) (s) is a regular expression denoting the language L(r) U L(s).

24 CS6660 Compiler Design Unit II (r) (s) is a regular expression denoting the language L(r) L(s). 3. (r) * is a regular expression denoting (L (r)) *. 4. (r) is a regular expression denoting L(r). i.e., Additional pairs of parentheses around expressions. Example: Let Σ = {a, b}. Regular expression Language Meaning a b {a, b} Single a or b (a b) (a b) {aa, ab, ba, bb} All strings of length two over the alphabet Σ a* { ε, a, aa, aaa, } Consisting of all strings of zero or more a's (a b)* {ε, a, b, aa, ab, ba, bb, aaa, } set of all strings consisting of zero or more instances of a or b a a*b {a, b, ab, aab, aaab, } String a and all strings consisting of zero or more a's and ending in b A language that can be defined by a regular expression is called a regular set. If two regular expressions r and s denote the same regular set, we say they are equivalent and write r = s. For instance, (a b) = (b a), (a b)*= (a*b*)*, (b a)*= (a b)*, (a b) (b a) =aa ab ba bb. Algebraic laws Algebraic laws that hold for arbitrary regular expressions r, s, and t: LAW r s = s r r(s t) = (r s)t r(st) = (rs)t DESCRIPTION is commutative is associative Concatenation is associative r(s t) = rs rt; (s t)r = sr tr Concatenation distributes over ε r = r ε = r ε is the identity for concatenation r* = (r ε)* ε is guaranteed in a closure r** = r* * is idempotent Extensions of Regular Expressions Few notational extensions that were first incorporated into Unix utilities such as Lex that are particularly useful in the specification lexical analyzers. 1. One or more instances: The unary, postfix operator + represents the positive closure of a regular expression and its language. If r is a regular expression, then (r)+ denotes the language (L(r)) +. The two useful algebraic laws, r* = r + ε and r + = rr* = r*r. 2. Zero or one instance: The unary postfix operator? means "zero or one occurrence." That is, r? is equivalent to r ε, L(r?) = L(r) U {ε}.

25 CS6660 Compiler Design Unit II Character classes: A regular expression a1 a2 an, where the ai's are each symbols of the alphabet, can be replaced by the shorthand [a1, a2, an]. Thus, [abc] is shorthand for a b c, and [a-z] is shorthand for a b z. Example: Regular definition for C identifier Letter_ [A-Z a-z_] digit [0-9] id letter_ ( letter_ digit )* Example: Regular definition unsigned integer digit [0-9] digits digit + number digits (. digits)? ( E [+ -]? digits )? Note: The operators *, +, and? has the same precedence and associativity. 2.4 CONVERTING REGULAR EXPRESSION TO DFA To construct a DFA directly from a regular expression, we construct its syntax tree and then compute four functions: nullable, firstpos, lastpos, and followpas, defined as follows. Each definition refers to the syntax tree for a particular augmented regular expression (r)#. 1. nullable(n) is true for a syntax-tree node n if and only if the subexpression represented by n has ε in its language. That is, the subexpression can be "made null" or the empty string, even though there may be other strings it can represent as well. 2. firstpos(n) is the set of positions in the subtree rooted at n that correspond to the first symbol of at least one string in the language of the subexpression rooted at n. 3. lastpos(n) is the set of positions in the subtree rooted at n that correspond to the last symbol of at least one string in the language of the subexpression rooted at n. 4. followpos(p), for a position p, is the set of positions q in the entire syntax tree such that there is some string x = a1a2 an in L((r)#) such that for some i, there is a way to explain the membership of x in L((r)#) by matching ai to position p of the syntax tree and ai+1 to position q. We can compute nullable, firstpos, and lastpos by a straightforward recursion on the height of the tree. The basis and inductive rules for nullable and firstpos are summarized in table. The rules for lastpos are essentially the same as for firstpos, but the roles of children c1 and c2 must be swapped in the rule for a cat-node. There are only two ways to compute followpos. 1. If n is a cat-node with left child cl and right child c2, then for every position i in lastpos(c1), all positions in firstpos(c2) are in followpos(i) If n is a star-node, and i is a position in lastpos(n), then all positions in firstpos(n) are in followpos(i).

26 CS6660 Compiler Design Unit II 2.7 initialize Dstates to contain only the unmarked state firstpos(no), where no is the root of syntax tree T for (r)#; while ( there is an unmarked state S in Dstates ) { mark S; for ( each input symbol a ) { let U be the union of followpos(p) for all p in S that correspond to a; if ( U is not in Dstates ) add U as an unmarked state to Dstates; Dtran[S, a] = U } } Converting a Regular Expression Directly to a DFA Algorithm: Construction of a DFA from a regular expression r. INPUT: A regular expression r. OUTPUT: A DFA D that recognizes L(r). METHOD: 1. Construct a syntax tree T from the augmented regular expression (r)#. 2. Compute nullable, firstpos, lastpos, and followpos for T. 3. Construct Dstates, the set of states of DFA D, and Dtran, the transition function for D, By the above procedure. The states of D are sets of positions in T. Initially, each state is "unmarked," and a state becomes "marked" just before we consider its out-transitions. The start state of D is firstpos(no), where node no is the root of T. The accepting states are those containing the position for the endmarker symbol #.

27 CS6660 Compiler Design Unit II 2.8 Example: Construct a DFA for the regular expression r = (a b)*abb Figure 2.2: Syntax tree for (a b)*abb# Figure 2.3 : firstpos and lastpos for nodes in the syntax tree for (a b)*abb# We must also apply rule 2 to the star-node. That rule tells us positions 1 and 2 are in both followpos(1) and followpos(2), since both firstpas and lastpos for this node are {1,2}. The complete sets followpos are summarized in table NODE n Followpos(n) 1 {1, 2, 3} 2 {1, 2, 3} 3 {4} 4 {4} 5 {4} 6 {} Figure 2.4: Directed graph for the function followpos nullable is true only for the star-node, and we exhibited firstpos and lastpos in Figure 2.3. The value of firstpos for the root of the tree is {1,2,3}, so this set is the start state of D. all this set of states A. We must compute Dtran[A, a] and Dtran[A, b]. Among the positions of A, 1 and 3 correspond to a, while 2 corresponds to b. Thus, Dtran[A, a] = followpos(1) U followpos(3) = {1, 2,3,4}, and Dtran[A, b] = followpos(2) = {1,2,3}.

28 CS6660 Compiler Design Unit II 2.9 Figure 2.5: DFA constructed for (a b)*abb# The latter is state A, and so does not have to be added to Dstates, but the former, B = {1,2,3,4}, is new, so we add it to Dstates and proceed to compute its transitions. The omplete DFA is shown in Figure 2.5. Example: Construct NFA ε for (alb)*abb and convert to DFA by subset construction. Figure 2.6: NFA ε for (a b)*abb Figure 2.7: NFA for (a b)*abb Figure 2.8 Result of applying the subset construction to Figure MINIMIZATION OF DFA There can be many DFA's that recognize the same language. For instance, the DFAs of Figure 2.5 and 2.8 both recognize the same language L((a b)*abb). We would generally prefer a DFA with as few states as possible, since each state requires entries in the table that describes the lexical analyzer. Algorithm: Minimizing the number of states of a DFA. INPUT: A DFA D with set of states S, input alphabet Σ, initial state so, and set of accepting states F.

29 CS6660 Compiler Design Unit II 2.10 OUTPUT: A DFA D' accepting the same language as D and having as few states as possible. METHOD: 1. Start with an initial partition II with two groups, F and S - F, the accepting and nonaccepting states of D. 2. Apply the procedure of Fig to construct a new partition anew. initially, let Πnew = Π; for ( each group G of Π ) { partition G into subgroups such that two states s and t are in the same subgroup if and only if for all input symbols a, states s and t have transitions on a to states in the same group of Π; /* at worst, a state will be in a subgroup by itself */ replace G in IInew by the set of all subgroups formed; } 3. If Π new = Π, let Πfinal = Π and continue with step (4). Otherwise, repeat step (2) with Π new in place of If Π. 4. Choose one state in each group of Πfinal as the representative for that group. The representatives will be the states of the minimum-state DFA D'. 5. The other components of D' are constructed as follows: (a) The state state of D' is the representative of the group containing the start state of D. (b) The accepting states of D' are the representatives of those groupsthat contain an accepting state of D. (c) Let s be the representative of some group G of Πfina, and let the transition of D from s on input a be to state t. Let r be the representative of t's group H. Then in D', there is a transition from s to r on input a. Example: Let us reconsider the DFA of Figure 2.8 for minimization. STATE a b A B C B B D C B C D B E (E) B C The initial partition consists of the two groups {A, B, C, D} {E}, which are respectively the nonaccepting states and the accepting states. To construct Π new, the procedure considers both groups and inputs a and b. The group {E} cannot be split, because it has only one state, so (E} will remain intact in Π new. The other group {A, B, C, D} can be split, so we must consider the effect of each input symbol. On input a, each of these states goes to state B, so there is no way to distinguish these states using strings that begin with a. On input b, states A, B, and C go to members of group {A, B, C, D}, while state D goes to E, a member of another group. Thus, in Π new, group {A, B, C, D} is split into {A, B, C}{D}, and Π new for this round is {A, B, C){D){E}.

30 CS6660 Compiler Design Unit II 2.11 In the next round, we can split {A, B, C} into {A, C}{B}, since A and C each go to a member of {A, B, C) on input b, while B goes to a member of another group, {D}. Thus, after the second round, Π new = {A, C} {B} {D} {E). For the third round, we cannot split the one remaining group with more than one state, since A and C each go to the same state (and therefore to the same group) on each input. We conclude that Πfinal = {A, C}{B){D){E). Now, we shall construct the minimum-state DFA. It has four states, corresponding to the four groups of Πfinal, and let us pick A, B, D, and E as the representatives of these groups. The initial state is A, and the only accepting state is E. Table : Transition table of minimum-state DFA STATE a b A B A B B D C B E (E) B A 2.6 LANGUAGE FOR SPECIFYING LEXICAL ANALYZERS-LEX There are wide range f tools for construction of lexical analyzer based on regular expressions. Lex is a tool (Computer program) that generates lexical analyzers. Lex is a lexical analyzer based tool by specifying regular expressions to describe patterns for token. Lex tool is referred to as the Lex language and the tool itself is the Lex compiler. Use of Lex The Lex compiler transforms the input patterns into a transition diagram and generates code. An input file lex.l is written in the Lex language and describes the lexical analyzer to be generated. The Lex compiler transforms lex.l to a C program, in a file that is always named lex.yy.c. The file lex.yy.c is compiled by the C Compiler and converted into a file a.out. The C-compiler output is a working lexical analyzer that can take a stream of input characters and produce a stream of tokens. The attribute value, whether it be another numeric code, a pointer to the symbol table, or nothing, is placed in a global variable yylval which is shared between the lexical analyzer and parser

31 CS6660 Compiler Design Unit II 2.12 Figure 2.9: Creating a lexical analyzer with Lex Structure of Lex Programs A Lex program has the following form: declarations %% translation rules %% auxiliary functions The declarations section includes declarations of variables, manifest constants (identifiers declared to stand for a constant, e.g., the name of a token), and regular definitions. The translation rules of lex program statement have the form Pattern { Action } Pattern P1 { Action A1} Pattern P2 { Action A2} Pattern Pn { Action An} Each pattern is a regular expression. The actions are fragments of code typically written in C language. The third section holds whatever additional functions are used in the actions. Alternatively, these functions can be compiled separately and loaded with the lexical analyzer. The lexical analyzer begins reading its remaining input, one character at a time, until it finds the longest prefix of the input that matches one of the patterns Pi. It then executes the associated action Ai. Typically, Ai will return to the parser, but if it does not (e.g., because Pi describes whitespace or comments), then the lexical analyzer proceeds to find additional lexemes, until one of the corresponding actions causes a return to the parser. The lexical analyzer returns a single value, the token name, to the parser, but uses the shared, integer variable yylval to pass additional information about the lexeme found. 2.7 DESIGN OF LEXICAL ANALYZER FOR A SAMPLE LANGUAGE The lexical-analyzer generator such as Lex is architected with an automation simulator. The implementation of Lex compiler can be based on either NFA or DFA The Structure of the Generated Analyzer Figure 2.10 shows the architecture of a lexical analyzer generated by Lex. A Lex program is converted into a transition table and actions which are used by a finite Automaton simulator. The program that serves as the lexical analyzer includes a fixed program that simulates an automaton; the automaton is deterministic or nondeterministic. The rest of the lexical analyzer consists of components that are created from the Lex program by Lex itself.

32 CS6660 Compiler Design Unit II 2.13 Figure 2.10: A Lex program is turned into a transition table and actions, which are used by a finiteautomaton simulator These components are: 1. A transition table for the automaton. 2. Those functions that are passed directly through Lex to the output. 3. The actions from the input program, which appear as fragments of code to be invoked at the appropriate time by the automaton simulator Pattern Matching Based on NFA's To construct the automation for several regular expressions, we need to combine all NFAs into one by introducing a new start state with ε-transitions to each of the start states of the NFA's Ni for pattern pi as shown in figure Figure 2.11: An NFA constructed from a Lex program Example: Consider the atern a { action Al for pattern pl } abb { action A2 for pattern p2 } a*b+ { action A3 for pattern p3}

33 CS6660 Compiler Design Unit II 2.14 Figure 2.12: NFA's for a, abb, and a*b+ Figure 2.13: Combined NFA Figure 2.14: Sequence of sets of states entered when processing input aaba Figure 2.15: Transition graph for DFA handling the patterns a, abb, and a*b+

34 CS6660 Compiler Design Unit III 3.1 UNIT III SYNTAX ANALYSIS 3.1 NEED AND ROLE OF THE PARSER The parser takes the token produced by lexical analysis and builds the syntax tree (parse tree). The syntax tree can be easily constructed from Context-Free Grammar. The parser reports syntax errors in an intelligible fashion and recovers from commonly occurring errors to continue processing the remainder of the program. Source program Lexical Analyzer token Get next token Parser Parse tree Rest of Front End intermediate representation Symbol Table Figure 3.1: Position of parser in compiler model Role of the Parser: Parser builds the parse tree. Parser Performs context free syntax analysis. Parser helps to construct intermediate code. Parser produces appropriate error messages. Parser attempts to correct few errors. Types of parsers for grammars: Universal parsers Universal parsing methods such as the Cocke-Younger-Kasami algorithm and Earley's algorithm can parse any grammar. These general methods are too inefficient to use in production. This method is not commonly used in compilers. Top-down parsers Top-down methods build parse trees from the top (root) to the bottom (leaves) Bottom-up parsers. Bottom-up methods start from the leaves and work their way up to the root. 3.2 CONTEXT FREE GRAMMARS The Formal Definition of a Context-Free Grammar A context-free grammar G is defined by the 4-tuple: G= (V, T, P S) where 1. V is a finite set of non-terminals (variable). 2. T is a finite set of terminals. 3. P is a finite set of production rules of the form A α. Where A is nonterminal and α is string of terminals and/or nonterminals. P is a relation from V to (V T)*. 4. S is the start symbol (variable S V).

35 CS6660 Compiler Design Unit III 3.2 Example 3.1: The following grammar defines simple arithmetic expressions. In this grammar, the terminal symbols are id + - * / ( ). The nonterminal symbols are expression, term and factor, and expression is the start symbol. expression expression + term expression expression - term expression term term term * factor term term / factor term factor factor ( expression ) factor id Notational Conventions The following notational conventions for grammars can be used 1. These symbols are terminals: (a) Lowercase letters early in the alphabet, such as a, b, e. (b) Operator symbols such as +,*, and so on. (c) Punctuation symbols such as parentheses, comma, and so on. (d) The digits 0,1,...,9. (e) Boldface strings such as id or if, each of which represents a single terminal symbol. 2. These symbols are nonterminals: (a) Uppercase letters early in the alphabet, such as A, B, C. (b) The letter S is usually the start symbol when which appears. (c) Lowercase, italic names such as expr or stmt. (d) When discussing programming constructs, uppercase letters may be used to represent nonterminals for the constructs. For example, nonterminals for expressions, terms, and factors are often represented by E, T, and F, respectively. 3. Uppercase letters late in the alphabet, such as X, Y, Z, represent grammar symbols; that is, either nonterminals or terminals. 4. Lowercase letters late in the alphabet, chiefly u, v,..., z, represent (possibly empty) strings of terminals. 5. Lowercase Greek letters, α,, for example, represent (possibly empty) strings of grammar symbols. Thus, a generic production can be written as A α, where A is the head and α the body. 6. A set of productions A α1, A α2,, A αk with a common head A (call them Aproductions), may be written A α1 α2 αk. call α1, α2,, αk the Alternatives for A. 7. Unless stated otherwise, the head of the first production is the start symbol. Example 3.2 : Using these conventions, the grammar of Example 3.1 can be rewritten concisely as E E + T E - T T T T * F T / F F

36 CS6660 Compiler Design Unit III 3.3 F ( E ) id Derivations The derivation uses productions to generate a string (set of terminals). The derivation is formed by replacing the nonterminal in the right hand side by suitable production rule. The derivations are classified into two types based on the order of replacement of production. They are: 1. Leftmost derivation If the leftmost non-terminal is replaced by its production in derivation, then it called leftmost derivation. 2. Rightmost derivation If the rightmost non-terminal is replaced by its production in derivation, then it called rightmost derivation. Example 3.3: LMD and RMD for example 3.2 LMD for - ( id + id ) E - E - ( E ) - ( E + E ) - ( id + E ) - ( id + id ) RMD for - ( id + id ) E - E - ( E ) - ( E + E ) - ( E + id ) - ( id + id ) Example 3.4: Consider the context free grammar (CFG) G = ({S}, {a, b, c}, P, S ) where P={S SbS ScS a}. Derive the string abaca by leftmost derivation and rightmost derivation. Leftmost derivation for abaca S SbS abs (using rule S a) abscs (using rule S ScS ) abacs (using rule S a) abaca (using rule S a) Rightmost derivation for abaca S ScS Sca (using rule S a) SbSca (using rule S SbS ) Sbaca (using rule S a)

37 CS6660 Compiler Design Unit III 3.4 abaca (using rule S a) Parse Trees and Derivations A parse tree is a graphical representation of a derivation. t is convenient to see how strings are derived from the start symbol. The start symbol of the derivation becomes the root of the parse tree. Example 3.5: construction of parse tree for - ( id + id ) Derivation: E - E - ( E ) - ( E + E ) - ( id + E ) - ( id + id ) Parse tree: E E E E E E E E E E E ( E ) ( E ) ( E ) ( E ) E + E E + E E + E i d i d id Figure 3.2: Parse tree for -(id + id) Ambiguity A grammar that produces more than one parse tree for some sentence is said to be ambiguous. Put another way, an ambiguous grammar is one that produces more than one leftmost derivation or more than one rightmost derivation for the same sentence. A grammar G is said to be ambiguous if it has more than one parse tree either in LMD or in RMD for at least one string. Example 3.6: The arithmetic expression grammar (3.3) permits two distinct leftmost derivations for the sentence id + id * id: E E + E id + E id + E * E id + id * E id + id * id E E * E E + E * E id + E * E id + id * E id + id * id E E

38 CS6660 Compiler Design Unit III 3.5 E + E E * E i d E * E E + E id id id i d id Figure 3.3: Two parse trees for id+id*id Verifying the Language Generated by a Grammar A proof that a grammar G generates a language L has two parts: show that every string generated by G is in L, and conversely that every string in L can indeed be generated by G. Example 3.7: Consider the following grammar S ( S ) S. this simple grammar generates all strings of balanced parentheses. To show that every sentence derivable from S is balanced, we use an inductive proof on the number of steps n in a derivation. BASIS: The basis is n = 1. The only string of terminals derivable from S in one step is the empty string, which surely is balanced. INDUCTION: Now assume that all derivations of fewer than n steps produce balanced sentences, and consider a leftmost derivation of exactly n steps. Such a derivation must be of the form The derivations of x and y from S take fewer than n steps, so by the inductive hypothesis x and y are balanced. Therefore, the string (x)y must be balanced. That is, it has an equal number of left and right parentheses, and every prefix has at least as many left parentheses as right. Having thus shown that any string derivable from S is balanced, We must next show that every balanced string is derivable from S. To do so, use induction on the length of a string. BASIS: If the string is of length 0, it must be, which is balanced. INDUCTION: First, observe that every balanced string has even length. Assume that every balanced string of length less than 2n is derivable from S, and consider a balanced string w of length βn, n 1. Surely w begins with a left parenthesis. Let (x) be the shortest nonempty prefix of w having an equal number of left and right parentheses. Then w can be written as w = (x) y where both x and y are balanced. Since x and y are of length less than 2n, they are derivable from S by the inductive hypothesis. Thus, we can find a derivation of the form Proved that w = (x)y is also derivable from S Context-Free Grammars versus Regular Expressions

39 CS6660 Compiler Design Unit III 3.6 Every regular language is a context-free language, but not vice-versa. Example 3.8: The grammar for regular expression (a b)*abb A aa ba ab B bc C b Describe the same language, the set of strings of a's and b's ending in abb. So we can easily describe these languages either by finite automata or PDA. On the other hand, the language L = {a n b n n 1} with an equal number of a's and b's is a prototypical example of a language that can be described by a grammar but not by a regular expression. we can say that "finite automata cannot count" meaning that a finite automaton cannot accept a language like {a n b n n 1} that would require it to keep count of the number of a's before it sees the b's. So these kinds of languages (Context-Free Grammars) are accepted by PDA as PDA uses stack as its memory Left recursion A context free grammar is said to be left recursive if it has a non terminal A with two productions in the following form. A A α Where α and are sequences of terminals and nonterminals that do not start with A. Left recursion in top-down parsing can enter into infinite loop. It creates serious problems, so we have avoid Left recursion. For example, in expr expr + term term Figure 3.4: Left-recursive and right recursive ways of generating a string ALGORITHM 3.1 Eliminating left recursion. INPUT: Grammar G with no cycles or - productions. OUTPUT: An equivalent grammar with no left recursion. METHOD: Apply the algorithm to G. Note that the resulting non-left-recursive grammar may have -productions. arrange the nonterminals in some order A1, Aβ,, An. for ( each i from 1 to n ) { for ( each j from 1 to i - 1 ) { replace each production of the form Ai Aj by the k, where productions Ai 1 2

40 CS6660 Compiler Design Unit III 3.7 } Aj 1 2 k are all current Aj-productions } eliminate the immediate left recursion among the Ai-productions Note: Simply modify the left recursive production A A α to A A' A' α A' Example 3.9: Consider the grammar for arithmetic expressions. E E + T T T T * F F E (E) id Eliminate left recursive productions E and T by applying the left recursion Eliminating Algorithm. If A A α then A A' A' α A' ε The production E E + T T is replaced by E T E' E' +T E' The production T T * F F is replaced by Therefore, finally we obtain, E T E' E' +T E' T F T' T' * F T' E (E) id T F T' T' * F T' Example 3.10: Consider the grammar, Eliminate left recursive productions. S A a b A A c S d There is no immediate left recursion. To get it substitute S-production in A. A A c A a d b d A A c A a d b d is replaced by A b d A' A' A' c A' a d A' Therefore, finally we obtain grammar without left recursion,

41 CS6660 Compiler Design Unit III 3.8 S A a b A b d A' A' A' c A' a d A' Example 3.11: Consider the grammar A A B d A a a B B e b The grammar without left recursion is A a A' A' B d A' a A' B b B' B' e B' Example 3.12: Eliminate left recursion from the given grammar. A A c A a d b d b c After removing left recursion, the grammar becomes, A b d A' b c A' A' c A' a d A' A' Left factoring Left factoring is a process of factoring out the common prefixes of two or more production alternates for the same nonterminal. Algorithm 3.2 : Left factoring a grammar. INPUT: Grammar G. OUTPUT: An equivalent left-factored grammar. εethod: For each nonterminal A, find the longest prefix α common to two or more of its alternatives. If a - i.e., there is a nontrivial common prefix - replace all of the Aproductions A α 1 α 2 α n, where represents all alternatives that do not begin with α, by A αa' A' 1 2 n Here A' is a new nonterminal. Repeatedly apply this transformation until no two alternatives for a nonterminal have a common prefix. Example 3.13: Eliminate left factors from the given grammar. S T + S T After left factoring, the grammar becomes, S T L L + S

42 CS6660 Compiler Design Unit III 3.9 Example 3.14: Left factor the following grammar. S i E t S i E t S e S a ; E b After left factoring, the grammar becomes, S i E t SS' a S' e S E b Uses: Left factoring is used in predictive top down parsing technique. 3.3 TOP DOWN PARSING -GENERAL STRATEGIES Top-down parsing can be viewed as the problem of constructing a parse tree for the input string, starting from the root and creating the nodes of the parse tree in preorder (depth-first ). Topdown parsing can be viewed as finding a leftmost derivation for an input string. Parsers are generally distinguished by whether they work top-down (start with the grammar's start symbol and construct the parse tree from the top) or bottom-up (start with the terminal symbols that form the leaves of the parse tree and build the tree from the bottom). Topdown parsers include recursive-descent and LL parsers, while the most common forms of bottomup parsers are LR parsers. Types of parser Top down parser Bottom up parser Backtracking Predictive parser Shift Reduce parser LR p arser Recursive descent LL(1) parser SLR parser LALR parser (C) LR parser Figure 3.5: Types of parser Example 3.15 : The sequence of parse trees for the input id+id*id in a top-down parse (LMD). E TE' E' + T E' ε T F T' T' * F T' ε F ( E ) id

43 CS6660 Compiler Design Unit III 3.10 Figure 3.6: Top-down parse for id + id * id 3.4 RECURSIVE DESCENT PARSER These parsers use a procedure for each nonterminal. The procedure looks at its input and decides which production to apply for its nonterminal. Terminals in the body of the production are matched to the input at the appropriate time, while nonterminals in the body result in calls to their procedure. Backtracking, in the case when the wrong production was chosen, is a possibility. void A() { } Choose an A-production, A X1 X2... Xk; for (i=l to k) { if ( Xi is a nonterminal ) call procedure Xi () ; else } if ( Xi equals the current input symbol a ) advance the input to the next symbol; else /* an error has occurred */; Example 3.16 : Consider the grammar S c A d A a b a To construct a parse tree top-down for the input string w = cad, begin with a tree consisting of a single node labeled S, and the input pointer pointing to c, the first symbol of w. S has only one

44 CS6660 Compiler Design Unit III 3.11 production, so we use it to expand S and obtain the tree of Figure 3.7(a). The leftmost leaf, labeled c, matches the first symbol of input w, so we advance the input pointer to a, the second symbol of w, and consider the next leaf, labeled A. Now, we expand A using the first alternative A a b to obtain the tree of Figure 3.7(b). We have a match for the second input symbol, a, so we advance the input pointer to d, the third input symbol, and compare d against the next leaf, labeled b. Since b does not match d, we report failure and go back to A to see whether there is another alternative for A that has not been tried, but that might produce a match S S c S A S c A d c A a d a b d c A d Figure 3.7: Steps in a top-down parse The second alternative for A produces the tree of Figure 3.7(c). The leaf a matches the second symbol of w and the leaf d matches the third symbol. Since we have produced a parse tree for w, we halt and announce successful completion of parsing. 3.5 PREDICTIVE PARSER (NON RECURSIVE) A nonrecursive predictive parser can be built by maintaining a stack explicitly, rather than implicitly via recursive calls. The parser mimics a leftmost derivation. If w is the input that has been matched so far, then the stack holds a sequence of grammar symbols α such that S wa lm The table-driven parser in Figure 3.8 has an input buffer, a stack containing a sequence of grammar symbols, a parsing table constructed, and an output stream. The input buffer contains the string to be parsed, followed by the endmarker $. We reuse the symbol $ to mark the bottom of the stack, which initially contains the start symbol of the grammar on top of $. The parser is controlled by a program that considers X, the symbol on top of the stack, and a, the current input symbol. If X is a nonterminal, the parser chooses an X-production by consulting entry M[X, a] of the parsing table M. Otherwise, it checks for a match between the terminal X and current input symbol a.

45 CS6660 Compiler Design Unit III 3.12 Input a + b $ Stack X Y Z Predictive Parsing Program Output $ Parsing Table M Figure 3.8: Model of a table-driven predictive parser Algorithm 3.3 : Table-driven predictive parsing. INPUT: A string w and a parsing table M for grammar G. OUTPUT: If w is in L(G), a leftmost derivation of w; otherwise, an error indication. METHOD: Initially, the parser is in a configuration with w$ in the input buffer and the start symbol S of G on top of the stack, above $. The following procedure uses the predictive parsing table M to produce a predictive parse for the input. set ip to point to the first symbol of w; set X to the top stack symbol; while ( X $ ) { /* stack is not empty */ if ( X is a ) pop the stack and advance ip; else if ( X is a terminal ) error(); else if ( M[X, a] is an error entry ) error(); else if ( M[X,a] = X Y1Y2 Yk) { output the production X Y1Y2 Yk; pop the stack; push YkYk-1 Y1 onto the stack, with Yl on top; } set X to the top stack symbol; } Example 3.17: Consider grammar for the input id + id * id using the nonrecursive predictive parser. E TE' E' + T E' ε T F T' T' * F T' ε F ( E ) id TE FT E idt E ide id+te id+ft'e id+idt'e id+id*ft'e id+id*idt'e id+id*ide id+id*id

46 CS6660 Compiler Design Unit III 3.13 Figure 3.9: Moves made by a predictive parser on input id + id * id 3.6 LL(1) PARSER A grammar such that it is possible to choose the correct production with which to expand a given nonterminal, looking only at the next input symbol, is called LL(1). These grammars allow us to construct a predictive parsing table that gives, for each nonterminal and each lookahead symbol, the correct choice of production. Error correction can be facilitated by placing error routines in some or all of the table entries that have no legitimate production. LL(1) Grammars Predictive parsers (recursive-descent parsers) needing no backtracking, can be constructed for a class of grammars called LL(1). The first "L" in LL(1) stands for scanning the input from left to right, the second "L" for producing a leftmost derivation, and the "1" for using one input symbol of lookahead at each step to make parsing action decisions. Transition Diagrams for Predictive Parsers Transition diagrams are useful for visualizing predictive parsers. To construct the transition diagram from a grammar, first eliminate left recursion and then left factor the grammar. Then, for each nonterminal A, 1. Create an initial and final (return) state. 2. For each production A X1X2 Xk, create a path from the initial to the final state, with edges labeled X1, X2,, Xk. If A, the path is an edge labeled. A grammar G is LL(1) if and only if whenever A α are two distinct productions of G, the following conditions hold: 1. For no terminal a do both α and derive strings beginning with a. 2. At most one of α and can derive the empty string.

47 CS6660 Compiler Design Unit III If then α does not derive any string beginning with a terminal in FOLLOW(A). δikewise, if α, then does not derive any string beginning with a terminal in FOLLOW(A). Predictive parsers can be constructed for LL(1) grammar since the proper production to apply for a nonterminal can be selected by looking only at the current input symbol. Flow-ofcontrol constructs with their distinguishing keywords generally satisfy the LL(1) constraints. For instance, Stmt if ( expr ) stmt else stmt while ( expr ) stmt { stmt_list } For the above productions the keywords if, while, and symbol { tell us which alternateive is only one that could possibly succeed. To compute FIRST(X) for all grammar symbols X, apply the following rules until no more terminals or E: can be added to any FIRST set. 1. If X is a terminal, then FIRST(X) = {X}. 2. If X is a nonterminal and X YlY2 Yk is a production for some k 1, then place a in FIRST(X) if for some i, a is in FIRST(Yi), and is in all of FIRST(Y1),, FIRST(Yi-1); that is, Yl Yi-1 =>*. (If is in FIRST(Yj) for all j = 1,β,..., k, then add to FIRST(X). For example, everything in FIRST(Yl) is surely in FIRST(X). If Yl does not derive, then we add nothing more to FIRST(X), but if Yl =>*, then we add F1RST(Y2), and So on.) 3. If X is a production, then add to FIRST(X). To compute FOLLOW(A) for all nonterminals A, apply the following rules until nothing can be added to any FOLLOW set. 1. Place $ in FOLLOW(S), where S is the start symbol, and $ is the input right endmarker. 2. If there is a production A αbβ, then everything in FIRST( ) except is in FOδδOW(B). 3. If there is a production A αb, or a production A αbβ, where FIRST(β) contains, then everything in FOLLOW (A) is in FOLLOW (B). Example 3.18: Construct the Predictive parsing table for LL(1) grammar: S ietss' a S' es E b The LL(1) grammar will not be left recursive. So find the FIRST() and FOLLOW(). FIRST (): FIRST(S) ={i,a}, FIRST(S') ={e, }, FIRST(E) ={b} FOLLOW(): FOLLOW(S)={e, $}, FOLLOW(S')={ e, $}, FOLLOW(E)={t, $} NON TERMINAL - INPUT SYMBOL a b e i t $ S S a S ietss'

48 CS6660 Compiler Design Unit III 3.15 S' E E b S' es S' ε S' ε 3.7 SHIFT REDUCE PARSER Bottom-up parsers generally operate by choosing, on the basis of the next input symbol (lookahead symbol) and the contents of the stack, whether to shift the next input onto the stack, or to reduce some symbols at the top of the stack. A reduce step takes a production body at the top of the stack and replaces it by the head of the production. Example 3.19: Consider the production rules for the shift-reduce parser on input id *id. E E + T T T T * F F F (E) id STACK INPUT ACTION $ id1 * id2 $ shift $id1 * id2 $ reduce by F id $F * id2 $ reduce by T F $T * id2 $ shift $T* id2 $ shift $T*id2 $ reduce by F id $T*F $ reduce by T T * F $T $ reduce by E T $E $ accept The actions of a shift-reduce parser on input id *id, using the LR(0) automaton. We use a stack to hold states, the grammar symbols corresponding to the states on the stack appear in column SYMBOLS. At line (1), the stack holds the start state 0 of the automaton; the corresponding symbol is the bottom-of-stack marker $. 3.8 LR PARSER A schematic of an LR parser is shown in Figure It consists of an input, an output, a stack, a driver program, and a parsing table that has two pasts (ACTION and GOTO). The driver program is the same for all LR parsers; only the parsing table changes from one parser to another. The parsing program reads characters from an input buffer one at a time. Where a shift-reduce parser would shift a symbol, an LR parser shifts a state. Each state summarizes the information contained in the stack below it.

49 CS6660 Compiler Design Unit III 3.16 Input a 1 a i + a n $ Stack S m S m-1 LR Parsing Program Output $ ACTION GOTO Figure 3.10: Model of an LR parser The stack holds a sequence of states, s0s1 sm where sm, is on top. In the SLR method, the stack holds states from the LR(0) automaton; the canonical- LR and LALR methods are similar. Structure of the LR Parsing Table The parsing table consists of two parts: a parsing-action function ACTION and a goto function GOTO. 1. The ACTION function takes as arguments a state i and a terminal a (or $, the input end marker). The value of ACTION[i, a] can have one of four forms: (a) Shift j, where j is a state. The action taken by the parser effectively shifts input a to the stack, but uses state j to represent a. (b) Reduce A β. The action of the parser effectively reduces β on the top of the stack to head A. (c) Accept. The parser accepts the input and finishes parsing. (d) Error. The parser discovers an error in its input and takes some corrective action. 2. We extend the GOTO function, defined on sets of items, to states: if GOTO[Ii, A] = Ij, then GOTO also maps a state i and a nonterminal A to state j. Algorithm 3.4: LR-parsing algorithm. INPUT: An input string w and an LR-parsing table with functions ACTION and GOTO for a grammar G. OUTPUT: If w is in L(G), the reduction steps of a bottom-up parse for w; otherwise, an error indication. METHOD: Initially, the parser has so on its stack, where so is the initial state, and w$ in the input buffer. let a be the first symbol of w$; while(1) { /* repeat forever */ let s be the state on top of the stack; if ( ACTION[S, a] = shift t ) { push t onto the stack; let a be the next input symbol;

50 CS6660 Compiler Design Unit III 3.17 } else if ( ACTION[S, a] = reduce A β) { pop β symbols off the stack; let state t now be on top of the stack; push GOTO[t, A] onto the stack; output the production A β; } else if ( ACTION[S, a] = accept ) break; /* parsing is done */ else call error-recovery routine; } Example 3.20: The Figure 4.37 shows the ACTION and GOT0 functions of an LR-parsing table for the expression grammar E E + T T T T * F F E (E) id STATE ACTION GOTO id + * ( ) $ E T F 0 s5 s s6 accept 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s r6 r6 r6 r6 6 s5 s s5 s s6 s1 9 r1 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5 Figure 3.11 Parsing table for expression grammar 3.9 LR (0) ITEM An LR parser makes shift-reduce decisions by maintaining states to keep track of where we are in a parse. States represent sets of "items". An LR(0) item (item for short) of a grammar G is a

51 CS6660 Compiler Design Unit III 3.18 production of G with a dot at some position of the body. Thus, production A XYZ yields the four items. A X Y Z A X Y Z A X Y Z A X Y Z The production A generates only one item, A. An item indicates how much of a production we have seen at a given point in the parsing process. For example, the item A XYZ indicates that we hope to see a string derivable from XYZ next on the input. Item A X Y Z indicates that we have just seen on the input a string derivable from X and that we hope next to see a string derivable from Y Z. Item A X Y Z indicates that we have seen the body XYZ and that it may be time to reduce XYZ to A. One collection of sets of LR(0) items, called the canonical LR(0) collection, provides the basis for constructing a deterministic finite automaton that is used to make parsing decisions. Such an automaton is called an LR(0) automaton. To construct the canonical LR(0) collection for a grammar, we define an augmented grammar and two functions, CLOSURE and GOTO. If G is a grammar with start symbol S, then G', the augmented grammar for G, is G with a new start symbol S' and production S' S. The purpose of this new starting production is to indicate to the parser when it should stop parsing and announce acceptance of the input. That is, acceptance occurs only when the parser is about to reduce by S' S. Closure of Item Sets If I is a set of items for a grammar G, then CLOSURE(I) is the set of items constructed from I by the two rules: 1. Initially, add every item in I to CLOSURE(I). 2. If A α Bβ is in CLOSURE(I) and B is a production, then add the item B to CLOSURE(I), if it is not already there. Apply this rule until no more new items can be added to CLOSURE(I) Intuitively, A α Bβ in CLOSURE(I) indicates that, at some point in the parsing process, we think we might next see a substring derivable from Bβ as input. The substring derivable from Bβ will have a prefix derivable from B by applying one of the B-productions. We therefore add items for all the B-productions; that is, if B is a production, we also include B in CLOSURE(I). A convenient way to implement the function closure is to keep a boolean array added, indexed by the nonterminals of G, such that added[b] is set to true if and when we add the item B for each B-production B. We can divide all the sets of items of interest into two classes. They are: 1. Kernel items: the initial item, S' S, and all items whose dots are not at the left end. 2. Nonkernel items: all items with their dots at the left end, except for S' S

52 CS6660 Compiler Design Unit III 3.19 SetOfItems CLOSURE(I ) { J = I; repeat for ( each item A α Bβ in J ) for ( each production B of G ) if (B is not in J ) add B to J; until no more items are added to J on one round; return J; } Figure 3.32: Computation of CLOSURE Example 4.21 : Consider the augmented expression grammar: E' E E E + T T T T * F F E (E) id If I is the set of one item {[E' E]}, then CLOSURE(I) contains the set of items I0 in Figure. E' E E E + T E T T T * F T F E (E) E id 3.10 CONSTRUCTION OF SLR PARSING TABLE The SLR method for constructing parsing tables is a good LR parsing. the parsing table constructed by this LR parser using an SLR-parsing table called SLR parser. The other two methods augment the SLR method with look-ahead information. The SLR method begins with LR(0) items and LR(0) automata. Given a grammar, G, we augment G' to produce G, with a new start symbol S'. From G', we construct C, the canonical collection of sets of items for G' together with the GOTO function. Algorithm 3.5: Constructing an SLR-parsing table. INPUT: An augmented grammar G'. OUTPUT: The SLR-parsing table functions ACTION and GOTO for G'. METHOD: 1. Construct C = {I0, I1,..., In}, the collection of sets of LR(0) items for G'. 2. State i is constructed from Ii. The parsing actions for state i are determined as follows: (a) If [A α aβ] is in Ii, and GOTO(Ii, a ) = Ij, then set ACTION[i, a] to "shift j". Here a must be a terminal. (b) If [A α ] is in Ii, then set ACTION[i, a] to "reduce A α" for all a in FOLLOW(A); here A may not be S'.

53 CS6660 Compiler Design Unit III 3.20 (c) If [S' S ] is in Ii, then set ACTION[i, $] to "accept". If any conflicting actions result from the above rules, we say the grammar is not SLR(1). The algorithm fails to produce a parser in this case. 3. The goto transitions for state i are constructed for all nonterminals A using the rule: If GOTO(Ii, A) = Ij, then GOTO[i, A] = j. 4. All entries not defined by rules (2) and (3) are made "error". 5. The initial state of the parser is the one constructed from the set of items containing [S' S]. An LR parser using the SLR(1) table for G is called the SLR(1) parser for G, and a grammar having an SLR(1) parsing table is said to be SLR(1). We usually omit the "(1)" after the "SLR," since we shall not deal here with parsers having more than one symbol of lookahead. Example 3.22: Let us construct the SLR table for the augmented expression grammar. The canonical collection of sets of LR(0) items for the following grammar. E E + T T E E + T T F T T * F F F (E) id Step1: Augment E' production E' E (Accept - r0) E E + T E T T T * F T F F (E) F id (r1) (r2) (r3) (r4) (r5) (r6) Step2: Closure of E' The set of items I0 : E' E E E + T E T T T * F T F F (E) F id Goto(I0, T): I2 E T T T * F Goto(I0, F): I3 T F Goto(I0, ( ): I4 E ( E) E E + T E T T T * F T F F (E) F id Goto(I0, id): I5 F id F (E) F id Goto(I2, *): I7 T T * F F (E) F id Goto(I4, E): I8 F (E ) E E + T Goto(I6, T): I9 E E + T T T * F Goto(I7, F): I10 T T * F Goto(I8, )): I11 F (E) Step3: GOTO operation of every symbol on I0 items: Goto(I0, E): I1 E' E Step4: Construction of DFA Goto(I1, +): I6 E E + T T T * F

54 CS6660 Compiler Design Unit III 3.21 Figure 3.12: LR(0) automaton DFA with every state as final and I0 as initial. Step5: Construction of FOLLOW SET for nonterminals FOLLOW (E') ={$} because E' is a start symbol. FOLLOW (E): E' E i.e., follow of E is E, so add $ because E is start symbol E E + T i.e., follow of E is +, so add + F (E) i.e., follow of E is ), so add ) FOLLOW (E) = {+, ), $} FOLLOW (T): As E' E, E T.i.e., E' = E = T = start symbol. add $ E E + T T + T i.e., follow of T is +, so add + T T * F i.e., follow of T is *, so add *

Compiler Design. Subject Code: 6CS63/06IS662. Part A UNIT 1. Chapter Introduction. 1.1 Language Processors

Compiler Design. Subject Code: 6CS63/06IS662. Part A UNIT 1. Chapter Introduction. 1.1 Language Processors Compiler Design Subject Code: 6CS63/06IS662 Part A UNIT 1 Chapter 1 1. Introduction 1.1 Language Processors A compiler is a program that can read a program in one language (source language) and translate

More information

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS Objective PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS Explain what is meant by compiler. Explain how the compiler works. Describe various analysis of the source program. Describe the

More information

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language. UNIT I LEXICAL ANALYSIS Translator: It is a program that translates one language to another Language. Source Code Translator Target Code 1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System

More information

LANGUAGE TRANSLATORS

LANGUAGE TRANSLATORS 1 LANGUAGE TRANSLATORS UNIT: 3 Syllabus Source Program Analysis: Compilers Analysis of the Source Program Phases of a Compiler Cousins of Compiler Grouping of Phases Compiler Construction Tools. Lexical

More information

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below. UNIT I Translator: It is a program that translates one language to another Language. Examples of translator are compiler, assembler, interpreter, linker, loader and preprocessor. Source Code Translator

More information

UNIT II LEXICAL ANALYSIS

UNIT II LEXICAL ANALYSIS UNIT II LEXICAL ANALYSIS 2 Marks 1. What are the issues in lexical analysis? Simpler design Compiler efficiency is improved Compiler portability is enhanced. 2. Define patterns/lexeme/tokens? This set

More information

SEM / YEAR : VI / III CS2352 PRINCIPLES OF COMPLIERS DESIGN UNIT I - LEXICAL ANALYSIS PART - A

SEM / YEAR : VI / III CS2352 PRINCIPLES OF COMPLIERS DESIGN UNIT I - LEXICAL ANALYSIS PART - A SEM / YEAR : VI / III CS2352 PRINCIPLES OF COMPLIERS DESIGN UNIT I - LEXICAL ANALYSIS PART - A 1. What is a compiler? (A.U Nov/Dec 2007) A compiler is a program that reads a program written in one language

More information

CS308 Compiler Principles Lexical Analyzer Li Jiang

CS308 Compiler Principles Lexical Analyzer Li Jiang CS308 Lexical Analyzer Li Jiang Department of Computer Science and Engineering Shanghai Jiao Tong University Content: Outline Basic concepts: pattern, lexeme, and token. Operations on languages, and regular

More information

COMPILER DESIGN LECTURE NOTES

COMPILER DESIGN LECTURE NOTES COMPILER DESIGN LECTURE NOTES UNIT -1 1.1 OVERVIEW OF LANGUAGE PROCESSING SYSTEM 1.2 Preprocessor A preprocessor produce input to compilers. They may perform the following functions. 1. Macro processing:

More information

Module 6 Lexical Phase - RE to DFA

Module 6 Lexical Phase - RE to DFA Module 6 Lexical Phase - RE to DFA The objective of this module is to construct a minimized DFA from a regular expression. A NFA is typically easier to construct but string matching with a NFA is slower.

More information

COMPILER DESIGN. For COMPUTER SCIENCE

COMPILER DESIGN. For COMPUTER SCIENCE COMPILER DESIGN For COMPUTER SCIENCE . COMPILER DESIGN SYLLABUS Lexical analysis, parsing, syntax-directed translation. Runtime environments. Intermediate code generation. ANALYSIS OF GATE PAPERS Exam

More information

LECTURE NOTES ON COMPILER DESIGN P a g e 2

LECTURE NOTES ON COMPILER DESIGN P a g e 2 LECTURE NOTES ON COMPILER DESIGN P a g e 1 (PCCS4305) COMPILER DESIGN KISHORE KUMAR SAHU SR. LECTURER, DEPARTMENT OF INFORMATION TECHNOLOGY ROLAND INSTITUTE OF TECHNOLOGY, BERHAMPUR LECTURE NOTES ON COMPILER

More information

Lexical Analysis. Sukree Sinthupinyo July Chulalongkorn University

Lexical Analysis. Sukree Sinthupinyo July Chulalongkorn University Sukree Sinthupinyo 1 1 Department of Computer Engineering Chulalongkorn University 14 July 2012 Outline Introduction 1 Introduction 2 3 4 Transition Diagrams Learning Objectives Understand definition of

More information

[Lexical Analysis] Bikash Balami

[Lexical Analysis] Bikash Balami 1 [Lexical Analysis] Compiler Design and Construction (CSc 352) Compiled By Central Department of Computer Science and Information Technology (CDCSIT) Tribhuvan University, Kirtipur Kathmandu, Nepal 2

More information

Lexical Analysis (ASU Ch 3, Fig 3.1)

Lexical Analysis (ASU Ch 3, Fig 3.1) Lexical Analysis (ASU Ch 3, Fig 3.1) Implementation by hand automatically ((F)Lex) Lex generates a finite automaton recogniser uses regular expressions Tasks remove white space (ws) display source program

More information

Zhizheng Zhang. Southeast University

Zhizheng Zhang. Southeast University Zhizheng Zhang Southeast University 2016/10/5 Lexical Analysis 1 1. The Role of Lexical Analyzer 2016/10/5 Lexical Analysis 2 2016/10/5 Lexical Analysis 3 Example. position = initial + rate * 60 2016/10/5

More information

A Simple Syntax-Directed Translator

A Simple Syntax-Directed Translator Chapter 2 A Simple Syntax-Directed Translator 1-1 Introduction The analysis phase of a compiler breaks up a source program into constituent pieces and produces an internal representation for it, called

More information

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou Administrative! [ALSU03] Chapter 3 - Lexical Analysis Sections 3.1-3.4, 3.6-3.7! Reading for next time [ALSU03] Chapter 3 Copyright (c) 2010 Ioanna

More information

UNIT I- LEXICAL ANALYSIS. 1.Interpreter: It is one of the translators that translate high level language to low level language.

UNIT I- LEXICAL ANALYSIS. 1.Interpreter: It is one of the translators that translate high level language to low level language. INRODUCTION TO COMPILING UNIT I- LEXICAL ANALYSIS Translator: It is a program that translates one language to another. Types of Translator: 1.Interpreter 2.Compiler 3.Assembler source code Translator target

More information

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design i About the Tutorial A compiler translates the codes written in one language to some other language without changing the meaning of the program. It is also expected that a compiler should make the target

More information

UNIT I INTRODUCTION TO COMPILER 1. What is a Complier? A Complier is a program that reads a program written in one language-the source language-and translates it in to an equivalent program in another

More information

UNIT -2 LEXICAL ANALYSIS

UNIT -2 LEXICAL ANALYSIS OVER VIEW OF LEXICAL ANALYSIS UNIT -2 LEXICAL ANALYSIS o To identify the tokens we need some method of describing the possible tokens that can appear in the input stream. For this purpose we introduce

More information

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens Concepts Introduced in Chapter 3 Lexical Analysis Regular Expressions (REs) Nondeterministic Finite Automata (NFA) Converting an RE to an NFA Deterministic Finite Automatic (DFA) Lexical Analysis Why separate

More information

VIVA QUESTIONS WITH ANSWERS

VIVA QUESTIONS WITH ANSWERS VIVA QUESTIONS WITH ANSWERS 1. What is a compiler? A compiler is a program that reads a program written in one language the source language and translates it into an equivalent program in another language-the

More information

CST-402(T): Language Processors

CST-402(T): Language Processors CST-402(T): Language Processors Course Outcomes: On successful completion of the course, students will be able to: 1. Exhibit role of various phases of compilation, with understanding of types of grammars

More information

Question Bank. 10CS63:Compiler Design

Question Bank. 10CS63:Compiler Design Question Bank 10CS63:Compiler Design 1.Determine whether the following regular expressions define the same language? (ab)* and a*b* 2.List the properties of an operator grammar 3. Is macro processing a

More information

CS 4201 Compilers 2014/2015 Handout: Lab 1

CS 4201 Compilers 2014/2015 Handout: Lab 1 CS 4201 Compilers 2014/2015 Handout: Lab 1 Lab Content: - What is compiler? - What is compilation? - Features of compiler - Compiler structure - Phases of compiler - Programs related to compilers - Some

More information

Lexical Analysis/Scanning

Lexical Analysis/Scanning Compiler Design 1 Lexical Analysis/Scanning Compiler Design 2 Input and Output The input is a stream of characters (ASCII codes) of the source program. The output is a stream of tokens or symbols corresponding

More information

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1 Table of Contents About the Authors... iii Introduction... xvii Chapter 1: System Software... 1 1.1 Concept of System Software... 2 Types of Software Programs... 2 Software Programs and the Computing Machine...

More information

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program. COMPILER DESIGN 1. What is a compiler? A compiler is a program that reads a program written in one language the source language and translates it into an equivalent program in another language-the target

More information

G Compiler Construction Lecture 4: Lexical Analysis. Mohamed Zahran (aka Z)

G Compiler Construction Lecture 4: Lexical Analysis. Mohamed Zahran (aka Z) G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis Mohamed Zahran (aka Z) mzahran@cs.nyu.edu Role of the Lexical Analyzer Remove comments and white spaces (aka scanning) Macros expansion Read

More information

UNIT -1 1.1 OVERVIEW OF LANGUAGE PROCESSING SYSTEM 1.2 Preprocessor A preprocessor produce input to compilers. They may perform the following functions. 1. Macro processing: A preprocessor may allow a

More information

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; } Ex: The difference between Compiler and Interpreter The interpreter actually carries out the computations specified in the source program. In other words, the output of a compiler is a program, whereas

More information

COMP455: COMPILER AND LANGUAGE DESIGN. Dr. Alaa Aljanaby University of Nizwa Spring 2013

COMP455: COMPILER AND LANGUAGE DESIGN. Dr. Alaa Aljanaby University of Nizwa Spring 2013 COMP455: COMPILER AND LANGUAGE DESIGN Dr. Alaa Aljanaby University of Nizwa Spring 2013 Chapter 1: Introduction Compilers draw together all of the theory and techniques that you ve learned about in most

More information

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100 GATE- 2016-17 Postal Correspondence 1 Compiler Design Computer Science & Information Technology (CS) 20 Rank under AIR 100 Postal Correspondence Examination Oriented Theory, Practice Set Key concepts,

More information

Lexical Analysis. Prof. James L. Frankel Harvard University

Lexical Analysis. Prof. James L. Frankel Harvard University Lexical Analysis Prof. James L. Frankel Harvard University Version of 5:37 PM 30-Jan-2018 Copyright 2018, 2016, 2015 James L. Frankel. All rights reserved. Regular Expression Notation We will develop a

More information

GUJARAT TECHNOLOGICAL UNIVERSITY

GUJARAT TECHNOLOGICAL UNIVERSITY Type of course: Core GUJARAT TECHNOLOGICAL UNIVERSITY SUBJECT NAME: Compiler Design SUBJECT CODE: 21701 B.E. 7 th SEMESTER Prerequisite: Data Structures and Algorithms, Theory of Computation, Rationale:

More information

2.2 Syntax Definition

2.2 Syntax Definition 42 CHAPTER 2. A SIMPLE SYNTAX-DIRECTED TRANSLATOR sequence of "three-address" instructions; a more complete example appears in Fig. 2.2. This form of intermediate code takes its name from instructions

More information

CSCI-GA Compiler Construction Lecture 4: Lexical Analysis I. Hubertus Franke

CSCI-GA Compiler Construction Lecture 4: Lexical Analysis I. Hubertus Franke CSCI-GA.2130-001 Compiler Construction Lecture 4: Lexical Analysis I Hubertus Franke frankeh@cs.nyu.edu Role of the Lexical Analyzer Remove comments and white spaces (aka scanning) Macros expansion Read

More information

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram CS6660 COMPILER DESIGN Question Bank UNIT I-INTRODUCTION TO COMPILERS 1. Define compiler. 2. Differentiate compiler and interpreter. 3. What is a language processing system? 4. List four software tools

More information

Formal Languages and Compilers Lecture I: Introduction to Compilers

Formal Languages and Compilers Lecture I: Introduction to Compilers Formal Languages and Compilers Lecture I: Introduction to Compilers Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/ artale/

More information

The Structure of a Syntax-Directed Compiler

The Structure of a Syntax-Directed Compiler Source Program (Character Stream) Scanner Tokens Parser Abstract Syntax Tree Type Checker (AST) Decorated AST Translator Intermediate Representation Symbol Tables Optimizer (IR) IR Code Generator Target

More information

Introduction to Syntax Analysis. Compiler Design Syntax Analysis s.l. dr. ing. Ciprian-Bogdan Chirila

Introduction to Syntax Analysis. Compiler Design Syntax Analysis s.l. dr. ing. Ciprian-Bogdan Chirila Introduction to Syntax Analysis Compiler Design Syntax Analysis s.l. dr. ing. Ciprian-Bogdan Chirila chirila@cs.upt.ro http://www.cs.upt.ro/~chirila Outline Syntax Analysis Syntax Rules The Role of the

More information

CS606- compiler instruction Solved MCQS From Midterm Papers

CS606- compiler instruction Solved MCQS From Midterm Papers CS606- compiler instruction Solved MCQS From Midterm Papers March 06,2014 MC100401285 Moaaz.pk@gmail.com Mc100401285@gmail.com PSMD01 Final Term MCQ s and Quizzes CS606- compiler instruction If X is a

More information

Group A Assignment 3(2)

Group A Assignment 3(2) Group A Assignment 3(2) Att (2) Perm(3) Oral(5) Total(10) Sign Title of Assignment: Lexical analyzer using LEX. 3.1.1 Problem Definition: Lexical analyzer for sample language using LEX. 3.1.2 Perquisite:

More information

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; } Ex: The difference between Compiler and Interpreter The interpreter actually carries out the computations specified in the source program. In other words, the output of a compiler is a program, whereas

More information

Dixita Kagathara Page 1

Dixita Kagathara Page 1 2014 Sem - VII Lexical Analysis 1) Role of lexical analysis and its issues. The lexical analyzer is the first phase of compiler. Its main task is to read the input characters and produce as output a sequence

More information

1. Lexical Analysis Phase

1. Lexical Analysis Phase 1. Lexical Analysis Phase The purpose of the lexical analyzer is to read the source program, one character at time, and to translate it into a sequence of primitive units called tokens. Keywords, identifiers,

More information

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILING

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILING PRINCIPLES OF COMPILER DESIGN 2 MARKS UNIT I INTRODUCTION TO COMPILING 1. Define compiler? A compiler is a program that reads a program written in one language (source language) and translates it into

More information

Front End: Lexical Analysis. The Structure of a Compiler

Front End: Lexical Analysis. The Structure of a Compiler Front End: Lexical Analysis The Structure of a Compiler Constructing a Lexical Analyser By hand: Identify lexemes in input and return tokens Automatically: Lexical-Analyser generator We will learn about

More information

TABLE OF CONTENTS S.No DATE TOPIC PAGE No UNIT I LEXICAL ANALYSIS 1 Introduction to Compiling-Compilers 6 2 Analysis of the source program 7 3 The

TABLE OF CONTENTS S.No DATE TOPIC PAGE No UNIT I LEXICAL ANALYSIS 1 Introduction to Compiling-Compilers 6 2 Analysis of the source program 7 3 The TABLE OF CONTENTS S.No DATE TOPIC PAGE No UNIT I LEXICAL ANALYSIS 1 Introduction to Compiling-Compilers 6 2 Analysis of the source program 7 3 The phases 9 4 Cousins 11 5 The grouping of phases 13 6 Compiler

More information

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens Syntactic Analysis CS45H: Programming Languages Lecture : Lexical Analysis Thomas Dillig Main Question: How to give structure to strings Analogy: Understanding an English sentence First, we separate a

More information

Compiler Design Overview. Compiler Design 1

Compiler Design Overview. Compiler Design 1 Compiler Design Overview Compiler Design 1 Preliminaries Required Basic knowledge of programming languages. Basic knowledge of FSA and CFG. Knowledge of a high programming language for the programming

More information

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table COMPILER CONSTRUCTION Lab 2 Symbol table LABS Lab 3 LR parsing and abstract syntax tree construction using ''bison' Lab 4 Semantic analysis (type checking) PHASES OF A COMPILER Source Program Lab 2 Symtab

More information

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou Administrative! Next time reading assignment [ALSU07] Chapters 1,2 [ALSU07] Sections 1.1-1.5 (cover in class) [ALSU07] Section 1.6 (read on your

More information

PESIT Bangalore South Campus Hosur road, 1km before Electronic City, Bengaluru -100 Department of Computer Science and Engineering

PESIT Bangalore South Campus Hosur road, 1km before Electronic City, Bengaluru -100 Department of Computer Science and Engineering TEST 1 Date : 24 02 2015 Marks : 50 Subject & Code : Compiler Design ( 10CS63) Class : VI CSE A & B Name of faculty : Mrs. Shanthala P.T/ Mrs. Swati Gambhire Time : 8:30 10:00 AM SOLUTION MANUAL 1. a.

More information

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata Lexical Analysis Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata Phase Ordering of Front-Ends Lexical analysis (lexer) Break input string

More information

1. The output of lexical analyser is a) A set of RE b) Syntax Tree c) Set of Tokens d) String Character

1. The output of lexical analyser is a) A set of RE b) Syntax Tree c) Set of Tokens d) String Character 1. The output of lexical analyser is a) A set of RE b) Syntax Tree c) Set of Tokens d) String Character 2. The symbol table implementation is based on the property of locality of reference is a) Linear

More information

Introduction to Lexical Analysis

Introduction to Lexical Analysis Introduction to Lexical Analysis Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexers Regular expressions Examples

More information

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Subject Name: CS2352 Principles of Compiler Design Year/Sem : III/VI

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Subject Name: CS2352 Principles of Compiler Design Year/Sem : III/VI DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Subject Name: CS2352 Principles of Compiler Design Year/Sem : III/VI UNIT I - LEXICAL ANALYSIS 1. What is the role of Lexical Analyzer? [NOV 2014] 2. Write

More information

The Structure of a Syntax-Directed Compiler

The Structure of a Syntax-Directed Compiler Source Program (Character Stream) Scanner Tokens Parser Abstract Syntax Tree (AST) Type Checker Decorated AST Translator Intermediate Representation Symbol Tables Optimizer (IR) IR Code Generator Target

More information

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous. Section A 1. What do you meant by parser and its types? A parser for grammar G is a program that takes as input a string w and produces as output either a parse tree for w, if w is a sentence of G, or

More information

Introduction to Lexical Analysis

Introduction to Lexical Analysis Introduction to Lexical Analysis Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexical analyzers (lexers) Regular

More information

Crafting a Compiler with C (II) Compiler V. S. Interpreter

Crafting a Compiler with C (II) Compiler V. S. Interpreter Crafting a Compiler with C (II) 資科系 林偉川 Compiler V S Interpreter Compilation - Translate high-level program to machine code Lexical Analyzer, Syntax Analyzer, Intermediate code generator(semantics Analyzer),

More information

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square) CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square) Introduction This semester, through a project split into 3 phases, we are going

More information

Life Cycle of Source Program - Compiler Design

Life Cycle of Source Program - Compiler Design Life Cycle of Source Program - Compiler Design Vishal Trivedi * Gandhinagar Institute of Technology, Gandhinagar, Gujarat, India E-mail: raja.vishaltrivedi@gmail.com Abstract: This Research paper gives

More information

VALLIAMMAI ENGINEERING COLLEGE

VALLIAMMAI ENGINEERING COLLEGE VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur 60 20 DEPARTMENT OF INFORMATION TECHNOLOGY QUESTION BANK VI SEMESTER CS6660 COMPILER DESIGN Regulation 20 Academic Year 207 8 Prepared by Ms. S.

More information

Lexical Analyzer Scanner

Lexical Analyzer Scanner Lexical Analyzer Scanner ASU Textbook Chapter 3.1, 3.3, 3.4, 3.6, 3.7, 3.5 Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Main tasks Read the input characters and produce

More information

DEPARTMENT OF INFORMATION TECHNOLOGY / COMPUTER SCIENCE AND ENGINEERING UNIT -1-INTRODUCTION TO COMPILERS 2 MARK QUESTIONS

DEPARTMENT OF INFORMATION TECHNOLOGY / COMPUTER SCIENCE AND ENGINEERING UNIT -1-INTRODUCTION TO COMPILERS 2 MARK QUESTIONS BHARATHIDASAN ENGINEERING COLLEGE DEPARTMENT OF INFORMATION TECHNOLOGY / COMPUTER SCIENCE AND ENGINEERING Year & Semester : III & VI Degree & Branch : B.E (CSE) /B.Tech (Information Technology) Subject

More information

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08 CS412/413 Introduction to Compilers Tim Teitelbaum Lecture 2: Lexical Analysis 23 Jan 08 Outline Review compiler structure What is lexical analysis? Writing a lexer Specifying tokens: regular expressions

More information

Unit 13. Compiler Design

Unit 13. Compiler Design Unit 13. Compiler Design Computers are a balanced mix of software and hardware. Hardware is just a piece of mechanical device and its functions are being controlled by a compatible software. Hardware understands

More information

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast! Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast! Compiler Passes Analysis of input program (front-end) character stream

More information

VALLIAMMAI ENGNIEERING COLLEGE SRM Nagar, Kattankulathur

VALLIAMMAI ENGNIEERING COLLEGE SRM Nagar, Kattankulathur VALLIAMMAI ENGNIEERING COLLEGE SRM Nagar, Kattankulathur 603203. DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year & Semester : III & VI Section : CSE 1 & 2 Subject Code : CS6660 Subject Name : COMPILER

More information

PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer

PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer As the first phase of a compiler, the main task of the lexical analyzer is to read the input

More information

Reading Assignment. Scanner. Read Chapter 3 of Crafting a Compiler.

Reading Assignment. Scanner. Read Chapter 3 of Crafting a Compiler. Reading Assignment Source Program (Character Stream) Scanner Tokens Parser Abstract Syntax Tree (AST) Type Checker Decorated AST Read Chapter 3 of Crafting a Compiler. Translator Intermediate Representation

More information

Lexical Analyzer Scanner

Lexical Analyzer Scanner Lexical Analyzer Scanner ASU Textbook Chapter 3.1, 3.3, 3.4, 3.6, 3.7, 3.5 Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Main tasks Read the input characters and produce

More information

COMPILER DESIGN - QUICK GUIDE COMPILER DESIGN - OVERVIEW

COMPILER DESIGN - QUICK GUIDE COMPILER DESIGN - OVERVIEW COMPILER DESIGN - QUICK GUIDE http://www.tutorialspoint.com/compiler_design/compiler_design_quick_guide.htm COMPILER DESIGN - OVERVIEW Copyright tutorialspoint.com Computers are a balanced mix of software

More information

Figure 2.1: Role of Lexical Analyzer

Figure 2.1: Role of Lexical Analyzer Chapter 2 Lexical Analysis Lexical analysis or scanning is the process which reads the stream of characters making up the source program from left-to-right and groups them into tokens. The lexical analyzer

More information

CS131: Programming Languages and Compilers. Spring 2017

CS131: Programming Languages and Compilers. Spring 2017 CS131: Programming Languages and Compilers Spring 2017 Course Information Instructor: Fu Song Office: Room 1A-504C, SIST Building Email: songfu@shanghaitech.edu.cn Class Hours : Tuesday and Thursday, 8:15--9:55

More information

UNIT III & IV. Bottom up parsing

UNIT III & IV. Bottom up parsing UNIT III & IV Bottom up parsing 5.0 Introduction Given a grammar and a sentence belonging to that grammar, if we have to show that the given sentence belongs to the given grammar, there are two methods.

More information

Gujarat Technological University Sankalchand Patel College of Engineering, Visnagar B.E. Semester VII (CE) July-Nov Compiler Design (170701)

Gujarat Technological University Sankalchand Patel College of Engineering, Visnagar B.E. Semester VII (CE) July-Nov Compiler Design (170701) Gujarat Technological University Sankalchand Patel College of Engineering, Visnagar B.E. Semester VII (CE) July-Nov 2014 Compiler Design (170701) Question Bank / Assignment Unit 1: INTRODUCTION TO COMPILING

More information

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 2

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 2 CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 2 CS 536 Spring 2015 1 Reading Assignment Read Chapter 3 of Crafting a Com piler. CS 536 Spring 2015 21 The Structure

More information

CS 403: Scanning and Parsing

CS 403: Scanning and Parsing CS 403: Scanning and Parsing Stefan D. Bruda Fall 2017 THE COMPILATION PROCESS Character stream Scanner (lexical analysis) Token stream Parser (syntax analysis) Parse tree Semantic analysis Abstract syntax

More information

Intermediate Code Generation

Intermediate Code Generation Intermediate Code Generation In the analysis-synthesis model of a compiler, the front end analyzes a source program and creates an intermediate representation, from which the back end generates target

More information

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective Chapter 4 Lexical analysis Lexical scanning Regular expressions DFAs and FSAs Lex Concepts CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 1 CMSC 331, Some material 1998 by Addison Wesley

More information

UNIT III. The following section deals with the compilation procedure of any program.

UNIT III. The following section deals with the compilation procedure of any program. Pune Vidyarthi Griha s COLLEGE OF ENGINEERING, NASHIK-4. 1 UNIT III Role of lexical analysis -parsing & Token, patterns and Lexemes & Lexical Errors, regular definitions for the language constructs & strings,

More information

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore Module No. # 01 Lecture No. # 01 An Overview of a Compiler This is a lecture about

More information

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective Concepts Lexical scanning Regular expressions DFAs and FSAs Lex CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 1 CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 2 Lexical analysis

More information

Compiler Design Aug 1996

Compiler Design Aug 1996 Aug 1996 Part A 1 a) What are the different phases of a compiler? Explain briefly with the help of a neat diagram. b) For the following Pascal keywords write the state diagram and also write program segments

More information

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES THE COMPILATION PROCESS Character stream CS 403: Scanning and Parsing Stefan D. Bruda Fall 207 Token stream Parse tree Abstract syntax tree Modified intermediate form Target language Modified target language

More information

Compiler Design (40-414)

Compiler Design (40-414) Compiler Design (40-414) Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007 Evaluation: Midterm Exam 35% Final Exam 35% Assignments and Quizzes 10% Project

More information

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language? The Front End Source code Front End IR Back End Machine code Errors The purpose of the front end is to deal with the input language Perform a membership test: code source language? Is the program well-formed

More information

Compiler course. Chapter 3 Lexical Analysis

Compiler course. Chapter 3 Lexical Analysis Compiler course Chapter 3 Lexical Analysis 1 A. A. Pourhaji Kazem, Spring 2009 Outline Role of lexical analyzer Specification of tokens Recognition of tokens Lexical analyzer generator Finite automata

More information

Lexical and Syntax Analysis

Lexical and Syntax Analysis Lexical and Syntax Analysis In Text: Chapter 4 N. Meng, F. Poursardar Lexical and Syntactic Analysis Two steps to discover the syntactic structure of a program Lexical analysis (Scanner): to read the input

More information

When do We Run a Compiler?

When do We Run a Compiler? When do We Run a Compiler? Prior to execution This is standard. We compile a program once, then use it repeatedly. At the start of each execution We can incorporate values known at the start of the run

More information

Compiling Regular Expressions COMP360

Compiling Regular Expressions COMP360 Compiling Regular Expressions COMP360 Logic is the beginning of wisdom, not the end. Leonard Nimoy Compiler s Purpose The compiler converts the program source code into a form that can be executed by the

More information

Part 5 Program Analysis Principles and Techniques

Part 5 Program Analysis Principles and Techniques 1 Part 5 Program Analysis Principles and Techniques Front end 2 source code scanner tokens parser il errors Responsibilities: Recognize legal programs Report errors Produce il Preliminary storage map Shape

More information

Chapter 3 Lexical Analysis

Chapter 3 Lexical Analysis Chapter 3 Lexical Analysis Outline Role of lexical analyzer Specification of tokens Recognition of tokens Lexical analyzer generator Finite automata Design of lexical analyzer generator The role of lexical

More information

Compilers and Code Optimization EDOARDO FUSELLA

Compilers and Code Optimization EDOARDO FUSELLA Compilers and Code Optimization EDOARDO FUSELLA The course covers Compiler architecture Pre-requisite Front-end Strong programming background in C, C++ Back-end LLVM Code optimization A case study: nu+

More information

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis Lexical and Syntactic Analysis Lexical and Syntax Analysis In Text: Chapter 4 Two steps to discover the syntactic structure of a program Lexical analysis (Scanner): to read the input characters and output

More information