2014 Sem - VII Introduction 1) Explain Phases of compiler or Analysis synthesis model of compilation. The compiler is designed into two parts. The first phase is the analysis phase while the second phase is called synthesis. The main objective of the analysis phase is to break the source code into parts. It then arranges these pieces into a meaningful structure (or grammar of the language). Lexical analysis, syntax analysis and semantic analysis constitute the analysis phase. The Synthesis phase is concerned with generation of target language statement which have the same meaning as the source statement. Lexical Analysis: Lexical analysis identifies the lexical unit in a source statement. Then it classifies the units into different lexical classes. E.g. id s, constants, keyword etc...and enters then into different tables. Lexical analysis builds a descriptor, called a token. We represent token as code#no Consider following code i: integer; a,b: real; abi; The statement a:bi is represented as a string of token a b i Id#1 Op#1 Id#2 Op#2 Id#3 Syntax Analysis: The syntax analyzer checks each line of the code and spots every tiny mistake that the programmer has committed while typing the code. Syntax analysis processes the string of token to determine statement class and also check whether given statement is syntax wise valid or not. a b i Semantic Analysis: The letter identifies the sequence of actions necessary to implement the meaning of a source statement. It adds information to a table or adds action to the sequence of actions. The analysis ends when the tree has been completely processed. Dixita Kagathara Page 1
a, real a, real a,real temp,real b,real i,int b,real i*,real Intermediate representation IR contains intermediate code and table. Symbol table symbol Type length address 1 i int 2 a real 3 b real 4 i* real 5 temp real Intermediate code 1. t1: int to real (id1) 2. t2:t1b 3. a:t2 Code Optimization: As the name suggests, this phase aims at optimizing the target code. The code can be optimized in terms of time taken to execute, length of the code, memory utilized or any other criteria. Code Generation: Target code is generated at this phase using the intermediate representation of the source program. The machine instructions perform the same tasks as the intermediate code. Registers are allocated to variables in the program. This has to be done carefully so as to avoid any clashes or repeated assignments. Various algorithms have been formulated to generate the most efficient machine code. the synthesis phase may decide to hold the value of i* and temp in machine registers and may generate the assembly code CONV_R AREG, I ADD_R AREG, B MOVEM AREG, A Dixita Kagathara Page 2
Lexical Analysis Syntax Analysis Symbol Table Semantic Analysis Intermediate Code Generation Error detection and recovery Code Optimization Code Generation Target A symbol table is a data structure used by a language translator such as a compiler or interpreter, for storing names encountered in the source program, along with the relevant attributes for those names. Information about following entities is store in symbol table Variable/Identifier Procedure/function Keyword (store before lexical analysis starts) Constant Class name Label name Structure & union name Symbol table is used in different phases of compiler as listed below Semantic Analysis: check correct semantic usage of language constructs Code generation: All program variables and temporaries need to be allocated some memory locations Error Detection: Leave variables undefined Optimization: To reduce the total number of variables used in a program we need to reuse the temporaries generated by the compiler 2) What is passes of compiler? Also explain forward reference issue. Language processing analysis of SP synthesis of TP. Each compiler consists of mainly two phases 1. Analysis phase 2. Synthesis phase Analysis phase uses each component of source language to determine relevant information Dixita Kagathara Page 3
concerning a statement in the source statement. Thus, analysis of source statement consists of lexical, syntax and semantic analysis.(front end) While, synthesis phase is concerned with the construction of target language. It includes mainly two activities memory allocation and code generation.(back end) Compiler Source Analysis phase Synthesis phase Target Errors Errors If compilation can be performed on statement by statement basis- that is, analysis of source statement cab be immediately followed by synthesis of equivalent target statement. This may not be feasible due to: Forward reference: a forward reference of a program entity is a reference to the entity which precedes its definition in the program. This problem can be solved by postponing the generation of target code until more information concerning the entity becomes available. It leads to multipass model of language processing. Language processor pass: a language processor pass is the processing of every statement in a source program, to perform language processing function. In Pass I: Perform analysis of the source program and note relevant information In Pass II: It once again analyses the source program to generate target code using type information noted in pass I. The language processor performs certain processing more than once. This can be avoided using an intermediate representation (IR) of the source program An intermediate representation is a representation of a source program which reflects the effect of some, but not all, analysis and synthesis task performed during language processing. Source Front End Back End Target Intermediate representation (IR) 3) Explain cousins of the compiler OR Explain roles linker, loader and preprocessor in compilation process. Preprocessor Preprocessor produce input to compiler. They may perform the following functions: Dixita Kagathara Page 4
1. Macro processing: A preprocessor may allow user to define macros that are shorthands for longer constructs. 2. File inclusion: A preprocessor may include the header file into the program text. 3. Rational preprocessor: Such a preprocessor provides the user with built in macro for construct like while statement or if statement. 4. Language extensions: this processors attempt to add capabilities to the language by what amount to built-in macros. Ex: the language equal is a database query language embedded in C. statement beginning with ## are taken by preprocessor to be database access statement unrelated to C and translated into procedure call on routines that perform the database access. Linker Linker allows us to make a single program from a several files of relocatable machine code. These file may have been the result of several different compilation, and one or more may be library files of routine provided by a system and available to any program that needs them. Loader The process of loading consists of taking relocatable machine code, altering the relocatable address and placing the altered instructions and data in memory at the proper location. Dixita Kagathara Page 5