PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

Similar documents
COMP455: COMPILER AND LANGUAGE DESIGN. Dr. Alaa Aljanaby University of Nizwa Spring 2013

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILING

COMPILER DESIGN LECTURE NOTES

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou



COMPILER DESIGN. For COMPUTER SCIENCE

A Simple Syntax-Directed Translator

CS 4201 Compilers 2014/2015 Handout: Lab 1

SEM / YEAR : VI / III CS2352 PRINCIPLES OF COMPLIERS DESIGN UNIT I - LEXICAL ANALYSIS PART - A

Compilers. Prerequisites

Sardar Vallabhbhai Patel Institute of Technology (SVIT), Vasad M.C.A. Department COSMOS LECTURE SERIES ( ) (ODD) Code Optimization

LECTURE NOTES ON COMPILER DESIGN P a g e 2

Compiler Design. Subject Code: 6CS63/06IS662. Part A UNIT 1. Chapter Introduction. 1.1 Language Processors

Formal Languages and Compilers Lecture I: Introduction to Compilers

CST-402(T): Language Processors

Pioneering Compiler Design

PESIT Bangalore South Campus Hosur road, 1km before Electronic City, Bengaluru -100 Department of Computer Science and Engineering

A simple syntax-directed

Compiling and Interpreting Programming. Overview of Compilers and Interpreters

COMPILER DESIGN LEXICAL ANALYSIS, PARSING

Intermediate Code Generation

Introduction to Compiler Construction

INTRODUCTION TO COMPILER AND ITS PHASES

Introduction to Compiler Construction

Introduction to Compiler Construction

Earlier edition Dragon book has been revised. Course Outline Contact Room 124, tel , rvvliet(at)liacs(dot)nl

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

2.2 Syntax Definition

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

LANGUAGE TRANSLATORS

Compiler, Assembler, and Linker

VIVA QUESTIONS WITH ANSWERS

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

Compiling Regular Expressions COMP360

Working of the Compilers

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

1. The output of lexical analyser is a) A set of RE b) Syntax Tree c) Set of Tokens d) String Character

Compiler Structure. Lexical. Scanning/ Screening. Analysis. Syntax. Parsing. Analysis. Semantic. Context Analysis. Analysis.

Compiler Design (40-414)

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

COP 3402 Systems Software. Lecture 4: Compilers. Interpreters

Dixita Kagathara Page 1

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

Compilers and Code Optimization EDOARDO FUSELLA

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

Why are there so many programming languages? Why do we have programming languages? What is a language for? What makes a language successful?

Introduction. Compilers and Interpreters

Lexical Scanning COMP360

Computer Hardware and System Software Concepts

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.

UNIT I- LEXICAL ANALYSIS. 1.Interpreter: It is one of the translators that translate high level language to low level language.

Crafting a Compiler with C (II) Compiler V. S. Interpreter

What do Compilers Produce?

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

Introduction to Syntax Analysis. Compiler Design Syntax Analysis s.l. dr. ing. Ciprian-Bogdan Chirila

Compilers and Interpreters

Compiler Code Generation COMP360

MidTerm Papers Solved MCQS with Reference (1 to 22 lectures)

SLIDE 2. At the beginning of the lecture, we answer question: On what platform the system will work when discussing this subject?

UNIT I. Pune Vidyarthi Griha s COLLEGE OF ENGINEERING, NASHIK-4. 1

Compiler Theory Introduction and Course Outline Sandro Spina Department of Computer Science

Stating the obvious, people and computers do not speak the same language.

UNIT-4 (COMPILER DESIGN)

Compilers. Pierre Geurts

CS606- compiler instruction Solved MCQS From Midterm Papers

Life Cycle of Source Program - Compiler Design

Early computers (1940s) cost millions of dollars and were programmed in machine language. less error-prone method needed

Question Bank. 10CS63:Compiler Design

Principles of Compiler Design

The Structure of a Syntax-Directed Compiler

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Subject Name: CS2352 Principles of Compiler Design Year/Sem : III/VI

Time : 1 Hour Max Marks : 30

Compiler Design. Lecture 1

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

Undergraduate Compilers in a Day

Compiler Theory. (Semantic Analysis and Run-Time Environments)

Advanced Topics in MNIT. Lecture 1 (27 Aug 2015) CADSL

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING ACADEMIC YEAR / EVEN SEMESTER

4. An interpreter is a program that

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Theory and Compiling COMP360

ST. XAVIER S COLLEGE

COSC121: Computer Systems: Runtime Stack

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Languages and Compilers

Language Translation. Compilation vs. interpretation. Compilation diagram. Step 1: compile. Step 2: run. compiler. Compiled program. program.

COMPILERS BASIC COMPILER FUNCTIONS

When do We Run a Compiler?

Formats of Translated Programs

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Introduction to Compilers

MAHARASHTRA STATE BOARD OF TECHNICAL EDUCATION (Autonomous) (ISO/IEC Certified) WINTER-15 EXAMINATION Model Answer Paper

Syntax Analysis. Chapter 4

Transcription:

Objective PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS Explain what is meant by compiler. Explain how the compiler works. Describe various analysis of the source program. Describe the phases of compiler. Explain about Loaders, Linkers & Assemblers. Differentiate Lexical and Syntax Analysis. Describe various compiler construction tools. 1.1 Compiler A compiler is a program that can read a program in one language - the source language - and translate it into an equivalent program in another language - the target language. An interpreter is another common kind of language processor. Instead of producing a target program as a translation, an interpreter appears to directly execute the operations specified in the source program on inputs supplied by the user. 1.2 Analysis of the source program Linear Analysis, in which the stream of characters making up the source program is read from left to right and grouped into tokens that are sequences of characters having a collective meaning. Hierarchical Analysis, in which characters or tokens are grouped hierarchically into nested collections with collective meaning. Semantic Analysis, in which certain checks are performed to ensure that the components of a program fit together meaningfully. Principles of Compiler Design - Unit1 1

1. Lexical analysis Linear analysis is also called as Lexical analysis or scanning. The lexical analyzer reads the stream of characters making up the source program and groups the characters into meaningful sequences called lexemes. For each lexeme, the lexical analyzer produces as output a token of the form (token-name, attribute-value) that it passes on to the subsequent phase, syntax analysis. For example, suppose a source program contains the assignment statement position = i n i t i a l + r a t e * 60. Then in lexical analysis the charcetrs are grouped into to the following tokens: 1. The identifier position 2. The assignment symbol = 3. The identifier initial 4. The plus sign 5. The identifier rate 6. The * sign 7. The number 60 2. Syntax Analysis Hierarchical analysis is also called as syntax analysis or parsing. The parser uses the first components of the tokens produced by the lexical analyzer to create a tree-like intermediate representation that depicts the grammatical structure of the token stream. A typical representation is a syntax tree in which each interior node represents an operation and the children of the node represent the arguments of the operation. The hierarchical structure of a program is usually expressed by recursive rules. For example, we might have the following rules as part of the definition of expressions: Principles of Compiler Design - Unit1 2

1. Any identifier is an expression. 2. Any number is an expression. 3. If expression1 and expression2 are expressions, then so are expression1 + expression2 expression1 * expression2 (expression1) 3. Semantic Analysis The semantic analyzer uses the syntax tree and the information in the symbol table to check the source program for semantic consistency with the language definition. It also gathers type information and saves it in either the syntax tree or the symbol table, for subsequent use during intermediate-code generation. An important part of semantic analysis is type checking, where the compiler checks that each operator has matching operands. The language specification may permit some type conversions called coercions. For example, a binary arithmetic operator may be applied to either a pair of integers or to a pair of floating-point numbers. If the operator is applied to a floating-point number and an integer, the compiler may convert or coerce the integer into a floating-point number. 1.3 Phases of a compiler A compiler is a program that reads a program in one language, the source language and translates into an equivalent program in another language, the target language. The translation process should also report the presence of errors in the source program. There are two parts of compilation. Principles of Compiler Design - Unit1 3

-> The analysis part breaks up the source program into constant piece and creates an intermediate representation of the source program. -> The synthesis part constructs the desired target program from the intermediate representation. The compiler has a number of phases plus symbol table manager and an error handler. The phases of a compiler are collected into front end and back end. The front end includes all analysis phases end the intermediate code generator. The back end includes the code optimization phase and final code generation phase. The front end analyzes the source program and produces intermediate code while the back end synthesizes the target program from the intermediate code. A correct approach to that front end might run the phases serially. 1. Lexical Analysis The first phase of a compiler is called lexical analysis or scanning. The lexical analyzer reads the stream of characters making up the source program and groups the characters into meaningful sequences called lexemes. For each lexeme, the lexical analyzer produces as output a token of the form (token-name, attribute-value) that it passes on to the subsequent phase, syntax analysis. In the token, the first component token-name is an abstract symbol that is used during syntax analysis, and the second component attribute- value points to an entry in the symbol table for this token. Information from the symbol-table entry 'is needed for semantic analysis and code generation. For example, suppose a source program contains the assignment statement position = i n i t i a l + r a t e * 60 The characters in this assignment could be grouped into the following lexemes and mapped into the following tokens passed on to the syntax analyzer: position is a lexeme that would be mapped into a token (id, I), where id is an abstract symbol standing for identifier and 1 points to the symbol table entry for position. The symbol-table entry for an identifier holds information about the identifier, such as its name and type. The assignment symbol = is a lexeme that is mapped into the token (=). Since this token needs no attribute-value, we have omitted the second component. i n i t i a l is a lexeme that is mapped into the token (id, 2), where 2 points to the symbol-table entry for i n i t i a l. 4. + is a lexeme that is mapped into the token (+). r a t e is a lexeme that is mapped into the token (id, 3), where 3 points to the symbol-table entry for r a t e. * is a lexeme that is mapped into the token (*). 60 is a lexeme that is mapped into the token (60).' Blanks separating the lexemes would be discarded by the lexical analyzer. Principles of Compiler Design - Unit1 4

Principles of Compiler Design - Unit1 5

2. Syntax Analysis The second phase of the compiler is syntax analysis or parsing. The parser uses the first components of the tokens produced by the lexical analyzer to create a tree-like intermediate representation that depicts the grammatical structure of the token stream. A typical representation is a syntax tree in which each interior node represents an operation and the children of the node represent the arguments of the operation. A syntax tree for the token stream (1.2) is shown as the output of the syntactic analyzer. This tree shows the order in which the operations in the assignment position = i n i t i a l + r a t e * 60 are to be performed. The tree has an interior node labeled * with (id, 3) as its left child and the integer 60 as its right child. The node (id, 3) represents the identifier rate. The node labeled * makes it explicit that we must first multiply the value of r a t e by 60. The node labeled + indicates that we must add the result of this multiplication to the value of i n i t i a l. The root of the tree, labeled =, indicates that we must store the result of this addition into the location for the identifier posit ion. This ordering of operations is consistent with the usual conventions of arithmetic which tell us that multiplication has higher precedence than addition, and hence that the multiplication is to be performed before the addition. 3. Semantic Analysis The semantic analyzer uses the syntax tree and the information in the symbol table to check the source program for semantic consistency with the language definition. It also gathers type information and saves it in either the syntax tree or the symbol table, for subsequent use during intermediate-code generation. An important part of semantic analysis is type checking, where the compiler checks that each operator has matching operands. For example, many programming language definitions require an Principles of Compiler Design - Unit1 6

array index to be an integer; the compiler must report an error if a floating-point number is used to index an array. The language specification may permit some type conversions called coercions. For example, a binary arithmetic operator may be applied to either a pair of integers or to a pair of floating-point numbers. If the operator is applied to a floating-point number and an integer, the compiler may convert or coerce the integer into a floating-point number. 4. Intermediate Code Generation In the process of translating a source program into target code, a compiler may construct one or more intermediate representations, which can have a variety of forms. Syntax trees are a form of intermediate representation; they are commonly used during syntax and semantic analysis. After syntax and semantic analysis of the source program, many compilers generate an explicit low-level or machine-like intermediate representation, which we can think of as a program for an abstract machine. This intermediate representation should have two important properties: it should be easy to produce and it should be easy to translate into the target machine. The output of the intermediate code generator consists of the three-address code sequence tl = i n t t o f l o a t (60) t 2 = id3 * tl t 3 = id2 + t 2 id1 = t 3 There are several points worth noting about three-address instructions. First, each three-address assignment instruction has at most one operator on the right side. Thus, these instructions fix the order in which operations are to be done; the multiplication precedes the addition in the source program. Second, the compiler must generate a temporary name to hold the value computed by a three-address instruction. Third, some "three-address instructions" like the first and last in the sequence above, have fewer than three operands. 5. Code Optimization The machine-independent code-optimization phase attempts to improve the intermediate code so that better target code will result. Usually better means faster, but other objectives may be desired, such as shorter code, or target code that consumes less power. For example, a straightforward algorithm generates the intermediate code using an instruction for each operator in the tree representation that comes from the semantic analyzer. A simple intermediate code generation algorithm followed by code optimization is a reasonable way to generate good target code. The optimizer can deduce that the conversion of 60 from integer to floating point can be done once and for all at compile time, so the inttofloat operation can be eliminated by replacing the integer 60 by the floating-point number 60.0. Principles of Compiler Design - Unit1 7

Moreover, t3 is used only once to transmit its value to id1 so the optimizer can transform into the shorter sequence, There is a great variation in the amount of code optimization different compilers perform. In those that do the most, the so-called "optimizing compilers," a significant amount of time is spent on this phase. 6. Code Generation The code generator takes as input an intermediate representation of the source program and maps it into the target language. If the target language is machine code, registers or memory locations are selected for each of the variables used by the program. Then, the intermediate instructions are translated into sequences of machine instructions that perform the same task. A crucial aspect of code generation is the judicious assignment of registers to hold variables. For example, using registers R1 and R2, the intermediate code in might get translated into the machine code LDF R2, id3 MULF R 2, R 2, #60.0 LDF R l, id2 ADDF R l, R l, R2 STF i d l, R l The first operand of each instruction specifies a destination. The F in each instruction tells us that it deals with floating-point numbers. The code loads the contents of address id3 into register R2, then multiplies it with floating-point constant 60.0. The # signifies that 60.0 is to be treated as an immediate constant. The third instruction moves id2 into register R1 and the fourth adds to it the value previously computed in register R2. Finally, the value in register R1 is stored into the address of i d l, so the code correctly implements the assignment statement. 7. Symbol-Table Management An essential function of a compiler is to record the variable names used in the source program and collect information about various attributes of each name. These attributes may provide information about the storage allocated for a name, its type, its scope (where in the program its value may be used), and in the ca,se of procedure names, such things as the number and types of its arguments, the method of passing each argument (for example, by value or by reference), and the type returned. The symbol table is a data structure containing a record for each variable name, with fields for the attributes of the name. The data structure should be designed to allow the compiler to find the record for each name quickly and to store or retrieve data from that record quickly. Principles of Compiler Design - Unit1 8

Limitations Requires enormous amount of space to store tokens and trees. Very slow since each phase would have to input and output to and from temporary disk Summary Language Processors. An integrated software development environment includes many different kinds of language processors such as compilers, interpreters, assemblers, linkers, loaders, debuggers, profilers. Compiler Phases. A compiler operates as a sequence of phases, each of which transforms the source program from one intermediate representation to another. Machine and Assembly Languages. Machine languages were the first generation programming languages, followed by assembly languages. Programming in these languages was time consuming and error prone. Modeling in Compiler Design. Compiler design is one of the places where theory has had the most impact on practice. Models that have been found useful include automata, grammars, regular expressions, trees, and many others. Code Optimization. Although code cannot truly be "optimized," the science of improving the efficiency of code is both complex and very important. It is a major portion of the study of compilation. Higher-Level Languages. As time goes on, programming languages take on progressively more of the tasks that formerly were left to the programmer, such as memory management, typeconsistency checking, or parallel execution of code. Key Terms >> compiler >> intermediate code generation >> tokens >> lexical analysis >> code optimization >> error handler >> syntax analysis >> code generation >> linear analysis >> semantic analysis >> symbol table >> hierarchical analysis Key Term Quiz 1. A ---------------- is a program that can read the source language and translate it into an equivalent program in to the target language. 2. An --------------- appears to directly execute the operations specified in the source program on inputs supplied by the user. Principles of Compiler Design - Unit1 9

3. -------------- analysis is also called as Lexical analysis or scanning. 4. --------------- Analysis, in which the stream of characters making up the source program is read from left to right and grouped into tokens that are sequences of characters having a collective meaning. 5. ------------------ Analysis, in which characters or tokens are grouped hierarchically into nested collections with collective meaning. 6. ------------------Analysis, in which certain checks are performed to ensure that the components of a program fit together meaningfully. 7. ----------------- phase attempts to improve the intermediate code so that better target code will result. 8. The groups the characters into meaningful sequences called ----------------- 9. ----------------- takes as input an intermediate representation of the source program and maps it into the target language. Multiple Choice Questions 1. -------------------- is a program that can read the source language and translate it into an equivalent program in to the target language. a. compiler b. interpreter c. assembler d. loader 2. Which one of the following that executes the operations specified in the source program on inputs supplied by the user? a. compiler b. interpreter c. assembler d. loader 3. Linear analysis is also called as a. lexical analysis b. syntax analysis c. hierarchical analysis d. semantic analysis 4. The stream of characters making up the source program is read from left to right and grouped into tokens called a. lexical analysis b. syntax analysis c. hierarchical analysis d. semantic analysis 5. Which one of the following phase is the front end of the compiler? a. lexical analysis b. code generation c. code optimization d. None of the above 6. Which one of the following phase is the back end of the compiler? a. lexical analysis b. linear analysis c. code optimization d. syntax analysis 7. Which phase attempts to improve the intermediate code so that better target code will result? a. lexical analysis b. code generation c. code optimization d. syntax analysis 8. Which phase takes as input an intermediate representation of the source program and maps it into the target language? a. lexical analysis b. code generation c. code optimization d. syntax analysis Principles of Compiler Design - Unit1 10

9. Which phase is used to separate the tokens from the expression? a. lexical analysis b. code generation c. code optimization d. syntax analysis 10. Which phase is used to construct the parse tree for the given expression? a. lexical analysis b. code generation c. code optimization d. syntax analysis 11. A language translator is a program a. which converts the program from machine to high level b. which converts the program from C to C++ c. which converts the program from high level to machine d. which converts the program from assembly to machine 12. Which of the following translator converts the assembly language to object language? a. assembler b. compiler c. macro processor d. linker 13. A compiler is a program that a. places program into memory and prepares them for execution b. automates the translation of assembly language into machine language c. accepts the program written in high level language and produces an object program d. appears to execute a source program as if it were machine language 14. An interpreter is a program that a. places program into memory and prepares them for execution b. automates the translation of assembly language into machine language c. accepts the program written in high level language and produces an object program d. appears to execute a source program as if it were machine language 15. A load is a program that a. places program into memory and prepares them for execution b. automates the translation of assembly language into machine language c. accepts the program written in high level language and produces an object program d. appears to execute a source program as if it were machine language 16. A program written in high level language is known as a. source program b. object program c. OS d. none 17. Compiler can diagnose a. grammatical errors only b. logical errors only c. grammatical as well as logical errors d. neither grammatical nor logical errors 18. Object program is a a. program written in machine language Principles of Compiler Design - Unit1 11

b. program translated into machine language c. the translation of high level language into machine language d. none 19. A programmer by mistake written an instruction to divide, instead if multiply, such errors can be detected by an a. compiler b. interpreter c. compiler or interpreter d. none 20. Semantic errors can be detected a. at compile time only b. at run time only c. both d. none 21. The task lexical analysis program is a. to parse the source program to basic or tokens of the language b. to build a literal table and an identifier table c. to build a uniform symbol table d. all of the above 22. The function of syntax analyzer is a. to recognize the major constructs of the language and calls the appropriate the action routines that will generate the intermediate form for these constructs b. to build a literal table and an identifier table c. to build a uniform symbol table d. to parse the source program to basic or tokens of the language 23. The errors that can pointed by the compiler is a. syntax error b. semantic error c. logical error d. internal error Review Questions Two mark Questions 1. Define a symbol table. 2. Differentiate compiler and interpreter. 3. What is a language processing system? 4. What are error recovery actions in lexical analyzer? 5. How to analyze the source program? 6. What does translator mean? 7. What are the phases of a compiler? 8. Mention the role of semantic analysis? 9. What are the different data structures used for symbol table? Principles of Compiler Design - Unit1 12

10. What is front-end and back-end of the compiler? 11. What is linear analysis? 12. What is the main difference between phase and pass of a compiler? Big Questions 1. Write about the phases of compiler and by assuming an input and show the output of various phases. 2. Explian in detail the role of lexical analyzer with the possible error recovery actions. Lesson Labs Exercise 1 : What is the difference between a compiler and an interpreter? Exercise 2 : What are the advantages of (a) a compiler over an interpreter (b) an interpreter over a compiler? 1.4 Cousins of the Compiler The cousins of the compiler are 1. Preprocessor. 2. Assembler. 3. Loader and Link-editor. 1. Preprocessors Preprocessors produce input to compilers. They may perform the following functions. Macro processing -> A preprocessor may allow a user to define macros that are shorthand s for longer constructs. File Inclusion -> A preprocessor may include hearder files into program text. Rational preprocessors -> These preprocessors augment older languages with more tokens flow of control and data structuring faculties. Language extensions -> These preprocessors attempt to add capabilities to the language by what amounts to built in macros. 2. Assembler A program that translates a symbolic version of an instruction into the binary version. Assemblers provide a friendlier representation than a computer s 0s and 1s that simplifies writing and reading programs. Symbolic names for operations and locations are one facet of this representation. An assembler reads a single assembly language source file and produces an object file containing machine instructions and bookkeeping information that Principles of Compiler Design - Unit1 13

helps combine several object files into a program. Assembly language is the symbolic representation of a computer s binary encoding machine language. Assembly language is more readable than machine language because it uses symbols instead of bits. Two-Pass Assemblers A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ). In the first pass all it does is looks for label definitions and introduces them in the symbol table. In the second pass, after the symbol table is complete, it does the actual assembly by translating the operations and so on. 3. Loaders and Link editors Loader A utility program that sets up an executable program in main memory ready for execution. This is the final stage of the compiling/assembly process. Link editor (linkage editor, linker) is the utility program that combines several separately compiled modules into one, resolving internal references between them. When a program is assembled or compiled, an intermediate form is produced into which it is necessary to incorporate library material containing the implementation of standard routines and procedures, and to add any other modules that may have been supplied by the user, possibly in other high-level languages.` 1.5 The Grouping of Phases The discussion of phases deals with the logical organization of a compiler. In an implementation, activities from several phases may be grouped together into a pass that reads an input file and writes an output file. For example, the front-end phases of lexical analysis, syntax analysis, semantic analysis, and intermediate code generation might be grouped together into one pass. Code optimization might be an optional pass. Then there could be a back-end pass consisting of code generation for a particular target machine. Some compiler collections have been created around carefully designed intermediate representations that allow the front end for a particular language to interface with the back end for a certain target machine. With these collections, we can produce compilers for different source languages for one target machine by combining different front ends with the back end for that target machine. Similarly, we can produce compilers for different target machines, by combining a front end with back ends for different target machines. 1.6 Compiler-Construction Tools The compiler writer, like any software developer, can profitably use modern software development environments containing tools such as language editors, debuggers, version managers, profilers, test harnesses, and so on. In addition to these general software-development tools, other more specialized tools have been created to help implement various phases of a compiler. These tools use specialized Principles of Compiler Design - Unit1 14

languages for specifying and implementing specific components, and many use quite sophisticated algorithms. The most successful tools are those that hide the details of the generation algorithm and produce components that can be easily integrated into the remainder of the compiler. Some commonly used compiler-construction tools include 1. Parser generators that automatically produce syntax analyzers from a grammatical description of a programming language. 2. Scanner generators that produce lexical analyzers from a regular-expression description of the tokens of a language. 3. Syntax-directed translation engines that produce collections of routines for walking a parse tree and generating intermediate code. 4. Code-generator generators that produce a code generator from a collection of rules for translating each operation of the intermediate language into the machine language for a target machine. 5. Data-flow analysis engines that facilitate the gathering of information about how values are transmitted from one part of a program to each other part. Data-flow analysis is a key part of code optimization. 6. Compiler-construction toolk2ts that provide an integrated set of routines for constructing various phases of a compiler. 1.7 Translators Language translators convert programming source code into language that the computer processor understands. Programming source code has various structures and commands, but computer processors only understand machine language. Different types of translations must occur to turn programming source code into machine language, which is made up of bits of binary data. The three major types of language translators are compilers, assemblers, and interpreters. (i) Compilers Most 3GL and higher-level programming languages use a compiler for language translation. A compiler is a special program that takes written source code and turns it into machine language. When a compiler executes, it analyzes all of the language statements in the source code and builds the machine language object code. After a program is compiled, it is then a form that the processor can execute one instruction at a time. Principles of Compiler Design - Unit1 15

In some operating systems, an additional step called linking is required after compilation. Linking resolves the relative location of instructions and data when more than one object module needs to be run at the same time and both modules cross-reference each others instruction sequences or data. Most high-level programming languages come with a compiler. However, object code is unique for each type of computer. Many different compilers exist for each language in order to translate for each type of computer. In addition, the compiler industry is quite competitive, so there are actually many compilers for each language on each type of computer. Although they require an extra step before execution, compiled programs often run faster than programs executed using an interpreter. (ii) Assembler An assembler translates assembly language into machine language. Assembly language is one step removed from machine language. It uses computer-specific commands and structure similar to machine language, but assembly language uses names instead of numbers. An assembler is similar to a compiler, but it is specific to translating programs written in assembly language into machine language. To do this, the assembler takes basic computer instructions from assembly language and converts them into a pattern of bits for the computer processor to use to perform its operations. (iii) Interpreters Many high-level programming languages have the option of using an interpreter instead of a compiler. Some of these languages exclusively use an interpreter. An interpreter behaves very differently from compilers and assemblers. It converts programs into machine-executable form each time they are executed. It analyzes and executes each line of source code, in order, without looking at the entire program. Instead of requiring a step before program execution, an interpreter processes the program as it is being executed. 1.8 One Pass Compilers A one-pass compiler is a compiler that passes through the source code of each compilation unit only once. In other words, a one-pass compiler does not "look back" at code it previously processed. Another term sometimes used is narrow compiler, which emphasizes the limited scope a one-pass compiler is Principles of Compiler Design - Unit1 16

obliged to use. This is in contrast to a multi-pass compiler which traverses the source code and/or the abstract syntax tree several times, building one or more intermediate representations that can be arbitrarily refined. While one-pass compilers may be faster than multi-pass compilers, they are unable to generate as efficient programs, due to the limited scope available. (Many optimizations require multiple passes over a program, subroutine, or basic block.) In addition, some programming languages simply cannot be compiled in a single pass, as a result of their design. In contrast, some programming languages have been designed specifically to be compiled with one-pass compilers, and include special constructs to allow one-pass compilation. One-pass compilers are faster, but may not generate an as efficient program. In addition, one-pass compilers cannot compile all types of source codes. One-pass versus multi-pass compilers Classifying compilers by number of passes has its background in the hardware resource limitations of computers. Compiling involves performing lots of work and early computers did not have enough memory to contain one program that did all of this work. So compilers were split up into smaller programs which each made a pass over the source (or some representation of it) performing some of the required analysis and translations. The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one pass compilers generally compile faster than multi-pass compilers. Many languages were designed so that they could be compiled in a single pass (e.g., Pascal). In some cases the design of a language feature may require a compiler to perform more than one pass over the source. For instance, consider a declaration appearing on line 20 of the source which affects the Principles of Compiler Design - Unit1 17

translation of a statement appearing on line 10. In this case, the first pass needs to gather information about declarations appearing after statements that they affect, with the actual translation happening during a subsequent pass. The disadvantage of compiling in a single pass is that it is not possible to perform many of the sophisticated optimizations needed to generate high quality code. It can be difficult to count exactly how many passes an optimizing compiler makes. For instance, different phases of optimization may analyse one expression many times but only analyse another expression once. Splitting a compiler up into small programs is a technique used by researchers interested in producing provably correct compilers. Proving the correctness of a set of small programs often requires less effort than proving the correctness of a larger, single, equivalent program. While the typical multi-pass compiler outputs machine code from its final pass, there are several other types: A "source-to-source compiler" is a type of compiler that takes a high level language as its input and outputs a high level language. For example, an automatic parallelizing compiler will frequently take in a high level language program as an input and then transform the code and annotate it with parallel code annotations (e.g. OpenMP) or language constructs (e.g. Fortran's DOALL statements). Stage compiler that compiles to assembly language of a theoretical machine, like some Prologimplementations Just-in-time compiler, used by Smalltalk and Java systems, and also by Microsoft.Net's Common Intermediate Language (CIL) Difference between single pass compiler and multi pass compiler A one-pass compiler is a compiler that passes through the source code of each compilation unit only once. A multi-pass compiler is a type of compiler that processes the source code or abstract syntax tree of a program several times. A one-pass compilers is faster than multi-pass compilers A one-pass compiler has limited scope of passes but multi-pass compiler has wide scope of passes. Multi-pass compilers are sometimes called wide compilers where as one-pass compiler are sometimes called narrow compiler. Many programming languages cannot be represented with a single pass compilers, for example Pascal can be implemented with a single pass compiler where as languages like Java require a multi-pass compiler. Principles of Compiler Design - Unit1 18

Summary Assemblers provide a friendlier representation than a computer s 0s and 1s that simplifies writing and reading programs. A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ). Loader is a utility program that sets up an executable program in main memory ready for execution. This is the final stage of the compiling/assembly process. Link editor (linkage editor, linker) is the utility program that combines several separately compiled modules into one, resolving internal references between them. Translators convert programming source code into language that the computer processor understands. An assembler translates assembly language into machine language. Assembly language is one step removed from machine language. An interpreter behaves very differently from compilers and assemblers. It converts programs into machine-executable form each time they are executed. A one-pass compiler is a compiler that passes through the source code of each compilation unit only once. Key Terms >> preprocessors >> linkers >> loaders >> translators >> assemblers >> interpreter >> one pass compiler Key Term Quiz 1) ------------------- provide a friendlier representation than a computer s 0s and 1s that simplifies writing and reading programs. 2) ------------------ is a utility program that sets up an executable program in main memory ready for execution. This is the final stage of the compiling/assembly process. 3) ----------------- is the utility program that combines several separately compiled modules into one, resolving internal references between them. 4) ----------------- convert programming source code into language that the computer processor understands. 5) An assembler translates -------------------- language into -------------------- language. Assembly language is one step removed from machine language. Principles of Compiler Design - Unit1 19

6) An -------------------- behaves very differently from compilers and assemblers. It converts programs into machine-executable form each time they are executed. 7) A ------------------- is a compiler that passes through the source code of each compilation unit only once. Multiple Choice Questions 1. Which one of the following that provide a friendlier representation than a computer s 0s and 1s that simplifies writing and reading programs? (a) assemblers (b) loaders (c) linkers (d) interpreters 2. ----------------- that passes through the source code of each compilation unit only once. (a) one pass compiler (b) two pass compiler (c) one pass assembler (d) two pas assembler 3. Which one of the following convert programming source code into language that the computer processor understands? (a) Assemblers (b) loaders (c) translators (d) interpreters 4. -------------- produce input to compilers. (a) preprocessors (b) assemblers (c) loaders (d) linkers 5. --------------- is a program that translates a symbolic version of an instruction into the binary version. (a) preprocessors (b) assemblers (c) loaders (d) linkers 6. A utility program that sets up an executable program in main memory ready for execution is called (a) preprocessors (b) assemblers (c) loaders (d) linkers 7. The utility program that combines several separately compiled modules into one, resolving internal references between them is called (a) preprocessors (b) assemblers (c) loaders (d) linkers Review Questions Two mark Questions 1. Define Preprocessor. What are the functions of preprocessors? 2. Define interpreters. 3. Name minimum 4 compiler construction tools. 4. Define loaders and linkers. 5. Compare one pass and two pass assembler. 6. What do you mean by assembler? Principles of Compiler Design - Unit1 20

7. Define one pass compiler. 8. Compare one pass and two pass compiler. Big Questions 1. Explain briefly about complier construction tools. 2. Discuss briefly about one pass compiler. ------------- END OF FIRST UNIT ----------- Principles of Compiler Design - Unit1 21