LECTURE NOTES ON COMPILER DESIGN P a g e 2

Similar documents
SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

CST-402(T): Language Processors

SRM UNIVERSITY FACULTY OF ENGINEERING AND TECHNOLOGY SCHOOL OF COMPUTER SCIENCE AND ENGINEERING COURSE PLAN

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Subject Name: CS2352 Principles of Compiler Design Year/Sem : III/VI

G.PULLAIH COLLEGE OF ENGINEERING & TECHNOLOGY

Question Bank. 10CS63:Compiler Design

Compilers. Prerequisites

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING ACADEMIC YEAR / EVEN SEMESTER

2068 (I) Attempt all questions.


SRM UNIVERSITY FACULTY OF ENGINEERING AND TECHNOLOGY SCHOOL OF COMPUTING DEPARTMENT OF CSE COURSE PLAN

Gujarat Technological University Sankalchand Patel College of Engineering, Visnagar B.E. Semester VII (CE) July-Nov Compiler Design (170701)

COMPILER DESIGN LECTURE NOTES

SRM UNIVERSITY FACULTY OF ENGINEERING AND TECHNOLOGY

Compiler Design Overview. Compiler Design 1

Principles of Programming Languages [PLP-2015] Detailed Syllabus


CS606- compiler instruction Solved MCQS From Midterm Papers

Compiler Design Aug 1996

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

VALLIAMMAI ENGINEERING COLLEGE

CS131: Programming Languages and Compilers. Spring 2017

1. (a) What are the closure properties of Regular sets? Explain. (b) Briefly explain the logical phases of a compiler model. [8+8]

BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, Pilani Pilani Campus Instruction Division

GUJARAT TECHNOLOGICAL UNIVERSITY

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILING

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

VIVA QUESTIONS WITH ANSWERS

COMPILER DESIGN. For COMPUTER SCIENCE

CA Compiler Construction

List of Figures. About the Authors. Acknowledgments

Torben./Egidius Mogensen. Introduction. to Compiler Design. ^ Springer

Working of the Compilers

Compiler Design (40-414)

Formal Languages and Compilers Lecture I: Introduction to Compilers

KALASALINGAM UNIVERSITY ANAND NAGAR, KRISHNAN KOIL DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING ODD SEMESTER COURSE PLAN

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

Introduction to Syntax Analysis. Compiler Design Syntax Analysis s.l. dr. ing. Ciprian-Bogdan Chirila

CS5363 Final Review. cs5363 1

INSTITUTE OF AERONAUTICAL ENGINEERING

DEPARTMENT OF INFORMATION TECHNOLOGY / COMPUTER SCIENCE AND ENGINEERING UNIT -1-INTRODUCTION TO COMPILERS 2 MARK QUESTIONS

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design

VALLIAMMAI ENGNIEERING COLLEGE SRM Nagar, Kattankulathur

SYLLABUS UNIT - I UNIT - II UNIT - III UNIT - IV CHAPTER - 1 : INTRODUCTION CHAPTER - 4 : SYNTAX AX-DIRECTED TRANSLATION TION CHAPTER - 7 : STORA

QUESTIONS RELATED TO UNIT I, II And III

COMPILER DESIGN LEXICAL ANALYSIS, PARSING

1. Explain the input buffer scheme for scanning the source program. How the use of sentinels can improve its performance? Describe in detail.

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

PRINCIPLES OF COMPILER DESIGN

Time : 1 Hour Max Marks : 30

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

Name of chapter & details

CJT^jL rafting Cm ompiler

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

CS 406/534 Compiler Construction Putting It All Together

Semantic Analysis. Lecture 9. February 7, 2018

DEPARTMENT OF INFORMATION TECHNOLOGY AUTOMATA AND COMPILER DESIGN. B.Tech-IT, III Year -I Sem

LANGUAGE TRANSLATORS

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

HOLY ANGEL UNIVERSITY COLLEGE OF INFORMATION AND COMMUNICATIONS TECHNOLOGY COMPILER THEORY COURSE SYLLABUS

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Syntax Analysis. Chapter 4

MidTerm Papers Solved MCQS with Reference (1 to 22 lectures)

4. An interpreter is a program that

Group B Assignment 9. Code generation using DAG. Title of Assignment: Problem Definition: Code generation using DAG / labeled tree.

COP 3402 Systems Software. Lecture 4: Compilers. Interpreters

Downloaded from Page 1. LR Parsing

GUJARAT TECHNOLOGICAL UNIVERSITY

INSTITUTE OF AERONAUTICAL ENGINEERING (AUTONOMOUS)

CS6660 COMPILER DESIGN L T P C

LANGUAGE PROCESSORS. Introduction to Language processor:

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Compiling Regular Expressions COMP360

QUESTION BANK CHAPTER 1 : OVERVIEW OF SYSTEM SOFTWARE. CHAPTER 2: Overview of Language Processors. CHAPTER 3: Assemblers

ST. XAVIER S COLLEGE

Sardar Vallabhbhai Patel Institute of Technology (SVIT), Vasad M.C.A. Department COSMOS LECTURE SERIES ( ) (ODD) Code Optimization

Introduction to Compiler Construction

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

CS Compiler Construction West Virginia fall semester 2014 August 18, 2014 syllabus 1.0

Evaluation Scheme L T P Total Credit Theory Mid Sem Exam

DOID: A Lexical Analyzer for Understanding Mid-Level Compilation Processes

COMP455: COMPILER AND LANGUAGE DESIGN. Dr. Alaa Aljanaby University of Nizwa Spring 2013

SEM / YEAR : VI / III CS2352 PRINCIPLES OF COMPLIERS DESIGN UNIT I - LEXICAL ANALYSIS PART - A

Compilers. Computer Science 431

Pioneering Compiler Design

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.

CSCI 565 Compiler Design and Implementation Spring 2014

UNIT -2 LEXICAL ANALYSIS

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Introduction to Compiler Construction

Life Cycle of Source Program - Compiler Design

ECE 468/573 Midterm 1 October 1, 2014

UNIT IV INTERMEDIATE CODE GENERATION

Alternatives for semantic processing

UNIT I INTRODUCTION TO COMPILING

Introduction to Compiler Construction

Transcription:

LECTURE NOTES ON COMPILER DESIGN P a g e 1 (PCCS4305) COMPILER DESIGN KISHORE KUMAR SAHU SR. LECTURER, DEPARTMENT OF INFORMATION TECHNOLOGY ROLAND INSTITUTE OF TECHNOLOGY, BERHAMPUR

LECTURE NOTES ON COMPILER DESIGN P a g e 2 PCCS4305 Compiler Design (3-0-0) MODULE 1 (Lecture hours: 13) Introduction: Overview and phases of compilation. (2-hours) Lexical Analysis: Non-deterministic and deterministic finite automata (NFA & DFA), regular grammar, regular expressions and regular languages, design of a lexical analyser as a DFA, lexical analyser generator. (3-hours) Syntax Analysis: Role of a parser, context free grammars and context free languages, parse trees and derivations, ambiguous grammar. Top Down Parsing: Recursive descent parsing, LL(1) grammars, non-recursive predictive parsing, error reporting and recovery. Bottom Up Parsing: Handle pruning and shift reduces parsing, SLR parsers and construction or SLR parsing tables, LR(1) parsers and construction of LR(1) parsing tables, LALR parsers and construction of efficient LALR parsing tables, parsing using ambiguous grammars, error reporting and recovery, parser generator. (8-hours) MODULE 2 (Lecture hours: 14) Syntax Directed Translation: Syntax directed definitions (SDD), inherited and synthesized attributes, dependency graphs, evaluation orders for SDD, semantic rules, application of syntax directed translation. (5-hours) Symbol Table: Structure and features of symbol tables, symbol attributes and scopes. (2-hours) Intermediate Code Generation: DAG for expressions, three address codes - quadruples and triples, types and declarations, translation of expressions, array references, type checking and conversions, translation of Boolean expressions and control flow statements, back patching, intermediate code generation for procedures. (7-hours) MODULE 3 (Lecture hours: 8) Run Time Environment: storage organizations, static and dynamic storage allocations, stack allocation, handlings of activation records for calling sequences. (3-hours) Code Generations: Factors involved, registers allocation, simple code generation using stack allocation, basic blocks and flow graphs, simple code generation using flow graphs. (3-hours) Elements of Code Optimization: Objective, peephole optimization, concepts of elimination of local common sub-expressions, redundant and un-reachable codes, basics of flow of control optimization. (2-hours) Text Book: Compilers Principles, Techniques and Tools, Authors: Alfred V. Aho, Monica S. Lam, Ravi Sethi and Jeffrey D. Ullman, Publisher: Pearson Lecture Manual Lecture 01: Overview of Compiler Lecture 02: Phases of compilation. Lecture 03: Non-deterministic and deterministic finite automata (NFA & DFA), Lecture 04: Regular grammar, regular expressions and regular languages. Lecture 05: Design of a lexical analyser as a DFA, lexical analyser generator. Lecture 06:Role of a parser, context free grammars and context free languages, parse trees and derivations, ambiguous grammar. Lecture 07: Top Down Parsing: Recursive descent parsing, LL(1) grammars, Lecture 08: Non-recursive predictive parsing, error reporting and recovery. Lecture 09: Bottom Up Parsing: Handle pruning and shift reduces parsing, SLR parsers and construction or SLR parsing tables, Lecture 10: LR(1) parsers and construction of LR(1) parsing tables, Lecture 11: LALR parsers and construction of efficient LALR parsing tables, Lecture 12: Parsing using ambiguous grammars, Lecture 13: Error reporting and recovery, parser generator. Lecture 14: Syntax directed definitions (SDD), Lecture 15: Inherited and synthesized attributes, Lecture 16: Dependency graphs, Lecture 17: Evaluation orders for SDD, Lecture 18: Semantic rules, application of syntax directed translation. Lecture 19: Structure and features of symbol tables, Lecture 20: Symbol attributes and scopes. Lecture 21: DAG for expressions, Lecture 22: Three address codes - quadruples and triples, Lecture 23: Types and declarations, translation of expressions, Lecture 24: Array references, type checking and conversions, Lecture 25: Translation of Boolean expressions and control flow statements, Lecture 26: Back patching, Lecture 27: Intermediate code generation for procedures. Lecture 28: Storage organizations, static and dynamic storage allocations, Lecture 29: Stack allocation, Lecture 30: Handlings of activation records for calling sequences. Lecture 31: Factors involved in code generation, registers allocation, Lecture 32: Simple code generation using stack allocation, Lecture 33: Basic blocks and flow graphs, simple code generation using flow graphs. Lecture 34: Objective, peephole optimization, concepts of elimination of local common sub-expressions, redundant and un-reachable codes, Lecture 35: Basics of flow of control optimization. BY: KISHORE KUMAR SAHU, DEPT OF INFORMATION TECHNOLOGY, RIT, BERHAMPUR.

LECTURE NOTES ON COMPILER DESIGN P a g e 3 CHAPTER-01 Hybrid Compiler Some languages like JAVA make use of INTRODUCTION TO COMPILERS compiler as well as interpreter. First the source code is converted to byte WHY TO STUDY COMPILER DESIGN code by means of a translator i.e. To study the constructs of modern programming languages and their compiler and then a virtual machine implementation in machine language of a typical computer. i.e. interpreter runs the bytecode to To study the development of tools that can be used in the construction of give the output. The compiler used certain translator components. over here are called as hybrid compilers. OVERVIEW OF COMPILERS Assembler Translator If the source language is assembly language and the target language is machine A translator is a program that takes as input a program written in one programming language, then the translator is called an assembler. language i.e. source language and produces as output a program in another language i.e. object or target language. Preprocessor If the source language is a high-level language and the target language is a different Compiler high-levepreprocessor. C++ etc and the object language is a low-level language then the translatorr is called as preprocessor. E.g. FORTRAN If the source language is a high-level language like C, language like assembly or machine language, then NEED OF TRANSLATORS such a translator is called a compiler. Programming in machine language is difficult because we communicate directly with Executing a program written in high-level computer in terms of bits, registers, and very primitive machine operations. Since programming language is basically a two step code in machine language is combinations of 0's and 1's, so it is difficult to differentiate between operator and operands. Hence it is impossible to modify the process: codes in machine language. Some more reasons for the need of translators are as follows: The source program must first be compiled that is translated into Symbolic Assembly Language : Assembly language is a symbolic high-level the object program. language thatt uses mnemonics names for operator and operands. The Object program is loaded into computer cannot understand these mnemonics; hence a translator is memory and executed. required to translate these symbols to binary code. This translatorr is called as assembler. Interpreter Translator that transforms a programming language into a simplified language called Macros : Assembly language has feature called macro that represents text intermediate code, which can be directly executed using a program called an replacement capability. In case of macro, the assembly codes are substituted interpreter. E.g. BASIC, SNOBOL, JCL etc. for a macro name. It means, without which the code represented by the macro need to inserted wherever necessary. So the macro processorr replaces It is smaller than compilers but suffers the macro body with appropriate arguments whenever invoked. with the disadvantage that, it is slower in execution of intermediate code that High-Level Languages : It is difficult to program in assembly language also, the object program in case of compiler. as the programmer must be aware with the instruction formats, data representationn and complex operand mnemonics. To avoid these

LECTURE NOTES ON COMPILER DESIGN P a g e 4 disadvantages, high-level languages has been developed that allows a Intermediate code are similar to that of the assembly code but differ in a way that programmer to express algorithms in a more natural notation that avoids they do not specify the register to be used for each operation. many of the details of how a specific computer functions. But it is far from being understood by the computer. Hence we have a compiler, yet another Code Optimization: An optional phase designed to improve the intermediatee code so program that helps in translating the high-level language to machine that the ultimate object program runs faster and/or takes less space. The output is language. A compiler in complex in structure than assembler. another intermediate code that does the same job as the original but perhaps in a way that saves time and/or space. STRUCTURE OF A COMPILER The basic function of a compiler to translate the high-level code to machine Code Generation: Last phase of compilation that produces the object code by deciding instruction, which can never be thought as a single step process. To avoid complexity on the memory locations for data, selecting code to accesss each datum, and selecting in the structure of a compiler, it is divided into series of sub processes called phases. A the registers in which each computation is to be done. Designing an effective code phase is a logical cohesive operation that takes as input one representation of the generator is a difficult part of compiler design, both practically and theoretically. source program and produces as output another representation. Table Management: Otherwise called bookkeeping, that keeps track of the names used The figure shows the phases of a compiler. by the program and records essential information about each, such as its type The work of each phase is as follows: (integer, real, etc). The data structure used to record this information is called as symbol table. Lexical Analysis: Otherwise called as scanner is the first phase of compilation. It separates Errorr Handler: All the phases are linked to error handler as error may be encountered characters of the source code into groups at any phase in the process of compilation. These errors may be like that logically belongs together; these groups Lexical error due to a misspelled token are called tokens. The usual tokens are Syntax error due to a sentence not in accordance with the programming keywords, identifier, operators etc. The set of language syntax. tokens generated by the scanner is passed on Intermediate code generator error due to mismatch in the type of the to the next phase with other relevant operands information. Code optimization error due to a code being unreachable. Code generation error due to a compiler generated constant, too big to Syntax Analysis: Otherwise called as parser is fit into a computer word. the second phase of compilation. It groups Symbol table entry error due to multiple entry of an identifier. tokens together into syntactic structures like expressions i.e. A+B, that could be furtherr Passes: Portions of one or more phases are combined into a module called a pass. A combined to form statements. Often the pass reads the source program or the output of the previous pass, makes the syntactic structure is represented as a parse transformations specified by its phases, and writes outpu into an intermediate file, tree whose nodes are the token form the scanner. which may then be read by a subsequent pass. The number of passes and the grouping of phases depends on the language structure Semantic Analysis: After the parse tree has been generated, the types of the operators and machine construction also. A minimum of two passes is required to compile most in an expressions are checked. This is called as type checking. If there is a mismatch in of the language codes as they allow use of a variable before its declaration. A multislower in the type then the data type are same by converting the lower data type to the higher pass compiler is preferred on machine that has less memory and is even one, this is called as coercion. execution than that of a single pass compiler that runs on machine having large memory and faster in execution. This is because a multi-pass compiler reads and Intermediate Code Generation: Accepts syntactic structures form the parser and writes an intermediatee file. produces stream of simple instructions. There are many ways of representation of intermediate code but the common is one that uses one operator and few operands.

The reduction in number of passes can be accomplished by a technique called as backpatching. During the compilation process, the output of some phases cannot be determined as they depend on the inputs of the later phases. To overcome this huddle the phase can generate output with slots that can be filled later after more the input can be read. E.g. GOTO statement implementation in case of a single pass compiler we make use of backpatching technique. COMPILER WRITING TOOLS There are a number of tools that are used to construct a compiler called as compilercompiler, compiler-generator or translator-writing system. They produce compiler form some form of specification of a source language and target machine. The input specification for these systems may contain: A description of the lexical and syntactic structure of the source language. A description of what output is to be generated for each source language construct, and A description of the target machine. There is a trade-off between how much of the work the compiler-compiler can do automatically for the user to the flexibility of the system. E.g. the rules for identifier name is usually same for all the compiler, but if the language is flexible enough to add more rule for naming identifier then very less can be done by the compiler-complier. The supports given by compiler-compiler are as follows: Scanner generator: A scanner can be most easily designed by giving the corresponding regular expressions for the tokens. Parser generator: A description of the syntax used in the language can be made input to the compiler-compiler in the form of context free grammar to obtain a parser. A parser is a unique phase in the compiler design and hence a mechanically generated compiler is more reliable that produced by hand. LECTURE NOTES ON COMPILER DESIGN P a g e 5 Compiler-construction toolkits: that provide an integrated set of routines for constructing various phases of a compiler. Facilities for the code generation: The mapping of the high-level language to that of the assembly, intermediate or object code is provided to the compilercompiler, so the routine may be called at the correct time in generation. They also specifies the decision table that select the object code. BOOTSTRAPPING Three languages characterize a compiler are as follows: The source language (X) The target language (Y) The language in which the compiler is written (Z) The compilers are represented as C Z XY, where X, Y and Z are as above. It is also possible to produce object code for a different machine while running on a machine. This phenomenon is called as cross-compiler. Bootstrapping is process of creation of a new compiler for a different machine by making use of existing compiler. E.g. let take us have two machine A and B. And we are to create a compiler for machine B by making use of a compiler for machine A. C S LA ------> C A SA --------->C A LA //Compiler for machine A written in A for language L. C L LB ------> C A LA --------->C A LB //Compiler for machine B written in A for language L. C L LB ------> C A LB --------->C B LB //Compiler for machine B written in B for language L. Syntax-directed translation engines: that produce collections of routines for walking a parse tree and generating intermediate code. Code-generator generators: that produce a code generator from a collection of rules for translating each operation of the intermediate language into the machine language for a target machine. The above equations are based on the principle can be listed as below: C A XY ------> C A AB --------->C B XY This is how we obtain the compiler for a new machine form an existing compiler of a different machine. Data-flow analysis engines: that facilitate the gathering of information about how values are transmitted from one part of a program to each other part. Data-flow analysis is a key part of code optimization. BY: KISHORE KUMAR SAHU, DEPT OF INFORMATION TECHNOLOGY, RIT, BERHAMPUR.