Group B Assignment 9. Code generation using DAG. Title of Assignment: Problem Definition: Code generation using DAG / labeled tree.

Similar documents
Group B Assignment 8. Title of Assignment: Problem Definition: Code optimization using DAG Perquisite: Lex, Yacc, Compiler Construction

Group A Assignment 3(2)


Code Generation. M.B.Chandak Lecture notes on Language Processing

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Subject Name: CS2352 Principles of Compiler Design Year/Sem : III/VI

Question Bank. 10CS63:Compiler Design

VIVA QUESTIONS WITH ANSWERS

LECTURE NOTES ON COMPILER DESIGN P a g e 2

Compiler Code Generation COMP360

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

Principles of Compiler Design

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

Compilers and Interpreters

What is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

tokens parser 1. postfix notation, 2. graphical representation (such as syntax tree or dag), 3. three address code

Figure 28.1 Position of the Code generator

Alternatives for semantic processing

CSE P 501 Compilers. Intermediate Representations Hal Perkins Spring UW CSE P 501 Spring 2018 G-1

Intermediate Code Generation

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

INSTITUTE OF AERONAUTICAL ENGINEERING (AUTONOMOUS)

Comp 204: Computer Systems and Their Implementation. Lecture 22: Code Generation and Optimisation

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

What do Compilers Produce?

CST-402(T): Language Processors

Intermediate Representations

G Compiler Construction Lecture 12: Code Generation I. Mohamed Zahran (aka Z)

CSE 401/M501 Compilers

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Intermediate Representations

Crafting a Compiler with C (II) Compiler V. S. Interpreter

COMPILER DESIGN - CODE OPTIMIZATION

Outline. Lecture 17: Putting it all together. Example (input program) How to make the computer understand? Example (Output assembly code) Fall 2002

PRINCIPLES OF COMPILER DESIGN

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

F453 Module 7: Programming Techniques. 7.2: Methods for defining syntax

Compiler Design Aug 1996

Programming Language Processor Theory

SYLLABUS UNIT - I UNIT - II UNIT - III UNIT - IV CHAPTER - 1 : INTRODUCTION CHAPTER - 4 : SYNTAX AX-DIRECTED TRANSLATION TION CHAPTER - 7 : STORA

Principles of Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

CS 406/534 Compiler Construction Putting It All Together

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

Intermediate representation

List of Figures. About the Authors. Acknowledgments

Intermediate Code & Local Optimizations

UNIT IV INTERMEDIATE CODE GENERATION

A programming language requires two major definitions A simple one pass compiler

Undergraduate Compilers in a Day

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Principles of Compiler Design

Optimization. ASU Textbook Chapter 9. Tsan-sheng Hsu.

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING ACADEMIC YEAR / EVEN SEMESTER

Semantic Analysis computes additional information related to the meaning of the program once the syntactic structure is known.

Intermediate Code & Local Optimizations. Lecture 20

Introduction to Compilers

VALLIAMMAI ENGINEERING COLLEGE

CS 132 Compiler Construction

G.PULLAIH COLLEGE OF ENGINEERING & TECHNOLOGY

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

Stating the obvious, people and computers do not speak the same language.

Course introduction. Advanced Compiler Construction Michel Schinz

Compiler Design (40-414)

CS606- compiler instruction Solved MCQS From Midterm Papers

Low-level optimization

Appendix A The DL Language

1. (a) What are the closure properties of Regular sets? Explain. (b) Briefly explain the logical phases of a compiler model. [8+8]

CSE 504: Compiler Design. Code Generation

CS5363 Final Review. cs5363 1

4. An interpreter is a program that

Trustworthy Compilers

COP 3402 Systems Software Syntax Analysis (Parser)

Gujarat Technological University Sankalchand Patel College of Engineering, Visnagar B.E. Semester VII (CE) July-Nov Compiler Design (170701)

WACC Report. Zeshan Amjad, Rohan Padmanabhan, Rohan Pritchard, & Edward Stow

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Earlier edition Dragon book has been revised. Course Outline Contact Room 124, tel , rvvliet(at)liacs(dot)nl

Language Translation. Compilation vs. interpretation. Compilation diagram. Step 1: compile. Step 2: run. compiler. Compiled program. program.

CS415 Compilers. Intermediate Represeation & Code Generation

Compiler Optimization

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Compilers Crash Course

Object-oriented Compiler Construction

When do We Run a Compiler?

Overview of a Compiler

HOLY ANGEL UNIVERSITY COLLEGE OF INFORMATION AND COMMUNICATIONS TECHNOLOGY COMPILER THEORY COURSE SYLLABUS

The View from 35,000 Feet

Compilers. Intermediate representations and code generation. Yannis Smaragdakis, U. Athens (original slides by Sam

Semantic analysis and intermediate representations. Which methods / formalisms are used in the various phases during the analysis?

CSc 453. Code Generation I Christian Collberg. Compiler Phases. Code Generation Issues. Slide Compilers and Systems Software.

Intermediate Representations. Reading & Topics. Intermediate Representations CS2210

LANGUAGE PROCESSORS. Introduction to Language processor:

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Lecture Outline. Intermediate code Intermediate Code & Local Optimizations. Local optimizations. Lecture 14. Next time: global optimizations

VETRI VINAYAHA COLLEGE OF ENGINEERING AND TECHNOLOGY

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

Introduction. CSc 453. Compilers and Systems Software. 19 : Code Generation I. Department of Computer Science University of Arizona.

UNIT-V. Symbol Table & Run-Time Environments Symbol Table

Transcription:

Group B Assignment 9 Att (2) Perm(3) Oral(5) Total(10) Sign Title of Assignment: Code generation using DAG. 9.1.1 Problem Definition: Code generation using DAG / labeled tree. 9.1.2 Perquisite: Lex, Yacc, Compiler Construction 9.1.3 Relevant Theory / Literature Survey: 9.1.3.1 Machine code generator:- 1. A final phase of the compiler is machine code generation. 2. Target program (or object code) is generated. 3. Usually target language is a machine code or an assembly code. The input to the code generator typically consists of a parse tree or an abstract syntax tree. The tree is converted into a linear sequence of instructions, usually in an intermediate language such as three address code. Further stages of compilation may or may not be referred to as "code generation", depending on whether they involve a significant change in the representation of the program. (For example, a peephole optimization pass would not likely be called "code generation", although a code generator might incorporate a peephole optimization pass.) SNJB s Late Sau. KBJ College Of Engineering, Chandwad 1

Major tasks or issues in code generation In addition to the basic conversion from an intermediate representation into a linear sequence of machine instructions, a typical code generator tries to optimize the generated code in some way. The generator may try to use faster instructions, use fewer instructions, exploit available registers, and avoid redundant computations. Tasks which are typically part of a sophisticated compiler's "code generation" phase include: 1) Input to code generator 2) Target program 3) Memory management 4) Instruction selection 5) Register allocation 6) Choice of evaluation order 7) Approaches to code generation. Input to Code Generation Phase (or Input to Code Generator): We know that input to code generator is the output of Intermediate Code Generator. Now output of intermediate code generator phase is: 1) Intermediate Representation (Intermediate code): This intermediate code is generated by front end of compiler. Intermediate code produced by intermediate phase can be any of the following as Qudraples Triples 2) Information in Symbol Table: This information is used to determine runtime address of all data objects which are specified by their names in the input of code generation phase i.e. Intermediate code. Thus, before code generation phase, source program is processed by Lexical, Syntax, Semantic Analyzers, Intermediate code generator and may or may not be Code Optimizer. Because code optimization can be implemented before and/or after and/or during code generation phase of the compiler. Thus, while designing code generator it is assumed that i. Source program is already analyzed, parsed and some equivalent intermediate code is generated. ii. Type checking is already performed. SNJB s Late Sau. KBJ College Of Engineering, Chandwad 2

iii. Intermediate code has some additional type conversion functions. iv. All syntactic and semantic errors are detected. v. Intermediate code given as input to Code Generation phase is error free. Intermediate code generated by intermediate code generation phase can be either of a possible types of intermediate representations as i. Linear representation as postfix expressions. ii. Three address intermediate representation as quadruples. iii. Virtual machine representation as stack machine code. iv. Graphical intermediate representation as parse tree pr DAGS. Now code generator algorithms are different for different intermediate representation of source program i.e. code generator algorithms has to be designed according to intermediate representation generated by intermediate code generator. Instruction selection: which instructions to use. Instruction scheduling: in which order to put those instructions. Scheduling is a speed optimization that can have a critical effect on pipelined machines. Register allocation: the allocation of variables to processor registers. Debug data generation if required so the code can be debugged. Instruction selection is typically carried out by doing a recursive postorder traversal on the abstract syntax tree, matching particular tree configurations against templates; for example, the tree W := ADD(X,MUL(Y,Z)) might be transformed into a linear sequence of instructions by recursively generating the sequences for t1 := X and t2 := MUL(Y,Z), and then emitting the instruction ADD W, t1, t2. In a compiler that uses an intermediate language, there may be two instruction selection stages one to convert the parse tree into intermediate code, and a second phase much later to convert the intermediate code into instructions from the instruction set of the target machine. This second phase does not require a tree traversal; it can be done linearly, and typically involves a simple replacement of intermediate-language operations with their corresponding opcodes. However, if the compiler is actually a language translator (for example, one that converts Eiffel to C), then the second code -generation phase may involve building a tree from the linear intermediate code. Runtime code generation When code generation occurs at runtime, as in just-in-time compilation (JIT), it is important that the entire process be efficient with respect to space and time. For example, SNJB s Late Sau. KBJ College Of Engineering, Chandwad 3

when regular expressions are interpreted and used to generate code at runtime, a nondetermistic finite state machine is often generated instead of a deterministic one, because usually the former can be created more quickly and occupies less memory space than the latter. Despite its generally generating less efficient code, JIT code generation can take advantage of profiling information that is available only at runtime. The fundamental task of taking input in one language and producing output in a non-trivially different language can be understood in terms of the core transformational operations of formal language theory. Consequently, some techniques that were originally developed for use in compilers have come to be employed in other ways as well. For example, YACC (Yet Another Compiler Compiler) takes input in Backus -Naur form and converts it to a parser in C. Though it was originally created for automatic generation of a parser for a compiler, yacc is also often used to automate writing code that needs to be modified each time specifications are changed. Many integrated development environments (IDEs) support some form of automatic source code generation, often using algorithms in common with compiler code generators, although commonly less complicated. Reflection In general, a syntax and semantic analyzer tries to retrieve the structure of the program from the source code, while a code generator uses this structural information (e.g., data types) to produce code. In other words, the former adds information while the latter loses some of the information. One consequence of this information loss is that reflection becomes difficult or even impossible. To counter this problem, code generators often embed syntactic and semantic information in addition to the code necessary for execution. We will now see how the intermediate code is transformed into target object code (assembly code, in this case). Directed Acyclic Graph Directed Acyclic Graph (DAG) is a tool that depicts the structure of basic blocks, helps to see the flow of values flowing among the basic blocks, and offers optimization too. DAG provides easy transformation on basic blocks. DAG can be understood here: Leaf nodes represent identifiers, names or constants. Interior nodes represent operators. Interior nodes also represent the results of expressions or the identifiers/name where the values are to be stored or assigned. SNJB s Late Sau. KBJ College Of Engineering, Chandwad 4

Example: t0 = a + b t1 = t0 + c d = t0 + t1 [t0 = a + b] [t1 = t0 + c] [d = t0 + t1] Peephole Optimization This optimization technique works locally on the source code to transform it into an optimized code. By locally, we mean a small portion of the code block at hand. These methods can be applied on intermediate codes as well as on target codes. A bunch of statements is analyzed and are checked for the following possible optimization: Redundant instruction elimination At source code level, the following can be done by the user: int add_ten(int x) int add_ten(int x) int add_ten(int x) int add_ten(int x) int y, z; int y; int y = 10; return x + 10; y = 10; y = 10; return x + y; z = x + y; y = x + y; return z; return y; At compilation level, the compiler searches for instructions redundant in nature. Multiple loading and storing of instructions may carry the same meaning even if some of them are removed. For example: MOV x, R0 MOV R0, R1 We can delete the first instruction and re-write the sentence as: MOV x, R1 SNJB s Late Sau. KBJ College Of Engineering, Chandwad 5

Unreachable code Unreachable code is a part of the program code that is never accessed because of programming constructs. Programmers may have accidently written a piece of code that can never be reached. Example: void add_ten(int x) return x + 10; printf( value of x is %d, x); In this code segment, the printf statement will never be executed as the program control returns back before it can execute, hence printf can be removed. Flow of control optimization There are instances in a code where the program control jumps back and forth without performing any significant task. These jumps can be removed. Consider the following chunk of code:... MOV R1, R2 GOTO L1... L1 : GOTO L2 L2 : INC R1 In this code,label L1 can be removed as it passes the control to L2. So instead of jumping to L1 and then to L2, the control can directly reach L2, as shown below:... MOV R1, R2 GOTO L2... L2 : INC R1 Algebraic expression simplification There are occasions where algebraic expressions can be made simple. For example, the expression a = a + 0 can be replaced by a itself and the expression a = a + 1 can simply be replaced by INC a. SNJB s Late Sau. KBJ College Of Engineering, Chandwad 6

Strength reduction There are operations that consume more time and space. Their strength can be reduced by replacing them with other operations that consume less time and space, but produce the same result. For example, x * 2 can be replaced by x << 1, which involves only one left shift. Though the output of a * a and a 2 is same, a 2 is much more efficient to implement. Accessing machine instructions The target machine can deploy more sophisticated instructions, which can have the capability to perform specific operations much efficiently. If the target code can accommodate those instructions directly, that will not only improve the quality of code, but also yield more efficient results. Code Generator A code generator is expected to have an understanding of the target machine s runtime environment and its instruction set. The code generator should take the following things into consideration to generate the code: Target language : The code generator has to be aware of the nature of the target language for which the code is to be transformed. That language may facilitate some machine-specific instructions to help the compiler generate the code in a more convenient way. The target machine can have either CISC or RISC processor architecture. IR Type : Intermediate representation has various forms. It can be in Abstract Syntax Tree (AST) structure, Reverse Polish Notation, or 3-address code. Selection of instruction : The code generator takes Intermediate Representation as input and converts (maps) it into target machine s instructi on set. One representation can have many ways (instructions) to convert it, so it becomes the responsibility of the code generator to choose the appropriate instructions wisely. Register allocation : A program has a number of values to be maintained during the execution. The target machine s architecture may not allow all of the values to be kept in the CPU memory or registers. Code generator decides what values to keep in the registers. Also, it decides the registers to be used to keep these values. Ordering of instructions : At last, the code generator decides the order in which the instruction will be executed. It creates schedules for instructions to execute them. SNJB s Late Sau. KBJ College Of Engineering, Chandwad 7

Descriptors The code generator has to track both the registers (for availability) and addresses (location of values) while generating the code. For both of them, the following two descriptors are used: Register descriptor : Register descriptor is used to inform the code generator about the availability of registers. Register descriptor keeps track of values stored in each register. Whenever a new register is required during code generation, this descriptor is consulted for register availability. Address descriptor : Values of the names (identifiers) used in the program might be stored at different locations while in execution. Address descriptors are used to keep track of memory locations where the values of identifiers are stored. These locations may include CPU registers, heaps, stacks, memory or a combination of the mentioned locations. Code generator keeps both the descriptor updated in real-time. For a load statement, LD R1, x, the code generator: updates the Register Descriptor R1 that has value of x and updates the Address Descriptor (x) to show that one instance of x is in R1. Code Generation Basic blocks comprise of a sequence of three-address instructions. Code generator takes these sequence of instructions as input. Note : If the value of a name is found at more than one place (register, cache, or memory), the register s value will be preferred over the cache and main memory. Likewise cache s value will be preferred over the main memory. Main memory is barely given any preference. getreg : Code generator uses getreg function to determine the status of available registers and the location of name values. getreg works as follows: If variable Y is already in register R, it uses that register. Else if some register R is available, it uses that register. Else if both the above options are not possible, it chooses a register that requires minimal number of load and store instructions. For an instruction x = y OP z, the code generator may perform the following actions. Let us assume that L is the location (preferably register) where the output of y OP z is to be saved: Call function getreg, to decide the location of L. SNJB s Late Sau. KBJ College Of Engineering, Chandwad 8

Determine the present location (register or memory) of y by consulting the Address Descriptor of y. If y is not presently in register L, then generate the following instruction to copy the value of y to L: MOV y, L where y represents the copied value of y. Determine the present location of z using the same method used in step 2 for y and generate the following instruction: OP z, L where z represents the copied value of z. Now L contains the value of y OP z, that is intended to be assigned to x. So, if L is a register, update its descriptor to indicate that it contains the value of x. Update the descriptor of x to indicate that it is stored at location L. If y and z has no further use, they can be given back to the system. Other code constructs like loops and conditional statements are transformed into assembly language in general assembly way. 9.1.4 Assignment Questions: 1. Define code generation. 2. Define stack allocation. 3. What is basic block? 4. How do you assign registers during code generation? 5. What are the forms of the output of the code generator? 6. What are the issues in the design of code generator? 7. Give the two standard storage allocation strategies. 8. What do you meant by address descriptors? 9. What do you meant by register descriptor? 10. Give the representation of intermediate language. SNJB s Late Sau. KBJ College Of Engineering, Chandwad 9