UNIT-V. Symbol Table & Run-Time Environments Symbol Table

Similar documents
CODE GENERATION Monday, May 31, 2010

CSE 504: Compiler Design. Code Generation

Example. program sort; var a : array[0..10] of integer; procedure readarray; : function partition (y, z :integer) :integer; var i, j,x, v :integer; :

Run-time Environments. Lecture 13. Prof. Alex Aiken Original Slides (Modified by Prof. Vijay Ganesh) Lecture 13

Run-time Environments

Run-time Environments

Compiler Optimization Techniques

Code Generation (#')! *+%,-,.)" !"#$%! &$' (#')! 20"3)% +"#3"0- /)$)"0%#"

Group B Assignment 8. Title of Assignment: Problem Definition: Code optimization using DAG Perquisite: Lex, Yacc, Compiler Construction

Compilers. 8. Run-time Support. Laszlo Böszörmenyi Compilers Run-time - 1

G Programming Languages - Fall 2012

Run-Time Environments/Garbage Collection

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11

Machine-Independent Optimizations

Weeks 6&7: Procedures and Parameter Passing

G Compiler Construction Lecture 12: Code Generation I. Mohamed Zahran (aka Z)

Tour of common optimizations

COMPILER DESIGN - RUN-TIME ENVIRONMENT

Notes on the Exam. Question 1. Today. Comp 104:Operating Systems Concepts 11/05/2015. Revision Lectures (separate questions and answers)

UNIT-4 (COMPILER DESIGN)

CSE 504. Expression evaluation. Expression Evaluation, Runtime Environments. One possible semantics: Problem:

CS201 - Introduction to Programming Glossary By

Compilers and Code Optimization EDOARDO FUSELLA

Comp 204: Computer Systems and Their Implementation. Lecture 25a: Revision Lectures (separate questions and answers)

Concepts Introduced in Chapter 7

Run Time Environments

Run-time Environments - 3

Short Notes of CS201

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

Project. there are a couple of 3 person teams. a new drop with new type checking is coming. regroup or see me or forever hold your peace

Intermediate Representations & Symbol Tables

CS61C : Machine Structures

UNIT-II. Part-2: CENTRAL PROCESSING UNIT

Programming Languages

Run-Time Environments

Module 27 Switch-case statements and Run-time storage management

! Those values must be stored somewhere! Therefore, variables must somehow be bound. ! How?

Memory Allocation. Static Allocation. Dynamic Allocation. Dynamic Storage Allocation. CS 414: Operating Systems Spring 2008

Run-time Environments -Part 3

Goals of Program Optimization (1 of 2)

Principles of Programming Languages Topic: Scope and Memory Professor Louis Steinberg Fall 2004

Procedure and Object- Oriented Abstraction

Programming Languages Third Edition. Chapter 10 Control II Procedures and Environments

Advanced Programming & C++ Language

Compiler Construction

Compiler Construction

Data Structure for Language Processing. Bhargavi H. Goswami Assistant Professor Sunshine Group of Institutions

G Programming Languages - Fall 2012

Compiler Theory. (Intermediate Code Generation Abstract S yntax + 3 Address Code)

Scope, Functions, and Storage Management

NOTE: Answer ANY FOUR of the following 6 sections:

register allocation saves energy register allocation reduces memory accesses.

The Procedure Abstraction

Binding and Storage. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Data Structure. IBPS SO (IT- Officer) Exam 2017

Compilers and computer architecture: A realistic compiler to MIPS

Run Time Environment. Activation Records Procedure Linkage Name Translation and Variable Access

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design

Question 1. Notes on the Exam. Today. Comp 104: Operating Systems Concepts 11/05/2015. Revision Lectures

Informatica 3 Marcello Restelli

Older geometric based addressing is called CHS for cylinder-head-sector. This triple value uniquely identifies every sector.

Heap Management. Heap Allocation

Qualifying Exam in Programming Languages and Compilers

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013

Group B Assignment 9. Code generation using DAG. Title of Assignment: Problem Definition: Code generation using DAG / labeled tree.

Compiler Architecture

12/4/18. Outline. Implementing Subprograms. Semantics of a subroutine call. Storage of Information. Semantics of a subroutine return

Optimization. ASU Textbook Chapter 9. Tsan-sheng Hsu.

CMa simple C Abstract Machine

Compiler Theory. (Semantic Analysis and Run-Time Environments)


Runtime. The optimized program is ready to run What sorts of facilities are available at runtime

CSCI 171 Chapter Outlines

Programs in memory. The layout of memory is roughly:

The basic operations defined on a symbol table include: free to remove all entries and free the storage of a symbol table

Topic 7: Activation Records

Intermediate Representation (IR)

Typical Runtime Layout. Tiger Runtime Environments. Example: Nested Functions. Activation Trees. code. Memory Layout

Structure of Programming Languages Lecture 10

5. Semantic Analysis!

Chapter 5. Names, Bindings, and Scopes

Programming Languages Third Edition. Chapter 7 Basic Semantics

CSE 504: Compiler Design. Runtime Environments

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano

The SPL Programming Language Reference Manual

The role of semantic analysis in a compiler

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

Runtime management. CS Compiler Design. The procedure abstraction. The procedure abstraction. Runtime management. V.

Names, Bindings, Scopes

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

SE352b: Roadmap. SE352b Software Engineering Design Tools. W3: Programming Paradigms

Subprograms. Copyright 2015 Pearson. All rights reserved. 1-1

High-Level Language VMs

Implementing Subroutines. Outline [1]

Intermediate Representations

Intermediate Code Generation

Programming Language Implementation

Run-time Environments - 2

Name, Scope, and Binding. Outline [1]

Principles of Compiler Design

Transcription:

1 P a g e UNIT-V Symbol Table & Run-Time Environments Symbol Table Symbol table is a data structure used by compiler to keep track of semantics of variable. i.e. symbol table stores the information about scope and binding information about names. Symbol table is built in lexical and syntax analysis phases. It is used by various phases as follows, semantic analysis phase refers symbol table for type conflict issues. Code generation refers symbol table to know how much run-time space is allocated? What type of space allocated? Use of symbol table To achieve compile time efficiency compiler makes use of symbol table It associates lexical names with their attributes. The items to be stored in symbol table are, Variable name Constants Procedure names Literal constants & strings Compiler generated temporaries Labels in source program Compiler uses following types of information from symbol table Data type Name Procedure declarations Offset in storage In case of structure or record, a pointer to structure table For parameters, whether pass by value or reference Number and type of arguments passed Base address

Types of symbol table Ordered symbol table Here, the entries of variables are made in alphabetical order. Searching of ordered symbol table can be done using linear and binary search. Advantages : The searching of particular variable is efficient. Relationship between variables can be established easily. Disadvantages: Insertion of element is costly if there are large no. of entries in the table. Unordered symbol table In this type of table, variable entries are not made in sorted manner. Each time before inserting a variable in the table, a lookup is made to check whether it is already present in the symbol table or not. If the variable is not present, then an entry is made. Advantage: Insertion of variable is easier. Disadvantage: Searching is done using linear search. For larger tables the method turns to be inefficient, because lookup is made before every insertion. How names are stored in symbol table? There are two ways to store the names in symbol table. Fixed-length name: A fixed space is allocated for every symbol in the table. Space is wasted in this type of storage, if name of the variable is small. 2 P a g e

Name Atribute For example, C A L C U L A T E Float S U M Float A Int B Int Variable- length names: The amount of space required by string is used to store the names. The name can be stored with the help of starting index and length of each name. E.g, Starting Name Atribute Index length 0 10 10 4 14 2 16 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 c a l c u l a t e $ s u m $ a $ b $ 3 P a g e

Symbol table management Symbol table management is required for the following reasons, For quick insertion of identifier and related information. For quick search of identifier. Following are commonly used data structures for symbol table construction List Data structure Self organizing list/ Linked list Binary tree Hash tables List data structure Linear list is the simplest kind of mechanism to implement symbol table. In this method array is used to store names and associated information. New names can be added in the order they arrive. The pointer available is maintained at the end of all stored records. Name 1 Info 1 Name 2 Info 2 Name 3 Info 3... Name n... Info n 4 P a g e To retrieve information about some name we start from the beginning and go on searching up to available pointer.

If we reach the available pointer without finding the name, then we get an error use of undeclared name. While inserting a new name we should ensure that it should not be already there. If it is already there, then another error occurs, i.e.. multiple defined name. Advantage is it takes less amount of space. Self organizing list This symbol table representation uses linked list. A link field is added to each record. We search the records in the order pointed by the link field. A pointer first, is maintained to point to the first record of the symbol table. When the name is referenced or created it is moved to the front of the list. The most frequently referred names will be tend to be front of the list. Hence access time to most frequently referred names will be the least. Insertion is easier. Binary trees The symbol table is represented as a binary tree as follows, Left child Symbols Information Right child 5 P a g e The left child stores address of previous symbol and right child stores address of next symbol. The symbol field is used to store the name of the symbol and information field is used to store all attributes/information of the symbol. The binary tree structure is basically a BST in which left side node is always less and right side node is always more than the parent node. Hence, insertion of symbol is efficient. Searching process is efficient. create a BST structure for following

Int m, n, p; Int compute(int a, int b, int c) { t=a + b+ c; Return (t); } Void main() { Int k; K= compute(10,20,30); } Hash tables Hashing is an important technique used to search the records of symbol table. This method is superior to list organization. In hashing scheme two tables are maintained- a hash table and a symbol table. The hash table consists of k entries from o to k-1. these entries are basically pointers to the symbol table pointing to the names in the symbol table. To determine whether the name is there in the symbol table, we use a hash function h such that h(name) will result any integer between o to k-1. We can search any name using the function. The hash function should result in uniform distribution of names in symbol table. The hash function should be such that there will be minimum no of collisions. Advantages of hashing is quick search is possible and the disadvantage is it is complicated to implement. Extra space is required. Obtaining scope of variables is difficult. 6 P a g e

Runtime environment The compiler demands a block of memory to the OS. This memory is utilized for running or executing the compiled program. This block of memory is called run time storage. The run time storage is subdivided to hold the following, The generated target code Data objects Information which keeps track of procedure activations The size of generated code is fixed. Hence the target code occupies statically determined area of the memory. Compiler places the target code at the lower end of the memory. The amount of memory required by the data objects is known at the compile time and hence data objects also can be placed at the static data area of memory. To maximize the utilization of space at run time, the other two areas, stack and heap are used at opposite ends of the remaining address space. Stack is used to store data structures called activation records that gets generated during procedure calls. Heap area is the area of run time storage which is allocated to variables during run-time. Size of stack and heap is not fixed. i.e. it may grow or shrink during program execution. Storage Organization The executing target program runs in its own logical address space in which each program value has a location. The management and organization of this logical address space is shared between the complier, operating system and target machine. The operating system maps the logical address into physical addresses, which are usually spread throughout memory. 7 P a g e

Run-time storage comes in blocks, where a byte is the smallest unit of addressable memory. Four bytes form a machine word. Multi byte objects are stored in consecutive bytes and given the address of first byte. The storage layout for data objects is strongly influenced by the addressing constraints of the target machine. A character array of length 10 needs only enough bytes to hold 10 characters, a compiler may allocate 12 bytes to get alignment, leaving 2 bytes unused. This unused space due to alignment considerations is referred to as padding. The size of some program objects may be known at run time and may be placed in an area called static. The dynamic areas used to maximize the utilization of space at run time are stack and heap. Storage organization strategies Three different strategies based on the division of run time storage Static allocation Stack allocation Heap allocation Static Allocation The size of data objects is known at compile time. The names of these objects are bound to storage at compile time only and such allocation is called as static allocation. Amount of storage do not change during run time. At compile time, the compiler can fill the addresses at which the target code can find the data it operates on. 8 P a g e

Main limitation is recursive procedures are not supported by this type of allocation because of the static nature. Stack allocation Here the storage is organized as a stack. (LIFO) It is also called as control stack. As activation begins the activation records are pushed onto the stack and on completion of this activation the corresponding activation record is popped out. The local variables are stored in each activation record. Hence locals are bound to corresponding activation record on each fresh activation. The data structure can be created dynamically for stack allocation. Here memory addressing is done using pointers or index registers hence slower than static allocation. Heap allocation If the values of non-local variables must be retained even after the activation record then such a retaining is not possible in stack allocation because of its LIFO nature. Hence, heap allocation is used in such situations. The heap allocation allocates continuous block of memory when required for storage of activation records or other data objects. This memory can be deactivated when activation ends. Heap management can be done by creating a linked list for the free blocks and when any memory is deallocated that block can be appended to the list. 9 P a g e

Activation record Activation record is a block of memory used for managing information needed by a single execution of a procedure. The contents of activation record are, Temporaries: temporary variables used during evaluation of expressions. Local data: it is the data that is local to the execution of procedure. Saved machine status: this field holds information regarding to the status of machine, before calling the procedure. This field contains the machine registers and PC. Control link: it is optional field. It points to activation record of calling procedure. This link is also called as dynamic link. Access link: it may be needed by the called procedure but found else where, i.e. in another activation record. Actual parameters: passed during call. Return value: stores result of function call. 10 P a g e

Example: Sketch of a quicksort program Activation for Quicksort 11 P a g e

Activation tree representing calls during an execution of quicksort Downward-growing stack of activation records 12 P a g e

Block and non block structure storage allocation The storage allocation is done for two types of data, Local data Non-local data The local data can be handled using activation record whereas non local data can be handled using scope information. Access to Local Data The local data can be accessed with help of activation record. The offset relative to base pointer of an activation record points to local data variables within an activation record. Hence, Reference to any variable x in procedure= Base pointer pointing to start of procedure + offset of variable x from base pointer E.g. consider the following, Procedure A Int a; Procedure B Int b; Body of B; Body of A; Contents of stack along with base pointer is shown below, Return Value Activation record for A Saved Registers Parameters Locals: a offset 13 P a g e

Activation record for A offset Activation record for B Access to non local Data A procedure may sometimes refer to variables which are not local to it. Such variables are called as non-local variables. For the non local names there are two types of scope rules such as static and dynamic. The static scope rule is also called as lexical scope. In this type the scope is determined by examining the program text. The languages that use the static scope rules are called as block structured language. The dynamic scope rules determine the scope of declaration of the names at run time by considering the current activation. Static or lexical scope Block is a sequence of statements containing the local data declarations and enclosed within the delimiters. The blocks can be nesting in nature. The scope of declaration in a block structured language is given by most closely nested loop or static rule. 14 P a g e

E.g. Scope_test() { Int p, q; { Int p; { Int r; } }. { Int q, s,t; } B3 B4 B2 } The storage for the names corresponding to a particular block can be shown below. B1 15 P a g e

The lexical scope can be implemented using access link and displays Access link: The implementation of lexical scope can be obtained by using pointer to each activation record. These pointers are called access links. If procedure P is nested within a procedure q then access link of p points to access link of most recent activation record of procedure q. Display: It is expensive to traverse down access link every time when a particular non local variable is accessed. To speed up the access to non locals can be achieved by maintaining an array of pointers called display. In display, 0 1 2 An array of pointers to activation record is maintained. Array is indexed by nesting level The pointers point to only accessible activation record. The display changes when a new activation occurs and it must be reset when control returns from the new activation. Display stack The advantage of using display is that, if p is executing, and it needs to access element x belonging to some procedure q, we need to look only in display[i] where i is the nesting depth of q.we will follow the pointer display[i] to the activation record for q wherein x is found at some offset. The compiler knows what i is, so it can access display[i] easily.hence no need to follow a long chain of access links. 16 P a g e

Heap Management Heap is a portion of memory that holds data during the lifetime of the program. In the heap, the static memory requirements such as global variables will be allocated space. In addition, any memory that is supposed to be used throughout the program is also stored in heap. Hence managing the heap is important. A special software called Memory Manager manages allocation and deallocation of memory. Memory Manager: Two basic functions of memory manager are, Allocation:when a program requests memory for a variable, then the memory manager produces a chunk of heap memory of requested size. Deallocation:The memory manager deallocates the space adds it to the pool of free space so that it can be reused. Desired Properties of memory managers: Space efficiency: should minimize the total heap space required by a program Program efficiency: should make better use of space, to run the program faster to increase efficientcy. Low overhead: allocation and deallocation should be effiecient. Two types of memory allocation techniques are, Explicit allocation Implicit allocation Explicit allocation is done using functions like new and dispose. Whereas, implicit memory allocation is done by compiler using run time support packages. Explicit allocation This is the simplest technique of explicit allocation where the size of block for which memory is allocated is fixed. 17 P a g e

In this technique, a free list is used which is a set of free blocks. Memory is allocated from this list. The blocks are linked each other in a list structure. The memory allocation can be done by pointing previous node to the newly allocated block. Similarly deallocation can be done by dereferencing the previous link. Explicit allocation of variable sized blocks Due to frequent memory allocation and deallocation, the heap memory becomes fragmented. For allocating variable sized blocks we use strategies like first fit, worst fit and best fit. Implicit allocation The implicit allocation is performed using user program and runtime packages. The run time package is required to know when the storage block is not in use. The format of storage block is given below. Reference count: (RC) It is a special counter used during implicit allocation. If any block is referred by any other block then its reference count is incremented by one. If the value of RC is 0, then it can be deallocated. Marking techniques: it is an alternative approach whether the block is in use or not. In this method, the user program is suspended temporarily and frozen pointers are used to mark the blocks which are in use. Parameter passing Mechanism Types of parameters Formal : parameters used in the function definition. Actual: parameters passed during function call. What is l-value & r-value? R-value is the value of expression which is present at the right side of assignment operator. L-value is the address of memory location( or variable) present at the left 18 P a g e

side of the assignment operator. What are the Parameter passing methods? Call by value/pass by value. Call by address/ Pass by reference. Pass by copy-restore Pass by Name Example: swapping of two numbers( using C language) Pass by Value: In pass by value mechanism, the calling procedure passes the r-value of actual parameters and the compiler puts that into the called procedure s activation record. Formal parameters then hold the values passed by the calling procedure. If the values held by the formal parameters are changed, it should have no impact on the actual parameters. Pass by Reference: In pass by reference mechanism, the l-value of the actual parameter is copied to the activation record of the called procedure. This way, the called procedure now has the address (memory location) of the actual parameter and the formal parameter refers to the same memory location. Therefore, if the value pointed by the formal parameter is changed, the impact should be seen on the actual parameter as they should also point to the same value. Pass by Copy-restore: This parameter passing mechanism works similar to pass-by-reference except that the changes to actual parameters are made when the called procedure ends. Upon function call, the values of actual parameters are copied in the activation record of the called procedure. Formal parameters if manipulated have no real-time effect on actual parameters (as l-values are passed), but when the called procedure ends, the l-values of formal parameters are copied to the l-values of actual parameters. 19 P a g e

Example: int y; calling_procedure() { y = 10; copy_restore(y); //l-value of y is passed printf y; //prints 99 } copy_restore(int x) { x = 99; // y still has value 10 (unaffected) y = 0; // y is now 0 } When this function ends, the l-value of formal parameter x is copied to the actual parameter y. Even if the value of y is changed before the procedure ends, the l- value of x is copied to the l-value of y making it behave like call by reference. Pass by Name Languages like Algol provide a new kind of parameter passing mechanism that works like preprocessor in C language. In pass by name mechanism, the name of the procedure being called is replaced by its actual body. Pass-by-name textually substitutes the argument expressions in a procedure call for the corresponding parameters in the body of the procedure so that it can now work on actual parameters, much like pass-by-reference. Garbage collection The process of collecting unused memory (which was previously allocated to variables/objects and no longer needed now) in a program and pool it in a form to be used by other application is called as GARBAGE COLLECTION. Few languages support automatic garbage collection and in other languages we 20 P a g e

need to explicitly use the garbage collection techniques. Basic idea is keep track of what memory is referenced and when it is no longer accessible, reclaim the memory. Example: linked list. Reference count garbage collectors Garbage collection(gc) works as follows. When an application needs some free space to allocate nodes and there is no free space available to allocate memory to the objects then a system routine called GARBAGE COLLECTOR is invoked. The routine then searches the system for the nodes that are no longer accessible from the external pointers. These nodes can be made available for reuse by adding them to the available pool. Reference count is a special counter used during implicit memory allocation. If any block is referred by any other block then its reference count is incremented by one. 21 P a g e

The block is deallocated as soon as the reference count value becomes zero. These kind of garbage collectors are called as reference count garbage collectors. Advantages of GC are, The manual memory management done by the programmer (using malloc, realloc, free) is time consuming and error prone. Reusability of memory is achieved using garbage collection. Disadvantages are, The execution of program is stopped or paused during garbage collection. Sometimes a situation called Thrashing occurs. Introduction CODE GENERATION The final phase of a compiler is code generator. It receives an intermediate representation (IR) along with information in symbol table. Produces a semantically equivalent target program Code generator main tasks: Instruction selection Register allocation and assignment Instruction ordering 22 P a g e

Issues in the Design of Code Generator: The most important criterion is that it produces correct code Input to the code generator: IR + Symbol table We assume front end produces low-level IR, i.e. values of names in it can be directly manipulated by the target machine. Syntactic and semantic errors have been already detected. The target program Common target architectures are: RISC, CISC and Stack based machines. In this chapter, we use a very simple RISC-like computer with addition of some CISC-like addressing modes Instruction selection: The code generated must map the IR program into a code sequence that can be executed by the target machine. The complexities of this mapping are, The level of the IR(Intermediate representation) The nature of the instruction-set architecture The desired quality of the generated code.(speed & size) Register allocation Two sub problems are, Register allocation: selecting the set of variables that will reside in registers at each point in the program Resister assignment: selecting specific register that a variable reside in. Evaluation ordering The order in which computations are performed can affect the efficiency of the target code. Because, some computation orders fewer registers to hold intermediate results than other. The Target Language The target language performs following operations, 23 P a g e

Load operations: LD r,x and LD r1, r2 Store operations: ST x,r Computation operations: OP dst, src1, src2 Unconditional jumps: BR L Conditional jumps: Bcond r, L like BLTZ r, L The simple target machine model uses following addressing modes. variable name: x indexed address: a(r) like, LD R1, a(r2) means R1=contents(a + contents(r2)) integer indexed by a register : like LD R1, 100(R2) Indirect addressing mode: *r means the memory location found in the location represented by the contents of register r and *100(r) means content(contents(100+contents(r))) immediate constant addressing mode: like LD R1, #100 The three address statement x=y-z can be implemented by the machine instructions LD R1,Y LD R2,Z SUB R1,R2 ST X,R1 suppose an array a whose elements are 8-byte real numbers. Then, b = a [i] LD R1, i //R1 = i MUL R1, 8 //R1 = Rl * 8 LD R2, a(r1) //R2=contents(a+contents(R1)) ST b, R2 //b = R2 a[j] = c LD R1, c //R1 = c LD R2, j // R2 = j MUL R2, 8 //R2 = R2 * 8 ST a(r2), R1 //contents(a+contents(r2))=r1 x=*p LD R1, p LD R2, 0(R1) ST x, R2 //R1 = p // R2 = contents(0+contents(r1)) // x=r2 24 P a g e

conditional-jump three-address instruction If x<y goto L LD R1, x // R1 = x LD R2, y // R2 = y SUB R1, R1, R2 // R1 = R1 - R2 BLTZ R1, M // i f R1 < 0 jump t o M Basic blocks and flow graphs A graph representation of intermediate code that is helpful for code generation. Partition the intermediate code into basic blocks, which are sequence of 3- address code with properties that, The flow of control can only enter the basic block through the first instruction in the block. That is, there are no jumps into the middle of the block. Control will leave the block without halting or branching, except possibly at the last instruction in the block. The basic blocks become the nodes of a flow graph, whose edges indicate which block can follow which other blocks. Rules for finding leaders The first three-address instruction in the intermediate code is a leader. Any instruction that is the target of a conditional or unconditional jump is a leader. Any instruction that immediately follows a conditional or unconditional jump is a leader. Intermediate code to set a 10*10 matrix to an identity. 25 P a g e

Three address code for the above code is, Flow graph Once basic blocks are constructed, the flow of control between these block can be represented using edges. There is an edge from B to C iff it is possible for the first instruction in block C to immediately follow the last instruction in block B. There are two ways such an edge can be justified. There is a conditional jump from the end of B to beginning of C C immediately follows B in the original order of the three address instructions and B does not end in an unconditional jump. Example: consider the above three address code, the leader instructions are, 1,2,3,10,12,13 because these statements are the targets of branch instructions. Now, using these leaders we can construct 6 basic blocks and then these basic blocks can be connected to each other using edges as shown below. 26 P a g e

(Flow graph) A Simple code generator One of the primary issues in code generation is deciding how to use registers to best advantage. Following are principal use of registers. In most machine architectures, some or all of the operands of an operation must be in registers in order to perform the operation. Registers make good temporaries - places to hold the result of a subexpression while a larger expression is being evaluated. Registers used to hold global values which are computed in one basic block and used in other blocks. Registers are often used to help with run-time storage management, for example, to manage the run-time stack, including the maintenance of stack pointers and possibly the top elements of the stack itself. 27 P a g e

Descriptors For each available register, a register descriptor keeps track of the variable names whose current value is in that register. Since we shall use only those registers that are available for local use within a basic block, we assume that initially, all register descriptors are empty. As the code generation progresses, each register will hold the value of zero or more names. For each program variable, an address descriptor keeps track of the location or locations where the current value of that variable can be found. The location might be a register, a memory address, a stack location, or some set of more than one of these. The information can be stored in the symbol-table entry for that variable name. The Code-generation Algorithm This algorithm uses a function called getreg(i) which select registers for each memory location associated with instruction I. Use getreg(x = y + z) to select registers for x, y, and z. Call these R x, R y and R z. If y is not in R y (according to the register descriptor for R y ), then issue an instruction LD R y, y', where y' is one of the memory locations for y (according to the address descriptor for y). Similarly, if z is not in R z, issue and instruction LD R z, z', where z' is a location for x. Issue the instruction ADD R x, R y, R z. Rules for updating the register and address descriptors 1. For the instruction LD R, x Change the register descriptor for register R so it holds only x. Change the address descriptor for x by adding register R as an additional location. 2. For the instruction ST x, R, change the address descriptor for x to include its own memory location. 3. For an operation such as ADD R x, R y, R z implementing a three-address instruction x = y + x Change the register descriptor for R x so that it holds only x. Change the address descriptor for x so that its only location is R x. Note 28 P a g e

that the memory location for x is not now in the address descriptor for x. Remove R x from the address descriptor of any variable other than x. 4. When we process a copy statement x = y, after generating the load for y into register R y, if needed, and after managing descriptors as for all load statements (per rule I): Add x to the register descriptor for R y. Change the address descriptor for x so that its only location is R y. Example Let us consider a basic block containing the following 3-address code. t=a-b u=a-c v=t+u a=d d=v+u Instructions generated and the changes in the register and address descriptors is shown below. 29 P a g e

Rules for picking register R y for y If y is currently in a register, pick a register already containing y as R y. Do not issue a machine instruction to load this register, as none is needed. If y is not in a register, but there is a register that is currently empty, pick one such register as R y. The difficult case occurs when y is not in a register, and there is no register that is currently empty. We need to pick one of the allowable registers anyway, and we need to make it safe to reuse. Peephole optimizations An alternative approach of code generation is to generate naïve code and then improve the quality of the target code by applying optimization. A simple but effective technique for locally improving the target code is peephole optimization. This is done by examining a sliding window of target instructions (called peephole) and replacing the instruction sequences within the peephole by a shorter or faster sequence whenever possible. 30 P a g e

Characteristic of peephole optimizations Redundant-instruction elimination Flow-of-control optimizations Algebraic simplifications Use of machine idioms Redundant-instruction elimination LD R0,a ST a,r0 Eliminating unreachable code Example 1: Sum=0 if(sum) printf( %d,sum); Example 2: Int fun(int a, int b) { c=a+b; Return c; Printf( %d,c);} Flow-of-control optimizations goto L1... Ll: goto L2 Can be replaced by: goto L2 Algebraic simplifications There is no end to the amount of algebraic simplification that can be attempted through peephole optimization. Only a few algebraic identities occur frequently enough that it is worth considering implementing them.for example, statements such as x := x+0 Or x := x * 1 31 P a g e

Are often produced by straightforward intermediate code-generation algorithms, and they can be eliminated easily through peephole optimization. Reduction in Strength: Reduction in strength replaces expensive operations by equivalent cheaper ones on the target machine. Certain machine instructions are considerably cheaper than others and can often be used as special cases of more expensive operators. For example, x² is invariably cheaper to implement as x*x than as a call to an exponentiation routine. Fixed-point multiplication or division by a power of two is cheaper to implement as a shift. Floating-point division by a constant can be implemented as multiplication by a constant, which may be cheaper. X2 X*X. Use of Machine Idioms: The target instructions have equivalent machine instruction for performing some operations. Hence we can replace these target instructions by equivalent machine instructions in order to improve efficiency. E.g. some machines have autoincrement and auto-decrement addressing modes that are used to perform addition and subtraction. Register Allocation and Assignment Instructions involving register operands are shorter and faster than those involving operands in memory. The use of registers is subdivided into two sub problems: Register allocation the set of variables that will reside in registers at a point in the program is selected. Register assignment the specific register that a variable will reside in is picked. Following are the techniques. Global Register Allocation Usage Counts Register Assignment for Outer Loops Register Allocation by Graph Coloring Global register allocation Previously explained algorithm does local (block based) register allocation This resulted that all live variables be stored at the end of block 32 P a g e

To save some of these stores and their corresponding loads, we might arrange to assign registers to frequently used variables and keep these registers consistent across block boundaries (globally) Some options are: Keep values of variables used in loops inside registers Use graph coloring approach for more globally allocation Usage counts The usage count is the count for the use of some variable x in some register used in any basic block. Usage count gives the idea about how many units of cost can be saved by selecting a specific variable for global register allocation. The approximate formula for usage count for the loop L in some basic block B can be given by, (use(x,b)+2*(live(x,b)) Sum over all blocks (B) in a loop (L) For each uses of x before any definition in the block we add one unit of saving If x is live on exit from B and is assigned a value in B, then we add 2 units of saving Ex: Here, usage count of a,b,c,d,e & f is 4,5,3,6,4 &4 respectively. Hence, if you have three registers (R1,R2,R3) then, these registers are allocated to the 33 P a g e

variables c,b and a because they have highest usage count value. Flow graph of an inner loop Register assignment for outer loops Consider that there are two loops L1 and L2 where L2 is the inner loop and allocation of variable a is to be done to some register. Following criteria should be adopted for register assignment for outer loop. If a is allocated in loop L2 then it need not be allocated in L1-L2. If a is allocated in L1 and it is not allocated in L2 then store a on an entrance to L2 and load a while leaving L2. If a is allocated in L2 and not in L1 then load a on entrance of L2 and store a on exit from L2. Register allocation by Graph coloring When we need a register for allocation but all registers are occupied then we need to make some register free for reusability. The register selection is done using following technique.(graph coloring) Two passes are used in this technique. In the first pass the specific machine instruction is selected for register allocation. For each variable a symbolic register is allocated. In the second pass the register interference graph is constructed. In this graph, each node is a symbolic register and an edge connects two nodes where one is live at a point where other is defined. Use a graph coloring algorithm to assign registers so that no two registers can interfere with each other with assigned physical registers. 34 P a g e