Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Similar documents
Principles of Compiler Design

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Intermediate Code Generation

opt. front end produce an intermediate representation optimizer transforms the code in IR form into an back end transforms the code in IR form into

Intermediate Representations

Front End. Hwansoo Han

Intermediate Representations

Intermediate Representations

tokens parser 1. postfix notation, 2. graphical representation (such as syntax tree or dag), 3. three address code

Alternatives for semantic processing

Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit

CSE 401/M501 Compilers

CSE P 501 Compilers. Intermediate Representations Hal Perkins Spring UW CSE P 501 Spring 2018 G-1

Semantic Analysis. Compiler Architecture

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

6. Intermediate Representation

COMPILER CONSTRUCTION (Intermediate Code: three address, control flow)

A Simple Syntax-Directed Translator

UNIT IV INTERMEDIATE CODE GENERATION

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Intermediate representation

Intermediate Representations & Symbol Tables

6. Intermediate Representation!

Intermediate Representation (IR)

CSc 453. Compilers and Systems Software. 13 : Intermediate Code I. Department of Computer Science University of Arizona

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Programming Languages & Translators PARSING. Baishakhi Ray. Fall These slides are motivated from Prof. Alex Aiken: Compilers (Stanford)

Intermediate Representations Part II

Compilers. Intermediate representations and code generation. Yannis Smaragdakis, U. Athens (original slides by Sam

Compiler Theory. (Intermediate Code Generation Abstract S yntax + 3 Address Code)

Compiler Optimization Intermediate Representation

CS606- compiler instruction Solved MCQS From Midterm Papers

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design

CS415 Compilers. Intermediate Represeation & Code Generation

Semantic Analysis computes additional information related to the meaning of the program once the syntactic structure is known.

Intermediate Representations

Dixita Kagathara Page 1

Concepts Introduced in Chapter 6

UNIT-3. (if we were doing an infix to postfix translator) Figure: conceptual view of syntax directed translation.

NARESHKUMAR.R, AP\CSE, MAHALAKSHMI ENGINEERING COLLEGE, TRICHY Page 1

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

CMPT 379 Compilers. Parse trees

Intermediate Representations

Some Basic Definitions. Some Basic Definitions. Some Basic Definitions. Language Processing Systems. Syntax Analysis (Parsing) Prof.

COMP 181. Prelude. Intermediate representations. Today. High-level IR. Types of IRs. Intermediate representations and code generation

Intermediate Code Generation

Acknowledgement. CS Compiler Design. Intermediate representations. Intermediate representations. Semantic Analysis - IR Generation

Syntax-directed translation. Context-sensitive analysis. What context-sensitive questions might the compiler ask?

Concepts Introduced in Chapter 6

Page No 1 (Please look at the next page )

2.2 Syntax Definition

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

Context-sensitive analysis. Semantic Processing. Alternatives for semantic processing. Context-sensitive analysis

CS 406/534 Compiler Construction Putting It All Together

Semantic analysis and intermediate representations. Which methods / formalisms are used in the various phases during the analysis?

Intermediate Code Generation

Intermediate Code Generation

Intermediate Code & Local Optimizations

G Compiler Construction Lecture 12: Code Generation I. Mohamed Zahran (aka Z)

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILING

Chapter 6 Intermediate Code Generation

Group B Assignment 9. Code generation using DAG. Title of Assignment: Problem Definition: Code generation using DAG / labeled tree.

Context-Free Grammars

CS 4201 Compilers 2014/2015 Handout: Lab 1

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

An Overview to Compiler Design. 2008/2/14 \course\cpeg421-08s\topic-1a.ppt 1

Introduction to Compiler

Static Checking and Intermediate Code Generation Pat Morin COMP 3002

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Intermediate Representations

Code Generation. Frédéric Haziza Spring Department of Computer Systems Uppsala University

Time : 1 Hour Max Marks : 30

Life Cycle of Source Program - Compiler Design

TACi: Three-Address Code Interpreter (version 1.0)

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

Comp 204: Computer Systems and Their Implementation. Lecture 22: Code Generation and Optimisation

Compiler Design (40-414)

COMPILER CONSTRUCTION Seminar 03 TDDB

Principles of Compiler Design

Intermediate Code Generation

PRINCIPLES OF COMPILER DESIGN

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

Program Analysis ( 软件源代码分析技术 ) ZHENG LI ( 李征 )

Run-time Environments. Lecture 13. Prof. Alex Aiken Original Slides (Modified by Prof. Vijay Ganesh) Lecture 13

Principle of Compilers Lecture VIII: Intermediate Code Generation. Alessandro Artale


CST-402(T): Language Processors

A simple syntax-directed

Administration CS 412/413. Why build a compiler? Compilers. Architectural independence. Source-to-source translator

Semantic actions for declarations and expressions

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

CMPSC 160 Translation of Programming Languages. Three-Address Code

Announcements. My office hours are today in Gates 160 from 1PM-3PM. Programming Project 3 checkpoint due tomorrow night at 11:59PM.

Semantic actions for declarations and expressions. Monday, September 28, 15

LECTURE 3. Compiler Phases

Compiler Architecture

Intermediate Code Generation

Tizen/Artik IoT Lecture Chapter 3. JerryScript Parser & VM

Sardar Vallabhbhai Patel Institute of Technology (SVIT), Vasad M.C.A. Department COSMOS LECTURE SERIES ( ) (ODD) Code Optimization

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Subject Name: CS2352 Principles of Compiler Design Year/Sem : III/VI

Transcription:

Language Processing Systems Prof. Mohamed Hamada Software ngineering Lab. The University of Aizu Japan

Intermediate/Code Generation Source language Scanner (lexical analysis) tokens Parser (syntax analysis) Parse tree Semantic Analysis AST Intr. IC Code Generator generator Intermediate Language Code Optimizer Intermediate Language OIL Code Generator Target language rror Handler Symbol Table

Intermediate Code Generation

Intermediate Code Similar terms: Intermediate representation, intermediate language Ties the front and back ends together Language and Machine neutral Many forms More than one intermediate language may be used by a compiler

Intermediate Representation There is no standard Intermediate Representation. IR is a step in expressing a source program so that machine understands it As the translation takes place, IR is repeatedly analyzed and transformed Compiler users want analysis and translation to be fast and correct Compiler writers want optimizations to be simple to write, easy to understand and easy to extend IR should be simple and light weight while allowing easy expression of optimizations and transformations.

Issues in new IR Design How much machine dependent xpressiveness: how many languages are covered Appropriateness for code optimization Appropriateness for code generation Front end Optimizer

Intermediate Languages Types Graphical IRs: Abstract Syntax trees, DAGs, Control Flow Graphs, etc. Linear IRs: Stack based (postfix) Three address code (quadruples) etc.

Graphical IRs Abstract Syntax Trees (AST) retain essential structure of the parse tree, eliminating unneeded nodes. Directed Acyclic Graphs (DAG) compacted AST to avoid duplication smaller footprint as well Control Flow Graphs (CFG) explicitly model control flow

Abstract Syntax Tree/DAG Condensed form of parse tree Useful for representing language constructs Depicts the natural hierarchical structure of the source program ach internal node represents an operator Children of the nodes represent operands Leaf nodes represent operands DAG is more compact than abstract syntax tree because common sub expressions are eliminated

ASTs and DAGs: xample: a := b *-c + b*-c AST := a + DAG := a + * * * b - (uni) b - (uni) b - (uni) c c c

Linear IR Low level IL before final code generation A linear sequence of low-level instructions Resemble assembly code for an abstract machine xplicit conditional branches and goto jumps Reflect instruction sets of the target machine Stack-machine code and three-address code Implemented as a collection (table or list) of tuples Linear IR examples Stack based (postfix) Three address code (quadruples)

Stack based: Postfix notation Linearized representation of a syntax tree List of nodes of the tree Nodes appear immediately after its children The postfix notation for an expression is defined as follows: If is a variable or constant then the postfix notation is itself If is an expression of the form 1 op 2 where op is a binary operator then the postfix notation for is 1 ' 2 ' op where 1 ' and 2 are the postfix notations for 1 and 2 respectively If is an expression of the form ( 1 ) then the postfix notation for 1 is also the postfix notation for

Stack based: Postfix notation No parenthesis are needed in postfix notation because the position and parity of the operators permits only one decoding of a postfix expression Postfix notation for a = b * -c + b * - c is a b c - * b c - * + =

Stack-machine code Also called one-address code Assumes an operand stack Operations take operands from the stack and push results back onto the stack Need special operations such as Swapping two operands on top of the stack Compact in space, simple to generate and execute Most operands do not need names Results are transitory unless explicitly moved to memory Used as IR for Smalltalk and Java Stack-machine code for x 2 * y Push 2 Push y Multiply Push x subtract

Three address code It is a sequence of statements of the general form X := Y op Z where X, Y or Z are names, constants or compiler generated temporaries op stands for any operator such as a fixed- or floating-point arithmetic operator, or a logical operator

Three address code Only one operator on the right hand side is allowed Source expression like x + y * z might be translated into t 1 := y * z t 2 := x + t 1 where t 1 and t 2 are compiler generated temporary names Unraveling of complicated arithmetic expressions and of control flow makes 3-address code desirable for code generation and optimization The use of names for intermediate values allows 3-address code to be easily rearranged Three address code is a linearized representation of a syntax tree where explicit names correspond to the interior nodes of the graph

Three address instructions Assignment x = y op z x = op y x = y Function param x call p,n return y Jump goto L if x relop y goto L Indexed assignment x = y[i] x[i] = y Pointer x = &y x = *y *x = y

Three address code very instruction manipulates at most two operands and one result. Typical forms include: Arithmetic operations: x := y op z x := op y Data movement: x := y [ z ] x[z] := y x := y Control flow: if y op z goto x goto x Function call: param x return y call foo xample: Three-address code for x 2 * y := 2 t2 := y t3 := *t2 t4 := x t5 := t4-t3

Storing three-address code Store all instructions in a quadruple table very instruction has four fields: op, arg1, arg2, result The label of instructions è index of instruction in table Quadruple entries xample: Three-address code op arg1 arg2 result := - c t2 := b * t3 := -c t4 := b * t3 t5 := t2 + t4 a := t5 (0) Uminus c (1) Mult b t2 (2) Uminus c t3 (3) Mult b t3 t4 (4) Plus t2 t4 t5 (5) Assign t5 a Alternative: store all the instructions in a singly/doubly linked list What is the tradeoff?

Linear IR: xample xample: Linear IR for x 2 * y Stack-machine code two-address code three-address code Push 2 Push y Multiply Push x subtract MOV 2 => MOV y => t2 MULT t2 => MOV x => t4 SUB => t4 := 2 t2 := y t3 := *t2 t4 := x t5 := t4-t3

Generating Intermediate Code As we parse, generate IC for the given input. Use attributes to pass information about temporary variables up the tree

xample1: a := b *-c + b*-c t0 = b t0 b

xample1: a := b *-c + b*-c t0 = b = -c t0 b - (uni) c

xample1: a := b *-c + b*-c t0 * b - (uni) t0 = b = -c = * t0 c

xample1: a := b *-c + b*-c t0 * b - (uni) b t0 t0 = b = -c = * t0 t0 = b c

xample1: a := b *-c + b*-c t0 * b - (uni) c t0 b - (uni) t2 t2 c t0 = b = -c = * t0 t0 = b t2 = -c

xample1: a := b *-c + b*-c t0 * * b - (uni) c t0 b t0 - (uni) t2 t2 c t0 = b = -c = * t0 t0 = b t2 = -c t0 = t0 * t2

xample1: a := b *-c + b*-c t0 + * * b - (uni) c t0 b t0 - (uni) t2 t2 c t0 = b = -c = * t0 t0 = b t2 = -c t0 = t0 * t2 = t0 +

xample1: a := b *-c + b*-c t0 assign a + * * b - (uni) c t0 b t0 - (uni) t2 t2 c t0 = b = -c = * t0 t0 = b t2 = -c t0 = t0 * t2 = t0 + a =

xample1: a := b *-c + b*-c t0 = b = -c = * t0 t0 = * a = 0 assign a t0 + t0 b * - (uni) t0 c

xpressions xample 2: Grammar: S à id := à + à id S Generate: = b 1 a := b + c + d + e

xpressions xample 2: Grammar: S à id := à + à id ach number a := b + c + d + e corresponds to a temporary variable. S 1 2 Generate: = b t2 = c

xpressions xample 2: Grammar: S à id := à + à id 3 ach number a := b + c + d + e corresponds to a temporary variable. S 1 2 Generate: = b t2 = c t3 = + t2

xpressions xample 2: Grammar: S à id := à + à id 3 ach number a := b + c + d + e corresponds to a temporary variable. S 1 2 4 Generate: = b t2 = c t3 = + t2 t4 = d

xpressions xample 2: Grammar: S à id := à + à id 3 ach number a := b + c + d + e corresponds to a temporary variable. S 5 1 2 4 Generate: = b t2 = c t3 = + t2 t4 = d t5 = t3 + t4

xpressions xample 2: Grammar: S à id := à + à id ach number a := b + c + d + e corresponds to a temporary variable. S 5 6 3 4 1 2 Generate: = b t2 = c t3 = + t2 t4 = d t5 = t3 + t4 t6 = e

xpressions xample 2: Grammar: S à id := à + à id S 7 5 6 3 4 1 2 ach number a := b + c + d + e corresponds to a temporary variable. Generate: = b t2 = c t3 = + t2 t4 = d t5 = t3 + t4 t6 = e t7 = t5 + t6

xpressions xample 2: Grammar: S à id := à + à id S 7 5 6 3 4 1 2 ach number a := b + c + d + e corresponds to a temporary variable. Generate: = b t2 = c t3 = + t2 t4 = d t5 = t3 + t4 t6 = e t7 = t5 + t6 a = t7

Other representations SSA: Single Static Assignment RTL: Register transfer language Stack machines: P-code CFG: Control Flow Graph Dominator Trees DJ-graph: dominator tree augmented with join edges PDG: Program Dependence Graph VDG: Value Dependence Graph GURRR: Global unified resource requirement representation. Combines PDG with resource requirements Java intermediate bytecodes The list goes on...