Compiler Design IIIT Kalyani, West Bengal 1. Introduction. Goutam Biswas. Lect 1

Similar documents
Binghamton University. CS-220 Spring X86 Debug. Computer Systems Section 3.11

Intermediate Representations

CS527 Software Security

Credits and Disclaimers

Draft. Chapter 1 Program Structure. 1.1 Introduction. 1.2 The 0s and the 1s. 1.3 Bits and Bytes. 1.4 Representation of Numbers in Memory

CS 33. Linkers. CS33 Intro to Computer Systems XXV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Chapter 2 A Quick Tour

Generation. representation to the machine

Introduction Presentation A

Compiler Drivers = GCC

CST-402(T): Language Processors

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

CS 4201 Compilers 2014/2015 Handout: Lab 1

Compiling Regular Expressions COMP360

Compiling and Interpreting Programming. Overview of Compilers and Interpreters

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

CSE2421 Systems1 Introduction to Low-Level Programming and Computer Organization

L14: Structs and Alignment. Structs and Alignment. CSE 351 Spring Instructor: Ruth Anderson

Princeton University Computer Science 217: Introduction to Programming Systems. Assembly Language: Function Calls

18-600: Recitation #4 Exploits

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

Compiler Code Generation COMP360

CSE 351 Midterm - Winter 2017

CS356: Discussion #7 Buffer Overflows. Marco Paolieri

Credits and Disclaimers

1 Number Representation(10 points)

COMPILER DESIGN. For COMPUTER SCIENCE

Code Generation: An Example

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

CS 261 Fall Machine and Assembly Code. Data Movement and Arithmetic. Mike Lam, Professor

Introduction. Compiler Design CSE Overview. 2 Syntax-Directed Translation. 3 Phases of Translation

CS377P Programming for Performance Leveraging the Compiler for Performance

Machine-level Programs Procedure

CSE P 501 Exam Sample Solution 12/1/11

CS 2505 Computer Organization I Test 2. Do not start the test until instructed to do so! printed

Building an Executable

CS356: Discussion #8 Buffer-Overflow Attacks. Marco Paolieri

CS429: Computer Organization and Architecture

Undergraduate Compilers in a Day

Machine Program: Procedure. Zhaoguo Wang

Assembly Language: Function Calls

Compiler Design Overview. Compiler Design 1

Machine-Level Programming III: Procedures

Assembly III: Procedures. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Roadmap. Java: Assembly language: OS: Machine code: Computer system:

+ Machine Level Programming: x86-64 History

Structs & Alignment. CSE 351 Autumn Instructor: Justin Hsia

The role of semantic analysis in a compiler

Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition. Carnegie Mellon

Overview of Compiler. A. Introduction

Lab 10: Introduction to x86 Assembly

The Compilation Process

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

SOFTWARE ARCHITECTURE 5. COMPILER

CSE 351 Midterm - Winter 2015 Solutions

Program Translation. text. text. binary. binary. C program (p1.c) Compiler (gcc -S) Asm code (p1.s) Assembler (gcc or as) Object code (p1.

Princeton University Computer Science 217: Introduction to Programming Systems. Machine Language

Running a C program Compilation Python and C Variables and types Data and addresses Functions Performance. John Edgar 2

Lexical Scanning COMP360

Machine/Assembler Language Putting It All Together

Summary: Direct Code Generation

CSE 351 Midterm - Winter 2015

Essentials for Scientific Computing: Source Code, Compilation and Libraries Day 8

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

C to Assembly SPEED LIMIT LECTURE Performance Engineering of Software Systems. I-Ting Angelina Lee. September 13, 2012


Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

Compiler, Assembler, and Linker


DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Subject Name: CS2352 Principles of Compiler Design Year/Sem : III/VI

Procedures and the Call Stack

Pioneering Compiler Design

CSE 401/M501 Compilers

Introduction to Compiler Construction

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILING

void P() {... y = Q(x); print(y); return; } ... int Q(int t) { int v[10];... return v[t]; } Computer Systems: A Programmer s Perspective

The Stack & Procedures

Software II: Principles of Programming Languages

Instruction Set Architectures

18-600: Recitation #4 Exploits (Attack Lab)

Buffer Overflow Attack (AskCypert CLaaS)

Executables & Arrays. CSE 351 Autumn Instructor: Justin Hsia

LECTURE NOTES ON COMPILER DESIGN P a g e 2

COMPILER DESIGN LECTURE NOTES

CS , Fall 2007 Exam 1

COP 3402 Systems Software. Lecture 4: Compilers. Interpreters

Introduction to Compiler Construction

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

Compiler Structure. Lexical. Scanning/ Screening. Analysis. Syntax. Parsing. Analysis. Semantic. Context Analysis. Analysis.

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

The Stack & Procedures

Compilers Crash Course

COMPILERS BASIC COMPILER FUNCTIONS

6.035 Project 3: Unoptimized Code Generation. Jason Ansel MIT - CSAIL

C and Programming Basics

CS165 Computer Security. Understanding low-level program execution Oct 1 st, 2015

Machine-Level Programming (2)

Group A Assignment 3(2)

Transcription:

Compiler Design IIIT Kalyani, West Bengal 1 Introduction

Compiler Design IIIT Kalyani, West Bengal 2 Programming a Computer High level language program Assembly language program Machine language program

Compiler Design IIIT Kalyani, West Bengal 3 A Compiler A compiler is a program that accepts a source language program as input. It translates the input program and produces a target language program as output. The semantics or the meaning of the source language program is preserved. An example of source language is C++ and the target language is assembly or machine language of a class of processors e.g. x86-64.

Compiler Design IIIT Kalyani, West Bengal 4 Compiling a Compiler A compiler is also a program written in some language and compiled. The implementation language of a compiler may or may not be same as its source language. If the source language and the implementation language are the same, we are essentially compiling a new version of the compiler. A process known as bootstrapping.

Compiler Design IIIT Kalyani, West Bengal 5 C Program #include <stdio.h> int main() // first0.c { printf("my first program\n"); return 0; }

Compiler Design IIIT Kalyani, West Bengal 6 Assembly Language Program $ cc -Wall -S first0.c first0.s.file "first0.c".section.rodata.lc0:.string "My first program".text.globl main.type main, @function main:.lfb0:.cfi_startproc pushq %rbp

Compiler Design IIIT Kalyani, West Bengal 7.cfi_def_cfa_offset 16.cfi_offset 6, -16 movq %rsp, %rbp.cfi_def_cfa_register 6 movl $.LC0, %edi call puts movl $0, %eax popq %rbp.cfi_def_cfa 7, 8 ret.cfi_endproc.lfe0:.size main,.-main.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"

Compiler Design IIIT Kalyani, West Bengal 8.section.note.GNU-stack,"",@progbits

Compiler Design IIIT Kalyani, West Bengal 9 Object File $ cc -c first0.s first0.o $ objdump -d first0.o less 0000000000000000 <main>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: bf 00 00 00 00 mov $0x0,%edi 9: e8 00 00 00 00 callq e <main+0xe> e: b8 00 00 00 00 mov $0x0,%eax 13: 5d pop %rbp 14: c3 retq

Compiler Design IIIT Kalyani, West Bengal 10 Executable File $ cc first0.o a.out $ objdump -d a.out less 00000000004004f4 <main>: 4004f4: 55 push %rbp 4004f5: 48 89 e5 mov %rsp,%rbp 4004f8: bf fc 05 40 00 mov $0x4005fc,%edi 4004fd: e8 ee fe ff ff callq 4003f0 <puts@plt> 400502: b8 00 00 00 00 mov $0x0,%eax 400507: 5d pop %rbp 400508: c3 retq a.out file contains more code than this.

Compiler Design IIIT Kalyani, West Bengal 11 Using Software Interrupt: x86-64 #include <asm/unistd.h> #include <syscall.h> #define STDOUT_FILENO 1.file "first.s".section.rodata L1:.string "My First program\n" L2:.text.globl _start

Compiler Design IIIT Kalyani, West Bengal 12 _start: movl $(SYS_write), %eax # eax <-- 1 (write) # parameters to write movq $(STDOUT_FILENO), %rdi # rdi <-- 1 (stdout) movq $L1, %rsi # rsi <-- starting # address of string movq $(L2-L1), %rdx # rdx <-- L2 - L1 # string length syscall movl $(SYS_exit), %eax # software interrupt # user process requesting # OS for service # eax <-- 60 (exit) # parameters to exit

Compiler Design IIIT Kalyani, West Bengal 13 movq $0, %rdi # rdi <-- 0 syscall # software interrupt ret # return

Compiler Design IIIT Kalyani, West Bengal 14 Preprocessor, assembler and Linker $ /lib/cpp first.s first.s $ as -o first.o first.s $ ld first.o $./a.out My first program

Compiler Design IIIT Kalyani, West Bengal 15 Description/Specification of a Language Description of a well-formed program - syntax of a language. Description of the meaning of different constructs and their composition as a whole program - semantics of a language.

Compiler Design IIIT Kalyani, West Bengal 16 Description of Syntax Syntax of a programming language is specified and verified in two stages. 1. Identification of the tokens (atoms of different syntactic categories) from the character stream of a program. 2. Correctness of the syntactic structure of the program from the stream of tokens.

Compiler Design IIIT Kalyani, West Bengal 17 Description of Syntax Formal language specifications e.g. regular expression, formal grammar, automaton etc. are used to specify the syntax of a language.

Compiler Design IIIT Kalyani, West Bengal 18 Description of Syntax Regular language specification is used to specify different syntactic categories. Restricted subclass of the context-free grammar e.g. LL(1), LALR(1), or LR(1) are used to specify the structure of a syntactically correct program.

Compiler Design IIIT Kalyani, West Bengal 19 Note There are structural features of a programming language that are not specified by the grammar rules for efficiency reason and are handled differently.

Compiler Design IIIT Kalyani, West Bengal 20 Description of Meaning: Semantics Informal or semi-formal description by natural language and mathematical notations. Formal descriptions e.g. grammar rule with attributes, different formal specifications of semantics.

Compiler Design IIIT Kalyani, West Bengal 21 Users of Specification Programmer - often uses an informal description of the language construct. Implementer of the language translator for a target machine or language. People who want to verify a piece of program or who want to automate program writing (synthesis).

Compiler Design IIIT Kalyani, West Bengal 22 Source and Target A source language is usually a high-level language. But there are different types of high level languages. Imperative languages like C, object oriented languages like Java, functional languages like Haskell, logic programming languages like Prolog, languages for parallel and distributed programming etc.

Compiler Design IIIT Kalyani, West Bengal 23 Source and Target The target languages also have a wide spectrum. Assembly and machine languages of different architecture, some high-level language e.g. C, languages of virtual machines etc. Variation in machine architecture puts different code improvement demand on a compiler.

Compiler Design IIIT Kalyani, West Bengal 24 Front-end and Back-end The part of the compiler that analyzes the structure of the source program, extracts the semantic information and produces an internal representation of it is known as the front-end. The part that uses the semantic representation and synthesis the semantically equivalent target program is called the back-end.

Compiler Design IIIT Kalyani, West Bengal 25 Basic Phases of Compilation Read the program text - this is the most time consuming part as it involves I/O. Preprocessing - a phase before the actual compilation. It may involve inclusion (reading) of several files. Lexical analysis - identification of the syntactic symbols of the language and collecting their attributes.

Compiler Design IIIT Kalyani, West Bengal 26 Basic Phases of Compilation Syntax checking and static semantic analysis - the stream of token is parsed to form the parse tree or syntax tree. The semantic information is collected from the context and the syntax tree is annotated.

Compiler Design IIIT Kalyani, West Bengal 27 Basic Phases of Compilation Intermediate code generation and code improvement - language specific constructs are translated to more general and simple constructs. As an example a long expressions is broken down to expressions with fixed number of parameters. Different optimizations are performed on this intermediate code.

Compiler Design IIIT Kalyani, West Bengal 28 Basic Phases of Compilation Symbolic target code generation and architecture specific code improvement - the intermediate representation is translated to target code in symbolic form, and architecture specific code optimization is performed. Low level machine code generation.

Compiler Design IIIT Kalyani, West Bengal 29 Source Program Compiler Symbol Table etc. Compiler Preprocessor Lexical Analyzer String of characters (source program) Token String Parser and Static Semantics Analysis Annotated Syntax Tree Intermediate Code Gen. Intermediate Code Code Generator Assembly Language Program Assembler Rel Obj Module Linker Relocatable object files or library Exec Machine Code

Compiler Design IIIT Kalyani, West Bengal 30 source program Lexical Analyzer Token String Parser and Static Semantics Analizer Intermediate Code Gen. Annotated Syntax Tree Code Improvement Target Code Gen. Intermediate Code Intermediate Code Low level Symb. Code Code Improvement Low level Symb. Code Machine Code Gen. Relocatable object module or runnable machine code

Compiler Design IIIT Kalyani, West Bengal 31 Compiler and Interpreter In an idealistic situation the front-end of a compiler does not know anything of the target language. Its job is to transform the source program to intermediate representation. Similarly, the back-end is not aware of the source language. It works on the intermediate representation to generate the target code.

Compiler Design IIIT Kalyani, West Bengal 32 Compiler and Interpreter If the intermediate representation of the source program and the input data to it is available, the compiler can perform the action on the data specified by the source program using the intermediate representation. There is no need to generate the target code This type of compiler is called an interpreter.

Compiler Design IIIT Kalyani, West Bengal 33 Scanner or Lexical Analyzer A scanner or lexical analyzer breaks the program text (string of ASCII characters) into the alphabet of the language (into syntactic categories) called a tokens. A token is a symbol of the alphabet of the language. It may be encoded as a number. A token may have one or more attributes.

Compiler Design IIIT Kalyani, West Bengal 34 An Example Consider the following C function. double CtoF(double cel) { return cel * 9 / 5.0 + 32 ; }

Compiler Design IIIT Kalyani, West Bengal 35 Scanner or Lexical Analyzer The scanner uses the finite automaton model to identify different tokens. Software are available that takes the specification of the tokens (elements of different syntactic categories) in the form of regular expressions and generates a program that works as the scanner. The process is completely automated.

Compiler Design IIIT Kalyani, West Bengal 36 Scanner or Lexical Analyzer A syntax analyzer does not differentiate between different identifiers or different integer constants. So they are identified as identifier token and integer token. But the actual values are preserved as attributes for use in subsequent phases.

Compiler Design IIIT Kalyani, West Bengal 37 Syntactic Category, Token and Attribute String Type Token Attribute double keyword 302 CtoF identifier 401 CtoF ( delimiter 40 double keyword 302 cel identifier 401 cel ) delimiter 41 { delimiter 123 return keyword 315

Compiler Design IIIT Kalyani, West Bengal 38 String Type Token Attribute cel identifier 401 cel * operator 42 9 int-numeral 504 9 / operator 47 5.0 double-numeral 507 5.0 + operator 43 32 int-numeral 504 32 ; delimiter 59 } delimiter 125

Compiler Design IIIT Kalyani, West Bengal 39 Parser or Syntax Analyzer A parser or syntax analyzer checks whether the token string generated by the scanner, forms a valid program. It uses a restricted class of context-free grammars to specify the language constructs.

Compiler Design IIIT Kalyani, West Bengal 40 Context-Free Grammar function-definition decl-spec decl comp-stat decl-spec type-spec type-spec double decl d-decl d-decl ident ident ( par-list ) par-list par-dcl par-dcl decl-spec decl

Compiler Design IIIT Kalyani, West Bengal 41 Expression Grammar E E +E E E E E E/E (E) E var float-cons int-cons

Compiler Design IIIT Kalyani, West Bengal 42 Parse Tree

Compiler Design IIIT Kalyani, West Bengal 43 ret statement <315> return E E <43> E <59> ; E <47> E + <504, 32> E <42> E / <507, 5.0> 32 * <401, "cel"> <504, 9> 5.0 cel 9

Compiler Design IIIT Kalyani, West Bengal 44 Abstract Syntax Tree The parse tree generated during syntax checking is not suitable a representation for further processing. A modified form, known as abstract syntax tree (AST) is created. Semantic information are stored in the nodes of AST (annotated AST) as attributes. It is used for intermediate code generation, error checking and code improvement.

Compiler Design IIIT Kalyani, West Bengal 45 Abstract Syntax Tree (AST) ret statement + / 32 * 5.0 cel 9

Compiler Design IIIT Kalyani, West Bengal 46 Annotated AST ret statement <float, symtab temp3> + <float, symtab temp2> / 32 <int, const 32> <float, symtab temp1> * 5.0 <float, const 5.0> <float, symtab b> cel 9 <int, const 9>

Compiler Design IIIT Kalyani, West Bengal 47 Symbol Table The compiler maintains an important data structure called the symbol table to store variety of names and their attributes it encounters. A few examples are - variables, named constants, function names, type names, labels etc.

Compiler Design IIIT Kalyani, West Bengal 48 Semantic Analysis The symbol table corresponding to the function CtoF should have an entry for the variable cel with its type and other information. The constant 9 is of type int. It is to be converted to 9.0 of type double before it can be multiplied with cel. Similar is the case for 32.

Compiler Design IIIT Kalyani, West Bengal 49 Semantic Analysis It is not enough to know that x = x + 5; is syntactically correct. The operation is meaningful only if the variable x is declared, it is a number, and not a string etc. Even if x is a number, the generated code will be different depending on whether it is an int, float or a pointer.

Compiler Design IIIT Kalyani, West Bengal 50 Intermediate Code Generation The language specific AST is translated into more general constructs known as intermediate code e.g. 3-address code. The form of the intermediate code should be suitable for code improvement and target code generation.

Compiler Design IIIT Kalyani, West Bengal 51 Intermediate Code Generation A while-loop in a C-like language may be replaced by test, and conditional and unconditional jumps. A compiler may use more than one intermediate representations for different phases.

Compiler Design IIIT Kalyani, West Bengal 52 Intermediate Code ret statement<return v5> <v4=(double) 32> <v5=v3 +_d v4> + <v3=v2 /_d 5.0> / 32 <v1=(double) 9> <v2=cel *_d v1> * 5.0 cel 9 *_d is multiplication for double

Compiler Design IIIT Kalyani, West Bengal 53 Intermediate Code param cel v1 = (double) 9 # compile time v2 = cel * d v1 v3 = v2/ d 5.0 v4 = (double)32 # compile time v5 = v3 + d v4 return v5

Compiler Design IIIT Kalyani, West Bengal 54 Note v1, v2, v3, v4, v5 are called virtual registers. Finally they will be mapped to actual registers or memory locations. The distinct names of the virtual registers help the compiler to improve the code.

Compiler Design IIIT Kalyani, West Bengal 55 GCC IC - GIMPLE $ cc -Wall -fdump-tree-gimple -S ctof.c CtoF (double cel) { double D.1796; } _1 = cel * 9.0e+0; _2 = _1 / 5.0e+0; D.1796 = _2 + 3.2e+1; return D.1796;

Compiler Design IIIT Kalyani, West Bengal 56 Raw GIMPLE Code $ cc -Wall -fdump-tree-gimple-raw -S ctof.c CtoF (double cel) gimple_bind < double D.1796; > gimple_assign <mult_expr, _1, cel, 9.0e+0, NULL> gimple_assign <rdiv_expr, _2, _1, 5.0e+0, NULL> gimple_assign <plus_expr, D.1796, _2, 3.2e+1, NULL> gimple_return <D.1796 NULL>

Compiler Design IIIT Kalyani, West Bengal 57 Intermediate Code Improvement Different code improvement transformations are performed on the intermediate code. A few examples are constant propagation, constant folding, strength reduction, copy propagation, elimination of common sub-expression,, code in-lining etc.

Compiler Design IIIT Kalyani, West Bengal 58 Target Code Generation Program variables and temporary variables are allocated to memory and registers. Translates the intermediate code to a target language code e.g. sequence of assembly language instructions of a machine.

Compiler Design IIIT Kalyani, West Bengal 59 Target Code Improvement The sequence of target code e.g. assembly language code of an architecture may be modified to improve the quality of code. It may replace a sequence of instructions by a better sequence to make the code faster on an architecture.

Compiler Design IIIT Kalyani, West Bengal 60 64-bit Intel Code.file "ctof.c".text.globl CtoF.type CtoF, @function CtoF:.LFB2: pushq %rbp # save old base pointer.lcfi0: movq %rsp, %rbp # rbp <-- rsp new base pointe.lcfi1: movsd %xmm0, -8(%rbp) # cel <-- xmm0 (parameter) movsd -8(%rbp), %xmm1 # xmm1 <-- cel

Compiler Design IIIT Kalyani, West Bengal 61 # addressing of read-only dat movsd.lc0(%rip), %xmm0 # xmm0 <--- 9.0, PC relative mulsd %xmm0, %xmm1 # xmm1 <-- cel*9.0 movsd.lc1(%rip), %xmm0 # xmm0 <-- 5.0 divsd %xmm0, %xmm1 # xmm1 <-- cel*9.0/5.0 movsd.lc2(%rip), %xmm0 # xmm0 <-- 32.0 addsd %xmm1, %xmm0 # xmm0 <-- cel*9.0/5/0+32.0 # return value in xmm0 leave ret.lfe2:.size CtoF,.-CtoF.section.rodata.align 8

Compiler Design IIIT Kalyani, West Bengal 62.LC0:.long 0.long 1075970048.align 8.LC1:.long 0.long 1075052544.align 8.LC2:.long 0.long 1077936128

Compiler Design IIIT Kalyani, West Bengal 63 9.0 in IEEE-754 Double Prec. 63 0 1000 0000 010 0010 0000 0000 0000 0000 31 0000 0000 0000 0000 0000 0000 0000 0000 Exponent Bias: 1023, Actual exponent: 1026 1023 = 3. 9.0 10 = 1001.0 2 = 1.001 2 3.

Compiler Design IIIT Kalyani, West Bengal 64 9.0 and.lc0 Interpreted as integer we have the higher order 32-bits as 2 30 +2 21 +2 17 = 1075970048 and the lower order 32-bits as 0. In the little-endian (lsb) data storage, lower bytes comes first.

Compiler Design IIIT Kalyani, West Bengal 65 9.0 and.lc0.lc0:.align 8.long 0.long 1075970048 is 9.0.

Compiler Design IIIT Kalyani, West Bengal 66 Improved Code $ cc -Wall -S -O2 ctof.c.file "ctof.c".text.p2align 4,,15.globl CtoF.type CtoF, @function CtoF:.LFB2: mulsd.lc0(%rip), %xmm0 divsd.lc1(%rip), %xmm0 addsd.lc2(%rip), %xmm0 ret.lfe2:

Compiler Design IIIT Kalyani, West Bengal 67.size CtoF,.-CtoF.section.rodata.cst8,"aM",@progbits,8.align 8.LC0:.long 0.long 1075970048.align 8.LC1:.long 0.long 1075052544.align 8.LC2:.long 0.long 1077936128