Compiler, Assembler, and Linker

Similar documents
Language Translation. Compilation vs. interpretation. Compilation diagram. Step 1: compile. Step 2: run. compiler. Compiled program. program.

Linking and Loading. ICS312 - Spring 2010 Machine-Level and Systems Programming. Henri Casanova


Gechstudentszone.wordpress.com

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILING

UNIT I - INTRODUCTION

CS 4201 Compilers 2014/2015 Handout: Lab 1

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

COMP455: COMPILER AND LANGUAGE DESIGN. Dr. Alaa Aljanaby University of Nizwa Spring 2013

The Assembly Language Level. Chapter 7

Formal Languages and Compilers Lecture I: Introduction to Compilers


COMPILER DESIGN. For COMPUTER SCIENCE

Sardar Vallabhbhai Patel Institute of Technology (SVIT), Vasad M.C.A. Department COSMOS LECTURE SERIES ( ) (ODD) Code Optimization

SPOS MODEL ANSWER MAY 2018

COMPILER DESIGN LECTURE NOTES

CIT 595 Spring System Software: Programming Tools. Assembly Process Example: First Pass. Assembly Process Example: Second Pass.

1. The output of lexical analyser is a) A set of RE b) Syntax Tree c) Set of Tokens d) String Character

Chapter 3 Loaders and Linkers

Chapter 3 Loaders and Linkers

Computer Hardware and System Software Concepts

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

A Simple Syntax-Directed Translator

SYSTEMS PROGRAMMING. Srimanta Pal. Associate Professor Indian Statistical Institute Kolkata OXFORD UNIVERSITY PRESS

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Introduction to Lexical Analysis

CS2304-SYSTEM SOFTWARE 2 MARK QUESTION & ANSWERS. UNIT I INTRODUCTION

Chapter 2. Assembler Design

Have examined process Creating program Have developed program Written in C Source code

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

CST-402(T): Language Processors

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

AC59/AT59 OPERATING SYSTEMS & SYSTEMS SOFTWARE DECEMBER 2012

The A ssembly Assembly Language Level Chapter 7 1

PSD1C SYSTEM SOFTWAE UNIT: I - V PSD1C SYSTEM SOFTWARE

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.

P.G.TRB - COMPUTER SCIENCE. c) data processing language d) none of the above

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

SOFTWARE ARCHITECTURE 5. COMPILER

Overview of Compiler. A. Introduction

Life Cycle of Source Program - Compiler Design

Dixita Kagathara Page 1

Principles of Compiler Design

The Software Stack: From Assembly Language to Machine Code

Software II: Principles of Programming Languages

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Formats of Translated Programs

(Extract from the slides by Terrance E. Boult

4. An interpreter is a program that

Lec 13: Linking and Memory. Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University. Announcements

Compilers and Interpreters

Compiler Design. Subject Code: 6CS63/06IS662. Part A UNIT 1. Chapter Introduction. 1.1 Language Processors

Compiler Design (40-414)

SEM / YEAR : VI / III CS2352 PRINCIPLES OF COMPLIERS DESIGN UNIT I - LEXICAL ANALYSIS PART - A

Principles of Programming Languages COMP251: Syntax and Grammars

What do Compilers Produce?

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Introduction. Interpreters High-level language intermediate code which is interpreted. Translators. e.g.

MAHARASHTRA STATE BOARD OF TECHNICAL EDUCATION (Autonomous) (ISO/IEC Certified) SUMMER-16 EXAMINATION Model Answer

Principles of Compiler Design

MAHARASHTRA STATE BOARD OF TECHNICAL EDUCATION (Autonomous) (ISO/IEC Certified) WINTER-15 EXAMINATION Model Answer Paper

2) Save the Macro definition:- The processor must store the macro instruction definitions which it will need for expanding macro calls.

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

Pioneering Compiler Design

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

LANGUAGE TRANSLATORS

Chapter 2 A Quick Tour

GUJARAT TECHNOLOGICAL UNIVERSITY

Advanced Programming & C++ Language

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

Errors During Compilation and Execution Background Information

COMPILER DESIGN LEXICAL ANALYSIS, PARSING

CSCI341. Lecture 22, MIPS Programming: Directives, Linkers, Loaders, Memory

Lexical Analysis. Chapter 2

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

AS-2883 B.Sc.(Hon s)(fifth Semester) Examination,2013 Computer Science (PCSC-503) (System Software) [Time Allowed: Three Hours] [Maximum Marks : 30]

Crafting a Compiler with C (II) Compiler V. S. Interpreter

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

Compiling and Interpreting Programming. Overview of Compilers and Interpreters

UNIT II ASSEMBLERS. Figure Assembler

CS131: Programming Languages and Compilers. Spring 2017

CSE2421 Systems1 Introduction to Low-Level Programming and Computer Organization


Chapter 3 Loaders and Linkers

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

UNIT III LOADERS AND LINKERS

e-pg Pathshala Subject : Computer Science Paper: Embedded System Module: Embedded Software Development Tools Module No: CS/ES/36 Quadrant 1 e-text

A simple syntax-directed

When do We Run a Compiler?

Compiler Design. Lecture 1

CMPT 379 Compilers. Anoop Sarkar.

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

UNIT 1: MACHINE ARCHITECTURE

A software view. Computer Systems. The Compilation system. How it works. 1. Preprocesser. 1. Preprocessor (cpp)

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano

Loaders. Systems Programming. Outline. Basic Loader Functions

Unit 2 -- Outline. Basic Assembler Functions Machine-dependent Assembler Features Machine-independent Assembler Features Assembler Design Options

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

Laboratorio di Programmazione. Prof. Marco Bertini

Transcription:

Compiler, Assembler, and Linker Minsoo Ryu Department of Computer Science and Engineering Hanyang University msryu@hanyang.ac.kr

What is a Compilation? Preprocessor Compiler Assembler Linker Loader Contents 2 2

What is a Compilation? A compilation is a translation process that reads a program written in one language--the source language--and translate it into an equivalent program in another language--the target language Source Program Compilation Target Program 3 3

The Complete Process Preprocessor file inclusion macro expansion Compiler Assembler source program (.c) lexical analysis syntactic analysis semantic analysis target assembly program (.s) intermediate code code optimization code generation translation and SYMTAB construction assembly into machine code Archiver relocatable machine code (.o) relocatable library (.a) Linker concatenation, reference resolving (relocation) machine code (a.out) Loader loading program into memory (optionally with relocation) 4 4

What is a Compilation? Preprocessor Compiler Assembler Linker Loader Contents 5 5

Preprocessor Inclusion Copies the full content of a included file into the current file at the point at which the directive occurs Eg.) #include directive in C Macros Replace each macro call, in-line, with the corresponding macro definition E.g.) #define directive in C Conditional compilation (include or macro guard) In combination with macros, remove code fragments that need not be compiled E.g) #ifdef, #ifndef, #endif directives in C Other compiler directives #pragma once (to ensure single inclusion) 6 6

Notes on #pragma The #pragma directives offer a way for each compiler to offer machine- and operating system-specific features while retaining overall compatibility with the C and C++ languages Pragmas are machine- or operating system-specific by definition, and are usually different for every compiler Each implementation of C and C++ supports some features unique to its host machine or operating system Some programs, for instance, need to exercise precise control over the memory areas where data is placed or to control the way certain functions receive parameters 7 7

What is a Compilation? Preprocessor Compiler Assembler Linker Loader Contents 8 8

The Complete Process Preprocessor file inclusion macro expansion Compiler Assembler source program (.c) lexical analysis syntactic analysis semantic analysis target assembly program (.s) intermediate code code optimization code generation translation and SYMTAB construction assembly into machine code Archiver relocatable machine code (.o) relocatable library (.a) Linker concatenation, reference resolving (relocation) machine code (a.out) Loader loading program into memory (optionally with relocation) 9 9

Compiler Analysis Lexical analysis Syntactic analysis Semantic analysis Synthesis Intermediate code generation Code optimization Code generation 10 10

Lexical Analysis Lexical analysis is also known as linear addressing or scanning Lexical analysis; Reads the source program from left-to-right Identifies the lexemes in the input list of characters and categorize them into tokens Lexeme ( 어휘 ) and lexicon ( 어휘집합 ) A token is a syntactic category Token = lexeme s type (+ its value) In English: noun, verb, adjective, In a programming language: Identifier, Integer, Keyword, 11 11

Lexical Analysis if (i == j) z = 0; else z = 1; read \tif (i == j)\n\t\tz = 0;\n\telse\n\t\tz = 1; tokenize (Whitespace, \t ) (Keyword, if ) (OpenPar, ( ) (Identifier, i ) (Relation, == ) (Identifier, j ) 12 12

Syntactic Analysis Syntactic analysis is also known as parsing Syntactic analysis; Analyzes a sequence of tokens to determine grammatical structure with respect to a given formal grammar Checks for syntax errors at the same time If there exists no syntax error, the result is often represented by a parse tree 13 13

Syntactic Analysis Expr 3*5+8 parse Expr Op Expr Expr Op Expr + 8 3 * 5 14 14

Semantic Analysis Semantic analysis performs; Semantic checks for; Types (checking for type errors) Scopes (variables declared before use, ODR violations) Construct symbol tables Variable names and function names + information Others Object binding (associating variable and function references with their definitions) Definite assignment (requiring all local variables to be initialized before use) Rejecting incorrect programs or issuing warnings Semantic analysis usually requires a complete parse tree, meaning that this phase logically follows the parsing phase 15 15

Semantic Analysis (Symbol Tables) 16 16

Semantic Analysis (Scope Checking) 17 17

Intermediate Code Generation and Code Optimization Intermediate code generation Generates an explicit inter-mediate representation of the source program Often emulates a virtual (RISC) processor that provides a simple instruction set architecture Code optimization Attempts to improve the intermediate code, so that faster-running machine code will result 18 18

Code generation Code Generation Generates target code, consisting normally of relocatable machine code or assembly code Memory locations are selected for each of the variables used by the program Then, intermediate instructions are each translated into a sequence of machine instructions that perform the same task A crucial aspect is the assignment of variables to registers 19 19

What is a Compilation? Preprocessor Compiler Assembler Linker Loader Contents 20 20

The Complete Process Preprocessor file inclusion macro expansion Compiler Assembler source program (.c) lexical analysis syntactic analysis semantic analysis target assembly program (.s) intermediate code code optimization code generation translation and SYMTAB construction assembly into machine code Archiver relocatable machine code (.o) relocatable library (.a) Linker concatenation, reference resolving (relocation) machine code (a.out) Loader loading program into memory (optionally with relocation) 21 21

Assembler; Assembler Converts mnemonic operation codes to their machine language codes Converts symbolic (e.g., jump labels, variable names) operands to their machine addresses Uses proper addressing modes and formats to build efficient machine instructions Translates data constants into internal machine representations Outputs the object program and provide other information (e.g., for linker and loader) 22 22

Assembler 23 23

Assembler Line ILC Source Statement Machine Code 24 24

Pass 1 Two Pass Assembler Assign addresses to all statements in the program Save the values (addresses) assigned to all labels (including label and variable names) for use in Pass 2 (deal with forward references) Perform some processing of assembler directives that can affect address assignment Pass 2 Assemble instructions (generate opcode and look up addresses) Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing 25 25

Two Pass Assembler Source program READ (Label, opcode, operand) Pass 1 Pass 2 Object codes OPTAB SYMTAB SYMTAB Mnemonic and opcode mappings are referenced from here Label and address mappings enter here Label and address mappings are referenced from here 26 26

Content Operation Code Table (OPTAB) The mapping between mnemonic and machine code Also include the instruction format, available addressing modes, and length information Characteristic Static table (the content will never change) Implementation Array or hash table because the content will never change, we can optimize its search speed In pass 1, OPTAB is used to look up and validate mnemonics in the source program In pass 2, OPTAB is used to translate mnemonics to machine instructions 27 27

Instruction Location Counter (ILC) This variable can help in the assignment of addresses It is initialized to the beginning address specified in the START statement After each source statement is processed, the length of the assembled instruction and data area to be generated is added to LOCCTR Thus, when we reach a label in the source program, the current value of LOCCTR gives the address to be associated with that label 28 28

Symbol Table (SYMTAB) Content Include the label name and value (address) for each label in the source program Include type and length information With flag to indicate errors (e.g., a symbol defined in two places) Characteristic Dynamic table (symbols may be inserted, deleted, or searched in the table) Implementation Hash table can be used to speed up search Because variable names may be very similar (e.g., LOOP1, LOOP2), the selected hash function must perform well with such non-random keys 29 29

Symbol Table (SYMTAB) Hash coding of symbol tables 30 30

Object Code An object code in the end contains the following information: A header that says where in the files the sections below are locates A (concatenated) text segment, which contains all the source code (with some missing addresses) A (concatenated) data segment (which may combine the data and the bss segments) Relocation Table: identifies lines of code that need to be fixed Symbol Table: list of this file s referenceable labels Functions and global variables Perhaps debugging information (is compiled with -g from a high-level programming language) Source code line numbers, etc. 31 31

Object File Format Object files are often called "binaries It is common for the same file format to be used both as linker input and output, and thus as the library and executable file format There are many different object file formats; COFF and ELF for UNIX COM for DOS (the simplest format) Portable Executable (PE) for Windows For executables, object code, and DLLs, used in 32-bit and 64-bit versions of Windows operating systems Used for EXE, DLL, OBJ, SYS 32 32

What is a Compilation? Preprocessor Compiler Assembler Linker Loader Contents 33 33

The Complete Process Preprocessor file inclusion macro expansion Compiler Assembler source program (.c) lexical analysis syntactic analysis semantic analysis target assembly program (.s) intermediate code code optimization code generation translation and SYMTAB construction assembly into machine code Archiver relocatable machine code (.o) relocatable library (.a) Linker concatenation, reference resolving (relocation) machine code (a.out) Loader loading program into memory (optionally with relocation) 34 34

Linker Linker is also known as linkage editor Three steps Step 1: concatenate all the text segments from all the.o files Step 2: concatenate all the data/bss segments from all the.o files Step 3: Resolve references Use the relocation tables and the symbol tables to compute all absolute addresses 35 35

Linker The linker knows The length of each text and data segment The order in which they are Therefore the linker can compute an absolute address for each label assuming the beginning of the executable file is at address 0x0 For each label being referenced (that is for each line of code that s pointed to by the relocation table), find where it is defined In the symbol table of a.o file In some specified or standard library file (e.g., fprintf) If not found, print a symbol not found error message and abort If found in multiple tables, print a multiply defined error message and abort If found in exactly one table, replace the label by an absolute address 36 36

Linker 37 37

Loader With a linked executable, all addresses are known so that the program can run To actually run the program we need to use a loader, which is part of the O/S The loader does the following: Read the executable file s header to find out the size of the text and data segments Creates a new address space for the program that is large enough to hold the text and data segments, and to hold the stack (within some bounds) Copies the text and data segments into the address space Copies arguments passed to the program on the stack Initializes the registers Jump to a standard start up routine, which sets the PC and calls the exit() system call when the program terminates 38 38