The So'ware Stack: From Assembly Language to Machine Code

Similar documents
The Software Stack: From Assembly Language to Machine Code

Loops and Locality. with an introduc-on to the memory hierarchy. COMP 506 Rice University Spring target code. source code OpJmizer

The Processor Memory Hierarchy

Naming in OOLs and Storage Layout Comp 412

Intermediate Representations

The ILOC Virtual Machine (Lab 1 Background Material) Comp 412

Introduction to OS Processes in Unix, Linux, and Windows MOS 2.1 Mahmoud El-Gayyar

Optimizer. Defining and preserving the meaning of the program

Intermediate Representations

Carnegie Mellon. Processes. Lecture 12, May 19 th Alexandre David. Credits to Randy Bryant & Dave O Hallaron from Carnegie Mellon

Just-In-Time Compilers & Runtime Optimizers

Instruction Selection: Preliminaries. Comp 412

Code Shape Comp 412 COMP 412 FALL Chapters 4, 5, 6 & 7 in EaC2e. source code. IR IR target. code. Front End Optimizer Back End

Handling Assignment Comp 412

OS lpr. www. nfsd gcc emacs ls 1/27/09. Process Management. CS 537 Lecture 3: Processes. Example OS in operation. Why Processes? Simplicity + Speed

by Marina Cholakyan, Hyduke Noshadi, Sepehr Sahba and Young Cha

CS415 Compilers Register Allocation. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Fall 2015 COMP Operating Systems. Lab #3

CSci 4061 Introduction to Operating Systems. Processes in C/Unix

Runtime Support for OOLs Object Records, Code Vectors, Inheritance Comp 412

Arrays and Functions

Procedure and Function Calls, Part II. Comp 412 COMP 412 FALL Chapter 6 in EaC2e. target code. source code Front End Optimizer Back End

Main Points. Address Transla+on Concept. Flexible Address Transla+on. Efficient Address Transla+on

CS415 Compilers. Intermediate Represeation & Code Generation

Machine Language Instructions Introduction. Instructions Words of a language understood by machine. Instruction set Vocabulary of the machine

Local Optimization: Value Numbering The Desert Island Optimization. Comp 412 COMP 412 FALL Chapter 8 in EaC2e. target code

Operating System Structure

CSC 1600 Unix Processes. Goals of This Lecture

www nfsd emacs lpr Process Management CS 537 Lecture 4: Processes Example OS in operation Why Processes? Simplicity + Speed

Runtime Support for Algol-Like Languages Comp 412

Lec 13: Linking and Memory. Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University. Announcements

Lecture 4: Process Management

Register Allocation. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Reading Assignment 4. n Chapter 4 Threads, due 2/7. 1/31/13 CSE325 - Processes 1

University of Washington What is a process?

Code Shape II Expressions & Assignment

OS lpr. www. nfsd gcc emacs ls 9/18/11. Process Management. CS 537 Lecture 4: Processes. The Process. Why Processes? Simplicity + Speed

Main Memory. CISC3595, Spring 2015 X. Zhang Fordham University

UNIT V: CENTRAL PROCESSING UNIT

Generating Code for Assignment Statements back to work. Comp 412 COMP 412 FALL Chapters 4, 6 & 7 in EaC2e. source code. IR IR target.

Compiler, Assembler, and Linker

CS61 Scribe Notes Date: Topic: Fork, Advanced Virtual Memory. Scribes: Mitchel Cole Emily Lawton Jefferson Lee Wentao Xu

Computing Inside The Parser Syntax-Directed Translation. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

Operating Systems. II. Processes

Computing Inside The Parser Syntax-Directed Translation. Comp 412

ECE 498 Linux Assembly Language Lecture 1

Computing Inside The Parser Syntax-Directed Translation, II. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

Processes. Operating System CS 217. Supports virtual machines. Provides services: User Process. User Process. OS Kernel. Hardware

Fall 2017 :: CSE 306. Process Abstraction. Nima Honarmand

The Procedure Abstraction

Preview. Process Control. What is process? Process identifier The fork() System Call File Sharing Race Condition. COSC350 System Software, Fall

UNIVERSITY OF NEBRASKA AT OMAHA Computer Science 4500/8506 Operating Systems Fall Programming Assignment 1 (updated 9/16/2017)

CSE351 Inaugural Edi7on Spring

Building a Parser Part III

Instruction Selection, II Tree-pattern matching

PROCESS VIRTUAL MEMORY PART 2. CS124 Operating Systems Winter , Lecture 19

Process Management! Goals of this Lecture!

Getting to know you. Anatomy of a Process. Processes. Of Programs and Processes

CS 550 Operating Systems Spring Process II

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

CS415 Compilers. Procedure Abstractions. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Operating Systems CMPSC 473. Process Management January 29, Lecture 4 Instructor: Trent Jaeger

Intermediate Representations Part II

Operating Systems. 09. Memory Management Part 1. Paul Krzyzanowski. Rutgers University. Spring 2015

What is a Process? Processes and Process Management Details for running a program

Operating systems and concurrency - B03

Parsing II Top-down parsing. Comp 412

Implementing Control Flow Constructs Comp 412

CS 111. Operating Systems Peter Reiher

PROCESS CONTROL: PROCESS CREATION: UNIT-VI PROCESS CONTROL III-II R

Protection and System Calls. Otto J. Anshus

CISC Processor Design

Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit

16 Sharing Main Memory Segmentation and Paging

CS 537 Lecture 2 - Processes

Physical memory vs. Logical memory Process address space Addresses assignment to processes Operating system tasks Hardware support CONCEPTS 3.

Linking and Loading. ICS312 - Spring 2010 Machine-Level and Systems Programming. Henri Casanova

OS Structure, Processes & Process Management. Don Porter Portions courtesy Emmett Witchel

The UtePC/Yalnix Memory System

COE518 Lecture Notes Week 2 (Sept. 12, 2011)

PROCESS CONTROL BLOCK TWO-STATE MODEL (CONT D)

The Kernel Abstraction. Chapter 2 OSPP Part I

Separate compilation. Topic 6: Runtime Environments p.1/21. CS 526 Topic 6: Runtime Environments The linkage convention

Operating Systems and Networks Assignment 2

Part II Processes and Threads Process Basics

What is a Process. Preview. What is a Process. What is a Process. Process Instruction Cycle. Process Instruction Cycle 3/14/2018.

Address spaces and memory management

Introduction to Optimization Local Value Numbering

Rui Wang, Assistant professor Dept. of Information and Communication Tongji University.

Process Management 1

Processes. CS 416: Operating Systems Design, Spring 2011 Department of Computer Science Rutgers University

Excep&onal Control Flow: Excep&ons and Processes

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 20

Syntax Analysis, III Comp 412

A process. the stack

OS Interaction and Processes

(In columns, of course.)

CS 111. Operating Systems Peter Reiher

Memory management. Requirements. Relocation: program loading. Terms. Relocation. Protection. Sharing. Logical organization. Physical organization

CS240: Programming in C

Transcription:

Taken from COMP 506 Rice University Spring 2017 The So'ware Stack: From Assembly Language to Machine Code source code IR Front End OpJmizer Back End IR target code Somewhere Out Here Copyright 2017, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 506 at Rice University have explicit permission to make copies of these materials for their personal use. Faculty from other educajonal insjtujons may use these materials for nonprofit educajonal purposes, provided this copyright nojce is preserved

The Hardware Execu7on Model Computers execute individual opera7ons, encoded into binary form Basic cycle of execujon is: (fetch, decode, execute)* Fetch brings an operajon from memory into the processor s decode unit Decode breaks the operajon into its fields, based on the opcode, & sets up the operajon s execujon writes values to appropriate control registers Execute clocks the processor s funcjonal unit to carry out the computajon Each opera;on has a fixed format A processor will support several formats Opcode determines format People are not par7cularly good at reading & wri7ng binary opera7ons ProducJvity & error rate are be[er with higher level of abstracjon People need a tool to translate some symbolic representajon to binary We invented assemblers to perform that translajon opcode reg 1 reg 2 reg 2 or constant 0 10 15 20 31 op constant 0 1 31 COMP Bits in 412, the Supplemental instrucjon format Materials are a serious architectural limitajon 2

Symbolic Assembly Code The ILOC that we have seen in examples is a symbolic assembly code for a simplified fic7onal RISC processor ILOC Code: Symbolic names for operajons load, store, add, mult, jump, Labels for addresses @a, @b, @c,... Register names specified with integers r 1, r 2, r 3,... Of course, the processor would not recognize all these symbolic names. Real processors run from code expressed in binary form. load @b r 1 load @c r 2 mult r 1,r 2 r 3 load @d r 4 add r 3,r 4 r 5 store r 5 @a load @f r 6 add r 5,r 6 r 7 store r 7 @e Symbolic assembly code must be translated before it can execute. COMP 412, Supplemental Materials 3

Symbolic Assembly Code The ILOC that we have seen in examples is a symbolic assembly code for a simplified fic7onal RISC processor ILOC Code: Symbolic names for operajons load, store, add, mult, jump, Labels for addresses @a, @b, @c,... Register names specified with integers r 1, r 2, r 3,... ILOC has symbolic labels and virtual registers load @b r 1 load @c r 2 mult r 1,r 2 r 3 load @d r 4 add r 3,r 4 r 5 store r 5 @a load @f r 6 add r 5,r 6 r 7 store r 7 @e Machine code needs addresses and physical registers Compiler maps virtual registers to physical registers Job of the compiler s regsiter allocator Soaware stack (assembler, linker, loader) converts labels to virtual addresses COMP 412, Supplemental Materials 4

Symbolic Assembly Code assembly code Assembler object file In assembly code, the fields in an opera7on are alphanumeric symbols Opcodes are represented with mnemonics character strings add instead of 0110 Registers and constants are wri[en using base 10 or base 16 numbers r15 instead of 01111 To prepare an assembly program for execu7on, it must be translated Mnemonics map directly to opcodes bit pa[erns Symbolic labels map directly to addresses bit pa[erns Base 10 numbers convert to base 2 numbers bit pa[erns For the most part, these translajons are simple and direct COMP 412, Supplemental Materials 5

Building an Assembler assembly code Assembler object file What does an object file contain? A list of opera?ons First operajon (almost always) has an exported label Complete operajons are in final binary form OperaJons with a symbolic reference have a hole & an index into a symbol table A symbol table Each symbol defined in the module has a symbol name, a type, and an offset from the start of the module Each imported symbol has a symbol name, a storage class, and a list of operajons in the module that reference this symbol Storage class might be stajc or dynamic, local or global COMP 412, Supplemental Materials 6

Building an Assembler assembly code Assembler object file Assemblers convert assembly code to object code Typical assemblers operate in two passes Pass 1 builds up a symbol table that maps names to addresses Use a hash table to record the map Linear pass over the input to find symbols & record <name, address> pairs Pass 2 rewrites the operajons into binary form All symbols should be defined aaer pass 1 Linear pass over the input to rewrite operajons into binary form Symbolic references marked with their name and table entry At the end, the assembler writes out an object file COMP 412, Supplemental Materials 7

Building an Assembler The algorithm for Pass 1 is simple IniJalize a counter to zero IniJalize an empty map (a hash table 1 ) At each operajon, from the first to the last If the operajon is labeled, enter the label in the map with the current counter Enter any undefined labels used as operands in the map, with an invalid counter Compute the length of the current operajon Increment the counter by the operajon s length Iterate over the map If any symbol has an invalid counter, mark it as an external symbol or signal error Choice of ac;on depends on the rules of the par;cular assembly language Pass 1 should operate in O( operajons ) 7me Number of symbols is O( operajons ) (if hash lookup is O(1)) 1 COMP See discussion 412, Supplemental of hash Materials tables in Chapter 6 and Appendix B in EaC2e 8

Building an Assembler Pass 2 iterates over the input again and rewrites it Pass 1 resolved all the symbols Pass 2 constructs the object code For each opera7on Mnemonic becomes a bit pa[ern for the opcode Opcode determines instrucjon format Operands become bit fields The excepjon is a reference to an externally defined symbol Replace externally defined symbol with a reference into table of such symbols Format for that reference and table is a system-wide convenjon Write out the finished binary operajon At the end, write out the symbol table informa7on Again, it is all O( operajons ) (if hash lookup is O(1)) COMP 412, Supplemental Materials 9

Efficient Building an Assembler assembly code Assembler object code An assembler can do most of its transla7on in the first pass Can translate most operajons directly in pass 1 E.g., add r1, r2 => r3 contains all the informajon that it needs Excep7on arises from a reference to a symbol that is not yet defined When the assembler finds an (as yet) undefined label, it can: Enter that symbol into the symbol table and mark it as undefined Add the current operajon to a list of operajons containing symbolic references It can process any operajon with an (already) defined label At the end of pass 1, the assembler can: Traverse the list of unfinished instrucjons Translate them in place Aaer that half pass, it can output the object code COMP 412, Supplemental Materials 10

Assembler Pseudo-Opera7ons Almost all assemblers provide pseudo-opera7ons to manipulate layout Pseudo-operaJons define storage, define labels, inijalize space, and mark symbols as external To create space for a global array A, define storage for it Label that pseudo-operajon with a known label, say _@A The pseudo-operajons serves two funcjons It advances the assembler s internal counter for addresses (by its argument) It provides a place to hang the label, at the start of that block of space If we define _@A as an external symbol, other code will be able to access it The linker will Je the symbols and addresses together The linker will match them by name, so only one object file can define _@A The process of crea7ng unique labels from names is called mangling Procedure fee() might create something like _.fee_ StaJc fee() in file foe() might create something like _.foe.fee @A: dss 1024 _@B: bss 128 COMP 412, Supplemental Materials 11

So, What Happens With All Those Object Modules? The compiler produces a.asm file for each compila7on unit. The assembler produces a.o file for each.asm file. What happens next? A collecjon of.o files can be combined to form an executable file (a.out) The program that performs this task is called a linker The linker takes a collecjon of object files and libraries of object files It selects out the pieces that it needs to build a complete executable It lays out the executable, computes addresses for all external symbols, and rewrites the executable, replacing the external symbols with virtual addresses COMP 412, Supplemental Materials 13

Building a Linker 1 or more.o files Linker executable code A linker composes one or more object modules into an executable program Finds all of the entry points and labels in the object code Both exported and imported names Finds the main entry point Determines the spajal layout of the executable Resolves all imported names in an appropriate way Names that are stajcally linked Imported names must match an exported name Imported names must be rewricen with the exported name s address Names that are dynamically linked Imported names must be rewriced with a stub to load & link the name at run?me COMP 412, Supplemental Materials 14

Building a Linker 1 or more.o files Linker executable code How does a linker work? Pass 1: Build a map of all symbols (exported & imported) by the object modules and libraries, and find the the main entry point Pass 2: Add object modules unjl all stajc symbols are resolved StarJng with the module containing main: Add the module to the end of the code Assign addresses to its internal labels Pass 3: Rewrite the code with resolved symbols StaJc symbols are rewri[en with addresses Dynamic symbols are rewri[en with a jump to a stub that finds, loads, and links the needed object code Operates over the symbol tables Operates over the object code Operates over the object code Granularity of inclusion is an object file COMP Library 412, can Supplemental be one big file Materials or a collecjon of smaller modules 15

Building a Linker 1 or more.o files Linker executable code How does a linker work? Pass 1: Build a map of all symbols (exported & imported) by the object modules and libraries, and find the the main entry point Pass 2: Add object modules unjl all stajc symbols are resolved StarJng with the module containing main: Add the module to the end of the code Assign addresses to its internal labels Pass 3: Rewrite the code with resolved symbols StaJc symbols are rewri[en with addresses Dynamic symbols are rewri[en with a jump to a stub that finds, loads, and links the needed object code Does order ma[er? Actually, it does See Pers & Hansen 90 or SecJon 8.7.2 in EaC2e Granularity of inclusion is an object file COMP Library 412, can Supplemental be one big file Materials or a collecjon of smaller modules 16

What Happens at Run7me? How does an a.out execute? Some running process 1 (the parent ) starts the execujon In a shell, the user types the name of an executable at an input prompt The parent process creates (or spawns ) a new process The new process executes a.out and returns In a shell, the parent process waits for the child to complete In a parallel computajon, the parent may spawn many children and wait for them to complete, or it may simply conjnue without waijng Obviously, this high-level view obscures some of the detail... 1 The first running process is the boot process, created by a hardware COMP signal 412, and Supplemental the code in the Materials boot sequence. 17

Crea7ng a New Process (Unix/Linux model) To create a new process, the parent calls fork() fork() clones the parent process New child has its own copies of all data Includes file descriptors, program counter, New child has a disjnct process id (PID) Fork() returns child s PID to the parent and 0 to the child The parent waits on the child The child executes the new command pid_t vfork(void); pid_t process; process = vfork(); if (process < 0) fprinw(stderr, vfork() error.\n ); else if (process == 0) { execute new command } else { wait() } If the child only executes the new command It does not need the parent s full address space vfork() copies just enough of the address space to allow an exec() call It clones file descriptors, process status informajon, and the code vfork() is safe if the process immediately calls exec() COMP 412, Supplemental Materials 18

Replacing the Address Space (Unix/Linux model) To replace the child process s address space, it calls exec() 1 execl() takes a path to the executable file and a list of arguments exec() constructs the address space of the executable file If the file begins with #! interpreter, it runs the interpreter on the rest of the file If the file is an executable, it reads the file s header informajon and: Loads the program text, starjng at the specified address Loads any inijalized data areas, starjng at their specified addresses Zeros any pages that are so specified Branches to the start address of main(), as specified in the file s header The original address space overwri[en, except for parts of its environment File descriptors remain open (unless modified by fcntl()) The contents of the global environ remain intact Arguments passed through exec() are passed to main as argc, argv, and envp int main( int argc, char **argv, char **envp) { } 1 COMP execve() 412, is Supplemental the general Materials form; execl() is a front end to execve() 19

A Couple More Points The linker describes the address space The executable file is fully resolved, except for dynamically linked labels exec(), in effect, inflates the address space based on header info in the a.out It creates the code region of the address space, along with stajc data areas and global data areas main() must create the stack and the heap main() starts the language s run7me system Allocates and inijalizes space for the stack and the heap Takes care of other crijcal details The compiler inserts a call to the runjme inijalizajon code at the start of main() Special case for the main entry point In c, the compiler recognizes main() by name. In other languages, the compiler may recognize it by declarajon On exit, main() must shut down the environment Compiler inserts a call to the runjme finalizajon code at the end of main() COMP 412, Supplemental Materials 20

What Happened to _@A? We went off on this tale to learn how a compile-7me label becomes a physical address. What did happen to_ @A? Two cases: _@A was a label in the code The assembler converted _@A to an offset from the start of the object module _@A was a label on a pseudo-operajon The assembler converted _@A to an address of a data area (inijalized or not) Then, The linker computed _@A s virtual address as it laid out memory The linker replaced occurrences of _@A with its actual virtual address exec() built the address space and inijalized the memory at _@A The relevant instrucjons executed with the _@A s virtual address Load, store, branch, jump, call, etc. see a virtual address rather than _@A COMP 412, Supplemental Materials 21

What Happens to the Virtual Address? The hardware uses physical address (for the most part) At runjme, the operajng system and the hardware must translate the virtual address of _@A to a physical address The process requires cooperajon between the processor and the OS The process requires both hardware and soaware support Remember last lecture? COMP 412, Supplemental Materials 22

Cache Memory L3 L2 L1 Core FuncJonal unit FuncJonal unit Data Data & Code Data & Code Registers Code FuncJonal unit FuncJonal unit Typically shared among 2 cores TLB Modern hardware features mul7ple levels of cache & of TLB L1 is typically core-private & tagged with virtual addresses L2 (and beyond) is typically shared between cores & tagged with physical addresses TranslaJon from virtual address to physical address is assisted by the TLB & requires cooperajon between OS and the hardware Lookup in L1 and TLB proceed in parallel TLB can be as fast as L1 because it is just a cache with very short lines COMP 412, Supplemental Materials 23

What Happens to the Virtual Address? The hardware uses physical address (for the most part) At runjme, the operajng system and the hardware must translate the virtual address of _@A to a physical address The process requires cooperajon between the processor and the OS The process requires both hardware and soaware support Remember last lecture? The L1 caches (code & data) have virtual tags If the code is in the L1 cache, the virtual address works Everything above L1 uses physical addresses The virtual address in a load or jump gets translated by the cache/tlb system. Requires a fast mechanism to translate a virtual address to a physical address Hence, the creajon of the TLB to speed that translajon TLB caches page frame address (physical address modulo page size) by virtual address If the virtual address is not in the TLB, it requires an expensive lookup in the page tables and, perhaps a page fault (which changes the TLB and the virtual to physical mapping) COMP 412, Supplemental Materials 24

What Happened to _@A? The combina7on of the sokware stack and the memory management hardware ensure that _@A in the code is a usable address at run7me It is complex, but it works billions of 7mes a second COMP 412, Supplemental Materials 25