Introduction to Computer Architecture

Similar documents
Lectures 13 & 14. memory management

CS61C : Machine Structures

CS61C : Machine Structures

Review! Lecture 5 C Memory Management !

CS61C : Machine Structures

Lecture 10: Program Development versus Execution Environment

Review Pointers and arrays are virtually same C knows how to increment pointers C is an efficient language, with little protection

CS61C Machine Structures. Lecture 4 C Structs & Memory Management. 9/5/2007 John Wawrzynek. www-inst.eecs.berkeley.edu/~cs61c/

Numbers: positional notation. CS61C Machine Structures. Faux Midterm Review Jaein Jeong Cheng Tien Ee. www-inst.eecs.berkeley.

CENG3420 Lecture 03 Review

review Pointers and arrays are virtually same C knows how to increment pointers C is an efficient language, with little protection

CS61C Machine Structures. Lecture 5 C Structs & Memory Mangement. 1/27/2006 John Wawrzynek. www-inst.eecs.berkeley.edu/~cs61c/

CS61C : Machine Structures

Lecture 7: Procedures and Program Execution Preview

Do-While Example. In C++ In assembly language. do { z--; while (a == b); z = b; loop: addi $s2, $s2, -1 beq $s0, $s1, loop or $s2, $s1, $zero

Review: C Strings. A string in C is just an array of characters. Lecture #4 C Strings, Arrays, & Malloc

Assembler. #13 Running a Program II

Administrivia. Midterm Exam - You get to bring. What you don t need to bring. Conflicts? DSP accomodations? Head TA

Review (1/2) IEEE 754 Floating Point Standard: Kahan pack as much in as could get away with. CS61C - Machine Structures

CS61C : Machine Structures

ECE 15B COMPUTER ORGANIZATION

How about them A s!! Go Oaktown!! CS61C - Machine Structures. Lecture 4 C Structures Memory Management Dan Garcia.

CS61C : Machine Structures

CS61C : Machine Structures

CS61C : Machine Structures

Machine Language Instructions Introduction. Instructions Words of a language understood by machine. Instruction set Vocabulary of the machine

ECE 15B COMPUTER ORGANIZATION

CS61C : Machine Structures

CS 110 Computer Architecture Running a Program - CALL (Compiling, Assembling, Linking, and Loading)

CS61C : Machine Structures

COMP 303 Computer Architecture Lecture 3. Comp 303 Computer Architecture

CS61C : Machine Structures

Part II Instruction-Set Architecture. Jan Computer Architecture, Instruction-Set Architecture Slide 1

MIPS (SPIM) Assembler Syntax

Reference slides! Garcia, Fall 2011 UCB! CS61C L04 Introduction to C (pt 2) (1)!

Lecture 7 More Memory Management Slab Allocator. Slab Allocator

Calling Conventions. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. See P&H 2.8 and 2.12

COMPUTER ORGANIZATION AND DESIGN

CSE Lecture In Class Example Handout

CS61C Midterm Review on C & Memory Management

CS 61C: Great Ideas in Computer Architecture CALL continued ( Linking and Loading)

Chapter 2A Instructions: Language of the Computer

Computer Architecture. Chapter 2-2. Instructions: Language of the Computer

Review. Disassembly is simple and starts by decoding opcode field. Lecture #13 Compiling, Assembly, Linking, Loader I

Rui Wang, Assistant professor Dept. of Information and Communication Tongji University.

CSE Lecture In Class Example Handout

CS 61C: Great Ideas in Computer Architecture. (Brief) Review Lecture

! What is main memory? ! What is static and dynamic allocation? ! What is segmentation? Maria Hybinette, UGA. High Address (0x7fffffff) !

Anne Bracy CS 3410 Computer Science Cornell University

CS 61C: Great Ideas in Computer Architecture Strings and Func.ons. Anything can be represented as a number, i.e., data or instruc\ons

MIPS Functions and the Runtime Stack

Review C program: foo.c Compiler Assembly program: foo.s Assembler Object(mach lang module): foo.o

Lectures 5. Announcements: Today: Oops in Strings/pointers (example from last time) Functions in MIPS

Implementing Procedure Calls

Functions in MIPS. Functions in MIPS 1

Subroutines. int main() { int i, j; i = 5; j = celtokel(i); i = j; return 0;}

Control Instructions. Computer Organization Architectures for Embedded Computing. Thursday, 26 September Summary

Control Instructions

From Code to Program: CALL Con'nued (Linking, and Loading)

Prof. Kavita Bala and Prof. Hakim Weatherspoon CS 3410, Spring 2014 Computer Science Cornell University. See P&H 2.8 and 2.12, and A.

Review C program: foo.c Compiler Assembly program: foo.s Assembler Object(mach lang module): foo.o

Memory Usage 0x7fffffff. stack. dynamic data. static data 0x Code Reserved 0x x A software convention

Anne Bracy CS 3410 Computer Science Cornell University

CS61C Machine Structures. Lecture 18 - Running a Program I. 3/1/2006 John Wawrzynek. www-inst.eecs.berkeley.edu/~cs61c/

ecture 12 From Code to Program: CALL (Compiling, Assembling, Linking, and Loading) Friedland and Weaver Computer Science 61C Spring 2017

Branch Addressing. Jump Addressing. Target Addressing Example. The University of Adelaide, School of Computer Science 28 September 2015

EEM 486: Computer Architecture. Lecture 2. MIPS Instruction Set Architecture

Programs in memory. The layout of memory is roughly:

See P&H 2.8 and 2.12, and A.5-6. Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University

Chapter 2. Instructions: Language of the Computer. Adapted by Paulo Lopes

CALL (Compiler/Assembler/Linker/ Loader)

CS 61C: Great Ideas in Computer Architecture Running a Program - CALL (Compiling, Assembling, Linking, and Loading)

Run-time Environment

Today. Putting it all together

Orange Coast College. Business Division. Computer Science Department CS 116- Computer Architecture. The Instructions

ECE232: Hardware Organization and Design

Advanced Programming & C++ Language

Course Administration

CS 61C: Great Ideas in Computer Architecture (Machine Structures) More MIPS Machine Language

Q1: /20 Q2: /30 Q3: /24 Q4: /26. Total: /100

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

CS 61C: Great Ideas in Computer Architecture Running a Program - CALL (Compiling, Assembling, Linking, and Loading)

More C functions and Big Picture [MIPSc Notes]

Assembly Language Programming. CPSC 252 Computer Organization Ellen Walker, Hiram College

CS 61c: Great Ideas in Computer Architecture

MIPS R-format Instructions. Representing Instructions. Hexadecimal. R-format Example. MIPS I-format Example. MIPS I-format Instructions

CS61C : Machine Structures

UCB CS61C : Machine Structures

Binding and Storage. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Procedures and Stacks

Language of the Machine Recursive functions

Shift and Rotate Instructions

Agenda. Pointer Arithmetic. Pointer Arithmetic pointer + number, pointer number 6/23/2011. Pointer Arithmetic: Peer Instruction Question

CENG3420 L03: Instruction Set Architecture

CS61C : Machine Structures

Lecture 5. Announcements: Today: Finish up functions in MIPS

M2 Instruction Set Architecture

Compiling Code, Procedures and Stacks

CS 110 Computer Architecture Lecture 6: More MIPS, MIPS Functions

Architecture II. Computer Systems Laboratory Sungkyunkwan University

Transcription:

ECE 154A Introduction to Computer Architecture Fall 2013 Dmitri istrukov Software Interface

Agenda Procedures and stack Memory mapping Arrays vs. linked lists Memory management Program compilation, linking, loading and execution

Big Idea Architecture should be convenient for programmers HW support for programming language constructions Debugging, security etc.

Why Subroutines (Procedures) Important? t? Better structure Fewer bugs, i.e. faster and cheaper development More compact code Fewer bugs Very important when memory is limited, e.g. early days Even for today s computers will typically lead in better performance Fewer misses (memory hierarchy) but could have also negative effects if overhead (i.e. control instructions) is significant

Implementing Subroutines Can implement with existing instructions What if procedure is written by somebody else and already compiled (e.g. library) Still doable to patch binaries Procedures are very frequent so let s have special instructions to support it JAL and JR cont: proc: j proc xxx.. xxx j cont

Instructions for Accessing Procedures MIPS procedure call instruction: jal ProcedureAddress #jump and link Saves PC+4 in register $ra to have a link to the next instruction for the procedure return Machine format (J format): 0x03 26 bit address Then can do procedure return with a jr $ra #return Instruction format (R format): 0 0x08

Illustrating a Procedure Call main PC jal proc Prepare to call Prepare to continue proc Save, etc. jr $ra Restore Relationship between the main program and a procedure.

More Issues with Procedures Q1: How to pass to and return from a procedure the data? Would like to use as many as possible register inside procedure (callee) to better utilize temporal locality but some register may be utilized by caller Solution: Spill registers (move RF content to main memory and thenrestore) restore). What is the exact mechanismfor that, in particular Q2: Which registers to spill? Q3: Who is responsible saving (callee vs. caller)? Q4: Where to spill? Solution: There are certain rules enforced in a software which helps such implementation

Typical Use of Registers $0 0 $zero $1 $at Reserved for assembler use $2 $v0 $3 $v1 Procedure results $4 $a0 $5 $a1 Procedure $6 $a2 arguments Saved $7 $a3 $8 $t0 $9 $t1 $10 $t2 $11 $t3 Temporary $12 $t4 values $13 $t5 $14 $t6 $15 $t7 $16 $s0 $17 $s1 $18 $s2 Saved $19 $s3 across Operands $20 $s4 procedure $21 $s5 calls $22 $s6 $23 $s7 $24 $t8 More $25 $t9 temporaries $26 $k0 $27 $k1 Reserved for OS (kernel) $28 $gp Global pointer $29 $sp Stack pointer $30 $fp Frame pointer Saved $31 $ra Return address A4-byte word sits in consecutive memory addresses according to the big-endian order (most significant byte has the lowest address) 3 2 1 0 Answer to Q1 Byte numbering: 3 2 1 0 When loading a byte into a register, it goes in the low end Byte In principle, p one can use registers as he/she likes without sticking to these guidelines Doublew ord Word (one exception: In MIPS kernel registers might be rewritten by hardware on special occasions (exceptions) so it is better not to use them ) However, if the program is supposed to A be doubleword run together with ihothers (e.g. sits in consecutive under certain OS and/or if it uses registers or subroutines memory locations written by other people) according to the then big-endian it is order a good idea to stick to these (most significant rules word comes first)

A Simple MIPS Procedure Procedure to find the absolute value of an integer. $v0 ($a0) Solution The absolute value of x is x if x < 0 and x otherwise. abs: sub $v0,$zero,$a0 # put -($a0) in $v0; # in case ($a0) < 0 bltz $a0,done # if ($a0)<0 then done add $v0,$a0,$zero # else put ($a0) in $v0 done: jr $ra # return to calling program In practice, we seldom use such short procedures because of the overhead that they entail. In this example, we have 3-4 instructions of overhead for 3 instructions of useful computation. No register spilling here -- see next example

Typical Use of Registers $0 0 $zero $1 $at Reserved for assembler use $2 $v0 $3 $v1 Procedure results $4 $a0 $5 $a1 Procedure $6 $a2 arguments Saved $7 $a3 $8 $t0 $9 $t1 $10 $t2 $11 $t3 Temporary $12 $t4 values $13 $t5 $14 $t6 $15 $t7 $16 $s0 $17 $s1 $18 $s2 Saved $19 $s3 across Operands $20 $s4 procedure $21 $s5 calls $22 $s6 $23 $s7 $24 $t8 More $25 $t9 temporaries $26 $k0 $27 $k1 Reserved for OS (kernel) $28 $gp Global pointer $29 $sp Stack pointer $30 $fp Frame pointer Saved $31 $ra Return address A4-byte word sits in consecutive memory addresses according to the big-endian order (most significant byte has the lowest address) 3 2 1 0 Answer to Q2 Byte numbering: 3 2 1 0 When loading a byte into a register, it goes in the low end Byte In principle, p one can use registers as he/she likes without sticking to these guidelines Doublew ord Word (one exception: In MIPS kernel registers might be rewritten by hardware on special occasions (exceptions) so it is better not to use them ) However, if the program is supposed to A be doubleword run together with ihothers (e.g. sits in consecutive under certain OS and/or if it uses registers or subroutines memory locations written by other people) according to the then big-endian it is order a good idea to stick to these (most significant rules word comes first)

Six Steps in Execution of a Procedure (Answer to Q3) 1. Main routine (caller) places parameters in a place where the procedure (callee) can access them $a0 $a3: four argumentregisters 2. Caller transfers control to the callee 3. Callee acquires the storage resources needed 4. Callee performs the desired task 5. Callee places the result value in a place where the caller can access it $v0 $v1: two value registers it for result values 6. Callee returns control to the caller $ra: one return address register to return to the point of origin

Illustrating a Procedure Call main PC jal proc Prepare to call Prepare to continue proc Save, etc. jr $ra Restore Relationship between the main program and a procedure.

Spilling Registers (Answer to Q4) What if the callee needs to use more registers than allocated to argument and return values? callee uses a stack a last in first out queue high addr top of stack $sp One of the general registers, $sp ($29), is used to address the stack (which grows from high address to low address) add data dt onto the stack push $sp = $sp 4 data on stack at new $sp remove data from the stack pop data from stack at $sp $ $ 4 low addr $sp = $sp + 4

high addr low addr Allocating Space on the Stack Saved argument regs (if any) Saved return addr Saved local regs (if any) $fp The segment of the stack containing a procedure s savedregisters andlocal variables is its procedure frame (aka activation record) The frame pointer ($fp) Local arrays & points to the first word of structures (if the frame of a procedure any) $sp providing a stable base register for the procedure $fp is initialized using $sp on a call and $sp is restored using $fp on a return

Example: Parameters and Results low addr $sp $fp c b a. Frame for current procedure $sp Local variables Saved registers $fp z y. Old ($fp) c b a. Frame for current procedure Frame for previous procedure high addr Before calling After calling Use of the stack by a procedure.

More on Procedures Prolog Body spill all register to stack used by procedure expect for $t0 $t9 and the one used for returning values advance stack pointer ($sp) first then write to stack code of the procedure Epilog restore all used registers adjust stack pointer at the end ($sp)

Example of Using the Stack Saving $fp, $ra, and $s0 onto the stack and restoring them at the end of the procedure proc: sw $fp,-4($sp) # save the old frame pointer addi $fp,$sp,0 $sp # save ($sp) into $fp addi $sp,$sp, 12 # create 3 spaces on top of stack sw $ra,-8($fp) # save ($ra) in 2nd stack element sw $s0,-12($fp) # save ($s0) in top stack element $sp. ($s0) ($ra). ($fp). $sp lw $s0,-12($fp) # put top stack element in $s0 $fp lw $ra, -8($fp) # put 2nd stack element in $ra addi $sp,$fp, 0 # restore $sp to original state $fp lw $fp,-4($sp) # restore $fp to original state jr $ra # return from procedure Could be a good idea to modify the stack pointer first in epilog (before writing to stack) and last in prolog. Why?

Typical Use of Registers $0 0 $zero $1 $at Reserved for assembler use $2 $v0 $3 $v1 Procedure results $4 $a0 $5 $a1 Procedure $6 $a2 arguments Saved $7 $a3 $8 $t0 $9 $t1 $10 $t2 $11 $t3 Temporary $12 $t4 values $13 $t5 $14 $t6 $15 $t7 $16 $s0 $17 $s1 $18 $s2 Saved $19 $s3 across Operands $20 $s4 procedure $21 $s5 calls $22 $s6 $23 $s7 $24 $t8 More $25 $t9 temporaries $26 $k0 $27 $k1 Reserved for OS (kernel) $28 $gp Global pointer $29 $sp Stack pointer $30 $fp Frame pointer Saved $31 $ra Return address A4-byte word sits in consecutive memory addresses according to the big-endian order (most significant byte has the lowest address) 3 2 1 0 Byte numbering: 3 2 1 0 When loading a byte into a register, it goes in the low end Byte In principle, p one can use registers as he/she likes without sticking to these guidelines Doublew ord Word (one exception: In MIPS kernel registers might be rewritten by hardware on special occasions (exceptions) so it is better not to use them ) However, if the program is supposed to A be doubleword run together with ihothers (e.g. sits in consecutive under certain OS and/or if it uses registers or subroutines memory locations written by other people) according to the then big-endian it is order a good idea to stick to these (most significant rules word comes first)

Nested Procedure Calls main PC jal abc Prepare to call Prepare to continue abc Procedure abc Save xyz Procedure xyz jal xyz jr $ra Restore jr $ra Example of nested procedure calls.

Fibonacci numbers (Similar problem in HW4) F(n) = F(n 1)+F(n 2) ( F(1) = 1 F(2) = 1 n = 1 2 3 4 5 6 F(n) = 1 1 2 3 5 8 /* Recursive function in c */ int fib(int n) { } If (n==1 n==2) return 1; return fib(n 1)+fib(n 2);

Memory mapping

Big Picture More complicated picture for modern processors. Many details are missing Complication #1: IM and DM are caches: Fast but small memory Complication #2: Program are mapped to virtual address space: the mapping for the program and data in question should be aware of other programs and data (i.e. O/S) each program (process) is mapped to its own virtual address space Additional mechanism (implemented in SW and HW) are taking care of that (will be discussed later) Main memory Virtual memory Add HW + SW HW + SW 4 PC Read Address Instruction Memory Instruction Read Addr 1 Register Read Read Addr 2 Data 1 File Write Addr Read Data 2 Write Data ALU Address Data Memory Write Data Read Data Sign Extend 16 32

Big Picture Assume that there is only one program mapped to physical memory Questions to answer Where to store code and where to store date? dt? Would stack structure be enough to keep all the data? What kind of data are typically present? A related question: How to pass more than one A related question: How to pass more than one parameter to procedure?

Address space (language and OS specific) A program s address space contains 4 ~ FFFF FFFF hex stack regions: stack: local variables, grows downward dynamic data (heap): space requested for pointers via malloc() ; resizes dynamically, grows upward static data: variables declared outside main, does not grow or shrink code: loaded when program code starts, ~ 0 does not change hex dynamic data static data For now, OS somehow Why stack grows from top to bottom? prevents accesses between stack and heap (gray hash lines). Wait for virtual memory

Memory Map in MIPS Hex address 00000000 00400000 Reserved Program 1 M words Text segment 63 M words Addressable with 16-bit signed offset 10000000 10008000 1000ffff Static data Dynamic data Data segment $28 $29 $30 $gp $fp $sp Stack 448 M words Stack segment 7ffffffc 80000000 Second half of address space reserved for memory-mapped I/O Overview of the memory address space in MiniMIPS.

Linked Lists vs. Arrays

Pointers (1/4) Sometimes you want to have a procedure increment a variable? What gets printed? void main() { int y = 5; AddOne( y); printf( y = %d\n, y); } $a0 void AddOne(int x) { x = x + 1; } y = 5 frame pointer for main $sp $fp lw $a0, 12($fp) jal AddOne AddOne: addi $t0, $a0, 1 jr $ra y

Pointers (2/4) Solved by passing in a pointer to our subroutine. Now what gets printed? void main() { int y = 5; AddOne(&y); printf( y = %d\n, y); } $a0 void AddOne(int *p) { *p = *p + 1; } y = 6 $sp $fp addi $a0, $fp, 12 jal AddOne AddOne: lw $t0, 0($a0) addi $t0, $t0,1 sw $t0, 0($a0) jr $ra y

Pointers (2.5/4) another way of correcting it Sometimes you want to have a procedure increment a variable? What gets printed? $sp y $fp void main() { int y = 5; y = AddOne( y); printf( y = %d\n, y); } $a0 int AddOne(int x) { x = x + 1; return x;} y = 6 lw $a0, 12($fp) jal AddOne sw $v0, 12($fp) AddOne: addi $v0, $a0, 1 jr $ra

Pointers (3/4) But what htif what htyou want changed is a pointer? What gets printed? $sp $fp q A[0] A[1] A[2] void main() { int A[3] = {50, 60, 70}; int *q = A; IncrementPtr( q); printf( *q = %d\n, *q); *q = 50 A q } void IncrementPtr(int $a0 *p) lw $0 $a0, 20($fp) ) jal IncPtr IncPtr: addi $t0, $a0, 1 jr $ra { p = p + 1; } 50 60 70

Pointers (4/4) Solution! Pass a pointer to a pointer, declared as **h Now what gets printed? void main() { int A[3] = {50, 60, 70}; int *q = A; IncrementPtr(&q); printf( *q = %d\n, *q); * 60 } $a0 $sp $fp *q = 60 A q q q A[0] A[1] A[2] addi $a0, $fp, 20 jal IncPtr IncPtr: lw $t0, 0($a0) addi $t0, $t0,4 sw $t0, 0($a0) jr $ra Note +4! void IncrementPtr(int **h) { *h = *h + 1; } 50 60 70

Arrays example void foo() { int *p, *q, x; int a[4]; p = (int *) malloc (sizeof(int)); q = &x; *p = 1; // p[0] would also work here printf("*p:%u, p:%u, &p:%u\n", *p, p, &p); *q = 2; // q[0] would also work here printf("*q:%u, q:%u, &q:%u\n", *q, q, &q); *a = 3; // a[0] would also work here printf("*a:%u, a:%u, &a:%u\n", *a, a, &a); 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60...... 40? 20? 2? 3?? 1... } p q x unnamed malloc space? 24 a *p:1, p:40, &p:12 *q:2, q:20, &q:16 *a:3, a:24, &a:24 An array name is not a variable

Example of array MIPS pseudocode Example of array C code int a[100]; main: void main () { int b[10]; int size; int *p; **** p = (int *)malloc(sizeof(int)*size); **** free(p); **** }.data a:.word 100.text addi $sp, $sp, 10*4 8 #reg to spill * 4; addi $fp, $sp, 10*4 + 8 + #reg to spill * 4; add $t0, $fp, 10*4 (address of base of array b) **** add $a0, $0, $t1 ($t1 has value of size*4) jal malloc (malloc returning memory address to $v0) **** sw $v0, 44($fp) (modify *p) add $a0, $v0, $0 jal free **** addi $sp, $sp, +10*4 + 8 + #reg to spill * 4; jr $ra malloc and malloc and free are an OS procedures

C structures A struct is a data structure composed from simpler data types. Like a class in Java/C++ but without methods or inheritance. i struct point { /* type definition */ int x; int y; }; void PrintPoint(struct point p) { printf( (%d,%d), p.x, p.y); } As always in C, the argument is passed by value a copy is made. struct point p1 = {0,10}; /* x=0, y=10 */ PrintPoint(p1);

C structures: Pointers to them Usually, more efficient to pass a pointer to the struct. The C arrow operator (->) dereferences and extracts a structure field with a single operator. The following are equivalent: struct point *p; /* code to assign to pointer */ printf( x is %d\n, (*p).x); printf( x is %d\n, p->x);

How big are structs? Recall C operator sizeof() which gives size in bytes (of type or variable) How big is sizeof(p)? struct p { char x; int y; }; 5 bytes? 8 bytes? Compiler may word align integer y

Array vs. Linked list Slowly changing size, order Quickly changing size, order Could be allocated More often dynamically y dynamically or statically (rarely statically) Contiguous location in Could be contiguous (when memory static) but most Fast traversal / no memory often not overhead but fixed structure Slower traversal / additional memory for storing pointers but flexible structure

Example of linked list C code Struct mylist { int value; struct mylist *next; struct mylist *prev; } In principle i can do this (can be allocated in any type of memory): struct mylist *list[100]; Most typically: void main(){ struct mylist *p p, *cur; ***** p = malloc(sizeof(struct mylist)*1); add(cur, p); ***** } delete(cur); ***** Linked list example Example of linked list MIPS pseudocode main: **** addi $a0, $0, 12 jal malloc (malloc returning memory address to $v0) add $a0, $0, $t1 ($t1 has address cur) add $a1, $0, $v0 jal addelement **** jal delete add $a0, $0, $t1 jal free **** jr $ra static dynamic stack

Deleting from doubly linked list example I

Deleting from doubly linked list example II

Memory Management

Memory Management How do we manage memory? Code, Static storage are easy: they never grow or shrink Stack space is also easy: stack frames are created and destroyed in last in, first out (LIFO) order Managing the heap is tricky: memory can be allocated / deallocated at any time

Heap Management Requirements Want malloc() and free() to run quickly. Want minimal memory overhead Want to avoid fragmentation* when most of our free memory is in many small chunks In this case, we might have many free bytes but not be able to satisfy a large request since the free bytes are not contiguous in memory. * This is technically called external fragmention

Heap Management An example Request R1 for 100 bytes Request R2 for 1 byte Memory from R1 is freed Request R3 for 50 bytes R2 (1 byte) R1 (100 bytes)

Heap Management An example Request R1 for 100 bytes Request R2 for 1 byte Memory from R1 is freed Request R3 for 50 bytes R2 (1 byte) R3? R3?

Example (K&R) Malloc/Free Implementation Each block of memory is preceded by a header that has two fields: size of the block and a pointer to the next block All free blocks are kept in a circular linked list, the pointer field is unused in an allocated block

Example Implementation malloc() searches the free list for a block that is big enough. If none is found, more memory is requested from the operating system. If what it gets can t satisfy the request, it fails. free() checks if the blocks adjacent to the freed block are also free If so, adjacent free blocks are merged (coalesced) into a single, larger free block Otherwise, the freed block is just added to the free list

Choosing a block in malloc() If there are multiple free blocks of memory that are big enough for some request, how do we choose which one to use? best fit: choose the smallest block that is big enough for the request first fit: choose the first block we see that is big enough next fit: like first fit but remember where we finished searching and resume searching from there

Tradeoffs of allocation policies Best fit: Tries to limit fragmentation but at the cost of time (must examine all free blocks for each malloc). Leaves lots of small blocks (why?) First fit: Quicker thanbest fit (why?) but potentially more fragmentation. Tends to concentrate small blocks at the beginning of the free list (why?) Next fit: Does not concentrate small blocks at front like first fit, should be faster as a result.

Compiling, Linking, and Loading Programs

The C Code Translation Hierarchy C program compiler assembly code assembler object code library routines linker machine code executable loader memory

Compiler Benefits Comparing performance for bubble (exchange) sort To sort 100,000 words with the array initialized to random values on a Pentium 4 with a 306clock 3.06 rate, a 533 MHz system bus, with 2 GB of DDR SDRAM, using Linux version 2.4.20 gcc opt Relative Clockcycles cycles Instrcount CPI performance (M) (M) None 1.00 158,615 114,938 1.38 O1 (medium) 237 2.37 66,990 37,470 179 1.79 O2 (full) 2.38 66,521 39,993 1.66 O3 (proc mig) 2.41 65,747 44,993 1.46 The unoptimized code has the best CPI, the O1 version has the lowest instruction count, but the O3 version is the fastest. Why?

Assembler Input: Assembly Language Code (e.g., foo.s for MIPS) Output: Object Code, information tables (e.g., foo.oo for MIPS) Reads and Uses Directives Replace Pseudoinstructions Produce Machine Language g Creates Object File

Assembler Directives Give directions to assembler, but do not produce machine instructions.text: Subsequent items put in user text segment (machine code).data: Subsequent items put in user data segment (binary rep of data in source file).globl sym: declares sym global land can be referenced from other files.asciiz str: Store the string str in memory and null terminate it.word w1 wn: Store the n 32 bit quantities in successive memory words

Producing Machine Language What about jumps (j and jal)? Jumps require absolute address. So, forward or not, still can t generate machine instruction without knowing the position of instructions in memory. What about references to data? la gets broken up into lui and ori These will require the full 32 bit address of the dt data. These can t be determined yet, so we create two tables

Symbol Table List of items in this file that may be used by other files. What are they? Labels: function calling Dt Data: anything in the.data section; variables ibl which may be accessed across files

Relocation Table List of items this file needs the address later. What are they? Any label jumped to: j or jal internal external (including lib files) Any piece of data such as the la instruction

Object File Format object file header: size and position of the other pieces of the object file text segment: the machine code data segment: binary representation of the data in the source file relocation information: identifies lines of code that need to be handled symbol table: list of this file s labels and data that can be referenced debugging information

Linker (1/3) Input: Object Code files, information tables (e.g., foo.o,libc.o for MIPS) Output: Executable Code (e.g., a.out for MIPS) Combines several object (.o) files into a single executable ( linking ) Enable Separate Compilation of files Changes to one file do not require recompilation of whole program Windows NT source was > 40 M lines of code! Old name Link Editor from editing the links in jump Old name Link Editor from editing the links in jump and link instructions

Linker (2/3).o file 1 text t 1 data 1 info 1.o file 2 text 2 data 2 info 2 Linker a.out Relocated text 1 Relocated text 2 Relocated data 1 Relocated data 2

Linker (3/3) Step 1: Take text segment from each.o file and put them together. Step 2: Take data segment from each.o file, put them together, and concatenate this onto end of text segments. Step 3: Resolve References Go through Relocation Table; handle each entry That is, fill in all absolute addresses

Acknowledgments Some of the slides contain material developed and copyrighted by M.J. Irwin (Penn state), B. Parhami (UCSB), D. Garcia (UCB) and instructor material for the textbook

Extra Material

More on Linked Lists (D. Garcia UCB)

Linked List Example Let s look at an example of using structures, pointers, malloc(), and free() to implement a linked list of strings. /* node structure for linked list */ struct Node { char *value; struct Node *next; }; cursive inition! Rec def

typedef simplifies the code struct Node { char *value; struct Node *next; }; String value; /* "typedef" means define a new type */ typedef struct Node NodeStruct; OR typedef struct Node { char *value; struct Node *next; } NodeStruct; THEN /* Note similarity! */ /* To define 2 nodes */ typedef NodeStruct *List; typedef char *String; struct t Node { char *value; struct Node *next; } node1, node2;

Linked List Example /* Add a string to an existing list */ List cons(string s, List list) { List node = (List) malloc(sizeof(nodestruct)); } node->value = (String) malloc (strlen(s) + 1); strcpy(node->value, s); node->next = list; return node; { String s1 = "abc", s2 = "cde"; List thelist = NULL; thelist = cons(s2, thelist); thelist = cons(s1, thelist); /* or, just like (cons s1 (cons s2 nil)) */ thelist = cons(s1, cons(s2, NULL));

Linked List Example /* Add a string to an existing list, 2nd call */ List cons(string s, List list) { List node = (List) malloc(sizeof(nodestruct)); ( } node->value = (String) malloc (strlen(s) + 1); strcpy(node->value, s); node->next = list; return node; node:? list: s: NULL "abc"

Linked List Example /* Add a string to an existing list, 2nd call */ List cons(string s, List list) { List node = (List) malloc(sizeof(nodestruct)); ( } node->value = (String) malloc (strlen(s) + 1); strcpy(node->value, s); node->next = list; return node; node:?? list: s: "abc" NULL

Linked List Example /* Add a string to an existing list, 2nd call */ List cons(string s, List list) { List node = (List) malloc(sizeof(nodestruct)); ( } node->value = (String) malloc (strlen(s) + 1); strcpy(node->value, s); node->next = list; return node; node: list:? "????" s: "abc" NULL

Linked List Example /* Add a string to an existing list, 2nd call */ List cons(string s, List list) { List node = (List) malloc(sizeof(nodestruct)); ( } node->value = (String) malloc (strlen(s) + 1); strcpy(node->value, s); node->next = list; return node; node:? "abc" list: s: "abc" NULL

Linked List Example /* Add a string to an existing list, 2nd call */ List cons(string s, List list) { List node = (List) malloc(sizeof(nodestruct)); ( } node->value = (String) malloc (strlen(s) + 1); strcpy(node->value, s); node->next = list; return node; node: list: s: NULL "abc" "abc"

Linked List Example /* Add a string to an existing list, 2nd call */ List cons(string s, List list) { List node = (List) malloc(sizeof(nodestruct)); ( } node->value = (String) malloc (strlen(s) + 1); strcpy(node->value, s); node->next = list; return node; node: "abc" s: "abc" NULL

Important points to remember Remember: Structure declaration does not allocate memory Variable declaration does allocate memory So far we have talked about several different ways to allocate memory for data: 1. Declaration of a local variable int i; struct Node list; char *string; int ar[n]; 2. Dynamic allocation at runtime by calling allocation function (alloc). ptr = (struct Node *) malloc(sizeof(struct Node)*n); One more possibility exists 3. Data declared outside of any procedure int myglobal; (i.e., before main). main() { Similar to #1 above, but has global scope. }

More on Heap Management Schemes

Slab Allocator A different approach to memory management (used in GNU libc) ) Divide blocks in to large and small by picking an arbitrary threshold size. Blocks larger than this threshold are managed with a freelist (as before). For small blocks, allocate blocks in sizes that are powers of 2 e.g., if program wants to allocate 20 bytes, actually give it 32 bytes

Slab Allocator Bookkeeping for small blocks is relatively easy: just use a bitmap for each range of blocks of the same size Allocating is easy and fast: compute the esize of the block to allocate and find a free bit in the corresponding bitmap. Freeing is also easy and fast: figure out which slab the address belongs to and clear the corresponding bit.

Slab Allocator 16 byte blocks: 32 byte blocks: 64 byte blocks: 16 byte block bitmap: 11011000 32 byte block bitmap: 0111 64 byte block bitmap: 00

Slab Allocator Tradeoffs Extremely fast for small blocks. Slower for large blocks But presumably the program will take more time to do something with a large block so the overhead is not as critical. Minimal space overhead No fragmentation (as we defined it before) for small blocks, but still have wasted space!

Internal vs. External Fragmentation With the slab allocator, difference between requested size and next power of 2 is wasted e.g., if program wants to allocate 20 bytes and we give it a 32 byte block, 12 bytes are unused. We also refer to this as fragmentation, but call it internal fragmentation since the wasted space is actually within an allocated block. External fragmentation: wasted space bt between allocated blocks.

Buddy System Yet another memory management technique (used in Linux kernel) Like GNU s slab allocator, but only allocate blocks boc in sizes esthat ataepo are powers of 2 (internal fragmentation is possible) Keep separate free lists for each size e.g., separate free lists for 16 byte, 32 byte, 64 byte blocks, etc.

Buddy System If no free block of size n is available, find a block of size 2n and split it in to two blocks of size n When a block of size n is freed, if its neighbor of size n is also free, combine the blocks in to a single block of size 2n Buddy is block in other half larger block buddies NOT buddies Same speed advantages as slab allocator

Buddy memory allocation 64K 64K 64K 64K 64K 64K 64K 64K 64K 64K 64K 64K 64K 64K 64K 64K t = 0 1024K t = 1 A 64K 64K 128K 256K 512K t =2 A 64K 64K B 128K 256K 512K t = 3 A 64K C 64K B 128K 256K 512K t = 4 A 64K C 64K B 128K D 128K 128K 512K t = 5 A 64K 64K B 128K D 128K 128K 512K t = 6 128K B 128K D 128K 128K 512K t = 7 256K D 128K 128K 512K t = 8 1024K 1. Program A requests memory 34K..64K in size 2. Program B requests memory 66K..128K in size 3. Program C requests ests memory 35K..64K in size 4. Program D requests memory 67K..128K in size 5. Program C releases its memory 6. Program A releases its memory 7. Program B releases its memory 8. Program D releases its memory

Allocation Schemes So which memory management scheme (K&R, slab, buddy) is best? There is no single best approach for every application. Different applications have different allocation / deallocation patterns. A scheme that works well for one application may work poorly for another application.