Linking and Loading. CS61, Lecture 16. Prof. Stephen Chong October 25, 2011

Similar documents
Systems Programming and Computer Architecture ( ) Timothy Roscoe

Lecture 16: Linking Computer Architecture and Systems Programming ( )

A Simplistic Program Translation Scheme

Linking Oct. 15, 2002

Example C program. 11: Linking. Why linkers? Modularity! Static linking. Why linkers? Efficiency! What do linkers do? 10/28/2013

Example C Program The course that gives CMU its Zip! Linking March 2, Static Linking. Why Linkers? Page # Topics

Systems I. Linking II

LINKING. Jo, Heeseung

Linking February 24, 2005

CSE 2421: Systems I Low-Level Programming and Computer Organization. Linking. Presentation N. Introduction to Linkers

Computer Organization: A Programmer's Perspective

Example C Program. Linking CS Instructor: Sanjeev Se(a. int buf[2] = {1, 2}; extern int buf[]; int main() { swap(); return 0; }

CS429: Computer Organization and Architecture

Computer Systems. Linking. Han, Hwansoo

Exercise Session 7 Computer Architecture and Systems Programming

Relocating Symbols and Resolving External References. CS429: Computer Organization and Architecture. m.o Relocation Info

Linking Oct. 26, 2009"

CS 201 Linking Gerson Robboy Portland State University

Linking. Computer Systems Organization (Spring 2017) CSCI-UA 201, Section 3. Instructor: Joanna Klukowska

Linking. Explain what ELF format is. Explain what an executable is and how it got that way. With huge thanks to Steve Chong for his notes from CS61.

(Extract from the slides by Terrance E. Boult

Revealing Internals of Linkers. Zhiqiang Lin

Linker Puzzles The course that gives CMU its Zip! Linking Mar 4, A Simplistic Program Translation Scheme. A Better Scheme Using a Linker

E = 2 e lines per set. S = 2 s sets tag. valid bit B = 2 b bytes per cache block (the data) CSE351 Inaugural EdiNon Spring

Outline. 1 Background. 2 ELF Linking. 3 Static Linking. 4 Dynamic Linking. 5 Summary. Linker. Various Stages. 1 Linking can be done at compile.

Sungkyunkwan University

Lecture 12-13: Linking

CS 550 Operating Systems Spring Process I

Systemprogrammering och operativsystem Laborationer. Linker Puzzles. Systemprogrammering 2007 Föreläsning 1 Compilation and Linking

Linking. Today. Next time. Static linking Object files Static & dynamically linked libraries. Exceptional control flows

u Linking u Case study: Library interpositioning

CS241 Computer Organization Spring Buffer Overflow

CSE2421 Systems1 Introduction to Low-Level Programming and Computer Organization

238P: Operating Systems. Lecture 7: Basic Architecture of a Program. Anton Burtsev January, 2018

COMPILING OBJECTS AND OTHER LANGUAGE IMPLEMENTATION ISSUES. Credit: Mostly Bryant & O Hallaron

Executables and Linking. CS449 Spring 2016

Binghamton University. CS-220 Spring Loading Code. Computer Systems Chapter 7.5, 7.8, 7.9

Carnegie Mellon. Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition

Executables and Linking. CS449 Fall 2017

Today. Linking. Example C Program. Sta7c Linking. Linking Case study: Library interposi7oning

Generating Programs and Linking. Professor Rick Han Department of Computer Science University of Colorado at Boulder

Link 2. Object Files

Link Edits and Relocatable Code

Link 7. Static Linking

Linking. CS 485 Systems Programming Fall Instructor: James Griffioen

Link 2. Object Files

Computer Systems Organization

A SimplisHc Program TranslaHon Scheme. TranslaHng the Example Program. Example C Program. Why Linkers? - Modularity. Linking

Link 4. Relocation. Young W. Lim Wed. Young W. Lim Link 4. Relocation Wed 1 / 22

Linkers and Loaders. CS 167 VI 1 Copyright 2008 Thomas W. Doeppner. All rights reserved.

Link 8.A Dynamic Linking

CIT 595 Spring System Software: Programming Tools. Assembly Process Example: First Pass. Assembly Process Example: Second Pass.

Link 4. Relocation. Young W. Lim Thr. Young W. Lim Link 4. Relocation Thr 1 / 26

Machine Programming 1: Introduction

Linking : Introduc on to Computer Systems 13 th Lecture, October 11th, Instructor: Randy Bryant. Carnegie Mellon

Link 7.A Static Linking

Machine Language, Assemblers and Linkers"

Systemprogrammering och operativsystem Linker Puzzles. Systemprogrammering 2009 Före lä s n in g 1 Compilation and Linking

Today s Big Adventure

Assembly Language Programming Linkers

Link 8. Dynamic Linking

Compiler Drivers = GCC

Link 3. Symbols. Young W. Lim Mon. Young W. Lim Link 3. Symbols Mon 1 / 42

Link 7. Dynamic Linking

Today s Big Adventure

CS 33. Linkers. CS33 Intro to Computer Systems XXV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Midterm. Median: 56, Mean: "midterm.data" using 1:2 1 / 37

Outline. Unresolved references

Lecture 8: linking CS 140. Dawson Engler Stanford CS department

o Code, executable, and process o Main memory vs. virtual memory

238P: Operating Systems. Lecture 4: Linking and Loading (Basic architecture of a program) Anton Burtsev October, 2018

CSC 1600 Memory Layout for Unix Processes"

143A: Principles of Operating Systems. Lecture 4: Linking and Loading (Basic architecture of a program) Anton Burtsev October, 2018

Link 4. Relocation. Young W. Lim Sat. Young W. Lim Link 4. Relocation Sat 1 / 33

Making Address Spaces Smaller

A software view. Computer Systems. The Compilation system. How it works. 1. Preprocesser. 1. Preprocessor (cpp)

Linking and Loading. ICS312 - Spring 2010 Machine-Level and Systems Programming. Henri Casanova

Introduction Presentation A

CS 33. Libraries. CS33 Intro to Computer Systems XXVIII 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

Turning C into Object Code Code in files p1.c p2.c Compile with command: gcc -O p1.c p2.c -o p Use optimizations (-O) Put resulting binary in file p

Link 4. Relocation. Young W. Lim Tue. Young W. Lim Link 4. Relocation Tue 1 / 38

Link 4. Relocation. Young W. Lim Mon. Young W. Lim Link 4. Relocation Mon 1 / 35

Assembly Programmer s View Lecture 4A Machine-Level Programming I: Introduction

High Performance Computing Lecture 1. Matthew Jacob Indian Institute of Science

CS5460/6460: Operating Systems. Lecture 21: Shared libraries. Anton Burtsev March, 2014

Dynamic Memory Allocation

Computer Systems Architecture I. CSE 560M Lecture 3 Prof. Patrick Crowley

Instruction Set Architectures

ELF (1A) Young Won Lim 10/22/14

CS 33. Libraries. CS33 Intro to Computer Systems XXIX 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

ECE 471 Embedded Systems Lecture 4

Overview REWARDS TIE HOWARD Summary CS 6V Data Structure Reverse Engineering. Zhiqiang Lin

CS140 - Summer Handout #8

CS 3214 Computer Systems. Do not start the test until instructed to do so! printed

Draft. Chapter 1 Program Structure. 1.1 Introduction. 1.2 The 0s and the 1s. 1.3 Bits and Bytes. 1.4 Representation of Numbers in Memory

Department of Computer Science and Engineering Yonghong Yan

gpio timer uart printf malloc keyboard fb gl console shell

COS 318: Operating Systems

Representation of Information

CSL373: Operating Systems Linking

Transcription:

Linking and Loading CS61, Lecture 16 Prof. Stephen Chong October 25, 2011

Announcements Midterm exam in class on Thursday 80 minute exam Open book, closed note. No electronic devices allowed Please be punctual, so we can start on time CS 61 Infrastructure maintenance SEAS Academic Computing need to do some maintainance All CS 61 VMs will be shut down on Thursday You will need to re-register (i.e., run seas-cloud register ) next time you use the infrastructure If this will cause problems for you, please contact course staff Stephen Chong, Harvard University 2

Today Getting from C programs to executables The what and why of linking Symbol relocation Symbol resolution ELF format Loading Static libraries Shared libraries Stephen Chong, Harvard University 3

Linking Linking is the process of combining pieces of data and code into a single file than can be loaded into memory and executed. Why learn about linking? Help you build large programs Avoid dangerous programming errors Understand how to use shared libraries Linking can be done at compile time (aka statically) at load time at run time Stephen Chong, Harvard University 4

Static linking Compiler driver coordinates translation and linking gcc is a compiler driver, invoking C preprocessor (cpp), C compiler (cc1), assembler (as), and linker (ld) main.c swap.c Source files gcc -c gcc -c main.o swap.o Separately compiled relocatable object files gcc -o myprog... Stephen Chong, Harvard University myprog Fully linked executable object file Contains code and data from main.o and swap.o 5

Example program with two C files Definition of buf[] main.c int buf[2] = {1, 2}; int main() { swap(); return 0; } Reference to swap() Declaration of buf[] swap.c extern int buf[]; void swap() { int temp; } temp = buf[1]; buf[1] = buf[0]; buf[0] = temp; Definition of main() Definition of swap() References to buf[] Stephen Chong, Harvard University 6

The linker The linker takes multiple object files and combines them into a single executable file. Three basic jobs: 1) Copy code and data from each object file to the executable 2) Resolve references between object files 3) Relocate symbols to use absolute memory addresses, rather than relative addresses. Stephen Chong, Harvard University 7

Resolving and relocating main.c int buf[2] = {1, 2}; swap.c extern int buf[]; int main() { swap(); return 0; } void swap() { int temp; Resolve: Find the definition of each undefined reference in an object file E.g., The use of swap() in main.o needs to be resolved to definition found in swap.o Relocate: Assign code and data into absolute locations, and update references swap.c compiled without knowing where buf will be in memory. temp = buf[1]; buf[1] = buf[0]; buf[0] = temp; Linker must allocate buf (defined in main.o) into some memory location, and update uses of buf in swap.o to use this location. Stephen Chong, Harvard University 8 }

Why Linkers? Reason 1: Modularity Program can be written as a collection of smaller source files, rather than one monolithic mass. Can build libraries of common functions (more on this later) e.g., Math library, standard C library Stephen Chong, Harvard University 9

Why Linkers? Reason 2: Efficiency Time efficiency: Separate Compilation Change one source file, recompile just that file, and then relink the executable. No need to recompile other source files. Space efficiency: Libraries Common functions can be combined into a single library file... Yet executable files contain only code for the functions they actually use. Stephen Chong, Harvard University 10

Today Getting from C programs to executables The what and why of linking Symbol relocation Symbol resolution ELF format Loading Static libraries Shared libraries Stephen Chong, Harvard University 11

Disassembly of main.o main.c int buf[2] = {1,2}; int main() { swap(); return 0; } This is a bogus address! Just a placeholder for swap (which we don't know the address of... yet!) $ objdump -d main.o... 00000000 <main>:... a: 55 push %ebp b: 89 e5 mov %esp,%ebp d: 51 push %ecx e: 83 ec 04 sub $0x4,%esp 11: e8 fc ff ff ff call 12 <main+0x12> 16: b8 00 00 00 00 mov $0x0,%eax... Stephen Chong, Harvard University 12

main.o object file main.c int buf[2] = {1,2}; main.o.text section code for main() int main() { swap(); return 0; } Relocation info tells the linker where references to external symbols are in the code $ objdump -d main.o... 00000000 <main>:... a: 55 push %ebp b: 89 e5 mov %esp,%ebp d: 51 push %ecx e: 83 ec 04 sub $0x4,%esp 11: e8 fc ff ff ff call 12 <main+0x12> 16: b8 00 00 00 00 mov $0x0,%eax....data section buf[] symbol table name section off main.text 0 buf.data 0 swap undefined relocation info for.text name offset swap 12 Stephen Chong, Harvard University 13

Disassembly of swap.o swap.c extern int buf[]; void swap() { int temp; } temp = buf[1]; buf[1] = buf[0]; buf[0] = temp; These are again just placeholders. 0x0 refers to buf[0] 0x4 refers to buf[1] $ objdump -d swap.o... 00000000 <swap>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 8b 15 04 00 00 00 mov 0x4,%edx 9: a1 00 00 00 00 mov 0x0,%eax e: a3 04 00 00 00 mov %eax,0x4 13: 89 15 00 00 00 00 mov %edx,0x0 19: 5d pop %ebp 1a: c3 ret Stephen Chong, Harvard University 14

swap.o object file swap.c extern int buf[]; void swap() { int temp; } temp = buf[1]; buf[1] = buf[0]; buf[0] = temp; $ objdump -d swap.o... 00000000 <swap>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 8b 15 04 00 00 00 mov 0x4,%edx 9: a1 00 00 00 00 mov 0x0,%eax e: a3 04 00 00 00 mov %eax,0x4 13: 89 15 00 00 00 00 mov %edx,0x0 19: 5d pop %ebp 1a: c3 ret swap.o.text section code for swap() symbol table name section off swap.text 0 buf undefined relocation info for.text name offset buf 0x5 buf 0xa buf 0xf buf 0x15 Stephen Chong, Harvard University 15

The linker The linker takes multiple object files and combines them into a single executable file. Three basic jobs: 1) Copy code and data from each object file to the executable 2) Resolve references between object files 3) Relocate symbols to use absolute memory addresses, rather than relative addresses. Stephen Chong, Harvard University 16

Linker operation main.o.text section code for main() swap.o.text section code for swap() myprog.text section code for main().data section buf[] symbol table code for swap() symbol table name section off main.text 0 buf.data 0 swap undefined relocation info for.text name offset swap 12 name section off swap.text 0 buf undefined relocation info for.text name offset buf 0x5 buf 0xa buf 0xf buf 0x15.data section buf[] Executable file must contain the absolute memory addresses of the code and data (that is, where they will be located when the program actually runs!) Stephen Chong, Harvard University 17

Linker operation main.o.text section code for main() swap.o.text section code for swap() myprog.text section code for main().data section buf[] symbol table code for swap() symbol table name section off main.text 0 buf.data 0 swap undefined relocation info for.text name offset swap 12 name section off swap.text 0 buf undefined relocation info for.text name offset buf 0x5 buf 0xa buf 0xf buf 0x15.data section buf[] Linker uses relocation info and actual addresses of code and data to modify symbol references. Stephen Chong, Harvard University 18

Linker operation main.o.text section code for main() swap.o.text section code for swap() myprog.text section code for main().data section buf[] symbol table code for swap() symbol table name section off main.text 0 buf.data 0 swap undefined relocation info for.text name offset swap 12 name section off swap.text 0 buf undefined relocation info for.text name offset buf 0x5 buf 0xa buf 0xf buf 0x15.data section buf[] Linker uses relocation info and actual addresses of code and data to modify symbol references. Stephen Chong, Harvard University 19

Linker operation main.o.text section code for main() swap.o.text section code for swap() myprog.text section code for main().data section buf[] symbol table code for swap() symbol table name section off main.text 0 buf.data 0 swap undefined relocation info for.text name offset swap 12 name section off swap.text 0 buf undefined relocation info for.text name offset buf 0x5 buf 0xa buf 0xf buf 0x15.data section buf[] Linker uses relocation info and actual addresses of code and data to modify symbol references. Stephen Chong, Harvard University 20

Linker operation main.o.text section code for main() swap.o.text section code for swap() myprog.text section code for main().data section buf[] symbol table code for swap() symbol table name section off main.text 0 buf.data 0 swap undefined relocation info for.text name offset swap 12 name section off swap.text 0 buf undefined relocation info for.text name offset buf 0x5 buf 0xa buf 0xf buf 0x15.data section buf[] Linker uses relocation info and actual addresses of code and data to modify symbol references. Stephen Chong, Harvard University 21

Linker operation main.o.text section code for main() swap.o.text section code for swap() myprog.text section code for main().data section buf[] symbol table code for swap() symbol table name section off main.text 0 buf.data 0 swap undefined relocation info for.text name offset swap 12 name section off swap.text 0 buf undefined relocation info for.text name offset buf 0x5 buf 0xa buf 0xf buf 0x15.data section buf[] Linker uses relocation info and actual addresses of code and data to modify symbol references. Stephen Chong, Harvard University 22

Disassembly after linking 08048344 <main>:... 804834e: 55 push %ebp 804834f: 89 e5 mov %esp,%ebp 8048351: 51 push %ecx 8048352: 83 ec 04 sub $0x4,%esp 8048355: e8 0e 00 00 00 call 8048368 <swap> 804835a: b8 00 00 00 00 mov $0x0,%eax... 08048368 <swap>: 8048368: 55 push %ebp 8048369: 89 e5 mov %esp,%ebp 804836b: 8b 15 5c 95 04 08 mov 0x804955c,%edx 8048371: a1 58 95 04 08 mov 0x8049558,%eax 8048376: a3 5c 95 04 08 mov %eax,0x804955c 804837b: 89 15 58 95 04 08 mov %edx,0x8049558 8048381: 5d pop %ebp 8048382: c3 ret Stephen Chong, Harvard University 23

Disassembly after linking 08048344 <main>:... PC relative addressing: %eip = 0x804835a %eip + 0xe = 0x8048368 804834e: 55 push %ebp 804834f: 89 e5 mov %esp,%ebp 8048351: 51 push %ecx 8048352: 83 ec 04 sub $0x4,%esp 8048355: e8 0e 00 00 00 call 8048368 <swap> 804835a: b8 00 00 00 00 mov $0x0,%eax... 08048368 <swap>: 8048368: 55 push %ebp 8048369: 89 e5 mov %esp,%ebp 804836b: 8b 15 5c 95 04 08 mov 0x804955c,%edx 8048371: a1 58 95 04 08 mov 0x8049558,%eax 8048376: a3 5c 95 04 08 mov %eax,0x804955c 804837b: 89 15 58 95 04 08 mov %edx,0x8049558 8048381: 5d pop %ebp 8048382: c3 ret Stephen Chong, Harvard University 24

Disassembly after linking $ objdump -t myprog... 08049558 g O.data 00000008 buf... Address Section Size Symbol name 08048368 <swap>: 8048368: 55 push %ebp 8048369: 89 e5 mov %esp,%ebp 804836b: 8b 15 5c 95 04 08 mov 0x804955c,%edx 8048371: a1 58 95 04 08 mov 0x8049558,%eax 8048376: a3 5c 95 04 08 mov %eax,0x804955c 804837b: 89 15 58 95 04 08 mov %edx,0x8049558 8048381: 5d pop %ebp 8048382: c3 ret Stephen Chong, Harvard University 25

Today Getting from C programs to executables The what and why of linking Symbol relocation Symbol resolution ELF format Loading Static libraries Shared libraries Stephen Chong, Harvard University 26

The linker Recall: The linker takes multiple object files and combines them into a single executable file. Three basic jobs: 1) Copy code and data from each object file to the executable 2) Resolve references between object files 3) Relocate symbols to use absolute memory addresses, rather than relative addresses. We have looked at copying and relocation. How does linker resolve references? Stephen Chong, Harvard University 27

Strong vs. Weak Symbols The compiler exports each global symbol as either strong or weak Strong symbols: Functions Initialized global variables Weak symbols: Uninitialized global variables Stephen Chong, Harvard University 28

Strong vs. Weak Symbols Strong main.c Weak swap.c Not a global symbol (just a reference) Strong int buf[2] = {1, 2}; int main() { swap(); return 0; } Not a global symbol int myvar; extern int buf[]; void swap() { int temp; } temp = buf[1]; buf[1] = buf[0]; buf[0] = temp; Strong Stephen Chong, Harvard University 29

Linker rules Rule 1: Multiple strong symbols with the same name are not allowed. foo1.c int somefunc(){ return 0; } foo2.c int somefunc() { return 1; } $ gcc foo1.c foo2.c /tmp/ccacj1wn.o: In function `somefunc': foo2.c:(.text+0x0): multiple definition of `somefunc' /tmp/ccaef3ue.o:foo1.c:(.text+0x1e): first defined here collect2: ld returned 1 exit status Stephen Chong, Harvard University 30

Linker rules Rule 2: Given a strong symbol and multiple weak symbols, choose the strong symbol. foo1.c foo2.c Strong void f(void); int x = 38; int main() { f(); printf( x = %d\n, x); return 0; } int x; void f() { x = 42; } Weak $ gcc -o myprog foo1.c foo2.c $./myprog x = 42 Potentially unexpected behavior! Stephen Chong, Harvard University 31

Linker rules This can lead to some pretty weird bugs!!! foo1.c foo2.c Strong void f(void); int x = 38; int y = 39; int main() { f(); printf( x = %d\n, x); printf( y = %d\n, y); return 0; } double x; void f() { x = 42.0; } Weak $ gcc -o myprog foo1.c foo2.c $./myprog x = 0 y = 1078263808 double x is 8 bytes in size! But resolves to address of int x in foo1.c. Stephen Chong, Harvard University 32

Linker rules Rule 3: Given multiple weak symbols, pick any one of them foo1.c foo2.c Weak void f(void); int x; int main() { x = 38; f(); printf( x = %d\n, x); return 0; } int x; void f() { x = 42; } Weak $ gcc -o myprog foo1.c foo2.c $./myprog x = 42 Stephen Chong, Harvard University 33

Static keyword In C, the keyword static affects the lifetime and linkage (visibility) of a variable A static global variable, declared at the top of a source file, is visible only within the source file Linker will not resolve any reference from another object file to it foo1.c static int x = 38; int main() { g(); printf( x = %d\n, x); return 0; } foo2.c int x; void g() { x = 42; } $ gcc -o myprog foo1.c foo2.c $./myprog x = 38 Stephen Chong, Harvard University 34

Today Getting from C programs to executables The what and why of linking Symbol relocation Symbol resolution ELF format Loading Static libraries Shared libraries Stephen Chong, Harvard University 35

Executable and Linkable Format (ELF) Standard binary format for object files Originally proposed by AT&T System V Unix Later adopted by BSD Unix variants and Linux One unified format for Relocatable object files (.o) Shared object files (.so) Executable files Stephen Chong, Harvard University 36

ELF Object File Format Elf header Magic number, type (.o, exec,.so), machine, byte ordering, etc. Segment header table Maps contiguous file sections to runtime memory segments.text section Code.rodata section Read-only data, e.g. jump tables, constant strings.data section Initialized global variables.bss section Uninitialized global variables Block Started by Symbol, Better Save Space Has section header but occupies no space ELF header Segment header table (required for executables).text section.rodata section.data section.bss section.symtab section.rel.text section.rel.data section.debug section Section header table 0 Stephen Chong, Harvard University 37

ELF Object File Format.symtab section Symbol table Procedure and static variable names Section names and locations.rel.text section Relocation info for.text section Addresses of instructions that will need to be modified in the executable Instructions for modifying..rel.data section Relocation info for.data section Addresses of pointer data that will need modification in merged executable.debug section Info for debugging (gcc -g) Section header table Offsets and sizes of each section Stephen Chong, Harvard University ELF header Segment header table (required for executables).text section.rodata section.data section.bss section.symtab section.rel.text section.rel.data section.debug section Section header table 0

objdump: Looking at ELF files Use the objdump tool to peek inside of ELF files (.o,.so, and executable files) objdump -h: print out list of sections in the file. $ objdump -h myprog myprog: file format elf32-i386 Sections: Idx Name Size VMA LMA File off Algn... 11.text 000001c8 080482a0 080482a0 000002a0 2**4 CONTENTS, ALLOC, LOAD, READONLY, CODE... 22.data 0000000c 080495f8 080495f8 000005f8 2**2 CONTENTS, ALLOC, LOAD, DATA 23.bss 0000000c 08049604 08049604 00000604 2**2 ALLOC Stephen Chong, Harvard University 39

objdump: Looking at ELF files Use the objdump tool to peek inside of ELF files (.o,.so, and executable files) objdump -s: print out full contents of each section. $ objdump -s myprog myprog: file format elf32-i386... Contents of section.text: 80482a0 31ed5e89 e183e4f0 50545268 c0830408 1.^...PTRh... 80482b0 68d08304 08515668 74830408 e8c7ffff h...qvht... 80482c0 fff49090 5589e553 83ec04e8 00000000...U..S...... Stephen Chong, Harvard University 40

objdump: Looking at ELF files Use the objdump tool to peek inside of ELF files (.o,.so, and executable files) objdump -t: print out contents of symbol table $ objdump -t myprog myprog: file format elf32-i386 SYMBOL TABLE:... 0804839c g F.text 00000022 swap 08049604 g *ABS* 00000000 bss_start 08049610 g *ABS* 00000000 _end 080495fc g O.data 00000008 buf 08049604 g *ABS* 00000000 _edata 08048439 g F.text 00000000.hidden i686.get_pc_thunk.bx 08048374 g F.text 00000028 main 08048250 g F.init 00000000 _init Stephen Chong, Harvard University 41

Today Getting from C programs to executables The what and why of linking Symbol relocation Symbol resolution ELF format Loading Static libraries Shared libraries Stephen Chong, Harvard University 42

Loading an executable Loading is the process of reading code and data from an executable file and placing it in memory, where it can run. This is done by the operating system. In UNIX, you can use the execve() system call to load and run an executable. What happens when the OS runs a program: 1) Create a new process to contain the new program (more later this semester!) 2) Allocate memory to hold the program code and data 3) Copy contents of executable file to the newly-allocated memory 4) Jump to the executable's entry point (which calls the main() function) Stephen Chong, Harvard University 43

Address space layout Executable file Process address space ELF header Segment header table indicates what sections to load where.text section.rodata section.data section.bss section.symtab section.debug section Section header table 0xc0000000 0x40000000 0x08048000 0x00000000 OS memory User stack Shared libraries Heap (used by malloc) Read/write segment.data,.bss Read-only segment.text,.rodata unused Stephen Chong, Harvard University 44 %esp

Virtual memory simplifies linking Linking Each program has similar virtual address space Code, stack, and shared libraries always start at the same address Stephen Chong, Harvard University 0xffffffff Process address space 0x00000000 OS memory User stack Shared libraries Heap (used by malloc) Read/write segment.data,.bss Read-only segment.text,.rodata unused

Virtual memory simplifies loading Doesn t make sense to immediately load all of a program into physical memory Lots of code and data may not be used E.g., rarely used functionality Initially, page table maps data, text, rodata, etc., segments to disk i.e., PTE are marked as invalid As segments are used (e.g., code accessed for the first time), page fault will bring them into physical memory Demand paging: pages copied into memory only when they are needed Stephen Chong, Harvard University 0xffffffff Process address space 0x00000000 OS memory User stack Shared libraries Heap (used by malloc) Read/write segment.data,.bss Read-only segment.text,.rodata unused

Demand-zero pages When page from bss 0xffffffff OS memory segment (and new User stack memory for the heap, stack, etc.) faulted for the first time, physical page of all zeros allocated. Once page is written, like any other page. Demand-zero pages Process address space Shared libraries Heap (used by malloc) Read/write segment.data,.bss Read-only segment.text,.rodata Stephen Chong, Harvard University 0x00000000 unused 47

Today Getting from C programs to executables The what and why of linking Symbol relocation Symbol resolution ELF format Loading Static libraries Shared libraries Stephen Chong, Harvard University 48

Packaging Commonly Used Functions How to package functions commonly used by programmers? printf, malloc, strcmp, all that stuff? Awkward, given the linker framework so far: Option 1: Put all functions in a single source file Programmers link big object file into their programs: gcc -o myprog myprog.o somebighonkinglibraryfile.o This would be really inefficient! Option 2: Put each routine in a separate object file Programmers explicitly link appropriate object files into their programs gcc -o myprog myprog.o printf.o malloc.o free.o strcmp.o strlen.o strerr.o... More efficient, but a real pain to the programmer Stephen Chong, Harvard University 49

Static Libraries Solution: Static libraries Combine multiple object files into a single archive file (file extension.a ) Example: libc.a contains a whole slew of object files bundled together. Linker can also take archive files as input: gcc -o myprog myprog.o /usr/lib/libc.a Linker searches the.o files within the.a file for needed references and links them into the executable. Stephen Chong, Harvard University 50

Creating Static Libraries atoi.c printf.c... random.c Source files gcc -c gcc -c... gcc -c atoi.o printf.o... random.o Separately compiled relocatable object files ar libc.a Static library Create a static library file using the UNIX ar command ar rs libc.a atoi.o printf.o random.o... Can list contents of a library using ar t libc.a Stephen Chong, Harvard University 51

Commonly Used Libraries libc.a (the C standard library) 2.8 MB archive of 1400 object files. I/O, memory allocation, signal handling, string handling, data and time, random numbers, integer math libm.a (the C math library) 0.5 MB archive of 400 object files. floating point math (sin, cos, tan, log, exp, sqrt, ) % ar t /usr/lib/libc.a sort fork.o fprintf.o fpu_control.o fputc.o freopen.o fscanf.o fseek.o fstab.o % ar t /usr/lib/libm.a sort e_acos.o e_acosf.o e_acosh.o e_acoshf.o e_acoshl.o e_acosl.o e_asin.o e_asinf.o e_asinl.o Stephen Chong, Harvard University 52

Using Static Libraries Linker s algorithm for resolving external references: Scan.o files and.a files in command line order. During the scan, keep a list of the current unresolved references. As each new.o or.a file is encountered, try to resolve each unresolved reference in the list against the symbols in the new file. If any entries unresolved when done, then return an error. Problem: Command line order matters! Moral: put libraries at the end of the command line. unix> gcc mylib.a libtest.o libtest.o: In function `main': libtest.o(.text+0x4): undefined reference to `libfun' unix> gcc libtest.o mylib.a # works fine!!! Stephen Chong, Harvard University 53

Shared Libraries Static libraries have the following disadvantages: Lots of code duplication in the resulting executable files Every C program needs the standard C library. e.g., Every program calling printf() would have a copy of the printf() code in the executable. Very wasteful! Lots of code duplication in the memory image of each running program OS would have to allocate memory for the standard C library routines being used by every running program! Any changes to system libraries would require relinking every binary! Solution: Shared libraries Libraries that are linked into an application dynamically, when a program runs! On UNIX,.so filename extension is used On Windows,.dll filename extension is used Stephen Chong, Harvard University 54

Shared Libraries When the OS runs a program, it checks whether the executable was linked against any shared library (.so) files. If so, it performs the linking and loading of the shared libraries on the fly. Example: gcc -o myprog main.o /usr/lib/libc.so Can use UNIX ldd command to see which shared libs an executable is linked against unix> ldd myprog linux-gate.so.1 => (0xffffe000) libc.so.6 => /lib/tls/i686/cmov/libc.so.6 (0xb7df5000) /lib/ld-linux.so.2 (0xb7f52000) Can create your own shared libs using gcc -shared gcc -shared -o mylib.so main.o swap.o Stephen Chong, Harvard University 55

Runtime dynamic linking You can also link against shared libraries at run time! Three UNIX routines used for this: dlopen() Open a shared library file dlsym() Look up a symbol in a shared library file dlclose() Close a shared library file Why would you ever want to do this? Can load new functionality on the fly, or extend a program after it was originally compiled and linked. Examples include things like browser plug-ins, which the user might download from the Internet well after the original program was compiled and linked. Stephen Chong, Harvard University 56

Dynamic linking example #include <stdio.h> #include <dlfcn.h> int main() { void *handle; void (*somefunc)(int, int); char *error; /* dynamically load the shared * lib that contains somefunc() */ handle = dlopen("./libvector.so", RTLD_LAZY); if (!handle) { fprintf(stderr, "%s\n", dlerror()); exit(1); } /* get a pointer to the somefunc() * function we just loaded */ somefunc = dlsym(handle, "somefunc"); if ((error = dlerror())!= NULL) { fprintf(stderr, "%s\n", error); exit(1); } /* Now we can call somefunc() just * like any other function */ somefunc(42, 38); } /* unload the shared library */ if (dlclose(handle) < 0) { fprintf(stderr, "%s\n", dlerror()); exit(1); } return 0; Stephen Chong, Harvard University 57