Revealing Internals of Linkers. Zhiqiang Lin

Similar documents
Outline. 1 Background. 2 ELF Linking. 3 Static Linking. 4 Dynamic Linking. 5 Summary. Linker. Various Stages. 1 Linking can be done at compile.

Systems Programming and Computer Architecture ( ) Timothy Roscoe

Lecture 16: Linking Computer Architecture and Systems Programming ( )

Example C program. 11: Linking. Why linkers? Modularity! Static linking. Why linkers? Efficiency! What do linkers do? 10/28/2013

Example C Program The course that gives CMU its Zip! Linking March 2, Static Linking. Why Linkers? Page # Topics

Linking February 24, 2005

Systems I. Linking II

A Simplistic Program Translation Scheme

Linking Oct. 15, 2002

Computer Systems. Linking. Han, Hwansoo

Example C Program. Linking CS Instructor: Sanjeev Se(a. int buf[2] = {1, 2}; extern int buf[]; int main() { swap(); return 0; }

LINKING. Jo, Heeseung

CSE 2421: Systems I Low-Level Programming and Computer Organization. Linking. Presentation N. Introduction to Linkers

Computer Organization: A Programmer's Perspective

Exercise Session 7 Computer Architecture and Systems Programming

Sungkyunkwan University

Linking. Computer Systems Organization (Spring 2017) CSCI-UA 201, Section 3. Instructor: Joanna Klukowska

CS429: Computer Organization and Architecture

Lecture 12-13: Linking

Linking Oct. 26, 2009"

Relocating Symbols and Resolving External References. CS429: Computer Organization and Architecture. m.o Relocation Info

E = 2 e lines per set. S = 2 s sets tag. valid bit B = 2 b bytes per cache block (the data) CSE351 Inaugural EdiNon Spring

Linking and Loading. CS61, Lecture 16. Prof. Stephen Chong October 25, 2011

CS 201 Linking Gerson Robboy Portland State University

Linker Puzzles The course that gives CMU its Zip! Linking Mar 4, A Simplistic Program Translation Scheme. A Better Scheme Using a Linker

Today. Linking. Example C Program. Sta7c Linking. Linking Case study: Library interposi7oning

u Linking u Case study: Library interpositioning

(Extract from the slides by Terrance E. Boult

COMPILING OBJECTS AND OTHER LANGUAGE IMPLEMENTATION ISSUES. Credit: Mostly Bryant & O Hallaron

Linking. Today. Next time. Static linking Object files Static & dynamically linked libraries. Exceptional control flows

Linking. CS 485 Systems Programming Fall Instructor: James Griffioen

CS241 Computer Organization Spring Buffer Overflow

A SimplisHc Program TranslaHon Scheme. TranslaHng the Example Program. Example C Program. Why Linkers? - Modularity. Linking

Systemprogrammering och operativsystem Laborationer. Linker Puzzles. Systemprogrammering 2007 Föreläsning 1 Compilation and Linking

Carnegie Mellon. Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition

CSE2421 Systems1 Introduction to Low-Level Programming and Computer Organization

Link 7. Static Linking

CS 550 Operating Systems Spring Process I

Link 7.A Static Linking

Link 7. Dynamic Linking

238P: Operating Systems. Lecture 7: Basic Architecture of a Program. Anton Burtsev January, 2018

Link 8.A Dynamic Linking

Linking : Introduc on to Computer Systems 13 th Lecture, October 11th, Instructor: Randy Bryant. Carnegie Mellon

Linking. Explain what ELF format is. Explain what an executable is and how it got that way. With huge thanks to Steve Chong for his notes from CS61.

Link 2. Object Files

Link 2. Object Files

Computer Systems Organization

Executables and Linking. CS449 Spring 2016

Compiler Drivers = GCC

Link 4. Relocation. Young W. Lim Wed. Young W. Lim Link 4. Relocation Wed 1 / 22

Executables and Linking. CS449 Fall 2017

Link 8. Dynamic Linking

Link 4. Relocation. Young W. Lim Thr. Young W. Lim Link 4. Relocation Thr 1 / 26

Generating Programs and Linking. Professor Rick Han Department of Computer Science University of Colorado at Boulder

Linkers and Loaders. CS 167 VI 1 Copyright 2008 Thomas W. Doeppner. All rights reserved.

Link 3. Symbols. Young W. Lim Mon. Young W. Lim Link 3. Symbols Mon 1 / 42

Introduction Presentation A

Systemprogrammering och operativsystem Linker Puzzles. Systemprogrammering 2009 Före lä s n in g 1 Compilation and Linking

Link Edits and Relocatable Code

CIT 595 Spring System Software: Programming Tools. Assembly Process Example: First Pass. Assembly Process Example: Second Pass.

CS5460/6460: Operating Systems. Lecture 21: Shared libraries. Anton Burtsev March, 2014

A software view. Computer Systems. The Compilation system. How it works. 1. Preprocesser. 1. Preprocessor (cpp)

Binghamton University. CS-220 Spring Loading Code. Computer Systems Chapter 7.5, 7.8, 7.9

M2 Instruction Set Architecture

238P: Operating Systems. Lecture 4: Linking and Loading (Basic architecture of a program) Anton Burtsev October, 2018

143A: Principles of Operating Systems. Lecture 4: Linking and Loading (Basic architecture of a program) Anton Burtsev October, 2018

Midterm. Median: 56, Mean: "midterm.data" using 1:2 1 / 37

CMSC 313 Lecture 12 [draft] How C functions pass parameters

Overview REWARDS TIE HOWARD Summary CS 6V Data Structure Reverse Engineering. Zhiqiang Lin

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING PREVIEW SLIDES 16, SPRING 2013

High Performance Computing Lecture 1. Matthew Jacob Indian Institute of Science

Lecture 8: linking CS 140. Dawson Engler Stanford CS department

Link 4. Relocation. Young W. Lim Sat. Young W. Lim Link 4. Relocation Sat 1 / 33

Machine Language, Assemblers and Linkers"

CS 33. Linkers. CS33 Intro to Computer Systems XXV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Introduction to Computer Systems , fall th Lecture, Sep. 28 th

Outline. Unresolved references

Link 4. Relocation. Young W. Lim Tue. Young W. Lim Link 4. Relocation Tue 1 / 38

Outline. Outline. Common Linux tools to explore object/executable files. Revealing Internals of Loader. Zhiqiang Lin

Draft. Chapter 1 Program Structure. 1.1 Introduction. 1.2 The 0s and the 1s. 1.3 Bits and Bytes. 1.4 Representation of Numbers in Memory

Link 4. Relocation. Young W. Lim Mon. Young W. Lim Link 4. Relocation Mon 1 / 35

CSL373: Operating Systems Linking

ELF (1A) Young Won Lim 10/22/14

CS , Fall 2004 Exam 1

CS2141 Software Development using C/C++ Libraries

Assembly Programmer s View Lecture 4A Machine-Level Programming I: Introduction

Link 4. Relocation. Young W. Lim Thr. Young W. Lim Link 4. Relocation Thr 1 / 48

CS 3214 Computer Systems. Do not start the test until instructed to do so! printed

CS165 Computer Security. Understanding low-level program execution Oct 1 st, 2015

Compila(on, Disassembly, and Profiling

Obtained the source code to gcc, one can just follow the instructions given in the INSTALL file for GCC.

Intro x86 Part 3: Linux Tools & Analysis

Turning C into Object Code Code in files p1.c p2.c Compile with command: gcc -O p1.c p2.c -o p Use optimizations (-O) Put resulting binary in file p

CS 33. Libraries. CS33 Intro to Computer Systems XXVIII 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

Linking and Loading. ICS312 - Spring 2010 Machine-Level and Systems Programming. Henri Casanova

CS140 - Summer Handout #8

COS 318: Operating Systems

CSC 591 Systems Attacks and Defenses Return-into-libc & ROP

CALL (Compiler/Assembler/Linker/ Loader)

Lectures 5-6: Introduction to C

Transcription:

CS 6V81-05: System Security and Malicious Code Analysis Revealing Internals of Linkers Zhiqiang Lin Department of Computer Science University of Texas at Dallas March 26 th, 2012

Outline 1 Background 2 ELF Linking 3 Static Linking 4 Dynamic Linking 5 Summary

Outline 1 Background 2 ELF Linking 3 Static Linking 4 Dynamic Linking 5 Summary

Linker Definition In computer science, a linker or link editor is a program that takes one or more objects generated by a compiler and combines them into a single executable program

Linker Definition In computer science, a linker or link editor is a program that takes one or more objects generated by a compiler and combines them into a single executable program

Linker Definition In computer science, a linker or link editor is a program that takes one or more objects generated by a compiler and combines them into a single executable program Various Stages 1 Linking can be done at compile time 2 at load time (by loaders) 3 at run time (by application programs)

Compilation vs. Linking Compilation Compilation refers to the processing of source code files (.c,.cc, or.cpp) and the creation of an object file. This step doesn t create anything the user can actually run. Instead, the compiler merely produces the machine language instructions that correspond to the source code file that was compiled.

Compilation vs. Linking Compilation Compilation refers to the processing of source code files (.c,.cc, or.cpp) and the creation of an object file. This step doesn t create anything the user can actually run. Instead, the compiler merely produces the machine language instructions that correspond to the source code file that was compiled. Linking Linking refers to the creation of a single executable file from multiple object files. In this step, it is common that the linker will complain about undefined functions (commonly, main itself).

Why Linkers? Modularity Program can be written as a collection of smaller source files, rather than one monolithic mass. Can build libraries of common functions (more on this later) e.g., Math library, standard C library

Non-shared v.s. Shared Library A library is a collection of pre-written function calls.

Non-shared v.s. Shared Library A library is a collection of pre-written function calls. Using existing libraries can save a programmer a lot of time. (e.g., the printf() in the C library)

Non-shared v.s. Shared Library A library is a collection of pre-written function calls. Using existing libraries can save a programmer a lot of time. (e.g., the printf() in the C library) However, if a library is not shared among many programs, but instead a copy is included in every program that uses it, the size of each linked program will be large. Also, a lot of disk and memory space will be wasted by these programs that uses non-shared libraries.

Non-shared v.s. Shared Library A library is a collection of pre-written function calls. Using existing libraries can save a programmer a lot of time. (e.g., the printf() in the C library) However, if a library is not shared among many programs, but instead a copy is included in every program that uses it, the size of each linked program will be large. Also, a lot of disk and memory space will be wasted by these programs that uses non-shared libraries. This motivates the concept of shared library.

Program with shared library

Why Linkers? (cont d) Efficiency Time: Separate compilation Change one source file, compile, and then relink. No need to recompile other source files. Space: Libraries Common functions can be aggregated into a single file... Yet executable files and running memory images contain only code for the functions they actually use.

Outline 1 Background 2 ELF Linking 3 Static Linking 4 Dynamic Linking 5 Summary

Elf Linking

How Does It Work? At link time, the linker searches through libraries to find modules that resolve undefined external symbols.

How Does It Work? At link time, the linker searches through libraries to find modules that resolve undefined external symbols. Rather than copying the contents of the module into the output file, the linker makes a note of what library the module come from and puts a list of the libraries in the executable.

How Does It Work? At link time, the linker searches through libraries to find modules that resolve undefined external symbols. Rather than copying the contents of the module into the output file, the linker makes a note of what library the module come from and puts a list of the libraries in the executable. When the program is loaded, startup code finds those libraries and maps them into the program s address space before the program starts.

Elf Linking Overview Goal: Merges object files merges multiple relocatable (.o) object files into a single executable object file that can loaded and executed by the loader.

Elf Linking Overview Goal: Merges object files merges multiple relocatable (.o) object files into a single executable object file that can loaded and executed by the loader. Step I: Resolves external references as part of the merging process, resolves external references. external reference: reference to a symbol defined in another object file

Elf Linking Overview Step II: Relocates symbols relocates symbols from their relative locations in the.o files to new absolute positions in the executable. updates all references to these symbols to reflect their new positions. references can be in either code or data code: a(); /* ref to symbol a */ data: int *xp=&x; /* ref to symbol x */ because of this modifying, linking is sometimes called link editing.

Example C Program main.c 1 int buf[2] = {1, 2}; 2 3 int main() 4 { 5 swap(); 6 return 0; 7 } swap.c 1 extern int buf[]; 2 3 int *bufp0 = &buf[0]; 4 static int *bufp1; 5 6 void swap() 7 { 8 int temp; 9 10 bufp1 = &buf[1]; 11 temp = *bufp0; 12 *bufp0 = *bufp1; 13 *bufp1 = temp; 14 }

Outline 1 Background 2 ELF Linking 3 Static Linking 4 Dynamic Linking 5 Summary

Static Linking Static Linking Programs are translated and linked using a compiler driver: Programs are translated and linked using a compiler driver: unix> unix> gcc gcc -O2 -g -o -op pmain.c main.c swap.c swap.c unix>./p unix>./p Carnegie Mellon main.c swap.c Source files Translators (cpp, cc1, as) main.o Translators (cpp, cc1, as) swap.o Separately compiled relocatable object files Linker (ld) p Fully linked executable object file (contains code and data for all functions defined in main.c and swap.c) 7

What Do Linkers Do? Step I: Symbol resolution Programs define and reference symbols (variables and functions): void swap() {...} /* define symbol swap */ swap(); /* reference symbol a */ int *xp = &x; /* define symbol xp, reference x */ Symbol definitions are stored (by compiler) in symbol table. Symbol table is an array of structs Each entry includes name, size, and location of symbol. Linker associates each symbol reference with exactly one symbol definition.

What Do Linkers Do?(cont d) Step II: Relocation Merges separate code and data sections into single sections Relocates symbols from their relative locations in the.o files to their final absolute memory locations in the executable. Updates all references to these symbols to reflect their new positions.

Recall: Three Kinds of Object Files Relocatable object file (.o file) Contains code and data in a form that can be combined with other relocatable object files to form executable object file. Each.o file is produced from exactly one source (.c) file Executable object file (a.out file) Contains code and data in a form that can be copied directly into memory and then executed. Shared object file (.so file) Special type of relocatable object file that can be loaded into memory and linked dynamically, at either load time or run-time. Called Dynamic Link Libraries (DLLs) by Windows

ELF object file format Carnegie Mellon Elf header ELF Object File Format Elf header magic number, type (.o, exec,.so), machine, byte ordering, Word size, byte ordering, file type (.o, exec, etc..so), machine type, etc. Program header table Segment header table page size, virtual Page size, addresses virtual addresses for memory segments segments (sections), segment segment sizes. sizes..text section.text section Code code.rodata section.data section Read only data: jump tables,... initialized.data (static) section data.bss section Initialized global variables.bss section uninitialized (static) data Block Started Uninitialized by Symbol global variables Better Save Block Space Started by Symbol Better Save Space has section header but occupies no space Has section header but occupies no space ELF header Segment header table (required for executables).text section.rodata section.data section.bss section.symtab section.rel.txt section.rel.data section.debug section Section header table 0 14

ELF object file format Carnegie Mellon.symtab section ELF Object File Format (cont.) symbol table.symtab section procedure and Symbol static table variable names section names Procedure and static variable names Section names and locations.rel.text section.rel.text section relocation info Relocation for.text info section for.text section addresses of Addresses instructions of instructions that willthat need will toneed be to be modified in the modified executable in the executable instructions Instructions for modifying. for modifying..rel.data section.rel.data section Relocation info for.data section relocation info Addresses for.data of pointer sectiondata that will need to be modified in the merged executable addresses of pointer data that will need to be.debug section modified in the Info merged for symbolic executable debugging (gcc -g).debug section Section header table Offsets and sizes of each section info for symbolic debugging (gcc -g) ELF header Segment header table (required for executables).text section.rodata section.data section.bss section.symtab section.rel.txt section.rel.data section.debug section Section header table 0 15

Linker Symbols Global symbols Symbols defined by module m that can be referenced by other modules. E.g.: non-static C functions and non-static global variables. External symbols Global symbols that are referenced by module m but defined by some other module. Local symbols Symbols that are defined and referenced exclusively by module m. E.g.: C functions and variables defined with the static attribute. Local linker symbols are not local program variables

Resolving Symbols Resolving Symbols Carnegie Me Global Global External Local int buf[2] = {1, 2}; extern int buf[]; int main() { swap(); return 0; } main.c int *bufp0 = &buf[0]; static int *bufp1; void swap() { int temp; Global External Linker knows nothing of temp bufp1 = &buf[1]; temp = *bufp0; *bufp0 = *bufp1; *bufp1 = temp; } swap.c

Relocating Code and Data Relocating Code and Data Carnegie Mellon Relocatable Object Files Executable Object File System code System data.text.data 0 Headers System code main.o main().text int buf[2]={1,2}.data swap.o swap().text int *bufp0=&buf[0].data static int *bufp1.bss main() swap() More system code System data int buf[2]={1,2} int *bufp0=&buf[0] int *bufp1.symtab.debug.text.data.bss Even though private to swap, requires allocation in.bss

Relocation Info (main) Relocation Info (main) main.c int buf[2] = {1,2}; int main() { swap(); return 0; } Carnegie Mellon main.o 0000000 <main>: 0: 8d 4c 24 04 lea 0x4(%esp),%ecx 4: 83 e4 f0 and $0xfffffff0,%esp 7: ff 71 fc pushl 0xfffffffc(%ecx) a: 55 push %ebp b: 89 e5 mov %esp,%ebp d: 51 push %ecx e: 83 ec 04 sub $0x4,%esp 11: e8 fc ff ff ff call 12 <main+0x12> 12: R_386_PC32 swap 16: 83 c4 04 add $0x4,%esp 19: 31 c0 xor %eax,%eax 1b: 59 pop %ecx 1c: 5d pop %ebp 1d: 8d 61 fc lea 0xfffffffc(%ecx),%esp 20: c3 ret Disassembly of section.data: Source: objdump r -d 00000000 <buf>: 0: 01 00 00 00 02 00 00 00 19

Relocs in i386 in static linking R_386_32 Simply deposit the absolute memory address of "symbol" into a dword R_386_PC32 Determine the destinance from this memory location to the "symbol", then add it to the value currently at this dword; deposit the result back into the dword

20 Relocation Info (swap,.text) Relocation Info (swap,.text) Carnegie Mellon swap.c extern int buf[]; int *bufp0 = &buf[0]; static int *bufp1; void swap() { int temp; } bufp1 = &buf[1]; temp = *bufp0; *bufp0 = *bufp1; *bufp1 = temp; swap.o Disassembly of section.text: 00000000 <swap>: 0: 8b 15 00 00 00 00 mov 0x0,%edx 2: R_386_32 buf 6: a1 04 00 00 00 mov 0x4,%eax 7: R_386_32 buf b: 55 push %ebp c: 89 e5 mov %esp,%ebp e: c7 05 00 00 00 00 04 movl $0x4,0x0 15: 00 00 00 10: R_386_32.bss 14: R_386_32 buf 18: 8b 08 mov (%eax),%ecx 1a: 89 10 mov %edx,(%eax) 1c: 5d pop %ebp 1d: 89 0d 04 00 00 00 mov %ecx,0x4 1f: R_386_32 buf 23: c3 ret

Relocation Info (swap,.data) Relocation Info (swap,.data) Carnegie Mellon swap.c extern int buf[]; int *bufp0 = &buf[0]; static int *bufp1; void swap() { int temp; Disassembly of section.data: 00000000 <bufp0>: 0: 00 00 00 00 0: R_386_32 buf } bufp1 = &buf[1]; temp = *bufp0; *bufp0 = *bufp1; *bufp1 = temp;

Executable Before/After Relocation (.text) Executable Before/After Relocation (.text) 0000000 <main>:... e: 83 ec 04 sub $0x4,%esp 11: e8 fc ff ff ff call 12 <main+0x12> 12: R_386_PC32 swap 16: 83 c4 04 add $0x4,%esp... Carnegie Mellon 0x8048396 + 0x1a = 0x80483b0 08048380 <main>: 8048380: 8d 4c 24 04 lea 0x4(%esp),%ecx 8048384: 83 e4 f0 and $0xfffffff0,%esp 8048387: ff 71 fc pushl 0xfffffffc(%ecx) 804838a: 55 push %ebp 804838b: 89 e5 mov %esp,%ebp 804838d: 51 push %ecx 804838e: 83 ec 04 sub $0x4,%esp 8048391: e8 1a 00 00 00 call 80483b0 <swap> 8048396: 83 c4 04 add $0x4,%esp 8048399: 31 c0 xor %eax,%eax 804839b: 59 pop %ecx 804839c: 5d pop %ebp 804839d: 8d 61 fc lea 0xfffffffc(%ecx),%esp 80483a0: c3 ret 22

Executable Before/After Relocation (.text) Carnegie Mellon 0: 8b 15 00 00 00 00 mov 0x0,%edx 2: R_386_32 buf 6: a1 04 00 00 00 mov 0x4,%eax 7: R_386_32 buf... e: c7 05 00 00 00 00 04 movl $0x4,0x0 15: 00 00 00 10: R_386_32.bss 14: R_386_32 buf... 1d: 89 0d 04 00 00 00 mov %ecx,0x4 1f: R_386_32 buf 23: c3 ret 080483b0 <swap>: 80483b0: 8b 15 20 96 04 08 mov 0x8049620,%edx 80483b6: a1 24 96 04 08 mov 0x8049624,%eax 80483bb: 55 push %ebp 80483bc: 89 e5 mov %esp,%ebp 80483be: c7 05 30 96 04 08 24 movl $0x8049624,0x8049630 80483c5: 96 04 08 80483c8: 8b 08 mov (%eax),%ecx 80483ca: 89 10 mov %edx,(%eax) 80483cc: 5d pop %ebp 80483cd: 89 0d 24 96 04 08 mov %ecx,0x8049624 80483d3: c3 ret 23

Executable After Relocation (.data) Executable After Relocation (.data) Disassembly of section.data: 08049620 <buf>: 8049620: 01 00 00 00 02 00 00 00 08049628 <bufp0>: 8049628: 20 96 04 08

Ca Strong and Weak Symbols Strong and Weak Symbols Program symbols are either strong or weak Program symbols are either strong or weak Strong: procedures and initialized globals Strong: procedures and initialized globals Weak: Weak: uninitialized uninitialized globals globals p1.c p2.c strong int foo=5; int foo; weak strong p1() { } p2() { } strong

Linker s Symbol Rules Rule 1: Multiple strong symbols are not allowed Each item can be defined only once Otherwise: Linker error Rule 2: Given a strong symbol and multiple weak symbol, choose the strong symbol References to the weak symbol resolve to the strong symbol Rule 3: If there are multiple weak symbols, pick an arbitrary one Can override this with gcc -fno-common

Relocating symbols and resolving external references Carnegie Mellon Symbols are lexical entities that name functions and variables. Relocating symbols and resolving external references Each symbol has a value (typically a memory address). Symbols are lexical entities that name functions and variables. Each symbol has a value (typically a memory address). Code consists of symbol definitions and references. Code consists of symbol definitions and references. References can be either local or external. Def of local symbol e Ref to external symbol exit (defined in libc.so) References can be either local or external. m.c int e=7; int main() { int r = a(); exit(0); } Ref to external symbol a Def of local symbol ep Def of local symbol a a.c extern int e; int *ep=&e; int x=15; int y; int a() { return *ep+x+y; } Refs of local symbols e,x,y Ref to external symbol e Defs of local symbols x and y 27

Linker Puzzles Linker Puzzles int x; p1() {} p1() {} Link time error: two strong symbols (p1) Carnegie Mellon int x; p1() {} int x; p2() {} References to x will refer to the same uninitialized int. Is this what you really want? int x; int y; p1() {} double x; p2() {} Writes to x in p2 might overwrite y! Evil! int x=7; int y=5; p1() {} double x; p2() {} Writes to x in p2 will overwrite y! Nasty! int x=7; p1() {} int x; p2() {} References to x will refer to the same initialized variable. Nightmare scenario: two identical weak structs, compiled by different compilers with different alignment rules. Nightmare scenario: two identical weak structs, compiled by different compilers with different alignment rules. 35

Role of.h Files Role of.h Files c1.c #include "global.h" int f() { return g+1; } c2.c #include <stdio.h> #include "global.h" global.h int main() { if (!init) g = 37; int t = f(); printf("calling f yields %d\n", t); return 0; } #ifdef INITIALIZE int g = 23; static int init = 1; #else int g; static int init = 0; #endif Carnegie Mellon 36

Running Preprocessor Running Preprocessor c1.c #include "global.h" int f() { return g+1; } global.h #ifdef INITIALIZE int g = 23; static int init = 1; #else int g; static int init = 0; #endif Carnegie Mellon -DINITIALIZE int g = 23; static int init = 1; int f() { return g+1; } no initialization int g; static int init = 0; int f() { return g+1; } #include causes C preprocessor to insert file verbatim 37

Role of.h Files(cont d) Role of.h Files (cont.) c2.c c1.c #include "global.h" int f() { return g+1; } #include <stdio.h> #include "global.h" global.h int main() { if (!init) g = 37; int t = f(); printf("calling f yields %d\n", t); return 0; } #ifdef INITIALIZE int g = 23; static int init = 1; #else int g; static int init = 0; #endif Carnegie Mellon What happens: gcc -o p c1.c c2.c?? gcc -o p c1.c c2.c \ -DINITIALIZE?? 38

Global Variables Avoid if you can Otherwise Use static if you can Initialize if you define a global variable Use extern if you use external global variable

Packaging Commonly Used Functions How to package functions commonly used by programmers? Math, I/O, memory management, string manipulation, etc. Awkward, given the linker framework so far: Option 1: Put all functions into a single source file Programmers link big object file into their programs Space and time inefficient Option 2: Put each function in a separate source file Programmers explicitly link appropriate binaries into their programs More efficient, but burdensome on the programmer

Library Structure Library Structure Carnegie Mellon foo.c gcc foo.o 01010010010101 100 10100100101000 1001000210010001 1001010100100101 011010100100101 011010101010101 01101010101010 010 1010100100101 bar.o 01010010010101 100 10100100101000 1001000210010001 1001010100100101 011010100100101 011010101010101 01101010101010 010 1010100100101 main 01010010010101 100 10100100101000 1001000210010001 1001010100100101 011010100100101 011010101010101 01101010101010 010 1010100100101 main.o 01010010010101 100 10100100101000 1001000210010001 1001010100100101 011010100100101 011010101010101 01101010101010 010 1010100100101 ar lib2.a 01010010010101 100 10100100101000 1001000210010001 1001010100100101 011010100100101 011010101010101 01101010101010 010 1010100100101 dynamic lib.a/lib.so 01010010010101 100 10100100101000 1001000210010001 1001010100100101 011010100100101 011010101010101 01101010101010 010 1010100100101

Solution: Static Libraries Static libraries (.a archive files) Concatenate related relocatable object files into a single file with an index (called an archive). Enhance linker so that it tries to resolve unresolved external references by looking for the symbols in one or more archives. If an archive member file resolves reference, link it into the executable.

Carnegie Mellon Creating Static Libraries Creating Static Libraries atoi.c printf.c random.c Translator Translator... Translator atoi.o printf.o random.o Archiver (ar) unix> ar rs libc.a \ atoi.o printf.o random.o libc.a C standard library Archiver allows incremental updates Archiver allows incremental updates Recompile function that changes and replace.o file in archive. Recompile function that changes and replace.o file in archive. 43

Commonly Used Libraries libc.a (the C standard library) 8 MB archive of 1392 object files. I/O, memory allocation, signal handling, string handling, data and time, random numbers, integer math libm.a (the C math library) 1 MB archive of 401 object files. floating point math (sin, cos, tan, log, exp, sqrt,...) % ar -t /usr/lib/libc.a sort... fork.o... fprintf.o fpu_control.o fputc.o freopen.o fscanf.o fseek.o fstab.o... % ar -t /usr/lib/libm.a sort... e_acos.o e_acosf.o e_acosh.o e_acoshf.o e_acoshl.o e_acosl.o e_asin.o e_asinf.o e_asinl.o...

Linking with Static Libraries Linking with Static Libraries Carnegie addvec.o multvec.o main2.c vector.h Archiver (ar) Translators (cpp, cc1, as) libvector.a libc.a Static libraries Relocatable object files main2.o addvec.o printf.o and any other modules called by printf.o Linker (ld) p2 Fully linked executable object file

Using Static Libraries Linker s algorithm for resolving external references: Scan.o files and.a files in the command line order. During the scan, keep a list of the current unresolved references. As each new.o or.a file, obj, is encountered, try to resolve each unresolved reference in the list against the symbols defined in obj. If any entries in the unresolved list at end of scan, then error. Problem: Command line order matters! Moral: put libraries at the end of the command line. unix> gcc -L. libtest.o -lmine unix> gcc -L. -lmine libtest.o libtest.o: In function main : libtest.o(.text+0x4): undefined reference to libfun

Loading Executable Object Files Executable Object File ELF header Program header table (required for executables).init section 0 0x100000000 Kernel virtual memory User stack (created at runtime) Memory outside 32-bit address space %esp (stack pointer).text section.rodata section.data section 0xf7e9ddc0 Memory-mapped region for shared libraries.bss section.symtab.debug.line.strtab Section header table (required for relocatables) 0x08048000 0 Run-time heap (created by malloc) Read/write segment (.data,.bss) Read-only segment (.init,.text,.rodata) Unused brk Loaded from the executable file

Outline 1 Background 2 ELF Linking 3 Static Linking 4 Dynamic Linking 5 Summary

Shared Libraries Static libraries have the following disadvantages: Duplication in the stored executables (every function need std libc) Duplication in the running executables Minor bug fixes of system libraries require each application to explicitly relink Modern solution: Shared Libraries Object files that contain code and data that are loaded and linked into an application dynamically, at either load-time or run-time Also called: dynamic link libraries, DLLs,.so files

Shared Libraries (cont.) Dynamic linking can occur when executable is first loaded and run (load-time linking). Common case for Linux, handled automatically by the dynamic linker (ld-linux.so). Standard C library (libc.so) usually dynamically linked. Dynamic linking can also occur after program has begun (run-time linking). In Linux, this is done by calls to the dlopen() interface. Distributing software. High-performance web servers. Runtime library interpositioning. Shared library routines can be shared by multiple processes. More on this when we learn about virtual memory

Dynamic Linking at Load-time Dynamic Linking at Load-time Relocatable object file main2.c Translators (cpp, cc1, as) main2.o vector.h Linker (ld) libc.so libvector.so Carnegie Mellon unix> gcc -shared -o libvector.so \ addvec.c multvec.c Relocation and symbol table info Partially linked executable object file p2 Loader (execve) libc.so libvector.so Fully linked executable in memory Dynamic linker (ld-linux.so) Code and data 50

Examples of Dynamic Linking #cat hello1.c main() { greet(); } #cat english.c greet() { printf("hello World"); } #gcc -fpic -c english.c #gcc -shared -o libenglish.so english.o #gcc -c hello1.c #gcc -o hello1 hello1.o -L. -lenglish # LD_LIBRARY_PATH=../hello1 Hello World

Trace it using objdump softos-server-debian:~/elf-test$ objdump -s -j.dynstr hello1 hello1: file type: elf32-i386.dynstr: 8048240 006c6962 656e676c 6973682e 736f005f.libenglish.so._ 8048250 44594e41 4d494300 5f696e69 74006772 DYNAMIC._init.gr 8048260 65657400 5f66696e 69005f47 4c4f4241 eet._fini._globa 8048270 4c5f4f46 46534554 5f544142 4c455f00 L_OFFSET_TABLE_. 8048280 5f4a765f 52656769 73746572 436c6173 _Jv_RegisterClas 8048290 73657300 5f5f676d 6f6e5f73 74617274 ses. gmon_start 80482a0 5f5f006c 6962632e 736f2e36 005f494f.libc.so.6._IO 80482b0 5f737464 696e5f75 73656400 5f5f6c69 _stdin_used. li 80482c0 62635f73 74617274 5f6d6169 6e005f65 bc_start_main._e 80482d0 64617461 005f5f62 73735f73 74617274 data. bss_start 80482e0 005f656e 6400474c 4942435f 322e3000._end.GLIBC_2.0.

Let s see again on library creating Static Library (How to Create) vi hello-1.c gcc -c hello-1.c ar -r hello-1.a hello-1.o -> get hello-1.c -> get hello-1.o -> get hello-1.a (static library) Static Library (How to Use) vi main-1.c gcc -c main-1.c gcc -o main-1 main-1.o hello-1.a./main-1 -> get main-1.c -> get main-1.o -> get main-1 -> execute main-1

Let s see again on library creating Dynamic Library (How to Create) vi hello-2.c -> get hello-2.c gcc -c hello-2.c -> get hello-2.o gcc -shared -o libhello-2.so hello-2.o -> get libhello-2.so (dynamic library) Dynamic Library (How to Use) vi main-2.c -> get main-2.c gcc -c main-2.c -> get main-2.o gcc -o main-2 main-2.o -L. -lhello-2 -> get main-2 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:../main-2 -> execute main-2

Dynamic Linking at Run-time #include <stdio.h> #include <dlfcn.h> int x[2] = {1, 2}; int y[2] = {3, 4}; int z[2]; int main() { void *handle; void (*addvec)(int *, int *, int *, int); char *error; } /* dynamically load the shared lib that contains addvec() */ handle = dlopen("./libvector.so", RTLD_LAZY); if (!handle) { fprintf(stderr, "%s\n", dlerror()); exit(1);

Dynamic Linking at Run-time... /* get a pointer to the addvec() function we just loaded */ addvec = dlsym(handle, "addvec"); if ((error = dlerror())!= NULL) { fprintf(stderr, "%s\n", error); exit(1); } /* Now we can call addvec() just like any other function */ addvec(x, y, z, 2); printf("z = [%d %d]\n", z[0], z[1]); } /* unload the shared library */ if (dlclose(handle) < 0) { fprintf(stderr, "%s\n", dlerror()); exit(1); } return 0;

Outline 1 Background 2 ELF Linking 3 Static Linking 4 Dynamic Linking 5 Summary

Summary NAME ld - The GNU linker SYNOPSIS ld [options] objfile... DESCRIPTION ld combines a number of object and archive files, relocates their data and ties up symbol references. Usually the last step in compiling a program is to run ld. Linker is one of the key system components

References http://www.linuxjournal.com/article/6463 http://netwinder.osuosl.org/users/p/patb/ public_html/elf_relocs.html http://www.cs.cmu.edu/afs/cs/academic/class/15213- f10/www/lectures/11-linking.pdf http://en.wikipedia.org/wiki/linker_(computing) Linker and Loader http://v2.cache7.c.bigcache.googleapis.com/ books.lihui.org/cs2/linkers_and_loaders.pdf