Executables and Linking. CS449 Fall 2017

Similar documents
Executables and Linking. CS449 Spring 2016

Computer Systems Organization

(Extract from the slides by Terrance E. Boult

Systems Programming and Computer Architecture ( ) Timothy Roscoe

CSE2421 Systems1 Introduction to Low-Level Programming and Computer Organization

Systems Programming. Fatih Kesgin &Yusuf Yaslan Istanbul Technical University Computer Engineering Department 18/10/2005

Compiler Drivers = GCC

238P: Operating Systems. Lecture 7: Basic Architecture of a Program. Anton Burtsev January, 2018

Example C program. 11: Linking. Why linkers? Modularity! Static linking. Why linkers? Efficiency! What do linkers do? 10/28/2013

A Simplistic Program Translation Scheme

Draft. Chapter 1 Program Structure. 1.1 Introduction. 1.2 The 0s and the 1s. 1.3 Bits and Bytes. 1.4 Representation of Numbers in Memory

Example C Program The course that gives CMU its Zip! Linking March 2, Static Linking. Why Linkers? Page # Topics

Generating Programs and Linking. Professor Rick Han Department of Computer Science University of Colorado at Boulder

Linking Oct. 15, 2002

Computer Systems. Linking. Han, Hwansoo

LINKING. Jo, Heeseung

Generic Programming in C

CSE 2421: Systems I Low-Level Programming and Computer Organization. Linking. Presentation N. Introduction to Linkers

Linking February 24, 2005

Lecture 16: Linking Computer Architecture and Systems Programming ( )

Linking and Loading. CS61, Lecture 16. Prof. Stephen Chong October 25, 2011

2 Compiling a C program

Link 8.A Dynamic Linking

Relocating Symbols and Resolving External References. CS429: Computer Organization and Architecture. m.o Relocation Info

CS429: Computer Organization and Architecture

Assembly Language Programming Linkers

Linkers and Loaders. CS 167 VI 1 Copyright 2008 Thomas W. Doeppner. All rights reserved.

Obtained the source code to gcc, one can just follow the instructions given in the INSTALL file for GCC.

Lecture 8: linking CS 140. Dawson Engler Stanford CS department

CIT 595 Spring System Software: Programming Tools. Assembly Process Example: First Pass. Assembly Process Example: Second Pass.

Practical C Issues:! Preprocessor Directives, Multi-file Development, Makefiles. CS449 Fall 2017

Computer Organization: A Programmer's Perspective

CS 201 Linking Gerson Robboy Portland State University

Compiler Theory. (GCC the GNU Compiler Collection) Sandro Spina 2009

Link 7. Dynamic Linking

Linking. Computer Systems Organization (Spring 2017) CSCI-UA 201, Section 3. Instructor: Joanna Klukowska

High Performance Computing Lecture 1. Matthew Jacob Indian Institute of Science

CS140 - Summer Handout #8

Department of Computer Science and Engineering Yonghong Yan

CS 550 Operating Systems Spring Process I

Compilation, Disassembly, and Profiling (in Linux)

C03c: Linkers and Loaders

2012 LLVM Euro - Michael Spencer. lld. Friday, April 13, The LLVM Linker

Linking Oct. 26, 2009"

CSL373: Operating Systems Linking

Compila(on, Disassembly, and Profiling

COS 318: Operating Systems

CS240: Programming in C

Linking. Explain what ELF format is. Explain what an executable is and how it got that way. With huge thanks to Steve Chong for his notes from CS61.

Systems I. Linking II

Memory and C/C++ modules

C Compilation Model. Comp-206 : Introduction to Software Systems Lecture 9. Alexandre Denault Computer Science McGill University Fall 2006

CS 33. Linkers. CS33 Intro to Computer Systems XXV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

A software view. Computer Systems. The Compilation system. How it works. 1. Preprocesser. 1. Preprocessor (cpp)

Exercise Session 7 Computer Architecture and Systems Programming

Essentials for Scientific Computing: Source Code, Compilation and Libraries Day 8

CS2141 Software Development using C/C++ Libraries

Outline. Outline. Common Linux tools to explore object/executable files. Revealing Internals of Loader. Zhiqiang Lin

Example C Program. Linking CS Instructor: Sanjeev Se(a. int buf[2] = {1, 2}; extern int buf[]; int main() { swap(); return 0; }

Outline. Unresolved references

Link Edits and Relocatable Code

Making Address Spaces Smaller

The C standard library

Process Address Spaces and Binary Formats

CS631 - Advanced Programming in the UNIX Environment

#include <stdio.h> int main() { printf ("hello class\n"); return 0; }

CS 61C: Great Ideas in Computer Architecture. Lecture 2: Numbers & C Language. Krste Asanović & Randy Katz

CS 33. Libraries. CS33 Intro to Computer Systems XXVIII 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

Agenda. CS 61C: Great Ideas in Computer Architecture. Lecture 2: Numbers & C Language 8/29/17. Recap: Binary Number Conversion

TDDB68 Concurrent Programming and Operating Systems. Lecture 2: Introduction to C programming

Midterm. Median: 56, Mean: "midterm.data" using 1:2 1 / 37

ECE 598 Advanced Operating Systems Lecture 10

Link 7. Static Linking

Link 8. Dynamic Linking

Lecture 3: C Programm

Important From Last Time

x86 assembly CS449 Fall 2017

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 12, FALL 2012

Page 1. Today. Important From Last Time. Is the assembly code right? Is the assembly code right? Which compiler is right?

Link 7.A Static Linking

COS 318: Operating Systems. Overview. Andy Bavier Computer Science Department Princeton University

Basic C Programming. Bin Li Assistant Professor Dept. of Electrical, Computer and Biomedical Engineering University of Rhode Island

238P: Operating Systems. Lecture 4: Linking and Loading (Basic architecture of a program) Anton Burtsev October, 2018

Errors During Compilation and Execution Background Information

CS 261 Fall C Introduction. Variables, Memory Model, Pointers, and Debugging. Mike Lam, Professor

CS 326 Operating Systems C Programming. Greg Benson Department of Computer Science University of San Francisco

Important From Last Time

How to learn C? CSCI [4 6]730: A C Refresher or Introduction. Diving In: A Simple C Program 1-hello-word.c

Beyond this course. Machine code. Readings: CP:AMA 2.1, 15.4

143A: Principles of Operating Systems. Lecture 4: Linking and Loading (Basic architecture of a program) Anton Burtsev October, 2018

CS 107 Lecture 18: GCC and Make

Today s Big Adventure

Introduction to C CMSC 104 Spring 2014, Section 02, Lecture 6 Jason Tang

Today s Big Adventure

Dynamic & Static Libraries in Linux (New Talk!) Wim Cardoen and Brett Milash CHPC User Services

Object Files. An Overview of. And Linking. Harry H. Porter III. Goals of this Paper. Portland State University cs.pdx.edu/~harry.

Introduction Presentation A

Operating Systems CMPSC 473. Process Management January 29, Lecture 4 Instructor: Trent Jaeger

Linker Puzzles The course that gives CMU its Zip! Linking Mar 4, A Simplistic Program Translation Scheme. A Better Scheme Using a Linker

Continue: How do I learn C? C Primer Continued (Makefiles, debugging, and more ) Last Time: A Simple(st) C Program 1-hello-world.c!

Transcription:

Executables and Linking CS449 Fall 2017

Remember External Linkage Scope? #include <stdio.h> int global = 0; void foo(); int main() { } foo(); printf( global=%d\n, global); return 0; <main.c> extern int global; void foo() { global += 10; } <foo.c> >> gcc./main.c./foo.c >>./a.out global=10 How did global in foo.c find global in main.c? How did foo() in main.c find foo() in foo.c? Where is the code for printf() and how does the call to printf() find it? What does a.out look like stored as a file? How does it look like while executing in memory? 2

Compiler gcc C source Preprocessed source Object files Executable.c cpp cc1.o ld Preprocessor Compiler Linker

Preprocessor gcc C source Preprocessed source Object files Executable.c cpp cc1.o ld Preprocessor Compiler Linker

Preprocessor Input: A C source file with directives Output: The source after directives have been processed #include directive Copies the contents of the (header) file to (source) file Source file (files with.c extensions): Contains function/variable definitions Header file (files with.h extensions): Contains function/variable declarations and type definitions Typically included at top of.c file hence the name #include <...> or #include... <...> searches for file under standard include path (e.g. /usr/include)... searches for file under local directory or paths given as I arguments (e.g. gcc I ~/local/include main.c)

Preprocessor #define directive Defines a macro (a symbol that is expanded to some other text before compilation) #define PI 3.14 All instances of PI gets replaced by 3.14 in source #define AVG(a, b) (((a) + (b)) / 2) All instances of AVG(2, 4) gets replaced by (((2) + (4)) / 2) Not a function call, just a text replacement No argument / return type declarations Will even work on AVG( Hello, 3.14), even though the result will not make sense at all -E option produces preprocessed source file gcc E main.c dumps preprocessed code to stdout

Compiler gcc C source Preprocessed source Object files Executable.c cpp cc1.o ld Preprocessor Compiler Linker

Compiler Input: A preprocessed C source file Output: A binary object file in machine language In the process it performs: Syntax and type checking Compiler optimizations to reduce execution time and memory footprint E.g. gcc O2 main.c (applies level 2 optimizations) Code generation for given machine One object file (.o) for one C source file (.c) The c option invokes the compiler only and produces object files instead of the final linked executable gcc main.c foo.c translates to: gcc c main.c foo.c (produces.o files; this is compilation) gcc main.o foo.o (produces a.out; this is linking)

Linker gcc C source Preprocessed source Object files Executable.c cpp cc1.o ld Preprocessor Compiler Linker

Linker Input: One or more object files Output: A linked executable image Why separate out compile and link stages? Leads to shorter executable generation time Compiling is slow, linking is fast On a source code update: Need only compile changed source file and then re-link (cheap) No need to recompile the entire project (expensive) Intermediate object files can be reused A library file is a package of one or more object files e.g. C standard library, Math library, Networking library etc. Libraries can be reused across multiple projects

Memory Space of Program 0xc0000000 Stack (Automatic variables) brk Dynamic Data 0 Heap (Malloc ed Memory) Data Segment (Static variables) Text Segment ( Function Code) Executable Image Linker: generates executable image from multiple object files Executable image: copied to memory when program launched

main.o foo.o The Linking Process [Data Segment] int global; [Text Segment] Int main() { foo(); 0 } [Data Segment] [Text Segment] void foo() { global += 10; 0 } Link [Data Segment] int global; [Text Segment] Int main() { foo(); } [Text Segment] void foo() { global += 10; } Relocation: Figuring out address of main, foo, global in a.out Symbol Resolution: Replacing uses of foo, global with addresses 0 [Data Segment] a.out

thoth $ objdump -S./main.o Disassembly of section.text: 0000000000000000 <main>: The Linking Process 0: 55 push %rbp... 9: e8 00 00 00 00 callq <foo> e: 8b 15 00 00 00 00 mov... thoth $ objdump -S./foo.o Disassembly of section.text: 0000000000000000 <foo>: 0: 55 push %rbp... 14: c3 retq Reloca'on Resolu'on thoth $ gcc o./a.out./main.o./foo.o thoth $ objdump S./a.out Disassembly of section.text: 00000000004004c4 <main>: 4004c4: 55 push %rbp... 4004cd: e8 22 00 00 00 4004d2: 8b 15 04 04 20 00 mov... Reloca'on 00000000004004f4 <foo>: callq 4004f4 <foo> 4004f4: 55 push %rbp... 400508: c3 retq objdump -S: dumps text segment for object 13

The Linking Process (in Detail) Previous compiler stage embeds two tables in each object Symbol table: Stores addresses of symbol locations Where each symbol is located at (in either data or code section) Entry fields: symbol name, section name, address Relocation table: Stores addresses of symbol uses Locations of all the uses of symbol (in code section) Entry fields: symbol name, address Step 1: Relocation of object files into executable image Merge individual code / data sections into single sections Merge individual symbol / relocation tables into single tables Translate address within each object to address in final image Step 2: Symbol resolution across object files Replaces all symbol uses with addresses using merged global symbol table and relocation table

Symbol Tables (Before Linking) #include <stdio.h> int global = 0; int global2 = 0; void foo(); int main() { foo(); printf( global=%d\n, global); return 0; } <main.c> extern int global; void foo() { global += 10; } <foo.c> >> gcc c./main.c./foo.c >> nm./main.o U foo 00000000 B global 00000004 B global2 00000000 T main >> nm./foo.o U printf 00000000 T foo U global nm: a tool that shows the symbol table offset + section name + symbol name Symbols foo, printf undefined in main.o Symbol global undefined in foo.o 15

Relocation Tables (Before Linking) #include <stdio.h> int global = 0; int global2 = 0; void foo(); int main() { foo(); printf( global=%d\n, global); return 0; } <main.c> extern int global; void foo() { global += 10; } <foo.c> >> gcc c./main.c./foo.c >> objdump r./main.o [sic] OFFSET TYPE 0000000a R_X86_64_PC32 VALUE foo 00000010 R_X86_64_32 global 00000015 R_X86_64_32.rodata 00000021 R_X86_64_PC32 printf objdump -r: dumps relocation table for object Offsets store code locations using symbol For example, in main.o at address 0xa: 9: e8 00 00 00 00 (callq 00 00 00 00 <foo>) Linker will rewrite call target with final location of foo. In a.out after rewriting: 4004cd: e8 22 00 00 00 (callq 4004f4 <foo>) 16

Relocation Tables (Before Linking) #include <stdio.h> int global = 0; int global2 = 0; void foo(); int main() { foo(); printf( global=%d\n, global); return 0; } <main.c> extern int global; void foo() { global += 10; } <foo.c> >> gcc c./main.c./foo.c >> objdump r./foo.o [sic] OFFSET TYPE VALUE 00000006 R_X86_64_PC32 global 0000000f R_X86_64_PC32 global In foo.o (read / write of global): 4: 8b 05 00 00 00 00 mov 0x0(%rip),%eax a: 83 c0 0a add $0xa,%eax d: 89 05 00 00 00 00 mov %eax,0x0(%rip) Linker will rewrite memory accesses with final location of global. In a.out after rewriting: 4: 8b 05 e2 03 20 00 mov 0x2003e2(%rip),%eax a: 83 c0 0a add $0xa,%eax d: 89 05 d9 03 20 00 mov %eax, 0x2003d9(%rip) 17

Symbol Tables (After Linking) #include <stdio.h> int global = 0; int global2 = 0; void foo(); int main() { foo(); printf( global=%d\n, global); return 0; } <main.c> extern int global; void foo() { global += 10; } <foo.c> >> gcc c./main.c./foo.c >> gcc./main.o./foo.o >> nm./a.out 00000000004004f4 T foo 00000000006008e0 B global 00000000006008e4 B global2 00000000004004c4 T main U printf@@glibc_2.2.5 Symbols foo, global defined and now have relocated addresses in memory Main.o data section relocated to 6008e0 Foo.o code section relocated to 4004f4 Symbol printf still undefined Will be defined through dynamic linking 18

Relocation Tables (After Linking) #include <stdio.h> int global = 0; int global2 = 0; void foo(); int main() { foo(); printf( global=%d\n, global); return 0; } <main.c> extern int global; void foo() { global += 10; } <foo.c> >> gcc c./main.c./foo.c >> objdump R./a.out [sic] OFFSET TYPE [sic] VALUE 08049674 R_X86_64_JUMP_SLOT printf Relocation table of a.out contains printf since it has not yet been resolved This will be resolved later at load time during dynamic linking 19

Quick Quiz Q: Are automatic local variables relocated and resolved? A: No. Local variables are stored in the stack and is not part of the executable image generated by linker. Q: Are internal linkage symbols relocated and resolved? A: Yes. Internal linkage symbols are part of the data segment and need to be relocated. Also, uses of the symbols need to be resolved to the final address. Q: Are static local symbols relocated and resolved? A: Yes. Static local symbols are also part of the data segment since their lifetime is static. Hence, they need to be relocated and resolved for the same reasons.

3 Ways to Link in Libraries Static Linking Copy library code into executable image at compile time Done by linker (ld) at last stage of compilation Dynamic Linking Copy library code into memory at program load time Done by dynamic linker (ld-linux.so) when program first launched Dynamic Loading Copy library code into memory in the middle of execution Done through dynamic loader (libdl.so) library by program itself DL library provides APIs to load/unload libraries into memory, search in library symbol table to perform symbol resolution

Static Linking #include <stdio.h> #include <math.h> int main() { printf( The sqrt of 9 is %f\n, sqrt(9)); return 0; } Archives /usr/lib/libc.a Object files /usr/lib/libm.a Executable.o ld Linker

Static Linking #include <stdio.h> int global = 0; int global2 = 0; void foo(); int main() { foo(); printf( global=%d\n, global); return 0; } <main.c> extern int global; void foo() { global += 10; } <foo.c> >> gcc -m32 -static./main.c./foo.c -lc >> ls l./a.out -rwxr-xr-x 1 wahn UNKNOWN1 639385 Sep 30 15:20./a.out >> objdump R./a.out objdump:./a.out: not a dynamic object objdump:./a.out: Invalid operation -static tells gcc linker to do static linking -lc links in library libc.a (found in /usr/lib) Static linking copies over all libraries into binary (just like with object files) That s why a.out is so large! 23

Dynamic Linking Stack /usr/lib/libc.so Shared Objects /usr/lib/libm.so libc.so libm.so Executable ld-linux.so Dynamic Linker Data (Heap) Data (Globals) Text (Code)

Dynamic Linking #include <stdio.h> int global = 0; int global2 = 0; void foo(); int main() { foo(); printf( global=%d\n, global); return 0; } <main.c> extern int global; void foo() { global += 10; } <foo.c> >> gcc m32./main.c./foo.c -lc >> ls l./a.out -rwxr-xr-x 1 wahn UNKNOWN1 4769 Sep 30 15:26./a.out >> ldd./a.out linux-gate.so.1 => (0x00110000) libc.so.6 => /lib/libc.so.6 (0x00bbd000) /lib/ld-linux.so.2 (0x00b9b000) By default gcc does dynamic linking Now a.out is much smaller! -lc: marks dependency on libc.so ldd: prints out shared library dependencies Program will not start if not satisfied 25

Dynamic Loading Stack Stack libc.so libm.so Data (Heap) Data (Heap) Data (Globals) Text (Code) dlopen( libc.so ); dlopen( libm.so ); Data (Globals) Text (Code)

Function Pointers How are function names translated to function addresses on a library call? Static linking: linker resolves address in image Dynamic linking: dynamic linker resolves address when image is loaded into memory Dynamic loading: program itself does resolving Address of function is searched in object file symbol table, with the help of dynamic loader library The type of that address is a function pointer

Dynamic Loading #include <dlfcn.h> int main() { } void *handle; int (*pf)(const char *format,...); /* load C library into memory*/ handle = dlopen("libc.so.6", RTLD_LAZY); /* find printf in symbol table */ pf = dlsym(handle, "printf"); pf("hello world!\n"); /* unload C library from memory */ dlclose(handle); return 0; >> gcc -ldl./dlsym.c >>./a.out Hello world! For when program doesn t know what libraries will be used beforehand E.g. browser plugins that start working right away after download 28

Dynamic Linking (Loading) Pros/Cons Pro: Library code is not duplicated Efficient use of file storage library code need not be duplicated in every application using it Easier to update libraries don t have to relink every application on update Con: Program binary is not self-contained Less portable program relies on certain libraries being installed on the system to run correctly Maintain multiple versions of same library (DLL Hell) if programs rely on specific versions Security challenges security properties of program change depending on libraries installed on system

Executable File Includes everything needed to run on a system Executable image (that is loaded directly into memory) Code segment Data segment (with initial values of variables) Symbol / relocation tables (if applicable) Code / Data segment may be relocated (again) through dynamic linking Executable files and object files share same format Executables are generated by the linker Object files are generated by the compiler But they need mostly the same things

Older Executable Formats a.out (Assembler OUTput) Oldest UNIX format No longer commonly used COFF (Common Object File Format) Older UNIX Format No longer commonly used

Modern Executable Formats ELF (Executable and Linkable Format) Linux/UNIX format PE (Portable Executable) Windows format Based on COFF Mach-O file Mac format

a.out exec header (describes structure) text segment (code) data segment (static lifetime data) symbol table string table (symbol name strings) text relocation table data relocation table

Header Every a.out formatted binary file begins with an exec header structure: struct exec { unsigned long unsigned long unsigned long unsigned long unsigned long unsigned long unsigned long unsigned long }; a_midmag; // format identifier a_text; // size of text segment a_data; // size of initialized data a_bss; // size of uninitialized data a_syms; // size of symbol table a_entry; // entry point of program a_trsize; // size of text relocation table a_drsize; // size of data relocation table

Linux Address Space