Executables and Linking. CS449 Spring 2016

Similar documents
Executables and Linking. CS449 Fall 2017

(Extract from the slides by Terrance E. Boult

CSE 2421: Systems I Low-Level Programming and Computer Organization. Linking. Presentation N. Introduction to Linkers

A Simplistic Program Translation Scheme

Computer Systems Organization

Draft. Chapter 1 Program Structure. 1.1 Introduction. 1.2 The 0s and the 1s. 1.3 Bits and Bytes. 1.4 Representation of Numbers in Memory

Systems Programming. Fatih Kesgin &Yusuf Yaslan Istanbul Technical University Computer Engineering Department 18/10/2005

LINKING. Jo, Heeseung

CSE2421 Systems1 Introduction to Low-Level Programming and Computer Organization

Example C program. 11: Linking. Why linkers? Modularity! Static linking. Why linkers? Efficiency! What do linkers do? 10/28/2013

2 Compiling a C program

CS 201 Linking Gerson Robboy Portland State University

Linking Oct. 15, 2002

Generic Programming in C

Systems Programming and Computer Architecture ( ) Timothy Roscoe

238P: Operating Systems. Lecture 7: Basic Architecture of a Program. Anton Burtsev January, 2018

Example C Program The course that gives CMU its Zip! Linking March 2, Static Linking. Why Linkers? Page # Topics

CIT 595 Spring System Software: Programming Tools. Assembly Process Example: First Pass. Assembly Process Example: Second Pass.

Compiler Drivers = GCC

Linking. Explain what ELF format is. Explain what an executable is and how it got that way. With huge thanks to Steve Chong for his notes from CS61.

CS 550 Operating Systems Spring Process I

High Performance Computing Lecture 1. Matthew Jacob Indian Institute of Science

Generating Programs and Linking. Professor Rick Han Department of Computer Science University of Colorado at Boulder

Obtained the source code to gcc, one can just follow the instructions given in the INSTALL file for GCC.

Assembly Language Programming Linkers

Linkers and Loaders. CS 167 VI 1 Copyright 2008 Thomas W. Doeppner. All rights reserved.

Link 8.A Dynamic Linking

Computer Organization: A Programmer's Perspective

Linking and Loading. ICS312 - Spring 2010 Machine-Level and Systems Programming. Henri Casanova

Lecture 8: linking CS 140. Dawson Engler Stanford CS department

Linking February 24, 2005

Lecture 16: Linking Computer Architecture and Systems Programming ( )

CS631 - Advanced Programming in the UNIX Environment

A Real Object File Format

CS240: Programming in C

CS140 - Summer Handout #8

Link 7. Dynamic Linking

Department of Computer Science and Engineering Yonghong Yan

Linking and Loading. CS61, Lecture 16. Prof. Stephen Chong October 25, 2011

The C standard library

C03c: Linkers and Loaders

CS2141 Software Development using C/C++ Libraries

Relocating Symbols and Resolving External References. CS429: Computer Organization and Architecture. m.o Relocation Info

CSL373: Operating Systems Linking

CS429: Computer Organization and Architecture

Link 7.A Static Linking

COS 318: Operating Systems

Computer Systems. Linking. Han, Hwansoo

Compiler Theory. (GCC the GNU Compiler Collection) Sandro Spina 2009

Outline. Outline. Common Linux tools to explore object/executable files. Revealing Internals of Loader. Zhiqiang Lin

Compiler, Assembler, and Linker

Introduction to C CMSC 104 Spring 2014, Section 02, Lecture 6 Jason Tang

CMPE-013/L. Introduction to C Programming

M2 Instruction Set Architecture

ECE 598 Advanced Operating Systems Lecture 10

How to learn C? CSCI [4 6]730: A C Refresher or Introduction. Diving In: A Simple C Program 1-hello-word.c

Midterm. Median: 56, Mean: "midterm.data" using 1:2 1 / 37

Practical C Issues:! Preprocessor Directives, Multi-file Development, Makefiles. CS449 Fall 2017

Exercise Session 7 Computer Architecture and Systems Programming

A software view. Computer Systems. The Compilation system. How it works. 1. Preprocesser. 1. Preprocessor (cpp)

Linking Oct. 26, 2009"

Systems I. Linking II

From Source to Execution:

#include <stdio.h> int main() { printf ("hello class\n"); return 0; }

CS 61C: Great Ideas in Computer Architecture. Lecture 2: Numbers & C Language. Krste Asanović & Randy Katz

Agenda. CS 61C: Great Ideas in Computer Architecture. Lecture 2: Numbers & C Language 8/29/17. Recap: Binary Number Conversion

Memory and C/C++ modules

TDDB68 Concurrent Programming and Operating Systems. Lecture 2: Introduction to C programming

CS 110 Computer Architecture. Lecture 2: Introduction to C, Part I. Instructor: Sören Schwertfeger.

Beyond this course. Machine code. Readings: CP:AMA 2.1, 15.4

9/19/18. COS 318: Operating Systems. Overview. Important Times. Hardware of A Typical Computer. Today CPU. I/O bus. Network

Basic C Programming. Bin Li Assistant Professor Dept. of Electrical, Computer and Biomedical Engineering University of Rhode Island

COS 318: Operating Systems. Overview. Andy Bavier Computer Science Department Princeton University

2012 LLVM Euro - Michael Spencer. lld. Friday, April 13, The LLVM Linker

ECE 471 Embedded Systems Lecture 4

C Compilation Model. Comp-206 : Introduction to Software Systems Lecture 9. Alexandre Denault Computer Science McGill University Fall 2006

gpio timer uart printf malloc keyboard fb gl console shell

Making Address Spaces Smaller

CS16 Exam #1 7/17/ Minutes 100 Points total

CS 33. Libraries. CS33 Intro to Computer Systems XXVIII 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

Final CSE 131B Spring 2004

Link 7. Static Linking

Linking. Computer Systems Organization (Spring 2017) CSCI-UA 201, Section 3. Instructor: Joanna Klukowska

#include <stdio.h> int main() { char s[] = Hsjodi, *p; for (p = s + 5; p >= s; p--) --*p; puts(s); return 0;

CSE 333 Lecture 2 Memory

Essentials for Scientific Computing: Source Code, Compilation and Libraries Day 8

27-Sep CSCI 2132 Software Development Lecture 10: Formatted Input and Output. Faculty of Computer Science, Dalhousie University. Lecture 10 p.

Link 3. Symbols. Young W. Lim Mon. Young W. Lim Link 3. Symbols Mon 1 / 42

Continue: How do I learn C? C Primer Continued (Makefiles, debugging, and more ) Last Time: A Simple(st) C Program 1-hello-world.c!

ECE 471 Embedded Systems Lecture 5

CS240: Programming in C. Lecture 2: Overview

CS 326 Operating Systems C Programming. Greg Benson Department of Computer Science University of San Francisco

Compilation, Disassembly, and Profiling (in Linux)

CSE 374 Programming Concepts & Tools

Important From Last Time

Errors During Compilation and Execution Background Information

Slide Set 5. for ENCM 339 Fall Steve Norman, PhD, PEng. Electrical & Computer Engineering Schulich School of Engineering University of Calgary

CS 33. Libraries. CS33 Intro to Computer Systems XXIX 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Outline. Unresolved references

Lecture 2: C Programming Basic

Transcription:

Executables and Linking CS449 Spring 2016

Remember External Linkage Scope? #include <stdio.h> int global = 0; void foo(); int main() { foo(); printf( global=%d\n, global); return 0; } <main.c> extern int global; void foo() { global += 10; } <foo.c> >> gcc./main.c./foo.c >>./a.out global=10 How did global in foo.c find global in main.c? How did foo() in main.c find foo() in foo.c? Where is the code for printf() and how does the call to printf() find it? What does a.out look like in the file system? How does it look like while executing in memory? 2

Compiler gcc C source Preprocessed source Object files Executable.c cpp cc1.o ld Preprocessor Compiler Linker

Preprocessor gcc C source Preprocessed source Object files Executable.c cpp cc1.o ld Preprocessor Compiler Linker

Preprocessor Input: A C source file with directives Output: A C source file after directives have been processed and comments stripped off. #include directive #include <...> or #include... Copies the contents of the file to source file Contain function/variable declarations and type definitions <...> searches for file under standard include path (usually /usr/include)... searches for file under local directory or paths given as I arguments (e.g. gcc I ~/local/include main.c)

Preprocessor #define directive Defines a macro (a symbol that is expanded to some other text) #define PI 3.14 All instances of PI gets replaced by 3.14 in source #define AVG(a, b) (((a) + (b)) / 2) AVG(2, 4) in source gets replaced by ((2 + 4) / 2) No type checking CPP is just a text translator; no concept of types Will even work on AVG( Hello, World )

Compiler gcc C source Preprocessed source Object files Executable.c cpp cc1.o ld Preprocessor Compiler Linker

Compiler Input: A preprocessed C source file Output: An object file in machine language In the process it performs: Syntax and type checking Compiler optimizations to reduce execution time and memory footprint E.g. gcc O2 main.c (applies level 2 optimizations) Code generation for given machine One object file for one C source file The c option produces an object file instead of an executable Regular gcc main.c command translates to the following: gcc c main.c (produces main.o) gcc main.o (produces a.out) rm main.o

Linker gcc C source Preprocessed source Object files Executable.c cpp cc1.o ld Preprocessor Compiler Linker

Why Linkers? Linkers allow the combination of multiple files into executables object files and libraries (libraries are archives of object files) Modularity Large program can be written as a collection of smaller files, rather than one monolithic mass Can build libraries of common functions e.g., Math library, C standard library Efficiency Time: Change one source file, compile, and then re-link No need to recompile other source files Space: Libraries of common functions can be put in a single file and linked to multiple programs

What does a Linker do? Step 1: Symbol resolution Programs define and reference symbols (variables and functions) void foo() { } /* define symbol foo */ foo(); /* reference symbol foo */ int *p = &x; /* define p, reference x */ Symbol definitions are stored (by compilers) in a symbol table Symbol table is an array of structs Each entry includes name, type, size, and location of symbol Linker associates each symbol reference with exactly one symbol definition Local symbols or static global symbols resolved within single object file Only symbols with external linkage need to be in symbol table

What does a Linker do? Step 2: Relocation Merges separate code and data sections into single sections Relocates symbols from their relative locations in the.o files to their final absolute memory locations in the executable Updates all references to these symbols to reflect their new positions using relocation tables Think of linkers as binary rewriters Uses symbol tables and relocation tables to rewrite addresses while mapping object files to memory space Treats the rest of the object file as a black box

Executables Includes everything needed to run on a system Code Data Dependencies Directions for laying out program in memory Self contained files that adhere to a standard format Generating executables is the job of the linker

Older Executable Formats a.out (Assembler OUTput) Oldest UNIX format No longer commonly used COFF (Common Object File Format) Older UNIX Format No longer commonly used

Modern Executable Formats PE (Portable Executable) Based on COFF Used in 32- and 64-bit Windows ELF (Executable and Linkable Format) Linux/UNIX Mach-O file Mac

Header Every a.out formatted binary file begins with an exec structure: struct exec { unsigned long unsigned long unsigned long unsigned long unsigned long unsigned long unsigned long unsigned long }; a_midmag; //magic number a_text; // size of text segment a_data; // size of initialized data a_bss; // size of uninitialized data a_syms; // size of symbol table a_entry; // entry point of program a_trsize; // size of text relocation a_drsize; // size of data relocation

Contents of an Executable exec header text segment data segment text relocations data relocations symbol table string table

Header Every a.out formatted binary file begins with an exec structure: struct exec { unsigned long unsigned long unsigned long unsigned long unsigned long unsigned long unsigned long unsigned long }; a_midmag; //magic number a_text; // size of text segment a_data; // size of initialized data a_bss; // size of uninitialized data a_syms; // size of symbol table a_entry; // entry point of program a_trsize; // size of text relocation a_drsize; // size of data relocation

Process s Address Space $sp 0x7fffffff Stack brk brk _end Data (Heap) Data (Heap) Data (Globals) 0 Text (Code) CS 1550-2077

Symbol Tables #include <stdio.h> int global = 0; void foo(); int main() { foo(); printf( global=%d\n, global); return 0; } <main.c> extern int global; void foo() { global += 10; } <foo.c> >> gcc c./main.c./foo.c >> nm./main.o U foo 00000000 B global 00000000 T main >> nm./foo.o U printf 00000000 T foo U global nm is a utility that shows the symbol table of object files and executables offset + symbol type + symbol name Symbol types: undefined (U), text (T), BSS (B). Symbols foo and printf are undefined (U) in main.o Symbol global is undefined (U) in foo.o 20

Symbol Tables #include <stdio.h> int global = 0; void foo(); int main() { foo(); printf( global=%d\n, global); return 0; } <main.c> extern int global; void foo() { global += 10; } <foo.c> >> gcc c./main.c./foo.c >> nm./a.out [sic] 080483f0 T foo 08049684 B global U printf@@glibc_2.0 Symbols foo, global defined and point to appropriate places in memory space Symbol printf still undefined Will be defined at load time 21

Relocation Tables #include <stdio.h> int global = 0; void foo(); int main() { foo(); printf( global=%d\n, global); return 0; } <main.c> extern int global; void foo() { global += 10; } <foo.c> >> gcc c./main.c./foo.c >> objdump r./main.o [sic] OFFSET TYPE 0000000a R_386_PC32 VALUE foo 00000010 R_386_32 global 00000015 R_386_32.rodata 00000021 R_386_PC32 printf Offsets store location of reference to symbol For example, in main.o call to foo() will be rewritten by the linker at the relocation phase. 22

Relocation Tables #include <stdio.h> int global = 0; void foo(); int main() { foo(); printf( global=%d\n, global); return 0; } <main.c> extern int global; void foo() { global += 10; } <foo.c> >> gcc c./main.c./foo.c >> objdump R./a.out [sic] OFFSET TYPE [sic] VALUE 08049674 R_386_JUMP_SLOT printf Relocation table of a.out contains printf since it has not yet been resolved (no printf definition in the source files). 23

Libraries Not all code in a program is what you wrote (e.g. printf in C standard library). Code written by others can be included in your own program. How?

Linking Static Linking Copy code into executable at compile time Done by linker Dynamic Linking Copy code into Address Space at load time or later Done by link loader

Static Linking #include <stdio.h> #include <math.h> int main() { printf( The sqrt of 9 is %f\n, sqrt(9)); return 0; } Archives /usr/lib/libc.a /usr/lib/libm.a Object files Executable.o ld Linker

Static Linking #include <stdio.h> int global = 0; void foo(); int main() { foo(); printf( global=%d\n, global); return 0; } <main.c> extern int global; void foo() { global += 10; } <foo.c> >> gcc -m32 -static./main.c./foo.c -lc >> ls l./a.out -rwxr-xr-x 1 wahn UNKNOWN1 639385 Sep 30 15:20./a.out >> objdump R./a.out objdump:./a.out: not a dynamic object objdump:./a.out: Invalid operation -static tells gcc linker to do static linking -lc links in library libc.a (found in /usr/lib) Static linking copies over all libraries into binary (just like with object files) That s why a.out is so large! 27

Dynamic Linking Shared Objects /usr/lib/libc.so /usr/lib/libm.so Stack Executable ld Link Loader Data (Heap) Data (Heap) Data (Globals) Text (Code)

Dynamic Linking #include <stdio.h> int global = 0; void foo(); int main() { foo(); printf( global=%d\n, global); return 0; } <main.c> extern int global; void foo() { global += 10; } <foo.c> >> gcc m32./main.c./foo.c -lc >> ls l./a.out -rwxr-xr-x 1 wahn UNKNOWN1 4769 Sep 30 15:26./a.out >> ldd./a.out linux-gate.so.1 => (0x00110000) libc.so.6 => /lib/libc.so.6 (0x00bbd000) /lib/ld-linux.so.2 (0x00b9b000) By default gcc does dynamic linking Now./a.out is much smaller! -lc marks dependency on libc.so ldd prints out shared library dependencies Program will not start if not satisfied 29

Dynamic Linking Pros/Cons Pros More efficient use of storage More efficient use of memory One copy of library can be mapped to multiple processes Much easier to fix bugs don t have to statically relink every application Cons Address resolution done at load time slow Versioning of multiple dynamic libraries (DLL Hell) Security holes due to replacement of libraries

Dynamic Loading and Dynamic Link Libraries DLL 2 DLL 1 Stack Stack Data (Heap) Data (Heap) Data (Globals) LoadLibrary( DLL1.dll ); LoadLibrary( DLL2.dll ); Data (Heap) Data (Heap) Data (Globals) Text (Code) Text (Code)

Function Pointers How do we call a function when we can t be sure what address it s loaded at? Need a level of indirection. Use a function pointer.

Dynamic Loading #include <dlfcn.h> int main() { void *handle; int (*pf)(const char *format,...); /* open the C standard library */ handle = dlopen("libc.so.6", RTLD_LAZY); /* find the address of the printf function */ pf = dlsym(handle, "printf"); (*pf)("hello world!\n"); dlclose(handle); return 0; } >> gcc -ldl./dlsym.c >>./a.out Hello world! 33

Problem 1 You have a commodity computer. There is a text file of size 4TB, written on the 12TB HDD. The file contains short integers only. Every line of the file contains exactly one number. The numbers are different. Propose a solution for sorting the file with minimal number of operations.

Problem 2 Implement the command wc. When run, the program should get 2 arguments. The first one is an option and the second one is a file name: wc <option> <file_name>. Option c instructs the program to count the number of characters. Option w instructs it to count the number of words and option l the number of lines.

Problem 3 Write a function in C, which gets a string as input parameter and checks if it is a palindrome. The function should return 1, if the string is a palindrome and -1 otherwise. A palindrome is a word, phrase, number, or other sequence of characters which reads the same backward or forward.