CSE 2421: Systems I Low-Level Programming and Computer Organization. Linking. Presentation N. Introduction to Linkers

Similar documents
A Simplistic Program Translation Scheme

Linking. Computer Systems Organization (Spring 2017) CSCI-UA 201, Section 3. Instructor: Joanna Klukowska

Computer Systems. Linking. Han, Hwansoo

Systems Programming and Computer Architecture ( ) Timothy Roscoe

Systems I. Linking II

Example C program. 11: Linking. Why linkers? Modularity! Static linking. Why linkers? Efficiency! What do linkers do? 10/28/2013

Linking Oct. 15, 2002

Lecture 16: Linking Computer Architecture and Systems Programming ( )

Example C Program The course that gives CMU its Zip! Linking March 2, Static Linking. Why Linkers? Page # Topics

Linking February 24, 2005

Computer Organization: A Programmer's Perspective

Exercise Session 7 Computer Architecture and Systems Programming

LINKING. Jo, Heeseung

Example C Program. Linking CS Instructor: Sanjeev Se(a. int buf[2] = {1, 2}; extern int buf[]; int main() { swap(); return 0; }

Linking Oct. 26, 2009"

Link 7.A Static Linking

CS429: Computer Organization and Architecture

Relocating Symbols and Resolving External References. CS429: Computer Organization and Architecture. m.o Relocation Info

(Extract from the slides by Terrance E. Boult

E = 2 e lines per set. S = 2 s sets tag. valid bit B = 2 b bytes per cache block (the data) CSE351 Inaugural EdiNon Spring

CS 201 Linking Gerson Robboy Portland State University

Linker Puzzles The course that gives CMU its Zip! Linking Mar 4, A Simplistic Program Translation Scheme. A Better Scheme Using a Linker

u Linking u Case study: Library interpositioning

Outline. 1 Background. 2 ELF Linking. 3 Static Linking. 4 Dynamic Linking. 5 Summary. Linker. Various Stages. 1 Linking can be done at compile.

Linking and Loading. CS61, Lecture 16. Prof. Stephen Chong October 25, 2011

Lecture 12-13: Linking

Revealing Internals of Linkers. Zhiqiang Lin

Link 7. Static Linking

Linking. Today. Next time. Static linking Object files Static & dynamically linked libraries. Exceptional control flows

Sungkyunkwan University

Carnegie Mellon. Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition

Systemprogrammering och operativsystem Laborationer. Linker Puzzles. Systemprogrammering 2007 Föreläsning 1 Compilation and Linking

CS241 Computer Organization Spring Buffer Overflow

COMPILING OBJECTS AND OTHER LANGUAGE IMPLEMENTATION ISSUES. Credit: Mostly Bryant & O Hallaron

CSE2421 Systems1 Introduction to Low-Level Programming and Computer Organization

Link 3. Symbols. Young W. Lim Mon. Young W. Lim Link 3. Symbols Mon 1 / 42

Linking : Introduc on to Computer Systems 13 th Lecture, October 11th, Instructor: Randy Bryant. Carnegie Mellon

Link 8.A Dynamic Linking

Generating Programs and Linking. Professor Rick Han Department of Computer Science University of Colorado at Boulder

Link 7. Dynamic Linking

Executables and Linking. CS449 Spring 2016

Linking. Explain what ELF format is. Explain what an executable is and how it got that way. With huge thanks to Steve Chong for his notes from CS61.

Today. Linking. Example C Program. Sta7c Linking. Linking Case study: Library interposi7oning

CS 550 Operating Systems Spring Process I

Compiler Drivers = GCC

CIT 595 Spring System Software: Programming Tools. Assembly Process Example: First Pass. Assembly Process Example: Second Pass.

A SimplisHc Program TranslaHon Scheme. TranslaHng the Example Program. Example C Program. Why Linkers? - Modularity. Linking

Linking. CS 485 Systems Programming Fall Instructor: James Griffioen

Executables and Linking. CS449 Fall 2017

Linking and Loading. ICS312 - Spring 2010 Machine-Level and Systems Programming. Henri Casanova

CS2141 Software Development using C/C++ Libraries

Deadlock Detection. Several Instances of a Resource Type. Single Instance of Each Resource Type

Computer Systems Organization

High Performance Computing Lecture 1. Matthew Jacob Indian Institute of Science

M2 Instruction Set Architecture

Linkers and Loaders. CS 167 VI 1 Copyright 2008 Thomas W. Doeppner. All rights reserved.

Compiler Theory. (GCC the GNU Compiler Collection) Sandro Spina 2009

CS3214 Spring 2017 Exercise 2

Link 8. Dynamic Linking

gpio timer uart printf malloc keyboard fb gl console shell

Process Environment. Pradipta De

Memory and C/C++ modules

Systemprogrammering och operativsystem Linker Puzzles. Systemprogrammering 2009 Före lä s n in g 1 Compilation and Linking

Lecture 8: linking CS 140. Dawson Engler Stanford CS department

CMPSC 311- Introduction to Systems Programming Module: Build Processing

From Source to Execution:

A software view. Computer Systems. The Compilation system. How it works. 1. Preprocesser. 1. Preprocessor (cpp)

CS Programming In C

Introduction Presentation A

Department of Computer Science and Engineering Yonghong Yan

LC-3 Assembly Language

COS 318: Operating Systems

CS 33. Linkers. CS33 Intro to Computer Systems XXV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Object Files. An Overview of. And Linking. Harry H. Porter III. Goals of this Paper. Portland State University cs.pdx.edu/~harry.

CALL (Compiler/Assembler/Linker/ Loader)

Midterm. Median: 56, Mean: "midterm.data" using 1:2 1 / 37

Link 4. Relocation. Young W. Lim Wed. Young W. Lim Link 4. Relocation Wed 1 / 22

ECE 15B COMPUTER ORGANIZATION

2 Compiling a C program

Lectures 5-6: Introduction to C

From Code to Program: CALL Con'nued (Linking, and Loading)

Assembly Language Programming Linkers

Lecture 10: Program Development versus Execution Environment

C compiler. Memory map. Program in RAM

Essentials for Scientific Computing: Source Code, Compilation and Libraries Day 8

COS 318: Operating Systems. Overview. Andy Bavier Computer Science Department Princeton University

Building an Executable

Intermediate Programming, Spring 2017*

Function Calls and Stack Allocation

Function Calls and Stack Allocation

Memory Allocation in C

Stack. Stack. Function Calls and Stack Allocation. Stack Popping Popping. Stack Pushing Pushing. Page 1

CS 61C: Great Ideas in Computer Architecture CALL continued ( Linking and Loading)

Compiler, Assembler, and Linker

e-pg Pathshala Subject : Computer Science Paper: Embedded System Module: Embedded Software Development Tools Module No: CS/ES/36 Quadrant 1 e-text

Today s Big Adventure

Today s Big Adventure

238P: Operating Systems. Lecture 7: Basic Architecture of a Program. Anton Burtsev January, 2018

BIL 104E Introduction to Scientific and Engineering Computing. Lecture 4

Outline. Unresolved references

Transcription:

CSE 2421: Systems I Low-Level Programming and Computer Organization Linking Read/Study: Bryant 7.1 7.10 Gojko Babić 11-15-2017 Introduction to Linkers Linking is the process of collecting and combining pieces of code and data into a single file that can be loaded (copied) into memory and executed. Linking can be performed at: compile time, when the source code is translated into machine code, load time, when the program in machine code is being loaded into memory by the loader, run time, when the program is executing. Linkers play a critical role in software development because they enable separate compilation. Thus, instead of organizing a large application as one monolithic source file, we can decompose it into smaller and more manageable modules that can be modified and compiled separately. 2 1

Benefits of Understanding Linking Understanding linking will: help you building larger programs, help you avoid dangerous programming errors, help you understand how language scoping rules are implemented, enable you to exploit shared libraries programs, help you understand other important system concepts; the executable object files produced by linkers play key roles in important systems functions such as loading and running programs, virtual memory, and memory mapping. 3 Why Linkers? Reason 1: Modularity Program can be written as a collection of smaller source files, rather than one monolithic mass. Can build libraries of common functions, e.g., Math library, standard C library Reason 2: Efficiency Time: Separate compilation o Change one source file, compile, and then relink. o No need to recompile other source files. Space: Libraries o Common functions can be aggregated into a single file. o Yet executable files and running memory images contain only code for the functions they actually use. 4 2

Example C Program File: main.c long sum(long *a, long n); long array[2] = 1, 2; long main() long val = sum(array, 2); return val; File: sum.c long sum(long *a, long n) long i, s = 0; for (i = 0; i < n; i++) s += a[i]; return s; The function main() calls the function sum(), which adds the two elements in the global array. 5 The command line: linux> gcc c sum.c creates object file sum.o Compiling sum.c sum.c Preprocessor (cpp) sum.i Compiler (cc1) sum.s Assembler (as) sum.o Source program (text) Modified source program (text) Assembly program (text) Relocatable object programs (binary) But, the command line: gcc o sum sum.c would indicate error during linking: undefined reference to main 6 3

Compiling & Linking main.c with sum.o The command line: linux> gcc o p main.c sum.o compiles main.c into main.o, links it with sum.o and produces executable p. sum.o main.c Preprocessor (cpp) main.i Compiler (cc1) main.s Assembler (as) main.o Linker (ld) p Source program (text) Modified source program (text) Assembly program (text) Relocatable object programs (binary) The linker is one that combines (relocatable) object files (modules) to form executable p. To run the executable p, issue the command line: linux> p Executable object program (binary) 7 Compiling & Linking main.c and sum.c gcc is a compiler driver and it invokes preprocessor, compiler, assembler, and linker programs as needed on behalf of the user: linux> gcc -o p main.c sum.c main.c Translators (cpp, cc1, as) sum.c Translators (cpp, cc1, as).c files are C source files main.o Linker (ld) p sum.o.o files are separately compiled relocatable object files File p contains fully linked executable object file (contains code and data for all functions defined in main.c and sum.c), ready to be loaded into memory and to be run. 8 4

Three Kinds of Object Modules (Files) Relocatable object module (.o file): contains code and data in a form that can be combined with other relocatable object files to form executable object file. each.o file is produced from exactly one source (.c) file Executable object module (exec file): contains code and data in a form that can be copied directly into memory and then executed. Shared object module (.so file): special type of relocatable object file that can be loaded into memory and linked dynamically, at either load time or runtime. Executable and Linkable Format (ELF) is the standard binary format for all object files (modules). 9 ELF Relocatable Object File Format ELF header.text.rodata.data.bss.symtab.rel.text.rel.data.debug Section header table ELF header: contains info such as word size, byte ordering, file type (.o, exec,.so), machine type (e.g. x86), etc. Section header table: contains offsets and sizes of each section.text section: contains compiled program machine code,.rodata section: contains jump tables, printf format strings.data section: initialized global variables.bss section: uninitialized global variables Note:.data &.bss sections also include local variables with static attribute 10 5

ELF Relocatable Object File Format (cont.) ELF header.text.rodata.data.bss.symtab.rel.text.rel.data.debug Section header table.symtab section: contains symbol table with function names, global variable names and static local variable names.rel.text section: contains relocation info for.text section, such as a list of addresses of instructions that will need to be modified in the executable, i.e. those that call external functions or reference global variables.rel.data section: contains relocation information for.data section, such as addresses of pointer data that will need to be modified in the merged executable.debug section: contain information for symbolic debugging, when compiled with option -g 11 Step 1: Symbol resolution What Do Linkers Do? Programs define and reference symbols (global variables and functions): long sum() /* define symbol sum */ sum(); /* reference symbol sum */ long array[2] /* define global variable array*/ Symbol definitions are stored in object file (by assembler) in symbol table. Symbol table is an array of structs Each entry includes name, size, and location of symbol. During this step, the linker associates each symbol reference with exactly one symbol definition. 12 6

Step 1: Symbol Resolution Referencing a global symbol that s defined here long sum(long *a, long n); long array[2] = 1, 2; long sum(long *a, long n) long i, s = 0; long main() long val = sum(array, 2); return val; for (i = 0; i < n; i++) s += a[i]; return s; Defining a global symbol Linker knows nothing of val Referencing a global symbol.. that s defined here Linker knows nothing of i or s 13 What Do Linkers Do? (cont.) Step 2: Relocation Merges separate code and data sections into single sections Relocates symbols from their relative locations in the.o files to their final absolute memory locations in the executable. Updates all references to these symbols to reflect their new positions. 14 7

Step 2: Symbol Relocation Relocatable Object Files Executable Object File System code System data.text.data 0 Headers System code main.o main().text int array[2]=1,2.data sum.o sum().text main() sum() More system code System data int array[2]=1,2.symtab.debug.text.data 15 Packaging Commonly Used Functions How to package functions commonly used by programmers? Math, I/O, memory management, string manipulation, etc. Awkward, given the linker framework so far: Option 1: Put all functions into a single source file and made one large relocatable object module, and o programmers link that object file into their programs, o but space and time inefficient. Option 2: Put each function in a separate source file, and made a relocatable object module for each function: o programmers explicitly link appropriate object modules into their programs, o but burdensome on the programmer, although more efficient. 16 8

Old-fashion Solution: Static Libraries First solution: Static libraries (.a archive files) Concatenate related relocatable object files into a single file with an index of symbols, called an archive; archive filenames use.a suffix Enhance linker so that it tries to resolve unresolved external references by looking for the symbols in one or more archives. If an archive member file resolves reference, link it into the executable. The GNU ar program (called Archiver) creates, modifies, and extracts from archives. An archive is a single file holding a collection of other files in a structure that makes it possible to retrieve the original individual files called archive members. 17 Updating Static Libraries atoi.c printf.c random.c Compiler Compiler... Compiler atoi.o printf.o random.o Archiver (ar) libc.a C standard library linux> ar rs libc.a atoi.o printf.o random.o Archiver ar allows incremental updates. Recompile function that changes and replace.o file in archive. 18 9

GNU Development Tool: Archiver ar Archiver ar creates an index to the symbols defined in relocatable object modules in the archive. option modifier r: Insert the file member(s) into archive with replacement; by default, new members are added at the end of the file, but you may use one of the modifiers to request placement relative to some existing member. option modifier c: Create the archive option modifier s: Write an object-file index into the archive, or update an existing one. 19 Commonly Used Static Libraries C standard library (libc.a): size about 4.6MB archive of 1496 object files, I/O, memory allocation, signal handling, string handling, data and time, random numbers, integer math C math library (libm.a): about 2MB and 444 object files floating point math (sin, cos, tan, log, exp, sqrt, ) % ar t libc.a sort fork.o fprintf.o fpu_control.o fputc.o freopen.o fscanf.o fseek.o fstab.o % ar t libm.a sort e_acos.o e_acosf.o e_acosh.o e_acoshf.o e_acoshl.o e_acosl.o e_asin.o e_asinf.o e_asinl.o Note: Our Linux system does not use static libraries! 20 10

Creating Private Static Library File: addvec.c void addvec(int *x, int *y, int *z, int n) int i; for (i = 0; i < n; i++) z[i] = x[i] + y[i]; File: multvec.c void multvec(int *x, int *y, int *z, int n) int i; for (i = 0; i < n; i++) z[i] = x[i] * y[i]; This is how the private library libvector.a is created & updated: linux> gcc -c addvec.c multvec.c linux> ar rcs libvector.a addvec.o multvec.o Also, it would be useful to create this text file, so you do not have include function declarations in your code: File: vector.h void addvec(int *x, int *y, int *z, int n); void multvec(int *x, int *y, int *z, int n); 21 Using Private Static Library File: main2.c #include <stdio.h> #include "vector.h //quotes for private lib int x[2] = 1, 2; int y[2] = 3, 4; int z[2]; int main() addvec(x, y, z, 2); printf( z = %d %d,z[0],z[1]); return 0; To build the executable, following command compiles main2.c and link it with addvec.o from libvector.a and printf.o from C standard library libc.a (if we had static library libc.a) : linux> gcc -o p2 main2.c libvector.a 22 11

Linking with Static Libraries main2.c stdio.h vector.h Translators (cpp, cc1, as) Static libraries libvector.a libc.a main2.o addvec.o printf.o Relocatable object modules Linker (ld) p2 Fully linked executable object file p2 Now executable p2 can be loaded and run: linux> p2 z = 4 6 23 Linker Symbols Each relocatable object module has a symbol table that contains information about the linker symbols that are defined and referenced by the module. There three different kinds of symbols relevant to a linker: Global symbols: symbols defined by the module that can be referenced from other modules; includes non-static functions and global variables that are defined without static attribute. External symbols: global symbols that are references by the some module but defined by some other module. Local symbols: symbols that are defined and referenced exclusively by the some module; those are functions, global variables and local variables that are all defined with the static attribute. 24 Local (linker) symbols are not local (program) variables. 12

Examples of Linker Symbols Global symbol Global symbol Global symbol External symbol Local symbol int buf[2] = 1, 2; extern int buf[]; int main() swap(); return 0; External symbol Linker knows nothing of temp No global uninitialized variable case here, nor local static variable. int *bufp0 = &buf[0]; static int *bufp1; void swap() int temp; bufp1 = &buf[1]; temp = *bufp0; *bufp0 = *bufp1; *bufp1 = temp; Global symbol Symbol tables are built by assemblers using symbols exported by the compilers into the assembly language.s files. 25 Local Linker Symbols Local variables that are defined with the C static attribute are not managed on the stack. int f() static int x = 0; return x; int g() static int x = 1; x = x+1 return x; Here, the compiler allocates space for two integers in.data section and exports two local linker symbols to the assembler. It might use x.1 for x definition in the function f and x.2 for x definition in function g. 26 13

Linker Step 1: Using Static Libraries The linker resolves symbol resolution by associating each reference with exactly one symbol definition from the symbol tables of its relocatable object files. It is simple for local symbols; the compiler allows only one definition of each local symbol per module. The compiler also ensures that the static local variables, which get local linker symbols, have unique names. When the compiler encounters a symbol that is not defined in the current module, it assumes that it is defined in some other module, and it generates a symbol table entry. Resolving references to global symbols is more involved, since the same symbol may be defined by multiple object files. If the linker is unable to find a definition for referenced symbol in any of its input modules, it terminates with error message. 27 How Does Linker Do Symbol Resolution Linker resolves external references as follows: scans.o files and.a files in the command line in order from left to right; Note: any.c file is automatically translated into.o file during the scan, keeps a list of the current unresolved references and a list of the previously defined symbols. as each new.o or.a file is encountered, tries to resolve each unresolved reference in the list; this may add to a list of previously defined symbols and also add to a list of the unresolved references. if any entry left in the unresolved list at end of scan, then error. Since, a command line order matters, put system libraries at the end of the command line. 28 14

How Linker Resolves Duplicate Symbols Linker symbols are either strong or weak: Strong: functions and initialized global variables Weak: uninitialized global variables strong strong p1file.c int foo=5; p1()... p2file.c int foo; p2()... weak strong Rule 1: Multiple strong symbols with the same name are not allowed; each item can be defined only once; linker error, Rule 2: Given a strong symbol and multiple weak symbols all with the same name, choose the strong symbol, i.e. references to the weak symbol resolve to the strong symbol, Rule 3: If there are multiple weak symbols all with the same name, pick an arbitrary one. 29 Linker Puzzle int x; p1() p1() int x; p1() int x; int y; p1() int x; p2() double x; p2() Link time error: two strong symbols p1 References to x will refer to the one and the same uninitialized int x. Is this what you really want? Writes to x in p2 might overwrite y! Evil! int x=7; int y=5; p1() int x=7; p1() double x; p2() int x; p2() Writes to x in p2 will overwrite y! Nasty! References to x will refer to the one and the same initialized variable. 30 15

Strong and Weak Symbols: Example A //File: foo3.c #include <stdio.h> void f(void); int x=1000; int main() f(); printf("x = %d\n",x); // file: bar3.c int x; void f() x=300; alpha ~/Cse2421/Linking> gcc -o foo3 foo3.c bar3.c alpha ~/Cse2421/Linking> foo3 x = 300 g. babic 31 Multiple Weak Symbols: Example //File: foo4.c #include <stdio.h> void f(void); int x; int main() x=2000; f(); printf("x = %d\n",x); // file: bar4.c int x; void f() x=600; alpha ~/Cse2421/Linking> gcc -o foo4 foo4.c bar4.c alpha ~/Cse2421/Linking> foo4 x = 600 g. babic 32 16