Systems Programming Fatih Kesgin &Yusuf Yaslan Istanbul Technical University Computer Engineering Department 18/10/2005
Outline How to assemble and link nasm ld gcc Debugging Using gdb; breakpoints,registers, memory Objdump, readelf,nm,ldd
Definition Compilers and assemblers create object files containing the generated binary code and data for a source file. Linkers combine multiple object files into one, loaders take object files and load them into memory. (In an integrated programming environment, the compilers, assemblers, and linkers are run implicitly when the user tells it to build a program, but they're there under the covers.)
Example: Hello world segment.data msg db "Hello, world!",10 len equ $ - msg segment.text global main main: mov eax,4 mov ebx,1 mov ecx,msg mov edx,len int 80h write syscal stdout address of output buffer length of buffer mov eax,1 mov ebx,0 int 80h exit syscal success
Assembling Hello.asm nasm - the Netwide Assembler, a portable 80x86 assembler nasm -f elf hello.asm To change the output file name use the -o command nasm -f elf hello.asm merhaba.o The -f elf option tells nasm to output the object code in the Executable and Linking Format (ELF) that Linux uses. The object code is still not executable.
Linking Hello.asm To create an executable file, we have to link it using a linker. We can use the GNU linker ld to link our object file: ld hello.o -o hello However, this will result in the following warning: ld: warning: cannot find entry symbol _start; defaulting to 08048080 ld searches for a _start label to use as the entry point of the linked program. Since our entry point is not _start but main instead
Linking Hello.asm we have to tell the ld use the label main as the entry point. ld hello.o -o hello -e main Let s execute the program examine listing file (little endianness) A listing file can be created by nasm for the assembled code by using the -l option: nasm -f elf hello.asm -l hello.lis The original source is displayed on the right hand side and the generated code is shown in hex on the left.
Examining the.lis file 10 10 00000000 B804000000 mov eax,4 Here we can see that the machine code for the mov eax instruction is B8. The value 04000000 next to it correspond to the immediate addressed parameter 4, but as we can see It is stored in 32 bits and stored as little endian. This is because our system is a 32 bit little endian system (Intel)
Differences between ld and gcc: entry points, size We can also use the GNU C compiler gcc to link the object file: gcc hello.o -o hello Note that gcc is not a linker but a compiler gcc is able to the determine the type of its input files and take appropriate actions to produce the executable gcc will try to invoke the linker ld in the background to generate the executable Compile hello.c and compare the file sizes
Differences between ld and gcc: entry points, size gcc links the object files to the standard C runtime library by default. Linking to the standard C runtime library result in an increase in the size of the executable. there is a label _start in one of the standard C runtime library object files ld does not complain about the entry point. (The _start function in the standard C runtime library is responsible for initializing the argc and argv variables for the main function of the C programs.)
Russian peasant method of multiplication Write the operands on top of two columns. At each step, divide the number on the first column by two and multiply the number on the second column by two. Ignore the remainders of division operations. Each time you obtain an odd number on the first column, add the number on the second column to the result. Stop when the number on the first column becomes 0.
Russian peasant method of multiplication Example: Multiply 92 by 37. 92 37 46 74 23 148 148 11 296 148+296=444 5 592 444+592=1036 2 1184 1 2368 1036+2368=3404 0
Russian peasant method of multiplication Write a C program. The main function should read the values from the keyboard, multiply them using the assembly function and display the result on the screen solution: rusmain.c Write a function in Intel assembly that multiplies its two operands using the algorithm described above and returns the result solution: russian.asm
Russian peasant method of multiplication Replace the call to the assembler function by appropriate inline assembly instructions that implement the described multiplication algorithm solution: rusinl.c Note That AT&T syntax is used in rusinl.c how to assemble/link nasm -f elf -g russian.asm -o russian.o Compiling rusmain.c: gcc -c -g rusmain.c -o rusmain.o Linking the object files russian.o and rusmain.o to produce russian executable: gcc -g russian.o rusmain.o -o russian
Debugging In order to debug the executable file; -g is used during compiling and linking gdb russian Add breakpoints to program break main break russian run info register trace the registers and program
Object File Formats MS-DOS.COM files A.COM file literally consists of nothing other than binary code Loaded to memory. Segments adjusted and Run If the program doesn t fit into segment fix up needed
Object File Formats Unix a.out files Computers with hardware memory relocation usually create a new process with an empty address space for each newly run program, in which case programs can be linked to start at a fixed address and require no relocation at load time. The Unix a.out object format handles this situation a.out header text section data section other section
a.out Header int a_magic; // magic number int a_text; // text segment size int a_data; // initialized data size int a_bss; // uninitialized data size int a_syms; // symbol table size int a_entry; // entry point int a_trsize; // text relocation size int a_drsize; // data relocation size
Object File Formats COFF (Common Object File Format )something better to support cross-compilation, dynamic linking and other modern system features Time Sharing Problem UNIX ELF (Executable & Linking Format) ELF files come in three slightly different flavors: relocatable, executable, and shared object. Relocatable files are created by compilers and assemblers but need to be processed by the linker before running.
Object File Formats Executable: All relocations are done, all symbols are resolved except shared library symbols at runtime Shared object: Symbol info + runnable code ELF Summary Complex but good Flexible Relocatable,Supports C++ Efficient Executable format for V.M. with dynamic linking Cross Compilation, cross linking enabled
Inspecting the Object File After Compiling objdump : display information from object files objdump -d rusmain.o Note that the address of the call function is dummy and the address of the main is 00000000 objdump -d russian.o objdump -d russian The address of the printf call is 80482ec and the new address of the main is 080483e0
Inspecting the Object File After Compiling readelf - Displays information about ELF files readelf -a russian.o readelf -a russian nm - list symbols from object files nm russian.o nm russian
Static Linking gcc -static rusmain.o russian.o -o rus_static Let s look at the file size of the rus_static: ll -h 477K, Isn t it too big? Why? objdump -d rus_static strace - trace system calls and signals strace./russian ltrace - A library call tracer ltrace./russian