seculoader Implementation of a secure loader extending the libdetox framework

Size: px

Start display at page:

Download "seculoader Implementation of a secure loader extending the libdetox framework"

Lynette Christal Clarke
6 years ago
Views:

1 Bachelor Thesis seculoader Implementation of a secure loader extending the libdetox framework Tobias Hartmann Mathias Payer Responsible assistant Prof. Thomas R. Gross Laboratory for Software Technology ETH Zurich August 2011

2 Declaration of Originality This sheet must be signed and enclosed with every piece of written work submitted at ETH. I hereby declare that the written work I have submitted entitled seculoader - Implementation of a secure loader extending the libdetox framework is original work which I alone have authored and which is written in my own words.* Author(s) Last name Hartmann First name Tobias Supervising lecturer Last name Gross First name Thomas R. With the signature I declare that I have been informed regarding normal academic citation rules and that I have read and understood the information on 'Citation etiquette' ( students/exams/plagiarism_s_en.pdf). The citation conventions usual to the discipline in question here have been respected. The above written work may be tested electronically for plagiarism. Place and date Signature *Co-authored work: The signatures of all authors are required. Each signature attests to the originality of the entire piece of written work in its final form.

3 Abstract In times of increasing attacks on computer systems through malware and exploits user-space process virtualization presents a promising approach for the safe execution of untrusted binary code. Current frameworks which implement software-based fault isolation (SFI) have a major drawback. They rely on features of the Linux dynamic loader to gain control over the application, which introduces security risks. First of all the loader is treated as a black box and the framework only gains control after the loader performed all necessary startup tasks like the loading of shared libraries, relocations and initializations. Depending on the setting, unchecked code is executed before the virtualization framework is even started. Further, the loader treats the framework like a normal shared library which allows the application to notice and manipulate the sandbox using features of the dynamic loader. The increasing number of attacks on the Linux loader itself proves that these concerns are justified. In this thesis we present seculoader, a reimplementation of the Linux dynamic loader with security as a key aspect and tight coupling with the virtualization framework. We use libdetox, a userspace process virtualization framework based on dynamic binary translation. Features like internal memory protection, control flow transfer checks and communication with the SFI platform build up a secure execution environment where the loader is part of the trusted computing base and forms a unit with the virtualization system. This new approach guarantees security right from the start and closes the gap between the framework and the operating system. The evaluation of the system shows that it is applicable in practice but needs some more sophisticated optimizations to satisfy high performance demands. iii

5 Zusammenfassung In Zeiten zunehmender Angriffe auf Computersysteme durch Schadsoftware und Exploits bietet User-Space Process Virtualization eine vielversprechende Loesung zur sicheren Ausfuehrung von nicht vertrauenswuerdigem Binaercode. Gaengige Systeme welche auf Software-Based Fault Isolation (SFI) basieren haben einen grossen Nachteil. Sie beruhen auf Funktionen des Linux Dynamic Loaders um Kontrolle ueber die Anwendung zu erlangen, was zu Sicherheitsrisiken fuehrt. Das erste Problem ist, dass der Loader wie eine Black Box behandelt wird und das Framework erst dann die Kontrolle erhaelt wenn der Loader bereits alle noetigen Startvorgaenge wie das Laden von Programmbibliotheken, Relocations und Initialisierungen durchgefuehrt hat. Abhaengig von den Gegebenheiten wird ungepruefter Code ausgefuehrt noch bevor das Virtualization Framework ueberhaupt gestartet wurde. Weiterhin behandelt der Loader das Framework wie eine normale Programmbibliothek, wodurch das ausgefuehrte Programm die Sandbox bemerken und manipulieren kann. Die zunehmende Anzahl von Angriffen auf den Linux Loader beweist, dass diese Bedenken begruendet sind. In dieser Arbeit praesentieren wir seculoader, eine Reimplementierung des Linux Dynamic Loaders mit Schwerpunkt auf Sicherheit und einer engen Verknuepfung mit dem Virtualization Framework. Wir benutzen Libdetox, ein User-space Process Virtualization Framework basierend auf Dynamic Binary Translation. Funktionen wie Schutz des internen Speichers, Control Flow Transfers Checks und Kommunikation mit dem SFI-System bilden eine sichere Umgebung fuer die Ausfuehrung, wobei der Loader Teil der Trusted Computing Base ist und eine Einheit mit dem Virtualisierungssystem bildet. Dieser neue Ansatz gewaehrleistet Sicherheit von Anfang an und schliesst die Luecke zwischen Framework und Betriebssystem. Die Leistungsbewertung des Systems zeigt, dass es in der Praxis anwendbar ist aber weitere, ausgereiftere Optimierungen braucht um hohen Leistungsanspruechen zu genuegen. v

7 Acknowledgments First of all I want to thank my supervisor Mathias Payer for giving me the opportunity to take part in this challenging project and learn so much about programming. I really appreciated your flexibility and just the right mix of support and personal responsibility. Thank you for your faith in me. Further, special thanks to Jonas Pfefferle for his constructive feedback and Jens Schuessler for his proofreading and improvement of my orthography. Last but not least, I want to thank my girlfriend Natalie for her endless patience and support during the past months. I couldn t have done it without you. Tobias Hartmann August, 2011 vii

9 Contents 1 Introduction 1 2 Background information and related work Linux dynamic loader Shared library preloading The auditing interface Binary translation Static binary translation Dynamic binary translation Libdetox framework The ELF format ELF header Program and section header Symbol table Hash tables Relocations Dynamic loading of ELF files Thread local storage Access Models TLS relocations The GNU C Library Application startup Connection with the loader ix

10 x Contents Dependencies on the loader Related work Bionic library rtldi - Indirect runtime loader Control flow integrity Design and implementation seculoader Loading of libraries Symbol lookup Relocations Lazy binding Initialization of libdetox Initialization and control transfer to the application Dynamic loading of shared libraries The loader interface Memory protection Communication with libdetox Control flow transfer checks Call instructions Jump instructions Return instructions Optimizations seculoader Control flow transfer checks Evaluation Security analysis Performance Memory consumption Problems and Limitations

11 Contents xi 5 Future work Extensions Optimizations Conclusion 49 A Appendix 51 A.1 Unit tests A.2 Benchmarks A.3 Settings A.4 Useful tools A.4.1 GDB A.4.2 Readelf A.4.3 Objdump A.4.4 Strace Bibliography 56

13 1 Introduction In times of increasing attacks on computer systems through malware and exploits, it is a hard problem to secure untrusted applications. There exist many partial approaches like Address Space Layout Randomization (ASLR), Data Execution Prevention (DEP), stack canaries, software-based fault isolation (SFI) and policy-based system call authorization. The problem is that all these techniques protect only from specific attacks and lack an integration into the operating system. User-space virtualization and sandboxing systems try to ensure security by simulating parts of the system or restricting the access to resources. The libdetox framework implements such a user space sandbox based on dynamic binary translation. It relies on two main principles: The first principle is software-based fault isolation which establishes a sandbox by redirecting the control flow of an application at some point in time and encapsulating all instructions from that point on. This enables checking and modification of individual instructions of the untrusted program and guarantees that no unchecked code is executed. The second principle is policy-based system call authorization which is the interception of system calls invoked by the untrusted application. These system calls and their parameters are then matched against a policy. If a violation occurs the program is terminated. But like libdetox, most current implementations have some major drawbacks which partly even introduce additional security risks. First, they heavily depend on the Linux loader and the loader API to gain control over the untrusted application. Because the loader is treated as a black-box, direct attacks on the loader could not be detected by the virtualization framework. Furthermore, there are problems with the loading process of the framework which leads to unchecked code being executed before the virtualization framework even started. This is explained in Section 2.1 in more detail. Second, the Linux loader then treats the framework binary like a normal library with no special memory protection or additional security features. The untrusted application may therefore be aware of the virtualization and could try to manipulate the framework. Further, the untrusted application is viewed as a sequence of instructions without an approach to analyze the overall structure. Because the Linux loader is a key component of every system it is also an attractive target for attacks [15]. The loader has access to all the loaded objects and is responsible for the loading of libdetox. A successful hijacking of the loader could therefore circumvent the entire virtualization 1

14 2 1. Introduction system. In this thesis we present seculoader, a reimplementation of the Linux loader with security as a key aspect and high coupling of the virtualization system and the loader. Figure 1.1 shows a simplified comparison of the memory layout of the Linux dynamic loader and the seculoader. The untrusted parts of the system are colored in red and the trusted ones in green. The Linux loader on the left side has the already mentioned problems. The libdetox library is loaded into the normal chain of loaded objects with no special memory protection and unchecked code may be executed. In contrast, seculoader on the right side forms a unit with the libdetox framework secured by memory protection. The trusted parts are not reachable through the loader API from the untrusted parts. It guarantees that all untrusted code is executed inside of the libdetox sandbox and that the application is not aware of the virtualization. linux loader... libdetox main program shared library... seculoader libdetox... main program shared library... Not trusted Trusted Figure 1.1: Comparison between the Linux dynamic loader and the seculoader. Further, seculoader communicates detailed information about the loaded objects to the libdetox framework which is then used to construct a fine-grained execution model. This model specifies which control flow transfers are valid and is enforced by the virtualization system at runtime. The contributions of this thesis are (i) seculoader, a reimplementation of the Linux dynamic loader with focus on security aspects and high coupling with libdetox, (ii) additional guards which extend the libdetox framework and protect intra- and inter-module control flow transfers, and (iii) a complete evaluation of the system regarding security aspects, memory consumption and performance.

15 2 Background information and related work In this chapter we will provide some background information needed in this thesis. First, we present details about the loading process of the virtualization framework with the standard Linux loader. Then, we give an introduction to binary translation and the libdetox framework. Further, the ELF format and the handling of Thread Local Storage (TLS) is presented, followed by an overview of the GNU C Library. Additionally, we present related work that served as a basis for this thesis or inspired it. 2.1 Linux dynamic loader The Linux dynamic loader is part of the GNU C Library [10] and responsible for the loading and linking of shared libraries for an executable. Most virtualization systems like DynamoRIO [3], HDTrans [26] and libdetox [20] depend on the loader to gain control over the application but as already mentioned in Chapter 1, there are some problems with the loading of the virtualization system by the Linux dynamic loader. In the following we will give a short introduction to two options which force the Linux loader to load the framework before the untrusted application. We then explain the problems associated with those options Shared library preloading The LD PRELOAD environment variable allows an arbitrary shared library to be loaded prior to any program. In combination with the INITFIRST 1 flag this should instruct the Linux loader to load and initialize the virtualization framework before the untrusted application and enable it to gain control over the control flow. There are several problems associated with LD PRELOAD: 1. The Linux loader handles the INITFIRST flag only for the last loaded object which sets the flag. Because LD PRELOAD forces the loader to load libdetox first, any other object of the 1 This flag indicates that the initialization functions of this object should be run before any other object. 3

16 4 2. Background information and related work untrusted application which sets the INITFIRST flag is initialized before the virtualization platform even started and unchecked code is executed. 2. Shared libraries specified in the LD AUDIT environment variable are loaded into another namespace than the libraries loaded normally and therefore not accessible through the loader API functions like dl iterate phdr 2. This leads to (i) unchecked code being executed and (ii) missing information about these libraries for additional features like control flow transfer checks (see Section 3.3). 3. The GNU extension GNU IFUNC introduces a new relocation type which needs to be resolved by the execution of application code. In combination with DF 1 NOW or DF BIND NOW flags 3 this results in untranslated code that is executed during the relocation phase. 4. Since overriding of loader functions is possible, the untrusted application or one of the dependent shared objects may override functions of the Linux loader used by the virtualization system to manipulate its behavior. 5. The Linux loader treats libraries loaded through LD PRELOAD just like any other shared object without any special protection. They are accessible by the loader API which allows the application to notice the virtualization, retrieve information or even try to manipulate it. The problems 2, 4 and 5 can be handled by overriding the corresponding loader functions in the virtualization framework and providing the needed functionality, but the remaining problems 1 and 3 are not solvable by the virtualization framework. This is only an incomplete list, there may exist other vulnerabilities. The problem is that the Linux loader is not in the trusted computing base and therefore treated as a black-box. Control over the untrusted application is gained only after the loader already finished all the startup-processing like relocations and loading The auditing interface The second way to load the virtualization framework is to use the auditing interface by setting the LD AUDIT environment variable to the framework library. Like LD PRELOAD it forces the Linux loader to load the specified shared library before the main program but in a different namespace and further offers an so called auditing interface which notifies the loaded library at special auditing checkpoints, for example loading a new library or resolving a symbol. This auditing interface could be used by the framework to retrieve additional information about the loaded objects for features like control flow transfer checks, instead of using the loader API which may not supply complete information. Because the framework is loaded into a different namespace it is also not reachable through the loader API by the untrusted application. 2 dl iterate phdr iterates over all loaded objects and invokes a callback function for each object. 3 These flags instruct the loader to use eager binding for function relocations.

17 2. Background information and related work 5 This partly solves problem 5 with LD PRELOAD explained in Section 2.1.1, but problems 1 to 4 still remain. Additionally this approach introduces a high overhead because a dynamic function is executed at every auditing checkpoint. This shows that both approaches LD PRELOAD and LD AUDIT have some major security problems. Also other possible approaches like the ptrace debugging interface 4 to inject libraries could be detected and subverted [32] by the untrusted application. 2.2 Binary translation Binary translation (BT) describes a technique to translate binary code from a source to a target instruction set. In this introduction we focus on the special case where the source and the target instruction sets are equal and we are only interested in adding, removing or replacing instructions of the translated program. Binary translation makes it possible to add new functionality to an existing program or patch it at runtime. It enables the user to enrich a critical application with additional safety and validity checks or to do a performance analysis on specific parts. Further, BT is often used for user-space process virtualization, where an application is executed inside of a sandbox with an additional layer that checks all interactions with the outside. BT enforces this additional layer which encapsulates the code, catches and redirects critical instructions and guarantees that the application is unable to break out of the framework. We differentiate between two types of binary translation: (i) static, ahead-of-time translation and (ii) dynamic, just-in-time translation. In the following we give a short introduction to both types Static binary translation Static binary translation performs the translation before the start of the execution of the program. All the code of the source binary is translated at the same time and only once. After the translation phase no more translations are performed at runtime. Static BT is used for example for porting a program from an old system to a new one without needing to recompile it. The main advantage of static translation is that there is no runtime overhead and that the program can be arbitrarily complex since we translate it before execution. But it remains a big problem with code that is not reachable at translation time and hence not discovered by the translator. For example, there may be some parts of the code which are only reachable through indirect branches, where the destination is not known at translation time. The same problem exists with self-modifying or dynamically loaded code. 4 System call which can be used by one process to control and analyze another.

18 6 2. Background information and related work Dynamic binary translation Dynamic BT translates the code in blocks right before they are executed the first time. Further executions of a block are redirected to the translation of this block. This functionality is implemented with a code cache which stores all the already translated parts. Figure 2.1 illustrates the design of a dynamic binary translator. Source program Code cache 0 1 Translator 1' 4 2 Trampoline to block 3 2' 3 Mapping 1 1' 2 2' Figure 2.1: Design of a dynamic binary translator. The main advantage of dynamic BT is that it is not limited to statically reachable code because the translation takes place at runtime where the destination addresses are known. Further, the runtime system is able to adapt to phase changes of the translated program and to optimize dependent on these characteristics. The big disadvantage is the runtime overhead for the translation. As we will see in Chapter 4 most overhead is caused by indirect control flow transfers which have to be resolved by a runtime look up. But fortunately this overhead is amortized in most applications because the translated code sequences are executed multiple times. As already mentioned in the introduction, another problem with dynamic binary translation is the process of gaining control over the translated program. At some point in time the translator is initialized and starts the translation process by inspecting the control flow. It is important, even more when used for security tasks, that this is done as early as possible to guarantee that no unchecked code is executed. There are some common design decisions like the code cache or that the stack of the program always remains unchanged. This is important for virtualization since the translated application should not be able to detect the virtualization by checking if the stack changes. Some systems also instrument the code to detect hot code regions and then optimize it further by dynamically recompiling this parts.

19 2. Background information and related work Libdetox framework This section gives an introduction to the libdetox framework. Libdetox is a user-space process virtualization framework based on a low-overhead, table-based dynamic binary translator. It guarantees the safe execution of untrusted binary code. Binary translation enables control of the executed instructions and adds security guards which guarantee the secure execution of the untrusted application. This additional layer between the application and the operating system enforces several rules: 1. Code cannot break out of the sandbox. 2. Code cannot manipulate the sandbox. 3. No untranslated or unchecked code is executed. 4. The application cannot detect the sandbox. In Section 2.1 we showed that points 3 and 4 are not guaranteed when libdetox is executed with the Linux dynamic loader. main strlen mprotect Interposition framework Policy Sandbox Kernel Figure 2.2: Overview of an application running in the libdetox sandbox. Libdetox is further based on the following two principles: 1. Software-based fault isolation: The framework establishes a sandbox in which the application code is executed and ensures that it cannot break out of it by attaching special guards using dynamic binary translation. Those guards guarantee for example that all executed code is translated and that system calls are redirected to the interposition framework. 2. Policy-based system call interposition: All system calls issued by the application are redirected to the interposition framework and checked according to a policy. This policy specifies which symbol calls are allowed, depending on the name, parameters and call location.

20 8 2. Background information and related work Figure 2.2 shows an overview of the two principles. Libdetox provides a simple interface which allows a program to precisely control which code is executed inside of the sandbox. See Section for more information. Libdetox intercepts the control flow of the program before any of its code is executed (for example by using LD PRELOAD). The instructions of the binary are then processed in basic blocks (like if-else blocks or entire functions) and placed in a code cache. The execution flow of the program always stays in this code cache which contains all the already translated blocks. For all branches which point to untranslated code a trampoline is added that transfers control back to the translator which then translates the block and resumes the program. A mapping table is used to convert addresses of the original binary to the corresponding addresses in the code cache. So if further branches are encountered in the translation process libdetox first checks if they point to already translated code and if this is the case does translate them again, otherwise it adds a runtime trampoline. See Figure 2.1 for an overview. For the translation of individual instructions of a basic block multidimensional translation tables are used. These tables contain entries for all IA-32 instructions with information on how to translate them. Each byte of the instruction is used for the lookup in one of those tables. Libdetox starts with the first byte and uses it as index in the first table, the entry then contains either information about the instruction or a pointer to the next table if the instruction is not yet finished. The last entry then contains information like parameters and registers used by the instruction and a pointer to an action function. Figure 2.3 illustrates such a chain of translation tables.... flags *next &action_func flags *next &action_func flags *next &action_func... Figure 2.3: Chain of translation tables. Each entry contains flags (parameters, registers), a pointer to the next translation table and an action function. These action functions define how the instruction is translated. They are responsible for the actual translation and may alter, copy, replace or even remove the instruction and then add it to the code cache. The translation tables are created using a table generator with a high level interface. This allows the programmer to specify in detail how individual instructions are translated by using predefined action functions or providing new ones. Some of the important predefined actions are: action jmp: This action function translates a direct jump. First it transfers control to

21 2. Background information and related work 9 the fbt_check_transfer function that performs a control flow transfer check. If the check succeeds the target address is looked up in the mapping table and a control flow transfer to the block in the code cache is emitted. action jmp ind: This function adds a runtime lookup because the target address for the indirect jump is not known at translation time. At runtime the target address is then translated to the corresponding address in the code cache. Further, it adds a runtime control flow transfer check. action call: This action function first checks if the call is a PLT call by invoking the loader function sl_resolve_plt_call and if this is the case emits a direct branch to the resolved address. Otherwise it checks if the call is valid and if so emits a normal branch to the translated target in the code cache. action call ind: Behaves like action call but adds a runtime lookup and a runtime control flow transfer check because the target address is not available at translation time. action ret: Emits a function which translates the original return address on the stack to the corresponding target in the code cache at runtime. The translator basically just iterates over the instructions, performs a lookup in the translation tables and then executes the corresponding action function. An important aspect is that the stack of the translated program is not changed by libdetox. So the translated call statement still pushes the original address and the return instruction then translates the original address to the corresponding target in the code cache. This is important for features like exceptions and debugging which rely on the stack layout or the return address which would otherwise point into the code cache. Further, the program does not know that it is translated and is unable to notice it by examining the stack. Libdetox implements several optimizations which mainly target indirect control flow transfer. More information is available in [18] and [17]. 2.4 The ELF format The Executable and Linkage Format (ELF) was originally published by the UNIX System Laboratories as part of the Application Binary Interface (ABI) [24]. Today it is the widely-used standard file format for executables, object code, shared libraries, and core dumps in many modern operating systems like Linux, Solaris and FreeBSD. ELF files are very flexible because their main structures are not bound to a specific platform or architecture. In this section we will give a short introduction with focus on the information relevant for the loading and linking process. Additional information can be found in the TIS specification [31].

22 10 2. Background information and related work The ELF format basically defines two views for each Dynamic Shared Object (DSO). The first one is the program header that describes segments which are memory regions with same permissions at runtime. This information is important for linking and relocation. The second view is the section header with more fine grained information mostly needed for debugging purposes and not necessarily available at runtime. In the following, we will provide more detailed information about the parts of an ELF file. Header Program header table.text.rodata....data Section header table Figure 2.4: ELF format ELF header When parsing an ELF file, the loader first consults the ELF header. It contains all important information like the architecture for which the file was build, the type (executable or library) and the offset of the section and program headers. Listing 2.1 shows the detailed layout of the ELF header. Listing 2.1: ELF header 1 # define EI_NIDENT (16) 2 3 typedef struct { 4 unsigned char e_ident[ EI_NIDENT]; /* Magic number */ 5 Elf32_Half e_type; /* Object file type */ 6 Elf32_Half e_machine; /* Architecture */ 7 Elf32_Word e_version; /* Object file version */ 8 Elf32_Addr e_entry; /* Entry point */ 9 Elf32_Off e_phoff; /* Program header offset */

23 2. Background information and related work Elf32_Off e_shoff; /* Section header offset */ 11 Elf32_Word e_flags; /* Flags */ 12 Elf32_Half e_ehsize; /* ELF header size */ 13 Elf32_Half e_phentsize; /* Program header entry size */ 14 Elf32_Half e_phnum; /* Program header count */ 15 Elf32_Half e_shentsize; /* Section header entry size */ 16 Elf32_Half e_shnum; /* Section header count */ 17 Elf32_Half e_shstrndx; /* String table index */ 18 } Elf32_Ehdr; Program and section header The program header describes segments, which then contain one or more sections. The header defines their boundaries and provides further information for the loader on how to map them into memory. Some entries in the program header have special meanings. For example PT INTERP specifies the loader which should be used for loading the file or PT DYNAMIC points to the dynamic section which holds important information for the linker. The section header then provides detailed information about the sections which are only reliable for the on-disk version of the file, although the section header may be also available at runtime. Important sections like the Global Offset Table (GOT) or the Procedure Linkage Table (PLT) are explained in the following sections. Figure 2.4 illustrates the two different views. See listing 2.2 and 2.3 for more information. Listing 2.2: Program header entry 1 typedef struct { 2 Elf32_Word p_type; /* Type */ 3 Elf32_Off p_offset; /* File offset */ 4 Elf32_Addr p_vaddr; /* Virtual address */ 5 Elf32_Addr p_paddr; /* Physical address */ 6 Elf32_Word p_filesz; /* File size */ 7 Elf32_Word p_memsz; /* Memory size */ 8 Elf32_Word p_flags; /* Flags */ 9 Elf32_Word p_align; /* Alignment */ 10 } Elf32_Phdr; 1 typedef struct { Listing 2.3: Section header entry 2 Elf32_Word sh_name; /* Name */

24 12 2. Background information and related work 3 Elf32_Word sh_type; /* Type */ 4 Elf32_Word sh_flags; /* Flags */ 5 Elf32_Addr sh_addr; /* Runtime address */ 6 Elf32_Off sh_offset; /* File offset */ 7 Elf32_Word sh_size; /* Size */ 8 Elf32_Word sh_link; /* Link to another section */ 9 Elf32_Word sh_info; /* Additional information */ 10 Elf32_Word sh_addralign; /* Alignment */ 11 Elf32_Word sh_entsize; /* Entry size */ 12 } Elf32_Shdr; Symbol table The symbol table is one of the most important structures of the ELF format. It contains information about symbols defined and referenced in the file like the address of function and data definitions, their size and visibility and other important information (see listing 2.4). The table is mostly used by the runtime linker for relocations and symbol lookups but also by debugging tools like GDB 5. Listing 2.4: Symbol table entry 1 typedef struct { 2 Elf32_Word st_name; /* Name */ 3 Elf32_Addr st_value; /* Value */ 4 Elf32_Word st_size; /* Size */ 5 unsigned char st_info; /* Type and binding */ 6 unsigned char st_other; /* Visibility */ 7 Elf32_Section st_shndx; /* Section index */ 8 } Elf32_Sym; We differentiate between the static and the dynamic symbol table. The static symbol table contains information about all symbols referenced in the program. This includes all types of local symbols as well as global or imported ones from other DSOs. The dynamic symbol table in contrast contains only exported and imported function symbols. This is the bare minimum because these definitions are needed by the runtime linker to resolve references from and to other DSOs. A shared object may contain both tables or only one of them, depending on how it was compiled. If it contains only the dynamic symbol table we call it stripped. Table 2.1 shows the symbol table of an example application which uses the strlen and printf functions from libc. The dynamic symbol table therefore contains entries for these symbols with value zero which means that they have to be resolved by the runtime linker. 5 GDB is a standard debugger provided by the GNU software system.

25 2. Background information and related work 13 Num Value Size Type Bind Vis Ndx Name 0: NOTYPE LOCAL DEFAULT UND 1: NOTYPE WEAK DEFAULT UND gmon start 2: FUNC GLOBAL DEFAULT UND libc start main@... 3: FUNC GLOBAL DEFAULT UND strlen@glibc 2.0 (2) 4: FUNC GLOBAL DEFAULT UND printf@glibc 2.0 (2) 5: FUNC GLOBAL DEFAULT UND puts@glibc 2.0 (2) 6: c 4 OBJECT GLOBAL DEFAULT 16 IO stdin used Table 2.1: Example of the dynamic symbol table Hash tables Hash tables are used to speedup the symbol lookup in a shared object. So instead of searching the entire symbol table for a definition, the loader uses a hash table for the lookup. There are generally two types of hash tables. The old, normal hash table and the faster GNU hash table which also uses a bloom filter 6. See [31] and [2] for more information Relocations Because the loader does not map the dynamic shared objects to a fixed address but uses a dynamic address instead, the files have to be relocated. This means that each symbolic reference needs to be replaced by the actual runtime address which depends on the base address at which the DSO was mapped. For this purpose there exists a relocation table for each object which contains the addresses and information on how to adjust them after the object is mapped into memory. Several different relocation types for function and data references exist but here we only focus on the most common ones on the x86 architecture. There are two categories of relocations called RELA and REL which differ in the way the specify the addend. On x86 there are only REL relocations where the addend resides at the address marked for relocation. This type is simple, it defines the address which has to be relocated, the type of the relocation and the symbol index of the symbol which belongs to the address. Listing 2.5 defines the format. Listing 2.5: REL relocation 1 typedef struct { 2 Elf32_Addr r_offset; /* Address */ 3 Elf32_Word r_info; /* Type and symbol index */ 4 } Elf32_Rel; As we already mentioned, libraries are loaded at dynamic addresses. The DSOs are therefore compiled as Position Independent Code (PIC) and other DSOs that reference symbols in these 6 A filter that is used to determine if an element is a member of a set without the problem of false negatives.

26 14 2. Background information and related work libraries contain two additional tables. The Global Offset Table (GOT) and the Procedure Linkage Table (PLT). Symbols defined in one DSO are indirectly referenced through these tables in another DSO because the runtime addresses are not known at compile time. The loader then updates the table entries to point to the actual symbol. The principle is explained in the following sections Global Offset Table (GOT) The Global Offset Table contains absolute addresses to all data (functions and variables) referenced in the program. If position independent code accesses global data it first determines the address of the GOT, stores it in a register (usually %ebx) and then reads the entry at the corresponding offset which contains the absolute address. This is possible because the Linux loader resolves the references and updates the GOT entries before the application gains control. The same principle is applicable to function references but because symbol lookups are expensive this is done with lazy binding through the Procedure Linkage Table (PLT) if the BIND NOW 7 flag is not set Procedure Linkage Table (PLT) The Procedure Linkage Table is a special table because the entries are executable code blocks. Similar to the GOT which redirects position-independent data references to runtime addresses, the PLT redirects position-independent function calls to runtime locations. A function call to the libc malloc function for example may look like this: call 498 <malloc@plt> The call destination is an entry in the PLT (see Figure 2.6), where the first instruction is a jmp through an entry in the GOT (%ebx contains the address of the GOT). This entry is initially set to the address of the next instruction. This means that it points to 0x49e which is a push of an corresponding offset in the relocation table. The following instruction jumps to the beginning of the PLT where first some loader dependent data which identifies the calling DSO is pushed and then control is transferred through the GOT to the loader. Listing 2.6: Procedure Linkage Table 1 458: ff b pushl 0x4(% ebx) # DSO Data 2 45e: ff a jmp *0 x8(% ebx) # Loader resolve func. 3 7 The BIND NOW flag indicates that all relocations for this object must be processed before the control transfer.

27 2. Background information and related work <malloc@plt >: 5 498: ff a jmp *0 x18(% ebx) # Next instruction 6 49e: push $0x18 # Rel. offset 7 4a3: e9 b0 ff ff ff jmp 458 <printf@plt -0x10 > Thus the loader receives the two arguments pushed in the PLT: (i) the offset in the relocation table and (ii) a pointer to some data which identifies the DSO. It is now able to look at the entry in the relocation table, do a symbol lookup for the call destination (in this case the malloc function in the libc) and perform the necessary relocation. The relocation sets the entry in the GOT which initially pointed to the next instruction to the just resolved address. Further calls to this function are therefore directly redirected to the resolved function without another symbol lookup. This procedure is called lazy binding, a function reference is only resolved if necessary. In order for this to work, the loader has to initialize the 2nd entry of the GOT to point to some internal identification data and set the 3rd entry to the address of the resolve function Relocation types We distinguish between function and data relocations. As already stated, function relocations are handled through lazy binding. The corresponding relocation type is R 386 JMP SLOT which then points to the entry in the GOT to be fixed. Table 2.2 gives an overview of the important data relocation types. Name Description Code example R Non-GOT reference. Static pointer to exported symbol. R 386 PC32 PC relative non-got reference. - R 386 GLOB DAT Reference through GOT. Pointer to global symbol. R 386 RELATIVE Relocatable data reference. Pointer to locally defined static data. R 386 COPY Copy relocation. Reference to global data from main object. Table 2.2: 32-bit x86 Data Relocation Types. R and R 386 PC32 are for data references which do not use the Global Offset Table like a static pointer to a symbol defined in another DSO. In contrast, R 386 GLOB DAT points to an entry in the GOT which needs to be relocated. For local defined data references, which also need to be relocated because the DSO is not mapped at a fixed address R 386 RELATIVE is used. Another special relocation type is the copy relocation R 386 COPY that instructs the runtime linker to copy data from a shared object into the application memory space. Further information about the relocation types and on how the addresses are calculated can be found in Chapter Relocation Sections in the Linker and Libraries Guide [28]. There exist also relocation types for Thread Local Storage (TLS). These are explained in Section 2.5.

28 16 2. Background information and related work Dynamic loading of ELF files The dynamic section contains all necessary information to load and execute the file. This includes all the shared objects the application depends on, the important tables (symbol tables, hash tables) introduced in the last sections, initialization and finalization functions and much more. The loader uses this information to map all needed objects into memory, resolves the dependencies between them and finally transfers control to the program. 2.5 Thread local storage Normally all threads of a process share the same address space. enables the programmer to declare data as local to each thread. Thread Local Storage (TLS) The declaration of thread-local variables depends on the compiler. With GCC 8 a thread local variable is declared by using the thread keyword, for example thread int value; declares an integer variable local to each thread. In the following we will explain how thread local storage is supported with the help of the loader. Each shared object which uses TLS has a block of memory defined in which all the TLS variables reside. On startup the loader combines the blocks of all loaded DSOs by copying the initialization images from the DSOs and creates one static TLS block. It assigns an offset to each of the blocks and generates an entry in the so called Dynamic Thread Vector (DTV) that points to it. Figure 2.5 shows this layout for three loaded DSOs which use TLS. tlsoffset 3 tlsoffset 2 tlsoffset 1 Thread pointer Dynamically loaded TLS Block 3 Block 2 Block 1 TCB Block 4 Block 5 DTV gen dtv 1 dtv 2 dtv 3 dtv 4 dtv 5 lower addresses higher addresses Figure 2.5: Thread Local Storage Layout. At startup, each created thread gets a copy of the entire TLS structure assigned which guarantees that the TLS variables are actually local to each thread. This structure is then accessible through 8 GCC is the C compiler part of the GNU Compiler Collection.

29 2. Background information and related work 17 the thread pointer (residing in the %gs register) that points to the Thread Control Block (TCB). The instruction movl %gs:0, %eax for example loads the address of the TCB in %eax which can then be used to access a TLS variable in the static block. For dynamically loaded DSOs (through dlopen) the loader allocates a new block outside of the static block and adds a DTV entry which points to it. These variables can only be accessed with runtime support of the loader because the addresses of these blocks in memory are not known at linking time. For this purpose there are several possible ways to access a TLS variable, depending on the type (static or dynamic) and the possible optimizations Access Models We distinguish between static and dynamic TLS. Static TLS is the fastest but most restricted access model because it can only access variables which are part of the initial static TLS block. Dynamic TSL in contrast can also access variables which belong to shared objects loaded at runtime where the TLS block is therefore not in the static block. Because this needs support from the loader it is slower. Static TLS: Local Executable (LE) This model can only reference TLS variables of the main executable. No references to variables outside are possible, because the offsets in the main TLS block are statically calculated at linking time and the address of the variable is then calculated by simply subtracting this offset from the thread pointer (see Figure 2.5). Initial Executable (IE) This model can only reference TLS variables which are part of the initial static TLS template. This means that new variables added by dynamically loaded code can not be referenced. The address is calculated by subtracting the statically known offset in the TLS block from the address of the block supplied by the loader. Dynamic TLS: Dynamic TLS needs runtime support from the linker to determine the address of the TLS block since it is not known at linking time. The tls_get_addr function supplied by the linker takes the module ID and the TLS offset of the variable and calculates the address. Local Dynamic (LD) This model can only reference TLS variables that have a fixed offset in the TLS block of the corresponding DSO (static as well as dynamically loaded DSOs). Only the address of the TLS block has to be calculated at runtime. General Dynamic (GD) This is the most general model and can therefore reference all TLS variables. Both the TLS block and the offset in the block have to be determined at runtime.

30 18 2. Background information and related work More information with examples of the different access models is available in ELF Handling For Thread-Local Storage by Ulrich Drepper [7] and in the Linker and Libraries Guide [28] TLS relocations Because the loader creates the static TLS block, the offsets are not determined at linking time. Therefore relocations are needed to set the offsets and module IDs. Table 2.3 lists the most important ones and describes for which model they are used. Name R 386 TLS TPOFF R 386 TLS TPOFF32 R 386 TLS DTPMOD32 R 386 TLS DTPOFF32 Description IE model: Set negative offset relative to TCB. IE model: Set positive offset relative to TCB. LD and GD: Set module ID of object containing TLS symbol. GD: Set offset in TLS block of symbol. Table 2.3: 32-bit x86 TLS relocation types. The local executable model needs no relocations because all information is already available at linking time. The R 386 TLS TPOFF and R 386 TLS TPOFF32 relocations are used for the initial executable model to determine the offset of the TLS variable relative to the thread control block. R 386 TLS DTPMOD32 is used for the local and the general dynamic model because they both need the module ID of the dynamically loaded shared object to determine the address using tls_get_addr. R 386 TLS DTPOFF32 is only needed for the general dynamic model to get the offset relative to the TLS block of the symbol. 2.6 The GNU C Library The GNU C Library is an implementation of the C standard library released by the GNU Project and implements important functionality used by many programs. In the following we present some detailed information about the libc relevant for the loading of shared objects Application startup The libc supports the startup of programs and is responsible for the control transfer between the loader and the application. After the loader finished its tasks it transfers control to the _start function defined in the program. This function then calls the libc function libc_start_main. This function is responsible for the following tasks: Perform security checks Initialize the threading system

31 2. Background information and related work 19 Register the fini handler supplied by the loader which is used to transfer control back to the loader Call the main function with arguments After the program finished, control is transferred back to libc_start_main which then calls exit with the return value of main Connection with the loader If a program uses for example the dlopen function to load a dynamic shared object at runtime it does not directly invoke the corresponding function of the Linux loader. In fact, it includes the libdl library which is part of the libc. This library supplies wrappers around the loader functions for the dynamic loading of libraries that perform additional work like security and consistency checks of the arguments and then call the internal versions of the loader interface. This is the same for the threading library libpthread Dependencies on the loader The libc has some direct data dependencies on the Linux loader. These are needed for different purposes like the application startup and thread local storage but mainly to access the loader API. In the following we list and describe these data dependencies. rtld global: A global structure which is writable and contains information about the loading process like namespaces, TLS data and locks. rtld global ro: Another global but read-only structure which contains mainly function pointers to the functions of the loader interface (see Section 3.1.8). dl argv: Pointer to the command line arguments of the executable. libc stack end: The end of the stack. libc enable secure: Security mode of the libc. 2.7 Related work In this section we present other related work that served as a basis for this thesis or inspired it. On the one hand there exist already many reimplementations of the Linux dynamic loader for different purposes, like the loader part of the Bionic Library 9 [11] or the Rtldi Project [23] that try 9 Bionic is a reimplementation of the GNU libc developed by Google for the Android mobile software platform.

Study and Analysis of ELF Vulnerabilities in Linux

Study and Analysis of ELF Vulnerabilities in Linux Biswajit Sarma Assistant professor, Department of Computer Science and Engineering, Jorhat Engineering College, Srishti Dasgupta Final year student, Department