Building a Reactive Immune System for Software Services

Building a Reactive Immune System for Software Services Tobias Haupt January 24, 2007 Abstract In this article I summarize the ideas and concepts of the paper Building a Reactive Immune System for Software Services by Sidiroglou, Locasto, Boyd and Keromytis from Columbia University in which the authors propose a way to treat software bugs that may not be foreseen during development and testing of that software. These may be software failures, ranging from remotely exploitable vulnerabilities to more mundane bugs that cause abnormal program termination [...] or other recognizable bad behaviour [...] [SLBK05]. Various software tools are used to create an environment in which an application can be monitored. Whenever abnormal program behaviour is observed inside the application, it can be healed with the help of an instruction level emulator and special exception handling methods so that future use of the application should than be possible without running into the same problem. In the paper, the structure of such a reactive immune system is discussed by demonstrating a prototypical implementation. Focus is also put on performance issues resulting from emulating parts of the application. Contents 1 Introduction 2 2 Basics 2 3 Reactive Immune System 3 3.1 Overview......................................... 3 3.2 Monitoring the application................................ 3 3.3 Selective Transactional EMulation (STEM)...................... 4 3.4 Types of treated errors.................................. 4 3.4.1 Division by zero................................. 4 3.4.2 Memory dereferencing.............................. 4 3.4.3 Buffer overflow (see also [TK02])........................ 4 3.4.4 Denial of service................................. 5 3.5 Error handling...................................... 5 3.6 Implementation of STEM................................ 6 4 Evaluation 6 4.1 Attacks, exploits..................................... 6 4.2 Error virtualization.................................... 7 4.3 Performance........................................ 7 5 Related Work 8 6 Discussion 8 7 Conclusion 9 1

2 2 BASICS 1 Introduction There are different approaches to avoiding failures in software systems but even today, when a number of well trained software developers work together on large projects, there are still bugs in these systems that nobody will find until they can, for example, be exploited by attackers or lead to critical errors in the system. This is most dangerous in the domain of (web) services which have to be highly available or deal with most valuable private data. The current approaches to avoiding vulnerabilities in these systems can be classified in four categories: Proactive approaches Selection of safe programming languages and other development tools and methodologies aims to make the software as dependable as possible during development. Debugging Analysis after faults have happened to determine the error in the source code. Runtime Prevent access of faulty applications to the underlying system and valuable data by providing a safe environment for the execution of the application (sandboxes, emulators). Fault tolerance Multiple instances of the application agree on the result of an operation to be able to ignore an error or attack on a single instance. In the paper the authors describe a reactive approach. This is an approach whose aim is to react to the occurrence of previously unseen errors by trying to fix the error and ensuring safe execution of the erroneous operation. The backbone of the system is a so-called Observe Orient Decide Act (OODA) feedback loop. At first the application will be observed by the operating system. Whenever a failure occurs, a recovery mechanism will be invoked to handle the failure by choosing and extending the region of code that will be emulated in the future by altering the application s source code and recompiling and restarting it. The system is capable of running unemulated and emulated code simultaneously because the emulator will be integrated into the source code of the application. Another main idea in the paper is the so-called error virtualization a method to extend the error handling mechanisms of the application by retrofit[ting]h an exception catching mechanism onto code that wasn t explicitly written to have such a capability. [SLBK05] The domain of the application studied in the paper are server applications because in general those applications have higher requirements in terms of reliability. Those applications, e.g. the Apache webserver, process input streams that arrive over the network. Therefore it is easy to replay the sequence of events that lead to the error Easier than, for example, in user-oriented applications with graphical user interfaces. The experiments in the paper show that the error virtualization and emulation techniques make applications a lot more dependable but have a deep impact on the performance of applications. Future work will focus on this issue by fine-tuning the whole system, especially the work of the emulator. 2 Basics Server applications like web or database servers have very strong requirements regarding availability and security. Often these applications are exposed to arbitrary input from any number of different clients. Therefore, bugs and vulnerabilities are a serious threat. Attacks that lead to critical errors, like execution of dangerous code through buffer overflows, or simply to denial of service have to be avoided at all costs. Therefore server applications are normally checking all input and provide error handling routines to handle invalid input streams. Nevertheless, unforeseen boundary conditions may still occur and result in vulnerabilities if they are discovered and

3 misused by attackers. It often takes time to provide a suitable bug fix and even more time until everybody has updated the vulnerable application. In a normal scenario an attacker can use this time to intrude different instances of the same version of the vulnerable application on different servers to exploit the bug. It is most dangerous, if the attacker can get access to secret data about other users, the system or the company. Attackers usually need more than one attempt to figure out how to construct a dangerous input sequence and this is the point at which the approach described in the paper comes into play. The first abnormal behaviour of the application will trigger a sequence of actions to prevent the same error in the future. 3 Reactive Immune System 3.1 Overview Figure 1: Immune system overview [SLBK05] In figure 1 you can see an overview of the reactive immune system proposed by the authors of the paper. The system is separated into three main sections. On the production server the application is running attached with several kinds of sensors that can detect different kinds of errors in the application. On a second server an instrumented instance of the same application is running and tests can be performed on this instance to get a vaccine for an error that occurred in the application on the production server. This is done by reusing the sequence of input vectors that potentially let to the previous error. The third part is some kind of automated testing environment where some knowledge about types of errors and techniques to avoid those errors can be applied to and inserted into the testing instance of the application. 3.2 Monitoring the application In the paper, two types of sensors are described. The first approach is based on the operating system the application is running on. Whenever an abnormal program termination occurs, the operating system stops the application and creates a core dump file with the type of failure and a stack trace. The search for the very point of the failure in the code can be started at the last called method and continued according to the call graph in the application. The second approach makes use of a so-called honeypot server that will only be visible to attackers that scan the network for available server instances. Normal requests will not be processed by this server. In this instance of the application one can instrument the parts of the application that may be vulnerable to a specific type of attack with an instruction level emulator that checks every single instruction before it will actually be executed. In such an environment it is much easier to extract the input that lead to the failure and afterwards replay the attack. Furthermore, the performance is not a critical factor on such a server that does not process normal service requests.

4 3 REACTIVE IMMUNE SYSTEM 3.3 Selective Transactional EMulation (STEM) The recovery mechanism uses an instruction-level emulator, STEM, that can be selectively invoked for arbitrary segments of code. [SLBK05] Code that is in some way error-prone (shown by previously detected errors in this code region) can be emulated by STEM. STEM allows to run single instructions on a virtual processor and therefore to handle errors in a way that they do not affect the real operation environment. Snapshots of internal states can be copied to the real environment whenever an emulated part of the application is processed successfully. Should an instruction provoke an error, STEM can handle that failure by ignoring all changes to the memory and the processor registers. To do that, it simply ignores the internal states of the virtual environment. This procedure explains the transactional in the name of the emulator. The emulated part of the program is treated like a single operation. It will either succeed without fault or have no influence on the processor and memory. 3.4 Types of treated errors Depending on the sensor that detected the fault, the emulator knows different methods to prevent future failures in the emulated code. Some examples mentioned in the paper are listed below. 3.4.1 Division by zero The emulator only needs to check the operand to a div instruction. 3.4.2 Memory dereferencing Any memory access has to be checked for whether the addressed page is inside the address space of the process. This can be done by using the mincore() system call. 3.4.3 Buffer overflow (see also [TK02]) A very serious threat to software systems on servers is the buffer overflow and derived methods for attacking a vulnerable system. Buffer overflows occur when a large string is copied to a fixedsize memory buffer by an unsafe C function that does not check the boundaries of the target buffer. Whenever an input string from a client is handled directly by such an unsafe function without checking the length and content of that string, an attacker can force the buffer overflow by providing malicious data. Figure 2: Stack Layout [TK02]

3.5 Error handling 5 Figure 3: Operation of strcpy [TK02] Figure 2 shows a stack layout of a function compiled by gcc while figure 3 shows the influence of a strcpy call with too large of a string as parameter on that stack in memory. As you can see, the result of such a call may overwrite the return address of the function and thus cause the instruction pointer to point to an undefined position in the stack. An attacker can construct the string in such a way that it contains executable code and the return address of the vulnerable function points to that code, leading to an undesired execution of the inserted code with the permissions of the running process. STEM is able to detect buffer overflows by tagging the available buffer memory with one extra byte and controlling memory access to this very byte. Writing over the boundaries of a fixed-length buffer can be avoided this way. 3.4.4 Denial of service Some attacks aim to force a denial of service by requesting an algorithmically complex operation. A very simple example would be that the program runs into an infinite loop because of malicious input signals. Denial of service can also be achieved by coupling multiple clients to request the service at the same time. The server will not be able to distinguish between requests from attackers and normal requests and will therefore deny service to a number of incoming requests because it is overloaded. STEM can also prevent those attacks by counting the number of instructions and aborting after a predefined threshold is exceeded. 3.5 Error handling Upon detecting a fault, our recovery mechanism undoes all memory changes and forces an error return from the currently executing function. To determine the appropriate error return value, we analyse the declared type of the function. [SLBK05] Based on some heuristics, an error return value is generated from the type of the function that may crash. For example, -1 is returned when the type is int. More difficult to determine is the result of a pointer or value return parameters. Depending on the code of the calling function, one might need to expand the emulated area of code to this function because simply returning a NULL value is not always useful and may lead to more errors. Evaluation shows that these heuristics work extremely well but future work should analyse the source code more exactly and may even ask the programmer to define a common error-code convention.

6 4 EVALUATION 3.6 Implementation of STEM STEM is implemented as a C library which defines special methods that can be inserted into the source code of the application that shall be emulated. It is possible to switch the emulator on only for a part of the program for example, for a method that handles the input code from the client. This is beneficial because only parts of the application need to run inside an emulated environment rather than the entire application. This results in lower performance costs. The downside is that, at least at the moment, you need the sources of the application that is to be emulated in order to insert the special emulator commands that start and stop the emulation. Figure 4: Inserting the emulator in the method foo() [SLBK05] Figure 4 shows an extract of the source code of an application with the required calls to initialise, start, stop and terminate the emulator inside of a single method foo(). The only instruction that is emulated in this example is the incrementing of a local variable. The first call to emulate init() moves all the program state into a data structure for the emulator. This structure represents the virtual environment needed by the emulator. Emulate begin() starts the emulation process by getting the instruction address of the first emulated instruction. Emulate end() stops the emulation if no error occurred during emulation, emulate term copies the virtual environment back to the real environment. Execution can continue under normal conditions. In case of an error, the emulator is capable of returning to the original program state of the time before emulation began. 4 Evaluation A large part of the original paper is about evaluation of the presented approach. Three main points are mentioned: First, the usability of the system when it comes to real life attacks in a productive environment. Second, some tests on whether the idea of error virtualization and the slicing off of functionality in vulnerable methods will work in practice. The last point is about performance of fully and partially emulated systems. 4.1 Attacks, exploits One example for real attacks is the Apache server and the Apache-Scalp exploit that takes advantage of a buffer overflow vulnerability based on the incorrect calculation of the required buffer sizes for chunked encoding requests. [SLBK05]

4.2 Error virtualization 7 The Apache server was inserted into the immune system environment and the system had no knowledge about the vulnerabilities inside of the server application. Appropriate sensors were used to detect errors in the software as stated in previous sections. During the experiment, the selective emulator was attached to the vulnerable method, therefore successfully recovering from the attack and serving subsequent requests. 4.2 Error virtualization To handle errors, STEM works by slicing off the functionality of the emulated methods by returning predefined error values. The question was if an application could work without the missing functionality and would not crash. The authors again tested the Apache server by automatically inserting early return statements into leaf methods (that do not call other methods). A webservice performance measurement tool was used to test the altered server application. The result was if the method aborting did not influence the test in 139 of 154 different cases. The Question arises if the functionality of those applications is still correct, regarding security issues. Missing functionality may, for example, result in missing authorization checks. 4.3 Performance The main issue regarding the performance of the system is the emulation of parts or the whole application. To check the performance, the application was tested by a performance measurement tool and different areas of the code were emulated. Figure 5: Performance testing by emulating different parts of the apache application [SLBK05] In figure 5 you can see the number of handled requests per second related to the number of threads for different clients that were running. The graph at the bottom shows a complete emulation of Apache inside of a different emulator than STEM the Valgrind emulator. In

8 6 DISCUSSION contrast to STEM, Valgrind is a high performance emulator because it has been optimised in a lot of different ways, so full emulation is much faster with Valgrind than with STEM. However the partial emulation with STEM is still faster than full emulation with Valgrind. Future work has to be done to optimise the performance of STEM. This can be done by applying some well-known techniques like caching translated instructions. Other ways to improve performance aim at minimizing the size of the emulated regions in the application s code. More intelligent mechanisms to detect the origin of the fault in the code may be applied. 5 Related Work Valgrind is a program supervision framework that enables in-depth instrumentation and analysis of IA-32 binaries without recompilation. Valgrind has been used [...] to implement instruction set randomization techniques to protect programs against code insertion attacks. (see [SLBK05] and [BAF + 03]) There are other approaches to make applications safer and more resistant against attacks, such as sandboxing or virtual machines. Often these approaches also work with safe languages like Java. The approach in this paper does, in contrast, focus on an unsafe language like C and will not only prevent errors in the application but will also try to repair the vulnerable application by automatically providing some kind of bug fix. It is usable in production environments at runtime. 6 Discussion The only bad thing about the paper is that structure is not clear enough. Some things are described in more or less detail in multiple sections. Therefore it is sometimes hard to read. The authors put a lot of focus on the evaluation and testing of their idea. They mention some promising things but also do not ignore the problems that occurred. My opinion is that the approach will lead to a suitable environment for safer software that can handle errors automatically by itself. I think that those systems will not work without some human interaction. Most of the approaches suffer from a lack of knowledge about the application they work with. This may lead to serious threats, too and is also the reason why the authors plan to create a framework for correctness testing. During my presentation, there were some questions about real performance numbers or the way to interpret these numbers. The system was capable to detect and heal about 88% of the bugs that were inside the application. The system did not know those bugs at the beginning of the test. Obviously, the authors did know the attacks that used vulnerabilities of the application and tested the system by running those attacks on it. So, in general, one can say that the system does heal unknown bugs and does prevent unknown attacks. In any case, the system needs to have knowledge about the kind of bugs that may occur. Appropriate sensors have to be provided which are capable of detecting errors like code injection through buffer overflow vulnerabilities. Another question was how one can figure out which input exactly lead to the detected error. In a production environment this is very difficult because one has to take into account multiple clients and their request strings. So it will always be hard to isolate the actual sequence of inputs that triggered the error. The most convincing method is the so-called honeypot server that only attackers will track down inside of the network. In this way, the malicious sequence of inputs can be isolated from normal inputs and the testing and analysis can determine the size of the emulated region in the code by replaying these inputs until the application does not crash anymore. Because the performance of the honeypot server is not relevant, one might even start the whole application inside the emulator and check every single instruction for occurrence of different kinds of errors. In this environment it is easy to provide a vaccine that, for example, implements array bounds checking to prevent buffer overflows. The question about the size of the input sequence the immune system uses to replay the error and find a cure for it is hard to answer. The authors do not mention much about this issue in

9 the paper. There might be different strategies to minimise the input. One might be to enlarge the amount of replayed input step by step until the error will show up again. In the scenario with the honeypot server there will probably be only one client that tries to attack the application. A more intelligent analysis algorithm may look for patterns in the client s input vector to isolate the dangerous sequence. An example might be a sequence of slightly changed requests with the goal of constructing a successful exploit. 7 Conclusion It has been shown that there are ways to cope with unknown errors even in larger software applications. Automatic environments can be set up to ensure the quality and correctness of server applications in the internet. The greatest problem is still the performance of such applications that will be compromised by the use of emulators. A lot of investigation has to be done regarding the correctness of self-altering application systems so that these alterations do not merely lead to more bugs. References [BAF + 03] E. G. Barrantes, D. H. Ackley, S. Forrest, D. Stefanovic, and D. D. Zovi. Randomized instruction set emulation to disrupt binary code injection attacks. In 10th ACM Conference on Computer and Communication Security (CCS), October 2003. 5 [SLBK05] Stelios Sidiroglou, Michael E. Locasto, Stephen W. Boyd, and Angelos D. Keromytis. Building a reactive immune system for software services. USENIX Annual Technical Conference, pages 149 161, 2005. (document), 1, 1, 3.3, 3.5, 4, 4.1, 5, 5 [TK02] T. Toth and C. Kruegel. Accurate buffer overflow detection via abstract payload execution. In Proceedings of the 5th Symposium on Recent Advances in Intrusion Detection (RAID), October 2002. (document), 3.4.3, 2, 3