A SYSTEM FOR PROGRAM EXECUTION IDENTIFICATION ON THE MICROSOFT WINDOWS PLATFORMS

Similar documents
Face Recognition Technology Based On Image Processing Chen Xin, Yajuan Li, Zhimin Tian

Condor Local File System Sandbox Requirements Document

Video Inter-frame Forgery Identification Based on Optical Flow Consistency

Implementing Software Connectors through First-Class Methods

Integrated Software Environment. Part 2

Popularity of Twitter Accounts: PageRank on a Social Network

NON-CENTRALIZED DISTINCT L-DIVERSITY

Operating-System Structures

Yuki Ashino, Keisuke Fujita, Maiko Furusawa, Tetsutaro Uehara and Ryoichi Sasaki

OPERATING SYSTEMS. Prescribed Text Book Operating System Principles, Seventh Edition By Abraham Silberschatz, Peter Baer Galvin and Greg Gagne

Complexity Results on Graphs with Few Cliques

Optimum Alphabetic Binary Trees T. C. Hu and J. D. Morgenthaler Department of Computer Science and Engineering, School of Engineering, University of C

ARIZONA CTE CAREER PREPARATION STANDARDS & MEASUREMENT CRITERIA SOFTWARE DEVELOPMENT,

Autonomous Garbage Collection: Resolve Memory

An Approach to Software Component Specification

Aero-engine PID parameters Optimization based on Adaptive Genetic Algorithm. Yinling Wang, Huacong Li

An Approach to the Generation of High-Assurance Java Card Applets

COMP4128 Programming Challenges

Raunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering Amravati

Software Quality. Chapter What is Quality?

Formal Model. Figure 1: The target concept T is a subset of the concept S = [0, 1]. The search agent needs to search S for a point in T.

What do Compilers Produce?

A reversible data hiding based on adaptive prediction technique and histogram shifting

K-Means Clustering Using Localized Histogram Analysis

Introduction to Software Testing

A Keypoint Descriptor Inspired by Retinal Computation

Security Digital Certificate Manager

Blockchain Certification Protocol (BCP)

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

When Java technology burst onto the Internet scene in 1995,

Operating systems and security - Overview

Operating systems and security - Overview

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 9 Database Design

Domain Specific Search Engine for Students

Tri-County Communications Cooperative, Inc. Broadband Internet Access Services. Network Management Practices, Performance Characteristics, and

How To Remove A Virus Manually Windows 7 Laptop Using Antivirus Program

XI International PhD Workshop OWD 2009, October Fuzzy Sets as Metasets

Operating-System Structures

The New C Standard (Excerpted material)

A New Method Of VPN Based On LSP Technology

THE ADHERENCE OF OPEN SOURCE JAVA PROGRAMMERS TO STANDARD CODING PRACTICES

A+ Certification Guide. Chapter 15 Troubleshooting and Maintaining Windows

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM

Teiid Designer User Guide 7.5.0

CS 8803 AIAD Prof Ling Liu. Project Proposal for Automated Classification of Spam Based on Textual Features Gopal Pai

EDAA40 At home exercises 1

Module 3: Operating-System Structures

Computer Number Systems Supplement

SAFE-BioPharma RAS Privacy Policy

Efficient and optimal block matching for motion estimation

Intrusion Prevention Method on LKM (Loadable Kernel Module) Backdoor Attack. Ji-Ho CHO, Han LEE, Jeong-Min KIM and Geuk LEE *

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. Study on secure data storage based on cloud computing ABSTRACT KEYWORDS

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1}

A Formalization of Transition P Systems

Slides for Faculty Oxford University Press All rights reserved.

How To Install Windows Update Vista Without Cd Dell Inspiron 1525

Operating-System Structures

- Table of Contents -

An Adaptive Threshold LBP Algorithm for Face Recognition

RPS Technology Standards Grades 9 through 12 Technology Standards and Expectations

This version has been archived. Find the current version at on the Current Documents page. Archived Version. Capture of Live Systems

IBM. Security Digital Certificate Manager. IBM i 7.1

Relational Database: The Relational Data Model; Operations on Database Relations

Chapter 6 Storage Management File-System Interface 11.1

Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi.

Full file at

Train schedule diagram drawing algorithm considering interrelationship between labels

Tutorial 1 Answers. Question 1

Chapter 3: Operating-System Structures

Topology and Topological Spaces

An Approach to Task Attribute Assignment for Uniprocessor Systems

Unit 2 : Computer and Operating System Structure

Timestamps and authentication protocols

Metric and Identification of Spatial Objects Based on Data Fields

A Typed Lambda Calculus for Input Sanitation

Programming in C++ Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Trust Services for Electronic Transactions

The Encoding Complexity of Network Coding

Joint Entity Resolution

CMPSCI 250: Introduction to Computation. Lecture 20: Deterministic and Nondeterministic Finite Automata David Mix Barrington 16 April 2013

IBM i Version 7.2. Security Digital Certificate Manager IBM

Ensuring Desktop Central Compliance to Payment Card Industry (PCI) Data Security Standard

Norton 360 Manual Scanning Not Working Windows 8

Outline. Proof Carrying Code. Hardware Trojan Threat. Why Compromise HDL Code?

Stacks, Queues & Trees. The Stack Interface. The Stack Interface. Harald Gall, Prof. Dr.

Module 3: Operating-System Structures. Common System Components

Getting Started Guide. This document provides step-by-step instructions for installing Max Secure Anti-Virus and its prerequisite software.

Rapid Natural Scene Text Segmentation

Taibah University College of Computer Science & Engineering Course Title: Discrete Mathematics Code: CS 103. Chapter 2. Sets

Oakhurst, California. Cancels 2 nd Revised Check Sheet A LIST OF EFFECTIVE SHEETS

Location Privacy Protection for Preventing Replay Attack under Road-Network Constraints

Concepts, Technology, and Applications of Mobile Commerce

Design Report for ErdosFS

A Mechanism for Sequential Consistency in a Distributed Objects System

Songklanakarin Journal of Science and Technology SJST R1 Ghareeb SPATIAL OBJECT MODELING IN SOFT TOPOLOGY

Pedigree Management and Assessment Framework (PMAF) Demonstration

A Novel Data Mining Platform Design with Dynamic Algorithm Base

Thwarting Traceback Attack on Freenet

MA651 Topology. Lecture 4. Topological spaces 2

Design of Deterministic Finite Automata using Pattern Matching Strategy

Transcription:

A SYSTEM FOR PROGRAM EXECUTION IDENTIFICATION ON THE MICROSOFT WINDOWS PLATFORMS Yujiang Xiong Zhiqing Liu Hu Li School of Software Engineering Beijing University of Post and Telecommunications Beijing China xyjs2003@163.com Abstract This paper describes a system for identification execution of programs using execution events of the programs. This system is based on a model of program execution for security purposes, and is implemented on the Microsoft Windows platforms using an operating system technique called DLL (Dynamic Linked Library) replacement. Compared to other related works, this paper has two key contributions: It describes a systematic way to retain all system DLLs made by application programs dynamically and in real-time on the Microsoft Windows platforms. It also presents a new model of program execution, in which frequencies of program execution events are considered in addition to their patterns. Our experiment data indicate improved results. Keywords Program execution model, DLL replacement, DLL interception, and API calls footprint 1. Introduction Networking and system security has become a major concern while the Microsoft Windows systems are widely used in our age. Many security problems have been discovered in the Microsoft Windows systems. For example, many computer viruses exist in the systems, which may lead to the leak of critical information or destroy important files. In addition, software running on the Windows platforms in unsecured environments, is often required to bear certain assurance for security purposes. The ability to be able to identify execution of a program based on its execution events would help in both of these cases. Prof. Liu has presented a model of program execution, which can function as a framework for program execution identification[1]. This paper presents a program execution identification system constructed for the Microsoft Windows platforms. By using an operating system technique called DLL (Dynamic Linked Library) replacement, this system is able to intercept and retain efficiently all system API (Application Programming Interface) calls made by application programs in real-time, along with normal execution of the application programs. Our system employs a new model of program execution, in which system call footprints are used, along with their frequency factors, to identify execution of programs. This use of frequency factors helps improve the accuracy of our program execution identification system. 2. Related Works Importance of program modeling is widely recognized. Modeling of program can be either done statically or dynamically, and former way is relatively easy. Trustworthy programs can be modeled using either program signatures or proof-carrying code (PCC)[2][3]. Untrustworthy programs, as static entities, are commonly modeled using program signatures as well. Unlike the case for trustworthy programs where MAC is generally used as the signatures, signatures of untrustworthy programs are often obtained by selecting unusual and short code sequences within the program. For example, this technique is widely used in anti-virus software, in which a large set of such signatures for

various computer viruses is collected, and updated periodically with signatures for newly-found viruses[4]. However it is significantly more difficult to identify execution of a program due to dynamic and non-deterministic natures of program execution, because a program must be executed in order to observe its dynamic behaviors, and because executing the program more than once may observe different behaviors of the program. De Pauw, et al. presented a model of object-oriented program execution, focusing on visualization of object-oriented program execution [5][6]. This model was constructed at the level of messages in object creation, deletion, method invocation, method return, and so on. Just like De Pauw s model, most program models and systems [7][8] were designed for traditional purposes such as visualization, debugging, optimization, and so on. Using program execution models for improvement of software reliability and assurance is a more recent effort, with which our work shares the same objective. Particularly, the executable specification project at the Microsoft Research tries to verify program execution conformance under a formal specification framework. However specifications under this framework are articulated statically while our system relies upon experimental results through software testing. 3. Program Execution Model 3.1 Program Execution Event Specification We shall use the term program execution events, or PEEs, to refer to all security-related behaviors that can be formally described in program execution. Examples of PEEs include file reading and network connection accessing. Liu s model is constructed upon PEEs and supports procedures for both program execution identification and program execution verification by matching PEEs against some known results. 3.2 Identification and Verification of Program Execution Program execution identification allows execution of a program to be identified based on its execution events, by comparing the events with some well-known patterns. Program execution verification allows execution of a program to be verified to conform to some of its well-known behaviors. Both of the two procedures are feasible only within the framework of a program execution model. 3.3 Program Execution Model Symbolically, system call sequences and system call strings are all strings on an alphabet formed by the set of system calls in a given platform. Each element in a system call footprint, when defined as a set of system call strings of a fixed length, can be viewed as a vector or a point in a space of n dimensions where n is the fixed length. Thus, a system call footprint is a set of points in the n-dimension space, and Liu s program execution model defines important properties for system call footprints. 4. A New Model of Program Execution 4.1 Previous Model The previous model consists of a representation of PEEs in a mathematical space and defines a number of properties in the representation. Mathematically, the set of system calls in a given platform forms an alphabet A, and system call sequences and system call strings are all strings on A. Because the system call strings are of a fixed length, which is denoted as n, each of the system call strings can be viewed as a vector or a point in a space of n dimensions. We can thus model a SCF as a set of points in the n-dimension space. This geometric representation of SCFs allows us to define a number of important operations and properties: Given a SCF S, we define the size of S, denoted as S, as the number of points in S. Obviously S depends on n: the size of a SCF is generally larger when it is represented in a higher dimension space. The relative sizes of programs SCFs generally represent the relative complexities of the programs execution. Given two SCFs S 1 and S 2, we say that S 1 conforms to S 2 if and only if S 1 is a subset of S 2, i.e. S 1 S2. Particularly, we say that S 1 equals to S 2, denoted as S 1 = S 2, if S 1 conforms to S 2 and S 2 conforms to S 1. Given two SCFs S 1 and S 2, we define the difference between S 1 and S 2, denoted as S 1 - S 2 and S 2 - S 1 equivalently, as the number of points that are in one system call footprint but not in the other, i.e. S 1 S 2 S 1 S 2. S 1 - S 2 depends on n: the distance is generally larger when represented in a higher dimension space. Given two SCFs S 1 and S 2, we define the distance of S 1 and S 2, denoted as S 1 S 2 and S 2 S 1 equivalently, as the difference of S 1 and S 2 in percentage of the total size of S 1 and S 2, i.e.,

S1- S2 S1 S2. This, in the range of [0,1], represents a relative distance measurement of SCFs, which is largely independent of n, and thus generally preferred over the difference measurement. Given two SCFs S 1 and S 2, we define the similarity of S 1 and S 2, denoted as S 1 S 2 and S 2 S 1 equivalently, as the number of points shared by both S 1 and S 2 in percentage of the total size of S 1 and S 2, i.e., S1 S1 S2 S2. This, also in the range of [0,1], represents a relative similarity measurement of SCFs, which is largely independent of n. Please note that the distance and the similarity of two SCFs are complement with each other, i.e., S 1 S 2 + S 1 S 2 = 1. 4.2 Our New Model In our current model, every element in SCF has its appearance frequency, which is not used. We believe that the frequency factor of SCF is important in helping improve accuracy of program execution identification. We thus define the similarity of two SCFs with their frequency factor as follows: Given two SCFs S 1 = { s 11, s 12, s 1i, } and S 2 = { s 21, s 22, s 2i, }, and their respective frequency factor denoted as F 1 = {f 11, f 12, f 1i, } and F 2 = { f 21, f 22, f 2i, }. We define the following properties: Frequency intersection: min(f1i, f2i ) F( S 1 S 2 ) = (S1 S 2) max(f1i, f2i) This means that its minimum frequency divides its maximum frequency of the same element in S 1 and S 2, and then we get their sum. Frequency union: F( S 1 S 2 ) =F( S 1 S 2 ) + S1 S 2 This means frequency intersection added the number of different element in S 1 and S 2. Similarity of S 1 and S 2 : S 1 S 2 = F( S1 F( S1 S2 ) S2 ) This means their frequency intersection divides their frequency union, and then we get its evolution. The action of evolution magnifies the comparative difference of the result, which makes the result more clear. We can see the effect from experimental results in section 7. 5. Program Identification on the Microsoft Windows Platforms We know that many functions are called when a program is executed. Some of these functions are defined by users, and the others are supplied by the operating system. For instance, kernel32.dll on the Microsoft Windows platform provides many system functions that are absolutely necessary for a program. When a program starts to run, all system DLLs and DLLs defined by users are first loaded. Figure 1 shows a logic view of program execution on the Windows platforms. Function A Function B System DLL Program Function X Function Y User DLL Figure 1 Logic view of program execution on the Windows platforms As application programmers, we do not touch system DLLs. However, we need to modify them when we want to make certain customized enhancements for the system. Another benefit of this is enforcement in the sense that user applications cannot bypass the customized enhancements. This technique is commonly known as DLL Replacement, which will be discussed briefly next [9]. If we intercept system API resided in system-supplied DLLs, we can surely insert our codes to add functionality such as logging system API calls for the purpose of program execution identification. API hook on program executable files is a feasible way to do so, but it is of less efficiency, because it must modify the import table of each program. Another way is called

Detours, which is a binary interception of win32 functions. This shares the same problem as the API hook approach. We prefer to use the DLL replacement technology to resolve the above issue, by replacing interesting system DLLs with those defined by ourselves. We instruct programs to load our DLLs first, which may invoke the original system DLLs if necessary. In DLL replacement, programs call our function first, and then jump to previous function for execution. Thus, we are able to insert code to log a system API call when a program called it. In other words, we can thus trace all API calls. Figure 3: SCFs of pinball 6. System Implementation Our program execution identification system first provides the ability to log system API calls. We implement system API logging procedures as the following: (1) Define our own versions of kernel32.dll and ws2_32.dll to intercept interesting system API calls. (2) Modify the OS regedit for proper DLL loading. (3) Rename the DLLs in the system32 directory to work with applications. Figure 4: SCFs of winmine 7 Experimental Results We have used the implemented system to produce SCFs and to conduct PEI upon a computer running the Microsoft Windows XP, and this section shows our results. We use the following three programs in our experiments: the media player, the 3D pinball program, and the winmine program, all of which are supplied by the Windows XP. We have executed each program for ten times, got their Individual SCFs (ISCFs), and then assembled them collectively as their Full SCFs (FSCFs), as discussed in [1]. We show their SCFs in a 2-dimension space. Figure 2: SCFs of media player Figure 2, 3 and 4 show the SCFs for the three programs. We can note from these figures that executing the same application produces typical similar SCFs. In addition, we also show API logs in the form of histograms. Figure 5 and 6 are two respective histograms of the media player. Figure 7 and 8 are those of the pinball. And Figure 9 and 10 the winmine. We can see that the different programs produce their traces distinct form each other, while the traces of the same program generally conform to each other. Figure 5: Histogram of media player-1

Figure 6: Histogram of media player-2 Figure 10: Histogram of winmine-2 Figure 7: Histogram of pinball-1 Figure 8: Histogram of pinball-2 Figure 9: Histogram of winmine-1 After SCFs are produced, we performed the PEI procedure on each of ISCFs against the three FSCFs. Table 1 shows similarity measurements based upon Liu s similarity algorithm. In comparison, Table 2 shows different similarity measurements based on our new algorithm with frequency factor. We can clearly see from the tables that our new algorithm helps distinguish executions of different programs more clearly and accurately. Table 1: Similarity (n = 2) based upon Liu s algorithm pinball winmine wmplayer Pinball0 0.98 0.40 0.34 Pinball1 0.95 0.38 0.33 Pinball2 0.90 0.39 0.32 Winmine0 0.40 1.00 0.18 Winmine1 0.32 1.00 0.14 Winmine2 0.40 0.80 0.14 wmplayer0 0.44 0.22 0.81 wmplayer1 0.43 0.23 0.76 wmplayer2 0.42 0.20 0.80 Table 2: Similarity (n = 2) based upon our new algorithm with frequency factor pinball winmine wmplayer Pinball0 0.97 0.08 0.04 Pinball1 0.96 0.11 0.05 Pinball2 0.95 0.07 0.04 winmine0 0.07 1.00 0.01 winmine1 0.05 0.94 0.02 winmine2 0.06 0.85 0.01 wmplayer0 0.05 0.02 0.82 wmplayer1 0.04 0.02 0.78 wmplayer2 0.05 0.01 0.70

It can be seen form the tables that we can get the similarity of programs with previous algorithm, and that the similarity between SCFs is notable in the same program but less in different ones. Anyway, the difference said above is not that obvious. For instance, the minimum similarity of wmplayer between ISCF and FSCF is 0.76, while the maximum similarity is 0.34 between wmplayer and pinball, and that figure is 0.18 between wmplayer and winmine. As a result, there will be confusions in identifying execution programs in some circumstances. However, the distinction between the similarities is more clear once the new algorithm is adopted, since frequency factor is of great importance to the SCFs of programs. We can see that the similarity between SCFs is significant in the same program but less in different ones. For instance, the minimum similarity of wmplayer between ISCF and FSCF is 0.7, while the maximum similarity is 0.05 between wmplayer and pinball, and that figure is 0.02 between wmplayer and winmine. It is so clear that we can easily identify execution programs. Thus, we can get the rough identification measurement as follows: Pnball is 0.90, Winmine is 0.80, and Wmplayer is 0.70. 8 Discussions and Conclusions In this program execution identification system, we have successfully replaced more than 100 system DLL s functions. With the replacements installed, the system runs on the rails, with many programs running without much difference from an ordinary Windows system, except for some slowness in performance, which is barely noticeable without prior knowledge. As an example of the effectiveness of this system, it produces a full system API trace of a well-known computer virus when the system was accidentally affected in testing. In summary, this system is a success. However, certain issues still need to be resolved before making a produce, which is our ultimate goal. One of the key issues is that the Windows operating system checks the signature of each system DLL file every time the system boots. To deal with this issue without acquiring the signature of our own versions, we have to put the original DLLs back when shutting down the system, which will make things more complicated. Another critical issue is how to construct better SCFs. We currently use substrings of length 2 as definition of SCFs, and this definition seems reasonably accurate but not yet perfect. We want to further improve it by find a better approach to characterize program execution, and thus making PEI and PEV more accurate. In conclusion, SCF provides a compact and convenient approach to formally specify security-related execution behaviors of programs. Our work as has been described in this paper has two key contributions: On the one hand, we have successfully implemented PEI on the Microsoft Windows platforms using the technique of DLL replacement. On the other hand, we have developed an enhanced model of program execution, in which frequency factor is used in SCFs for formal specification of program execution events. References [1] Z. Liu, A Model of Program Execution for Security Purposes, in Proceedings of the 8 th IASTED International Conference on Software Engineering and Applications, Cambridge, MA, November, 2004. [2] G. C. Necula and P. Lee, Safe Kernel Extensions Without Run-Time Checking. OSDI'96 [3] G. C. Necula, Proof-Carrying Code. POPL'97. [4] J. O. Kephart and S. R. White, Measuringand Modeling Computer Virus Prevalence. Proceedings of the 1993 IEEE Computer Society Symposium on Research in Security and Privacy. Oakland, California, May 24-26, 1993, 2-15. [5] W. De Pauw, R. Helm, D. Kimelman, and J. Vlissides, Visualizing the behavior of object-oriented systems. In Object-Oriented Programming Systems, Languages, and Applications Conference, pages 326 337, 1993. [6] W. De Pauw, D. Kimelman, and J. Vlissides, Modeling Object-Oriented Program Execution, ECOOP 94 Conference, [7] R. Snodgrass, A relational approach to monitoring complex systems. ACM Transactions on Computer Systems, 6(2): May 1988, 157 196 [8] J. Domingue, Compressing and comparing metric execution spaces. In INTERACT 90, pages 997 1002. Elsevier Science Publishers B.V. (North Holland), 1990. [9] J. Richter, Windows Core Programming(Machine Building Press, Beijing China, May 2000).