WDS'06 Proceedings of Contributed Papers, Part I, 146 150, 2006. ISBN 80-86732-84-3 MATFYZPRESS First Steps to Automated Driver Verification via Model Checking T. Matoušek Charles University Prague, Faculty of Mathematics and Physics, Prague, Czech Republic. Abstract. The paper summarizes the current state of our work addressing the verification of Windows kernel drivers via model checking technique. Our goal is to implement a tool that extracts verification models using driver source code and specifications of the kernel environment written in DeSpec language, which we introduced previously. The DeSpec language enables specifying the kernel environment as well as the rules imposed on drivers. The DeSpec Model Extractor tool builds a Zing model capturing those parts of the driver and kernel behavior related to a selected subset of the specification rules. Processing the resulting model in the Zing model checker could reveal the errors in the driver that are commonly difficult to discover via traditional methods of software testing due to the concurrency and complexity of the Windows kernel. Introduction Model Checking The model checking technique [2] is a formal verification method based on thorough examination of a model that emulates the software unit with respect to a verified property. This model should ideally retain those parts of the software that influence the property so that the verification is sound and complete with respect to the property. On the other hand, the model should be much simpler than the original software because the time and space requirements of the verification process grow exponentially with respect to the number of operations, threads, and variables used in the program (the state explosion problem [19]). That is because the model checker explores all possible states of the model to check that the property is valid in each one of them. Verification of Windows Drivers Windows kernel drivers are relatively small libraries mainly written in C language and running in a privileged mode that enables them to work directly with hardware. This introduces a high risk of damaging the other parts of the kernel if the driver contains an error. Hence, the correctness of drivers is crucial for the operating system security and stability and so drivers are common subject of software verification. Microsoft itself has developed several tools that verify drivers correctness. These include the Driver Verifier [13], which tests the drivers at run-time by emulating critical conditions in tight cooperation with the kernel, PREfast [14], which statically analyses the driver s code searching for common erroneous code patterns, and finally Static Driver Verifier (SDV) [16] based on techniques of static analysis and model checking. Zing Modeling Language The target modeling language for our model extractor is the Zing language [18] [1], being developed by Microsoft Research group on the top of the Microsoft.NET Framework platform [11]. This language has been chosen due to a rich modeling functionality it provides and the state of its current development the preview implementation of the model checker is available and works quite well. However, most ideas behind our work are independent of the target model checker and can be applied on any modeling language that provides at least some basic level of abstractions like classes, methods, exceptions, non-deterministic choices, and threads. Another modeling language meeting these criteria should be the new version of Bandera Intermediate Representation (BIR) a modeling language of Bogor model checking framework [23]. Driver Environment Specification Language 146
In our previous work [9] [10], we have introduced a new object-oriented specification language primarily targeting Windows kernel driver environment called DeSpec. It allows writing formal specifications of the kernel API provided to drivers, modeling the kernel s behavior to the drivers, and capturing rules imposed on the drivers in a formal yet still comprehensible form. The language integrates the majority of Zing modeling language features and adds means for defining parameterized abstractions of the kernel functions and structures at varying levels of detail. It enables to map the C language constructs to object-oriented constructs of the Zing language. In this sense, the DeSpec language bridges the gap between the C source code and the Zing model. We have demonstrated [9] the expressiveness and suitability of the DeSpec language on a significant part of the Windows kernel API and many rules described in Driver Development Kit [15] as well as those verified by the Microsoft Static Driver Verifier tool. Driven by DeSpec specifications, the Model Extractor is supposed to generate a Zing model from the driver source codes and kernel header files. The essential part of the DeSpec project is therefore the Specification Repository whose task is to load specifications from DeSpec source files and provide them to the Model Extractor in a convenient form. Contribution In this paper, we summarize the current state of our work addressing the verification of Windows kernel drivers via model checking. The current implementation of the DeSpec Model Extractor is capable of extracting Zing models from C programs using our novel approach to modeling C pointers and arrays. In Section 4, we present this approach and show that it is feasible in practice. Section 2 introduces the Model Extractor s front-end the part of the Model Extractor responsible for transformation of C source codes to the inner representation used in the rest of the tool. Section 3 is summarizing slicing algorithms applied by the Model Extractor on the inner representation prior to the Zing model generation in order to reduce its size. Finally, Section 5 concludes and outlines our future work. Model Extractor Front-end An appropriate front-end that could parse and represent source code of the driver needs to be chosen. The major requirement on the front-end is a support for Microsoft extensions to C language including e.g. structured exception handling commonly used by Windows drivers. The Infrastructure for C Program Analysis and Transformation [20] [22] is a suitable front-end for the extractor as it is able to parse, merge, normalize, and transform C source codes and is capable of both Microsoft and GCC extensions. It converts the source codes to C Intermediate Language (CIL), which is basically a subset of C language replacing complicated constructs with simpler ones that are equivalent. CIL is much easier to analyze since it fairly reduces the number of possible cases the analyzer has to distinguish. For working with projects comprising of multiple source files, which is usually the case, the infrastructure provides the source code merging feature. It is able to merge multiple source codes into a single compilation unit and to remove superfluous type definitions. A single CIL abstract syntax tree then represents the entire program source code. Hence, the tools analyzing the code need not to care about multiple source files. The system is also extensible by custom modules that can operate on the internal CIL representation. A chain of modules can be executed enriching the AST with additional information or computing other structures like e.g. a control flow graph. The process of source code parsing, file merging, AST building, and execution of the extension modules is implemented by the tool called Cilly. The infrastructure is written mainly in OCaml programming language [5] and is currently available for Windows platform using the Cygwin environment. On.NET Framework platform, the majority of OCaml language is implemented by Microsoft Research s F# system [17]. Unfortunately, some of the OCaml language features used by the infrastructure are not currently supported by the F# so it is not possible to run it directly on.net Framework. That is why a workaround is needed. To overcome the platform difference, we have implemented a CIL dump module. It is a simple Cilly extension written in OCaml going through the entire CIL AST and dumping it into a text file. 147
The file is than consumed by the C# utility building the representation resembling the CIL AST in the managed environment of.net Framework. The dump module is placed at the and of the module chain allowing some useful transformations of the CIL AST that are already implemented in OCaml to be performed before dumping the AST. Their results could therefore be loaded by the C# representation builder. When the F# system will be capable of all features used by Cilly sometime in the future, the mediatory text file could be dropped and the dumper could build our representation directly from the Cilly s one. The DeSpec Model Extractor loads the driver s source code representation in 3 phases. Firstly, it runs the driver builder (i.e. build command) from Windows DDK, which is used by driver developers for building drivers. This utility is used to provide full compatibility with the current driver building process. However, some instrumentation to the builder is necessary to get the preprocessed source files instead of the driver binary. One more change is needed to get all the information required for the model extraction into the preprocessed source files. A presence of macros causes a problem when a function the kernel specification is referring to is actually a macro that either renames the function to an internal kernel name or even completely removes function calls and replaces them with the code. If the preprocessor expanded the macro before the CIL AST is build the information about the original function call would be lost. Therefore, such macros have to be removed from the set of preprocessor symbols and replaced with function stubs. The second phase builds CIL AST up by executing Cilly on preprocessed files and dumps it to the text file. In the final phase, the Model Extractor reads the text file and creates the internal C# representation. Slicing There are plenty of operations that need to be performed on the driver s code representation before the generation of the Zing model can take place. Program slicing is one of the most important prerequisites since the resulting model should contain as few code and variables as possible. Otherwise, the resulting model could be infeasible to model check due to its size. At the beginning of the extraction process, the user is expected to choose a set of rules to be verified from the Specification Repository. The Model Extractor should then slice out code and data that are irrelevant to the selected rules. The complexity of program slicing ranges from relatively simple algorithms for slicing sequential code without pointers up to the undecidable problems of slicing programs with unrestricted use of pointers. Slicing methods are covered extensively by [7] and by dozens of other research works. So far, we have implemented intraprocedural pointer-less slicing based on the Program Dependence Graph (PDG) data structure [3]. PDG captures both data and control dependencies among statements and expressions within a function body. Its control dependency sub-graph can be constructed using the Lengauer-Tarjan [8] algorithm and the data dependency sub-graph by the minimal fixed-point algorithm. The PDG can be further extended to the Interprocedural PDG (IPDG) or the threaded PDG (tpdg) for the purpose of interprocedural and concurrent slicing [7]. To extend slicing algorithms to the programs with pointers, some kind of points-to analysis [4] is necessary. Such an analysis discovers sets of aliases for chosen variables. When modeling the function pointers, we also need to discover a set of functions that could be possibly targeted by a specified function pointer variable. The points-to analysis can give us that information. Although it is not always possible to determine the points-to sets precisely, an approximation should be sufficient for the model extraction purpose. The goal of the analysis application is a reduction of the size of the model. Without the analysis, the extractor may conservatively assume that pointers can point to any data and create a larger model incorporating all the possibilities. It is however desirable to make the model as small as possible and hence find an acceptable trade-off between analysis preciseness (and complexity) and the model size. Extracting Zing Models from C Source Code 148
We propose a novel approach to the extraction of verification models from C source code and provide the implementation targeting the Zing model checker. Existing works either focus on Java-like languages (e. g. Bandera [23], Java Path Finder [20]), do not extract the model fully automatically (e.g. SPIN [5]) and/or are very limited on the constructs that can be used in the source code (e.g., SPIN does not support unbounded heap allocation, call stacks nor dynamic thread creation). The major issues of the C program model extraction stem from pointer and array operations. In our work [11], we distinguish four kinds of pointers depending on the kind of memory and the possible number of items they are pointing to. Although this differentiation leads to more complicated dereferencing operations, it minimizes the state space of the model. Due to the atomicity of the dereferencing operations, the complexity increase does not influence the resulting model size. Each pointer is represented by a pair <target, offset>, where target is a reference to the Zing object representing the value the pointer points to or the Zing array storing multiple values if the pointer points to (or can point to) a sequence of values. In the latter case, the offset is the index to the array. If the pointer target is allocated dynamically in the C language the target does not directly refer to the value the pointer points to. Instead, it refers to an instance of Memory class that represents the allocated memory and holds the value the pointer points to. We proved that our approach is feasible in practice by verifying correctness of the C implementation of a synchronized priority queue represented by a singly linked list. The C source code has around 110 lines and the entire generated Zing model about 900 lines. All tests were performed on 1.4GHz/1GB machine. Deliberately introduced race conditions to the implementation were discovered by the model extractor within a few seconds. The correct implementation running 3 producers each inserting 3 items to the queue passed the verification in about 30 minutes. We also observed that the number of threads has much greater impact than the number of items inserted to the queue, which is positive as the race conditions are usually revealed even for a small number of threads. Conclusion In our previous work, we have introduced a new specification language targeting Windows kernel environment called DeSpec. The language is designed to enable writing modular, readable, and wellarranged specifications of the Windows kernel driver environment as well as formally, yet still comprehensibly, capture the rules imposed on drivers by the kernel and documented in plain English in DDK. Consecutively we started to implement the Model Extractor tool, which should be eventually used for an extraction of a Zing model from the source codes of the driver, kernel header files, and the DeSpec specifications of the driver environment. The Model Extractor uses the CIL infrastructure for building an internal representation of the driver s source code and the DeSpec Specification Repository for managing the specifications. We have already implemented the front-end of the Repository that parses DeSpec files and builds appropriate representation in a form of abstract syntax tree. Further work will include implementation of the specification analyzer that would check the consistency of the specifications and perform the transformations that are required before they can be provided to the Model Extractor. To get information about the driver source code that is necessary for the model generation, we implement various C code static analyses. The results of these analyses also allow us to reduce the resulting model and so target the state explosion problem. So far, we have implemented Lengauer- Tarjan algorithm for building Program Dependency Graph and used this data structure for intraprocedural slicing without presence of procedure calls and pointers. We will enhance slicing capabilities of the extractor by interprocedural and concurrent slicing and points-to analysis in our future work. We also implement the component of the Model Extractor tool that automatically generates a Zing model from the source code of the program. We have proposed a novel approach to modeling various constructs of the C language that do not map to the Zing modeling language straightforwardly (i.e. pointers, arrays, etc.) and we have shown on several examples that the verification of the extracted model is feasible in practice. Our future work in this area will focus on improvements to the Model Extractor making the generated models more compact. 149
References [1] Andrews, T., Qadeer, S., Rajamani, S. K., Rehof, J., Xie, Y: Zing: A model checker for concurrent software, Technical report, Microsoft Research, 2004. [2] Clarke, E. M., Grumberg, O., Peled, D. A.: Model Checking, MIT Press, 2000. [3] Ferrante, J., Ottenstein, K. J., Warren, J. D.: The Program Dependence Graph and Its Use in Optimization, ACM Transactions on Programming Languages and Systems, Vol. 9, No. 3, July 1987, Pages 319-349. [4] Hind, M.: Pointer analysis: Haven t we solved this problem yet? In 2001 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE 01), 2001. [5] Holzmann, G. J.: The SPIN Model Checker: Primer and Reference Manual, Addison-Wesley Professional, 2003. [6] INRIA: The OCaml Language, http://caml.inria.fr [7] Krinke, J.: Advanced Slicing of Sequential and Concurrent Programs, PhD thesis, Fakultät Für Mathematik und Informatik, Universität Passau, 2003. [8] Lengauer, T., Tarjan, R.E.: A Fast Algorithm for Finding Dominators in a Flow Graph, ACM Transactions on Programming Languages and Systems, 1:121-141, 1979. [9] Matousek, T.: Model of the Windows Driver Environment, Master Thesis at Department of Software Engineering, Charles University in Prague, 2005. [10] Matousek, T., Jezek, P.: DeSpec: Modeling the Windows Driver Environment [11] Matousek, T., Zavoral F.: Extracting Zing Models from C Source Code [12] Microsoft:.NET Framework, MSDN, http://msdn.microsoft.com/netframework [13] Microsoft: Driver Verifier, http://www.microsoft.com/whdc/devtools/tools/drvverifier.mspx [14] Microsoft: PREfast, http://www.microsoft.com/whdc/devtools/tools/prefast.mspx [15] Microsoft: Windows Driver Development Kit, WHDC, http://www.microsoft.com/whdc/devtools/ddk/default.mspx [16] Microsoft: Static Driver Verifier: Finding Driver Bugs at Compile-Time, WHDC, http://www.microsoft.com/whdc/devtools/tools/sdv.mspx [17] Microsoft Research: F#, http://research.microsoft.com/projects/ilx/fsharp.aspx [18] Microsoft Research: Zing Model Checker, http://research.microsoft.com/zing [19] McMillan, K. L.: Symbolic model checking an approach to the state explosion problem, PhD thesis, SCS, Carnegie Mellon University, 1992. [20] NASA Intelligent Systems Division: Java Path Finder, http://ase.arc.nasa.gov/havelund/jpf.html [21] Necula, G. C., McPeak, S., Rahul, S. P., Weimer, W.: CIL: Intermediate Language for Analysis and Transformation of C Programs, Proceedings of Conference on Compiler Construction, 2002. [22] Necula, G. C., McPeak, S., Weimer, W., Liblit B., Harren, M.: CIL: Infrastructure for C Program Analysis and Transformation, http://manju.cs.berkeley.edu/cil [23] Robby, Dwyer, M. B., Hatcliff, J.: Bogor: An Extensible and Highly Modular Software Model Checking Framework, SIGSOFT Software Engineering Notes 28, 5, 267-276, 2003 150