Portable Software Fault Isolation

Size: px
Start display at page:

Download "Portable Software Fault Isolation"

Transcription

1 Portable Software Fault Isolation Joshua A. Kroll Computer Science Department Princeton University Princeton, NJ Gordon Stewart Computer Science Department Princeton University Princeton, NJ Andrew W. Appel Computer Science Department Princeton University Princeton, NJ Abstract We present a new technique for architecture portable software fault isolation (SFI), together with a prototype implementation in the Coq proof assistant. Unlike traditional SFI, which relies on analysis of assembly-level programs, we analyze and rewrite programs in a compiler intermediate language, the Cminor language of the CompCert C compiler. But like traditional SFI, the compiler remains outside of the trusted computing base. By composing our program transformer with the verified back-end of CompCert and leveraging CompCert s formally proved preservation of the behavior of safe programs, we can obtain binary modules that satisfy the SFI memory safety policy for any of CompCert s supported architectures (currently: PowerPC, ARM, and x86-32). This allows the same SFI analysis to be used across multiple architectures, greatly simplifying the most difficult part of deploying trustworthy SFI systems. I. INTRODUCTION Recently, there has been significant interest in systems that use Software Fault Isolation (SFI) to couple near native performance with strong isolation. SFI is a memory safety technique applicable to arbitrary programs and first proposed by Wahbe et al. [1]. SFI limits isolated code modules to only a portion of their address space, making it possible to load untrusted modules in the same address space as trusted code. SFI isolates programs by rewriting them to enforce policy (and verifying that binaries have a specific safe form), rather than proving safety either of the program statically or of particular executions. For example, SFI has been used to provide high performance, securely sandboxed client side web applications [2]; to provide integrity for kernel modules [3], [4]; and to extend the Java Security Manager across the Java Native Interface [5]. SFI is a natural choice, given its demonstrated low overhead [6], minimal programmer annotation load and language independence [7], and minimal trusted computing base (TCB) [8]. The original SFI system was designed to support safe (i.e. non-crashing) extensional functionality for a database engine [1]. However, SFI techniques are very tightly bound to the architecture to which they are applied, and so these applications suffer from a lack of portability: SFI modules must be specially compiled and verified for each architecture on which they are to be deployed [1]. And this means that the trusted SFI verifier must be completely rewritten for every targeted architecture [9], adding to the size of the TCB. Further, while the trusted verifier is generally quite small, its interaction with the architecture is subtle and has led to security-critical bugs even in carefully evaluated systems [10]. We propose a different approach, which simultaneously improves security and portability: instead of doing SFI translation and verification at the level of assembly code, we perform SFI on a compiler intermediate language, Cminor, rewriting programs to guarantee safety and policy compliance. The invariant we establish by rewriting the code allows us to pass the rewritten program through the back end of CompCert [11], [12], a certified-correct compiler, to obtain a safe, policy-compliant executable binary, leveraging the safety- and semantics-preserving properties of CompCert. Because this technique allows us to perform our security analysis at a level which is architecture-independent, the resulting analysis is portable to any of the architectures targeted by CompCert, currently PowerPC, ARM, and x86. We prove that our method is correct, producing only securitycompliant programs, by building a prototype implementation in Gallina, the specification language of the automated proof assistant Coq [13], [14], and proving that it implements our security policy soundly. In turn, we use Coq s extraction facility to turn our Gallina functions into OCaml code to compile and isolate real programs. Our system provides for a small trusted computing base: we have no need for a separate trusted SFI verifier, as in traditional SFI [1], [8]. Instead, we can guarantee that the output of our transformation is safe with respect to the Cminor operational semantics and thus that CompCert will produce assembly language that has the same (safe) observable behavior. While there has been previous work on formally verified SFI systems (e.g. [15] [19]), in each case the SFI and its formal verification are tightly coupled to the choice of instruction-set architecture. Specifically, our analysis takes in any program in our chosen intermediate language Cminor, and produces a transformed program that is safe in the sense that it is not

2 stuck 1 with respect to the Cminor operational semantics provided by CompCert. The transformed program satisfies two properties: Program safety/sfi The transformed program is guaranteed to execute safely with respect to the standard SFI memory safety policy: no reads or writes are done outside predefined, pre-allocated regions. Correctness Every safe execution of the input program corresponds exactly (i.e. one-to-one) to a single execution of the transformed program with the same observable behavior. Programs which are not safe may be transformed to have arbitrary (but safe) behavior. This correspondence property follows easily by inspection for each of our portable SFI transformations but has not yet been proved in Coq. This approach differs from classical static analysis: we establish the dynamic safety of any execution of any transformed program. This is the defining feature of SFI: any program can be transformed into a safe program even without determining whether the input program was in fact safe to begin with. Because CompCert is guaranteed to compile safe programs correctly, this scheme provides the same model of assurance as SFI but retains architecture portability and offers a small trusted computing base. What is particularly notable about this analysis is that it is possible to isolate the security properties we need at the level of an intermediate representation. Previous SFI systems have been described in terms of their action on concrete addresses and have relied for correctness on the ability of the SFI compiler and verifier to perform arithmetic on concrete representations of addresses as numbers (for example, by bitwise and and or). However, an optimizing compiler must treat addresses abstractly in order to perform program transformations such as spilling/reloading and stack frame allocation that rearrange a program s memory layout. Indeed, the abstract nature of the CompCert memory model (described in Section III) is central to CompCert s proof of correctness. Thus, we cannot rely on numerical properties of addresses in our reasoning. One of our major contributions, then, is a demonstration of how to address this gap: we present an abstract SFI enforcement mechanism and demonstrate how to make that mechanism concrete while preserving its essential properties. In order for our analysis to be valid, we must carefully examine how assumptions 1 The operational semantics of a programming language define the rules for transitions between states of a model abstract machine. If, at some state, the execution of the program would cause the machine to transition according to these rules to another, well defined state, we say that the program takes a step. A program which can always take a step (or which is safely halted) is said to be safe. That is, executing the program from its initial state never yields a machine state that is incompatible with all rules in the operational semantics. If a program reaches a state where it cannot take a step, we say it is stuck. Note that this does not mean that the program stops executing - it just means our model no longer explains the execution behavior. in the abstract CompCert memory model can be realized faithfully in the memory of a real machine. This is described in more detail in Section VI. An alternative proposal to address the portable distribution of code meant for an SFI system is to distribute bytecode for an intermediate language such as LLVM (as with Portable Native Client [20]). The bytecode can be compiled by an untrusted compiler back-end on the client and then verified by an SFI verifier for that client s architecture. This approach solves operational concerns surrounding application portability (e.g. that applications which have to be distributed on a per-architecture basis will become fragmented between platforms over time). But it does not address the problem of needing to re-engineer and re-verify the trusted computing base for every architecture. Additionally, such an approach requires exposing a compiler back-end, software typically developed to run in a friendly environment, to arbitrary input. Modern optimizing compiler back-ends are large and complicated artifacts not typically built under the assumption that input may be chosen maliciously to cause unintended side-effects. One recent study of widely used compilers found several severe bugs including incorrect code generation and crashes [21]. CompCert was the only compiler that did not exhibit bugs when subjected to arbitrary input; no issues were found in CompCert s proved-correct components. Because our analysis accepts arbitrary abstract syntax in one of CompCert s intermediate languages, it is no more vulnerable to bugs caused by malformed input than CompCert as a whole. Cminor and LLVM are in fact quite similar intermediate representations. Both are sufficiently high-level to be architecture independent. Both are sufficiently low-level that a variety of source languages can be compiled to either one. Armed with the system we present here, it would be possible to build a drop-in replacement for Portable NaCl [20] just by building a verified translation from LLVM to Cminor. Indeed, there have been recent efforts to formalize the semantics of LLVM in Coq [22], making this goal all the more realistic. The main contributions of this work are: A method for software fault isolation (SFI) that is portable across multiple architectures. A prototype implementation of our method together with a mechanized specification and a machine-checked proof that guarantees the prototype s soundness for memory operations at the Cminor level. We also describe how to prove additionally that our prototype does not change the behavior of already-safe programs. We have not yet completed a machine-checked proof of end-to-end security of assembly language programs produced by composing our prototype with CompCert. We begin by discussing traditional SFI methods in Section II. We discuss our approach as a counterpoint in Section III. In Section IV, we describe our transformation

3 in detail. In Section V, we show how to prove the transformation correct and how our specification ties in to the correctness theorem in CompCert to provide both safety and isolation. In Section VI, we describe our model of memory and how we relate our abstract address analysis to a flattened address space. In Section VII, we describe early benchmarks of programs compiled with our techniques and expand on our early experiences with the system. Finally, Section IX presents related ideas from the literature and Section X contains our conclusions. II. TRADITIONAL SFI Here, we explain the goals and operation of classical SFI methods, which typically have three essential components: a rewriter, a verifier, and a runtime system. In traditional SFI, sandboxed assembly language is rewritten from the originally generated assembly language [8] or generated by a specialized compiler [1], [2]. Code is then passed through a trusted assembler 2 to create a sandboxed binary in machine language. A trusted verifier and runtime system ingest this binary, disassemble it, and verify that it satisfies certain structural invariants, which imply a twopronged policy over dynamic executions: Reliable Execution Only machine instructions that have been disassembled and analyzed will ever be executed. All of these instructions are found in memory in a region of addresses bounded by [code lo, code hi). Memory Safety All writes to memory (and in some implementations also reads from memory) will occur only to the region bounded by the addresses [data lo, data hi). The code and data bounds must be chosen at compile-time and must be known to the compiler and verifier. Typically, the SFI region has a base address that, taken as a number, is evenly divisible by some power of two and also has a size equal to the same power of two, allowing extremely efficient computation of in-bounds addresses using bit-level arithmetic [8]. 3 We will assume these conditions on the SFI regions for the rest of this paper. The goal of the SFI code generator takes arbitrary programs and make them verifiably policy-compliant. The verifier uses the special structure of sandboxed binaries to trade soundness for completeness; any program accepted by the verifier is guaranteed to be policy-compliant, but the verifier 2 The assembler need not be trusted, but at least one of the assembler or the disassembler used by the verifier must be. For simplicity, for the rest of the paper, we will refer to the problem of converting (in either direction) between assembly and machine code in a trustworthy way as the problem of requiring a trusted assembler. 3 These details are inessential but are common features of prior implementations [1], [2], [7] [9]. For example, the dynamic enforcement of pointer values could be with respect to any bounds, not just power-of-two aligned bounds. And rather than explicitly modifying bad pointer values, the inlined reference monitor code could simply redirect control to a trusted error handler. will not accept all policy-compliant programs. Conversely, programs which go wrong by violating the SFI policy will be rewritten to have different (safe) behavior or else will be rejected by the verifier. The SFI rewriter must check static references to memory to ensure that they are in-bounds and reject or modify unsafe programs. But many memory accesses occur through dynamic constructs such as fixed offsets from unknown pointers. It is not possible to know statically whether all such accesses will be policy-compliant. For these, including all memory accesses through a CPU register, the rewriter inserts sandboxing instructions: extra CPU instructions which modify the address held in the register or pointer, fixing the high order bits of the address to a specific value, the SFI mask. Note that if the address was in-bounds, the sandboxing instructions have no effect. If the address was out-of-bounds, a new, possibly unknown (but definitely in-bounds) address is substituted. In this way, the rewriter need not determine whether a particular memory access is safe; all sandboxed accesses are safe by construction. A program with arbitrary control flow could try to jump into the middle of an instruction as decoded by the verifier, yielding a different stream of instructions than that checked for safety. Thus, all control transfers must target a special set of aligned addresses. The difficult case here are dynamic control transfers, such as indirect jumps, calls via function pointers and returns to addresses held on the stack, which are handled like dynamic memory accesses: by the insertion of sandboxing instructions before an indirect call, jump, or return. Finally, it is important that the sandboxing checks strictly dominate (in the control flow graph) the operations they are meant to protect. Otherwise, a program could prepare an invalid value in a particular CPU register and then jump to an operation in which that unsafe value would be used. This is an under-appreciated, subtle aspect of the structural invariants imposed by an SFI rewriter: it is mentioned briefly in the explanation of verifier invariants in Pittsfield [8] but is missing from the provided implementation; Native Client [2] does not describe these invariants, but does implement them correctly. The fact that this observation was deficient from the discussion and implementation of SFI systems until recently underscores the need for formal verification. Later work (e.g. XFI [23]) exploits dominator analysis of the control flow graph to avoid redundant checks and reduce overhead. SFI is typically implemented as a translation validation: the rewriter need not be part of the TCB. Instead, trust is generally placed in a separate verifier that operates on rewritten code. However, in some systems (including ours) no verifier is necessary; the SFI code generator generates policy-compliant code. In any case, as discussed below, SFI requires a trusted runtime (at minimum, to load the isolated code correctly).

4 The structural invariants described above can be checked by a verifier that is just a finite-state machine. Typically, the verifier operates at load time and ensures that the loaded binary meets the above requirements. The verifier can be implemented very simply in a few hundred lines of code. For such an SFI system to be secure, only the verifier and the runtime system (including the loader) need to be trusted. A small TCB is one of the things that makes SFI attractive as a mechanism for module isolation. Compared with techniques such as full-scale reference monitors [24, Chapter 8], interpreted languages [25], and process-level hardware-enforced isolation [26], SFI requires a much smaller TCB. Still, even in carefully hardened systems, mistakes in the verifier do occur; modern architectures are complicated and supporting a large subset of the instructions in any ISA leads to complexity that is difficult to manage successfully. 4 Finally, most SFI implementations include runtime systems that provide services such as inter-module communications and RPC, limited access to system calls, or calls into trusted, unsandboxed libraries. These runtime systems are also necessarily part of the trusted computing base. We will not discuss such elements, as they are highly variable. III. PORTABLE SFI Traditional SFI requires multiple trusted verifiers, one for each target architecture. Portable SFI instead defines a single SFI analysis, at the level of an architecture-independent compiler intermediate language, CompCert s Cminor, and relies on the compiler s correctness theorem to prove security of the generated assembly programs. At the Cminor level, CompCert treats memory abstractly, not as a collection of addressable bytes. This is highly desirable: CompCert aims to be an optimizing compiler, and many of the phases after ours will apply optimizations that change the memory layout of the program significantly. An abstract representation of memory is necessary to facilitate the reasoning in these later phases. Thus, memory locations are described by a block number and an offset. While infinitely many blocks may be created dynamically, each block has a finite size assigned to it when it is allocated. Our strategy is to treat the entire SFI data region as a single block and to map all potentially unsafe accesses to memory (all accesses through pointers visible at the Cminor level) into accesses only into this block. As in traditional SFI, we aim not to check that this property holds of particular pointers but rather to ensure this property through the use of special mask operations, as we describe below. A. Overview Figure 1 gives a high-level overview of the two major phases of the portable SFI (PSFI) process. In phase one 4 See [27] for a discussion of one such bug in Native Client. Source Program Cminor extern Global Variables PSFI Cminor Assembly Frontend Call Stack Blocks (one per frame) Stack Frame Stack Frame Stack Frame Call Stack Blocks (one per frame - zero width) Activation Records (one per frame, spilled temporaries, compiler managed) Executable Machine Code Low Pages Unmapped Heap Blocks Generated by malloc Host Allocated Memory Host Stack PSFI Transformation (Phase 1) SFI Block extern Global Variables Shadow Stack Heap Blocks (Generated by SFI-aware malloc) Compilation with CompCert (Phase 2) Assembly SFI Block SFI Region SFI Region SFI Region SFI Block SFI Spill Stack extern Global Variables Shadow Stack Heap Blocks (Generated by SFI-aware malloc) Machine Memory Figure 1. Our Portable SFI system, including a diagram of the abstract memory layout before SFI, after the SFI transformation, after both the SFI transformation and compilation with the CompCert back end, and finally after assembly in a one-dimensional, finite machine memory. (labeled PSFI transformation in the figure), the PSFI compiler rewrites the input Cminor program to a new Cminor program in which all loads and stores are to the safe SFI region (as fixed at load time). The SFI transformations, which we detail in Section IV-E, include operations like relocating address-taken function local variables and mallocs to the SFI region. (Locals which are not address-taken are treated as temporaries in Cminor, and therefore are independent from memory.) In the figure, we depict the stack reallocation operation as the movement of the stack frames on the lefthand side of the Cminor box to a portion of the SFI region we call the shadow stack on the right-hand side of the box labeled PSFI Cminor. In a similar way, we relocate heap blocks generated by calls to malloc in the original Cminor program to heap blocks in the SFI region. This is accomplished rewriting the original program to call an SFIaware version of malloc. In phase two (labeled Compilation with CompCert in the figure), PSFI compiles the rewritten Cminor program to PowerPC, x86, or ARM using stock CompCert. CompCert s safety and semantics preservation theorems make it possible to show that the Cminor programs produced by stage one result in compiled assembly programs satisfying the following

5 SFI security invariant: Definition (SFI Security). The output assembly program loads and stores only in the allowed SFI region, as fixed at load time, or in spill locations in stack frames introduced by CompCert (containing, e.g., spilled temporaries and function return addresses). Informally, SFI security follows from the following four observations: 1) Stage one produces safe Cminor programs. 2) Cminor programs rewritten by stage one do not allocate memory outside the SFI region: PSFI transforms calls to malloc in the source program to mallocs within the SFI region. Address-taken local variables are relocated to the SFI region as well. Cminor local variables that are not addresstaken are modeled in a separate temporaries environment independent from memory. 3) In the CompCert languages, there are two ways to allocate memory: Calling observable external functions such as malloc or pushing activation records onto the stack. (CompCert does not consider the latter an observable event.) 4) CompCert s semantics preservation theorem means the compiler does not introduce observable calls to external functions such as malloc that were not present in the source program. Thus by 2) and 3), the target program at most allocates memory for activation records or at external calls to the SFI-aware malloc. By 1) and CompCert s safety preservation theorem, we get that the target program could not have read or written to locations outside the SFI region or stack frames introduced by the compiler, since otherwise it would have become stuck. B. Advantages of PSFI over traditional approaches By piggy-backing on CompCert s compiler correctness theorem, we avoid the need to define and trust an SFI verifier for each target architecture. This is a significant gain; the RockSalt project [17] demonstrated that removing the verifier for even a single target architecture like x86 from the TCB of an SFI tool requires significant effort, requiring about 15,000 lines of machine checked specification and proof. One must first define semantics for a realistic subset of x86 (enough to serve as a target for multiple compilers and for hand-written assembly), then verify the verifier with respect to this semantics. PSFI sidesteps the proliferating verifiers problem by defining a proved-correct SFI analysis, then compiling with a proved-correct compiler. Because our SFI compiler produces verified assembly code, no separate verification step is required. In effect, we have used the certified compiler CompCert to build a certifying (in the sense of proof carrying code [28] 5 ) SFI compiler. The resulting program can be assembled and 5 The certificate in this case would be the correctness proof of our SFI transformation, coupled with the correctness proof of CompCert itself. executed without a heavyweight runtime because its safety is certified by the compilation process. Performing the security analysis at a higher level in the compiler allows for the easy analysis of more interesting and complicated properties than the standard SFI invariants. Further, the optimizing phases of the compiler have an opportunity to act on the sandboxed code rather than the sandboxing transformation acting on the optimized code. This has the potential to make the sandboxed code more efficient (for example, redundant or unnecessary sandboxing operations can potentially be eliminated) and also makes the sandboxing transformation simpler (since it does not need to know about idioms generated by particular optimizations). Indeed, the fact that we do not need to worry about the security of most types of indirect control flow (with the exception of calls through function pointers) significantly reduces the implementation complexity of our system relative to competing approaches. In the next section, we describe the SFI transformation and the parts of CompCert on which it relies: the intermediate language Cminor, the CompCert memory model, and the certified back end of CompCert that ties the system together. IV. TECHNICAL APPROACH A. CompCert and Compiler Correctness CompCert [12] is an industrial-strength compiler supporting nearly all of the ISO C 90 standard (a.k.a. ANSI C). The compiler is built using the Coq proof assistant and has a machine-checked proof that the code it generates is observationally equivalent to the compiled source program. This is the key to our method: because the compiler preserves semantics, invariants established by analysis and rewriting at the Cminor level are true at lower levels. Accordingly, we can enforce the SFI security specification of Section III-A by rewriting programs as they are compiled. Of course, for the specification to be preserved, we must also guarantee that the program we analyze does not get stuck. So we also rewrite programs so that they are safe with respect to the Cminor operational semantics. Our SFI transformation T is implemented as a series of seven new compilation phases in CompCert that convert an arbitrary Cminor source program S to a Cminor program T (S), such that T (S) is safe and policy-compliant (in the sense of Section III-A). Safety of T (S) ensures that CompCert will produce an assembly-language program which has the same observable behavior as T (S), meaning that the compiled program C is also policy-compliant. B. The CompCert Memory Model All of the CompCert intermediate languages, including Cminor and assembly, share a common memory model [29], [30]. Memory states in this model are collections of memory blocks of finite size, each with an integer block number b Z. Blocks are assigned a size at allocation time, and are

6 by construction separate. Byte-level addresses are referenced via an offset δ, which must be within the block s bounds. Pointers are constructed by specifying a block number b and an offset δ, so an address is written (b, δ). In the version of CompCert we target in this work (1.11), functions are modeled as code blocks occurring at negative block addresses, meaning it is possible to have function pointers in all CompCert intermediate languages. It is perhaps surprising that, in such an abstract memory model, we can express a technique such as software fault isolation, which is usually thought of in terms of the very concrete expression of pointers as patterns of bits. Certainly, it is not possible in CompCert s memory model to understand where a pointer will end up in the concrete byte-addressed memory array of the real machine on which the program will eventually run. However, because memory blocks in CompCert are separate by construction, we can reason about interference properties of pointers. As we will show, this is sufficient for our purposes. C. Cminor, CompCert, and Infrastructure Cminor is a simple imperative language designed as a compiler intermediate language for CompCert. It is comparable to a stripped-down, typeless variant of C. Like highlevel languages, it is processor-independent (indeed, it is the lowest level processor-independent intermediate representation in the CompCert stack). CompCert provides a small-step operational semantics for Cminor, which we adopt for this work. We chose Cminor because it is processor-independent yet also easy to target as an intermediate representation as it has few high-level constructs. Any lower-level intermediate stage in CompCert would require architecture-dependent analysis. Any higher-level intermediate stage would introduce source-language dependence. Cminor is used in other work as a natural target in the CompCert stack for compiling various source languages [31]. Temporaries in Cminor do not have rich type information. While the Cminor operational semantics specifies what values must be held in temporaries in order for expressions to be valid or for statements to step in the operational semantics, the temporaries themselves are not typed and Cminor is encountered in the compiler after type checking. Therefore, the first pass of our SFI transformer is a simple type inferencer for int/float temporaries, as we describe below. We use this type information in the PSFI transformation phases to rewrite potentially stuck expressions. As in C, Cminor programs may make calls both to internal and to external functions. Internal functions are those that are defined in the current translation unit. External functions are only prototyped. In CompCert s trace model, calls to external functions generate observable events. T 1 T 2 T 3 T 4 T 5 T 6 T 7 Cminor Source Program S Typechecking Define Program Variables Flatten Expressions Heap Allocate Stack Variables Insert Mask Functions Safe Execution (Expressions) Safe Execution (Statements) Cminor SFI Program T(S) = T 7 (...(T 2 (T 1 (S)))) Figure 2. A diagram of the structure of our SFI transformation. Each box T i shows what is rewritten at that stage. D. The SFI Mask Recall that our strategy for expressing SFI constraints at the Cminor level is to rewrite all memory accesses in the sandboxed program so that they reference a single distinguished CompCert memory block (the SFI block). The SFI block is allocated during program initialization, before control has passed to main. Its size is fixed to the size of the data region in the SFI policy. To relocate loads and stores to the SFI block, we add to CompCert a new external function called SFI mask, or often just mask. At the CompCert level and for the purposes of proofs, we axiomatize this function to have the following properties: After a call to the SFI mask, the variable passed as an argument to the function contains a pointer to a valid address inside the SFI block. If the variable passed to SFI mask contained a pointer to a valid address inside the SFI block, then it contains the same pointer after the function returns. The SFI mask is idempotent. The SFI mask is a pure, total function. We insert masks to restrict expressions that are used as addresses in potentially unsafe loads and stores. Once masked, pointers are guaranteed to point inside the SFI region. E. The SFI Transformation The SFI transformation comprises several passes. Each pass is implemented as a function on Cminor ASTs. Here we describe each pass briefly as well as the invariants the pass establishes. Figure 2 shows the order in which the passes are applied. Figure 1 shows the memory layout of the program in the Cminor source, in the transformed Cminor after all SFI passes, in CompCert s abstract assembly language, and finally in a flat machine memory model. Because CompCert is an ANSI C compiler, the semantics, even at the Cminor intermediate representation, reflect the semantics of ANSI C. The details and restrictions described

7 below are not merely idiosyncratic: they make C programs whose executions are not unpredictable. 1) Type Inference: In the first pass, we use a typechecker to infer the int-floatness of the local variables occurring in the program. Type inference will either fail with a type error, or will succeed with a symbol table mapping each identifier in the program to either Tint or Tfloat. The symbol table is then used in later passes to convey typing information. By this point in the compiler, the input program has already been typechecked in C, but that information is not accessible to our transformations, so we have implemented our own typechecker for Cminor. This is necessary anyway, as our system supports any source program which can be targeted to Cminor, not just those written in C. 2) Define Program Variables: In the second phase, we use the type information to explicitly define (initialize) every local variable that occurs in the program. Although safe Cminor programs would never reference an undefined variable, this phase ensures that even unsafe programs, which are acceptable inputs to our SFI transformation, do not exhibit this particular unsafe behavior. Safe programs, moreover, must define each program variable before use. Thus it is conservative to predefine each program variable with an arbitrary value of the appropriate type, as long as the new definitions occur before any existing definitions. The variable definition pass ensures that this is the case by inserting the new definitions as a block in the preamble of each function. For example, the following unsafe Cminor function (expressed in C syntax). void f(void) { 2 int c; double d; c = (int)(-1.*d); 4 } would be transformed to void f(void) { 2 int c; double d; d = -0.; 4 c = (int)(-1.*d); } in order to properly initialize d before first use. 3) Flatten: Because loads in Cminor may occur at nested positions within expressions, it is not possible to apply the mask operation directly to every loaded or stored address. Instead, we perform an intermediate flattening transformation that pulls loads to top-level, while ensuring that loads only occur to the right of assignment statements. To do so, we must introduce fresh local variables of the appropriate types to store intermediate values. However, because our SFI passes operate above register allocation and optimization in the compiler stack, these temporaries will often be optimized away by later compiler phases. As an example of flattening, consider the following code block containing nested loads in the call to printf. int i = 10; 2 int * p; 4 p = &i; printf( %d\n, *p + *p); After flattening, we get 1 int i = 10; int * p; 3 p = &i; 5 a = *p; b = *p; 7 printf( %d\n, a + b); where a is a fresh temporary, and the expression *p + *p in the call to printf has been replaced by the new assignments a = *p; b = *p and the expression a + b. 4) Malloc Shadow Stack: In Cminor, there are three kinds of local variables. First, there are stack-allocated local variables which have their addresses taken. At the Cminor stage in CompCert, these have been turned into explicit loads and stores into the programmer-visible stack frame. Second, there are temporaries that do not have their addresses taken. Temporaries at the Cminor level may be allocated to registers or spilled later in compilation. Finally, there are user-invisible local variables such as the return address of each call frame. Later in the compilation process, CompCert will spill temporaries and store return addresses into a user-invisible portion of the stack frame. A complete stack frame consists of a user-visible portion, made up of variables accessed by dereferencing user-visible pointers, surrounded by a userinvisible portion made up of spilled temporaries and userinvisible local variables. To enforce the integrity of the userinvisible stack, in this pass we move the user-visible portion into a shadow stack in the main SFI block. To accomplish this, we malloc sufficient memory for all of the stack-allocated locals in each Cminor function and rewrite stack accesses to reference this new memory in the SFI block. 6 We must link at runtime with a memory allocator which is aware of the location and size of the SFI block and which only allocates safe memory. Additionally, if a function has no stack-allocated local variables, we explicitly check for any stack accesses in the function and remove them. This is necessary because Cminor contains a special constant type for stack references, which is 6 To preserve locality, this malloc may be a special shadow stack malloc which allocates safe memory on a shadow stack within the SFI region.

8 interpreted as an offset to the current stack block. This allows us to guarantee that, after this pass is complete, the program contains only heap accesses to memory, local temporaries, and global variables. Concretely, consider the following Cminor program. void main(void) { 2 int i = 10; int * p = &i; 4 *p = 11; 6 *&i = 12; return; 8 } Here i is allocated on the stack because it is addressed in the assignment int * p = &i. After the shadow stack transformation, we get the program void main(void) { 2 void * sb = malloc(4); *(sb + 0) = 10; 4 int * p = sb + 0; 6 *p = 11; *(sb + 0) = 12; 8 free(sb); return; 10 } in which a new stack frame has been allocated within the SFI region by a call to an SFI-aware malloc function. Each access of i in the original program has been replaced by a dereference of the address sb + 0, the top of the allocated stack frame. Here sb is a fresh temporary pointer variable. Before the transformed function returns, a call to an SFIaware free is inserted to deallocate the stack frame. The vast majority of functions in typical C programs do not contain any addressed local variables. In these cases, we can avoid the calls to malloc and free entirely along with the concomitant performance penalty. 5) Mask: We add the SFI mask to protect every operation that dereferences a pointer. Since the previous Flatten pass simplified expressions, it is easy to insert the mask function where required. We model the mask function as a compiler builtin function with a signature Tint -> Tint. As in the Flatten pass, we must add fresh temporaries in order to place the return value in the correct location. These temporaries are removed by optimization. We must also ensure correct control flow at this stage. In Cminor, function pointers are the only real source of indirect control flow. Other control flow structures (such as conditional branches, switch statements, block-and-exit constructions, and jumps to labels) have well-defined semantics in Cminor. CompCert will generate correct, faithful assemblylevel implementations which exhibit the same observable behavior as long as we guarantee that the constructions are semantically safe at the Cminor level. Thus, control flow (with the exception of indirect control flow through function pointers) is confined to the control flow graph visible at the Cminor level. For calls through function pointers to be semantically safe, they must point to a known function (functions pointers are represented as pointers to blocks with block numbers b < 0) and the known function must have a signature that matches the stated signature in the call node. We thus introduce a new mask function to isolate function pointers. We require a portion of the text region to be set aside for special SFI-aligned memory blocks, one per distinct function signature. These new blocks may not overlap each other or the data region but may otherwise be located anywhere in memory. Inside these blocks, we place trampolines: small, fixed instruction sequences that implement calls to the appropriate function. Function pointers are allowed to point at this sequence, even if they are not allowed to point at the actual function they reference (which might be loaded at a different address). The new allowable regions are quite small: each allowable region needs only to be large enough to hold the trampolines for those functions with the given arity and signature that are called by function variables. We must specify an implementation in assembly for the data mask function and for function pointer trampolines for each architecture we wish to target. The particular choice of sandboxing instructions is not important except for performance, so long as the implementation has the desired properties in the CompCert assembly semantics for the given architecture. For example, on x86, the mask instruction might be a single and instruction that fixes the high-order bits of pointers by clearing any that are set erroneously. Together with a loader that unmaps a memory block equal in size to the SFI data block at the bottom of the address space, this mask is sufficient to satisfy the SFI properties [8]. 6) Reliable Execution: Finally, we rewrite the program so that it is guaranteed never to get stuck in the Cminor operational semantics. That is, executing the program from its initial state never yields a machine state that is incompatible with all rules in the operational semantics. This is analogous to the code alignment constraints in traditional SFI where aligned chunks allow the verifier to guarantee that it observes the same sequence of instructions as will be executed. Here, we need not worry about rogue control flow CompCert guarantees that it will create a program that satisfies (at least observably) the control flow constraints we can see at the Cminor level. However, in order to take advantage of CompCert s correctness theorem, we must give it a program that we know does not observably go wrong, that is, a program which never makes a state transition not covered by the provided operational semantics. First, we consider expressions, which must always evaluate to well-defined values. In order to guarantee expres-

9 sion evaluation, it is sometimes necessary to insert runtime checks (when subexpression values must satisfy certain constraints) and to substitute made-up values for poorly formed expressions. For example, if a and b have type Tint, the statement c = a/b; will only evaluate in the Cminor operational semantics if b 0. So we replace it with the statement c = (b!= 0)? a / b : 0; where we replace the evaluation of the expression with an invented value if it would not otherwise evaluate. The constant value 0 chosen here is arbitrary. Recall that because we only guarantee to preserve the behavior of semantically safe (i.e. well defined) programs but require safe behavior of all output programs, we can sensibly invent safe (but not semantics-preserving) behavior for programs that are not themselves semantically safe. Such runtime checks are also necessary to resolve value-dependent execution semantics for pointer subtraction, modular arithmetic, bit-level shift operations, and pointer-integer comparison (in Cminor, a pointer may be compared sensibly to the integer 0, but not to any other integer). Next, we turn our attention to statements. We ensure that all named function calls are safe with respect to the Cminor operational semantics by ensuring that the calls appropriately match the template of the named function. Function pointers must match the template of some declared function (we only require that a function pointer can be resolved, since we cannot know which function it actually points to). F. Runtime Our system requires a minimal runtime for each architecture that will be targeted. The primary role of the runtime is to lay out memory correctly. The loader must place the data section of the SFI executable in an appropriatelyaligned location for the choice of implementation of data mask function (and may also have to protect pages around this location appropriately). Also, the loader must ensure that global (extern) variables are mapped into the SFI region, especially if those variables have their addresses taken. Additionally, the loader must install function pointer trampolines as described above (or load the functions themselves at aligned addresses). The loader may also need to protect certain regions of memory depending on the choice of implementation of the mask function: for example, the loader may need to unmap guard pages around the SFI block to account for pointer-relative addressing modes or to unmap pages at the bottom of the address space to allow for efficient masks that only clear (but do not set) erroneous bits in sandboxed pointers. Finally, the runtime must include an architectureappropriate malloc implementation which will return only buffers in the SFI region. We can provide such a malloc library simply by applying the PSFI transformation to a standard memory allocation library. Thus, the memory allocator is also verified and need not be part of the trusted base. V. VERIFICATION AND TRUST In this section, we describe the specification of the SFI system and outline the proof that our implementation meets this specification. There are two properties that define a security system as SFI: (1) The system must be able to transform any any input program so that it executes safely; and (2) the program transformer does not alter the behavior of safe programs. SFI systems typically achieve (1) by modifying input programs so that they are safe, without analyzing whether that behavior would have been safe without modification. To define SFI security for Cminor programs, we first introduce the notion of OK addresses. Definition (OK Address). An address (b, δ) is OK, where b is a CompCert memory block and δ is an offset into block b, when either b is the distinguished SFI block sb, as fixed at program load time, and lo sb δ < hi sb, where lo sb and hi sb are the low and high offsets, respectively, of the SFI region; or (b, δ) addresses a compiler-managed memory region such as the part of a stack frame used for spilling. We say a program is SFI-secure when it is (1) safe and (2) loads or stores only to OK addresses. Now, we can define the desired properties of our system more formally Property 1 (Program Safety and Security). Let S be a source Cminor program and let T (S) be the program produced by the SFI transformer. Let C be the assembly program compiled by CompCert from T (S). Then C is SFIsecure. Property 2 (Semantics Preservation). Let S be a safe Cminor source program. Let T (S) be the Cminor program produced by running the SFI transformer on S. Then S and T (S) are observationally equivalent. 7 Furthermore, by CompCert s correctness theorem, we have the following corollary. Corollary (Semantics Equivalence of Compiled Code). Assembly programs C compiled from T (S) have the same observable behavior as T (S), and thus as S. To establish that our PSFI transformations, when composed with compilation by CompCert, produce assembly programs conforming to the SFI security policy, we first prove that the Cminor programs output by the program transformer described in Section IV satisfy SFI security. To do so, we establish formally the transformation invariants described in Section IV. For example, after the Masking pass, programs satisfy the following invariant: 7 As in CompCert, programs are observationally equivalent when they produce equivalent event traces.

10 Lemma (Masking Invariant). Take source Cminor program S. Assume S satisfies the invariant established by the flattening pass (loads only at top-level, and only within an assignment statement). Let T (S) be the program produced by the masking transformation. Then loaded ids(t (S)) stored ids(t (S)) masked ids(t (S)) In other words, assuming the flattening invariant holds initially of S, the set of masked identifiers in T (S) is a superset of the set of identifiers loaded or stored in T (S) (every identifier in T (S) containing an address that is either loaded or stored has also been masked). This invariant, together with the fact that masks are inserted directly before the corresponding loads and stores, and a second invariant which ensures that loads and stores are only to (potentially fresh) temporaries, never of arbitrary expressions, establishes that every memory access in T (S) is guarded correctly. The formal statements of the invariants for other passes are not very interesting. For example, to formally express the flattening invariant used above, we first characterize what it means for an expression e to be load-free: load free (e : expr) = match e with Evar i True Econst c True Eunop op e 1 load free e 1 Eload ch e 1 False Ebinop op e 1 e 2 load free e 1 load free e 2 Econdition e 1 e 2 e 3 load free e 1 load free e 2 load free e 3 A statement s satisfies the flattening invariant (all loads in s are top-level assignments) when every expression in s (besides top-level loads themselves) is load-free: stmt fmap (P : expr Prop) (s : stmt) = match s with Sassign i (Eload ch e) P (e) Sassign i e P (e) Sloop s stmt fmap P s load free stmt = stmt fmap load free We lift this property to whole programs in a similar way. A. Trusted Computing Base Our system enjoys a very small TCB: it is not necessary to trust that the isolated program is bug free, because we rewrite it to be safe. It is not necessary to trust the SFI compiler because we have proved it sound. It is not even necessary to trust the proofs of our transformation, as they are checked by machine. One must only trust the statements of our soundness theorems above. Some of the theorems above are in fact about completeness, not security: in reality, while the code consumer cares about security and the trusted base, as it is her trust in question, the code producer also needs assurance that the code has not been broken by the SFI system. At runtime, only the loader need be trusted. Any additional runtime services which might be implemented (such as intermodule communication or trusted system call handlers) would, of course, also be trusted. The mechanized proof infrastructure must be trusted. In Coq, a small kernel verifier checks proof objects against types generated from the theorem specifications. It has been carefully validated and is widely considered trustworthy. Additionally, in order to actually run our code on real programs, it is necessary to extract the Gallina code to executable OCaml. The extraction facility itself must be trusted, along with the OCaml runtime. The compiler, CompCert, need not be completely trusted: because it has a machine-checked proof of correctness, only the (much smaller) specification of that proof is in the TCB. Finally, as in traditional SFI, it is necessary to trust the assembler to preserve the semantics of the assembly language program faithfully in the final, executable machine language representation (or to analyze those semantics properly during disassembly). Here, again, it would be possible to make our technique foundational by extending the semantic preservation proof in CompCert to machine language. VI. MODELING MEMORY In our model of the SFI transformation, we define mask as an external function that takes as argument a (possibly unsafe) address and returns an address into the SFI region as result. We model the SFI region as a distinguished CompCert block. Since the dynamic semantics of both Cminor and the CompCert assembly languages employ exactly the same memory model, we can specify uniformly the semantics of mask across these languages. Here we consider what happens when we expand our built-in mask function to inlined assembly instructions in a finite, flat, byte-addressable memory model. We justify this transformation step by showing that the block-level abstraction of mask is an adequate model of our implementation of mask as inlined assembly. Claim (Adequacy of the Mask Specification). The inlined assembly implementation of mask refines, in the onedimensional virtual address space allocated to a running process by the operating system, the specification of mask defined above on CompCert memories. This follows from the fact that CompCert blocks represent contiguous regions of memory. In particular, CompCert preserves the behavior of programs that perform pointer arithmetic within a block, but behavior is not preserved for pointer arithmetic across blocks.

A concrete memory model for CompCert

A concrete memory model for CompCert A concrete memory model for CompCert Frédéric Besson Sandrine Blazy Pierre Wilke Rennes, France P. Wilke A concrete memory model for CompCert 1 / 28 CompCert real-world C to ASM compiler used in industry

More information

Type Checking and Type Equality

Type Checking and Type Equality Type Checking and Type Equality Type systems are the biggest point of variation across programming languages. Even languages that look similar are often greatly different when it comes to their type systems.

More information

0x1A Great Papers in Computer Security

0x1A Great Papers in Computer Security CS 380S 0x1A Great Papers in Computer Security Vitaly Shmatikov http://www.cs.utexas.edu/~shmat/courses/cs380s/ slide 1 Reference Monitor Observes execution of the program/process At what level? Possibilities:

More information

Sandboxing Untrusted Code: Software-Based Fault Isolation (SFI)

Sandboxing Untrusted Code: Software-Based Fault Isolation (SFI) Sandboxing Untrusted Code: Software-Based Fault Isolation (SFI) Brad Karp UCL Computer Science CS GZ03 / M030 9 th December 2011 Motivation: Vulnerabilities in C Seen dangers of vulnerabilities: injection

More information

COS 320. Compiling Techniques

COS 320. Compiling Techniques Topic 5: Types COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer 1 Types: potential benefits (I) 2 For programmers: help to eliminate common programming mistakes, particularly

More information

Programming Languages Third Edition. Chapter 7 Basic Semantics

Programming Languages Third Edition. Chapter 7 Basic Semantics Programming Languages Third Edition Chapter 7 Basic Semantics Objectives Understand attributes, binding, and semantic functions Understand declarations, blocks, and scope Learn how to construct a symbol

More information

Intermediate Code Generation

Intermediate Code Generation Intermediate Code Generation In the analysis-synthesis model of a compiler, the front end analyzes a source program and creates an intermediate representation, from which the back end generates target

More information

Programming Languages Third Edition. Chapter 10 Control II Procedures and Environments

Programming Languages Third Edition. Chapter 10 Control II Procedures and Environments Programming Languages Third Edition Chapter 10 Control II Procedures and Environments Objectives Understand the nature of procedure definition and activation Understand procedure semantics Learn parameter-passing

More information

CS558 Programming Languages

CS558 Programming Languages CS558 Programming Languages Fall 2016 Lecture 3a Andrew Tolmach Portland State University 1994-2016 Formal Semantics Goal: rigorous and unambiguous definition in terms of a wellunderstood formalism (e.g.

More information

CIS 341 Final Examination 3 May 2011

CIS 341 Final Examination 3 May 2011 CIS 341 Final Examination 3 May 2011 1 /16 2 /20 3 /40 4 /28 5 /16 Total /120 Do not begin the exam until you are told to do so. You have 120 minutes to complete the exam. There are 12 pages in this exam.

More information

Language Techniques for Provably Safe Mobile Code

Language Techniques for Provably Safe Mobile Code Language Techniques for Provably Safe Mobile Code Frank Pfenning Carnegie Mellon University Distinguished Lecture Series Computing and Information Sciences Kansas State University October 27, 2000 Acknowledgments:

More information

Lecture Notes on Memory Layout

Lecture Notes on Memory Layout Lecture Notes on Memory Layout 15-122: Principles of Imperative Computation Frank Pfenning André Platzer Lecture 11 1 Introduction In order to understand how programs work, we can consider the functions,

More information

High-Level Language VMs

High-Level Language VMs High-Level Language VMs Outline Motivation What is the need for HLL VMs? How are these different from System or Process VMs? Approach to HLL VMs Evolutionary history Pascal P-code Object oriented HLL VMs

More information

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

SEMANTIC ANALYSIS TYPES AND DECLARATIONS SEMANTIC ANALYSIS CS 403: Type Checking Stefan D. Bruda Winter 2015 Parsing only verifies that the program consists of tokens arranged in a syntactically valid combination now we move to check whether

More information

CS164: Programming Assignment 5 Decaf Semantic Analysis and Code Generation

CS164: Programming Assignment 5 Decaf Semantic Analysis and Code Generation CS164: Programming Assignment 5 Decaf Semantic Analysis and Code Generation Assigned: Sunday, November 14, 2004 Due: Thursday, Dec 9, 2004, at 11:59pm No solution will be accepted after Sunday, Dec 12,

More information

Types. Type checking. Why Do We Need Type Systems? Types and Operations. What is a type? Consensus

Types. Type checking. Why Do We Need Type Systems? Types and Operations. What is a type? Consensus Types Type checking What is a type? The notion varies from language to language Consensus A set of values A set of operations on those values Classes are one instantiation of the modern notion of type

More information

Simply-Typed Lambda Calculus

Simply-Typed Lambda Calculus #1 Simply-Typed Lambda Calculus #2 Back to School What is operational semantics? When would you use contextual (small-step) semantics? What is denotational semantics? What is axiomatic semantics? What

More information

Cover Page. The handle holds various files of this Leiden University dissertation

Cover Page. The handle   holds various files of this Leiden University dissertation Cover Page The handle http://hdl.handle.net/1887/22891 holds various files of this Leiden University dissertation Author: Gouw, Stijn de Title: Combining monitoring with run-time assertion checking Issue

More information

Programming Languages and Compilers Qualifying Examination. Answer 4 of 6 questions.

Programming Languages and Compilers Qualifying Examination. Answer 4 of 6 questions. Programming Languages and Compilers Qualifying Examination Fall 2017 Answer 4 of 6 questions. GENERAL INSTRUCTIONS 1. Answer each question in a separate book. 2. Indicate on the cover of each book the

More information

1 Introduction. 3 Syntax

1 Introduction. 3 Syntax CS 6110 S18 Lecture 19 Typed λ-calculus 1 Introduction Type checking is a lightweight technique for proving simple properties of programs. Unlike theorem-proving techniques based on axiomatic semantics,

More information

Chapter 1 GETTING STARTED. SYS-ED/ Computer Education Techniques, Inc.

Chapter 1 GETTING STARTED. SYS-ED/ Computer Education Techniques, Inc. Chapter 1 GETTING STARTED SYS-ED/ Computer Education Techniques, Inc. Objectives You will learn: Java platform. Applets and applications. Java programming language: facilities and foundation. Memory management

More information

Buffer overflow background

Buffer overflow background and heap buffer background Comp Sci 3600 Security Heap Outline and heap buffer Heap 1 and heap 2 3 buffer 4 5 Heap Outline and heap buffer Heap 1 and heap 2 3 buffer 4 5 Heap Address Space and heap buffer

More information

Control Flow Integrity & Software Fault Isolation. David Brumley Carnegie Mellon University

Control Flow Integrity & Software Fault Isolation. David Brumley Carnegie Mellon University Control Flow Integrity & Software Fault Isolation David Brumley Carnegie Mellon University Our story so far Unauthorized Control Information Tampering http://propercourse.blogspot.com/2010/05/i-believe-in-duct-tape.html

More information

Run-time Environments. Lecture 13. Prof. Alex Aiken Original Slides (Modified by Prof. Vijay Ganesh) Lecture 13

Run-time Environments. Lecture 13. Prof. Alex Aiken Original Slides (Modified by Prof. Vijay Ganesh) Lecture 13 Run-time Environments Lecture 13 by Prof. Vijay Ganesh) Lecture 13 1 What have we covered so far? We have covered the front-end phases Lexical analysis (Lexer, regular expressions,...) Parsing (CFG, Top-down,

More information

INFORMATION AND COMMUNICATION TECHNOLOGIES (ICT) PROGRAMME. Project FP7-ICT-2009-C CerCo. Report n. D2.2 Prototype implementation

INFORMATION AND COMMUNICATION TECHNOLOGIES (ICT) PROGRAMME. Project FP7-ICT-2009-C CerCo. Report n. D2.2 Prototype implementation INFORMATION AND COMMUNICATION TECHNOLOGIES (ICT) PROGRAMME Project FP7-ICT-2009-C-243881 CerCo Report n. D2.2 Prototype implementation Version 1.0 Main Authors: Nicolas Ayache, Roberto M. Amadio, Yann

More information

Heap Management. Heap Allocation

Heap Management. Heap Allocation Heap Management Heap Allocation A very flexible storage allocation mechanism is heap allocation. Any number of data objects can be allocated and freed in a memory pool, called a heap. Heap allocation is

More information

Lecture Notes on Queues

Lecture Notes on Queues Lecture Notes on Queues 15-122: Principles of Imperative Computation Frank Pfenning Lecture 9 September 25, 2012 1 Introduction In this lecture we introduce queues as a data structure and linked lists

More information

Chapter 1. Introduction

Chapter 1. Introduction 1 Chapter 1 Introduction An exciting development of the 21st century is that the 20th-century vision of mechanized program verification is finally becoming practical, thanks to 30 years of advances in

More information

Lecture 3 Notes Arrays

Lecture 3 Notes Arrays Lecture 3 Notes Arrays 15-122: Principles of Imperative Computation (Summer 1 2015) Frank Pfenning, André Platzer 1 Introduction So far we have seen how to process primitive data like integers in imperative

More information

CSE 504: Compiler Design. Runtime Environments

CSE 504: Compiler Design. Runtime Environments Runtime Environments Pradipta De pradipta.de@sunykorea.ac.kr Current Topic Procedure Abstractions Mechanisms to manage procedures and procedure calls from compiler s perspective Runtime Environment Choices

More information

CS558 Programming Languages

CS558 Programming Languages CS558 Programming Languages Winter 2017 Lecture 4a Andrew Tolmach Portland State University 1994-2017 Semantics and Erroneous Programs Important part of language specification is distinguishing valid from

More information

Lecture 1 Contracts : Principles of Imperative Computation (Fall 2018) Frank Pfenning

Lecture 1 Contracts : Principles of Imperative Computation (Fall 2018) Frank Pfenning Lecture 1 Contracts 15-122: Principles of Imperative Computation (Fall 2018) Frank Pfenning In these notes we review contracts, which we use to collectively denote function contracts, loop invariants,

More information

Static Program Analysis Part 1 the TIP language

Static Program Analysis Part 1 the TIP language Static Program Analysis Part 1 the TIP language http://cs.au.dk/~amoeller/spa/ Anders Møller & Michael I. Schwartzbach Computer Science, Aarhus University Questions about programs Does the program terminate

More information

Short Notes of CS201

Short Notes of CS201 #includes: Short Notes of CS201 The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with < and > if the file is a system

More information

Formal C semantics: CompCert and the C standard

Formal C semantics: CompCert and the C standard Formal C semantics: CompCert and the C standard Robbert Krebbers 1, Xavier Leroy 2, and Freek Wiedijk 1 1 ICIS, Radboud University Nijmegen, The Netherlands 2 Inria Paris-Rocquencourt, France Abstract.

More information

Run-time Environments

Run-time Environments Run-time Environments Status We have so far covered the front-end phases Lexical analysis Parsing Semantic analysis Next come the back-end phases Code generation Optimization Register allocation Instruction

More information

Lecture 10 Notes Linked Lists

Lecture 10 Notes Linked Lists Lecture 10 Notes Linked Lists 15-122: Principles of Imperative Computation (Spring 2016) Frank Pfenning, Rob Simmons, André Platzer 1 Introduction In this lecture we discuss the use of linked lists to

More information

Lecture 1 Contracts. 1 A Mysterious Program : Principles of Imperative Computation (Spring 2018) Frank Pfenning

Lecture 1 Contracts. 1 A Mysterious Program : Principles of Imperative Computation (Spring 2018) Frank Pfenning Lecture 1 Contracts 15-122: Principles of Imperative Computation (Spring 2018) Frank Pfenning In these notes we review contracts, which we use to collectively denote function contracts, loop invariants,

More information

Run-time Environments

Run-time Environments Run-time Environments Status We have so far covered the front-end phases Lexical analysis Parsing Semantic analysis Next come the back-end phases Code generation Optimization Register allocation Instruction

More information

CS558 Programming Languages

CS558 Programming Languages CS558 Programming Languages Fall 2017 Lecture 3a Andrew Tolmach Portland State University 1994-2017 Binding, Scope, Storage Part of being a high-level language is letting the programmer name things: variables

More information

Advances in Programming Languages: Regions

Advances in Programming Languages: Regions Advances in Programming Languages: Regions Allan Clark and Stephen Gilmore The University of Edinburgh February 22, 2007 Introduction The design decision that memory will be managed on a per-language basis

More information

CS201 - Introduction to Programming Glossary By

CS201 - Introduction to Programming Glossary By CS201 - Introduction to Programming Glossary By #include : The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with

More information

Honours/Master/PhD Thesis Projects Supervised by Dr. Yulei Sui

Honours/Master/PhD Thesis Projects Supervised by Dr. Yulei Sui Honours/Master/PhD Thesis Projects Supervised by Dr. Yulei Sui Projects 1 Information flow analysis for mobile applications 2 2 Machine-learning-guide typestate analysis for UAF vulnerabilities 3 3 Preventing

More information

Buffer overflow prevention, and other attacks

Buffer overflow prevention, and other attacks Buffer prevention, and other attacks Comp Sci 3600 Security Outline 1 2 Two approaches to buffer defense Aim to harden programs to resist attacks in new programs Run time Aim to detect and abort attacks

More information

Lecture Notes on Arrays

Lecture Notes on Arrays Lecture Notes on Arrays 15-122: Principles of Imperative Computation July 2, 2013 1 Introduction So far we have seen how to process primitive data like integers in imperative programs. That is useful,

More information

CPSC 3740 Programming Languages University of Lethbridge. Data Types

CPSC 3740 Programming Languages University of Lethbridge. Data Types Data Types A data type defines a collection of data values and a set of predefined operations on those values Some languages allow user to define additional types Useful for error detection through type

More information

Confinement (Running Untrusted Programs)

Confinement (Running Untrusted Programs) Confinement (Running Untrusted Programs) Chester Rebeiro Indian Institute of Technology Madras Untrusted Programs Untrusted Application Entire Application untrusted Part of application untrusted Modules

More information

Chapter 1 INTRODUCTION SYS-ED/ COMPUTER EDUCATION TECHNIQUES, INC.

Chapter 1 INTRODUCTION SYS-ED/ COMPUTER EDUCATION TECHNIQUES, INC. hapter 1 INTRODUTION SYS-ED/ OMPUTER EDUATION TEHNIQUES, IN. Objectives You will learn: Java features. Java and its associated components. Features of a Java application and applet. Java data types. Java

More information

CSCE 314 Programming Languages. Type System

CSCE 314 Programming Languages. Type System CSCE 314 Programming Languages Type System Dr. Hyunyoung Lee 1 Names Names refer to different kinds of entities in programs, such as variables, functions, classes, templates, modules,.... Names can be

More information

How much is a mechanized proof worth, certification-wise?

How much is a mechanized proof worth, certification-wise? How much is a mechanized proof worth, certification-wise? Xavier Leroy Inria Paris-Rocquencourt PiP 2014: Principles in Practice In this talk... Some feedback from the aircraft industry concerning the

More information

School of Computer & Communication Sciences École Polytechnique Fédérale de Lausanne

School of Computer & Communication Sciences École Polytechnique Fédérale de Lausanne Principles of Dependable Systems Building Reliable Software School of Computer & Communication Sciences École Polytechnique Fédérale de Lausanne Winter 2006-2007 Outline Class Projects: mtgs next week

More information

Chapter 8 :: Composite Types

Chapter 8 :: Composite Types Chapter 8 :: Composite Types Programming Language Pragmatics, Fourth Edition Michael L. Scott Copyright 2016 Elsevier 1 Chapter08_Composite_Types_4e - Tue November 21, 2017 Records (Structures) and Variants

More information

Motivation was to facilitate development of systems software, especially OS development.

Motivation was to facilitate development of systems software, especially OS development. A History Lesson C Basics 1 Development of language by Dennis Ritchie at Bell Labs culminated in the C language in 1972. Motivation was to facilitate development of systems software, especially OS development.

More information

Semantic Analysis. Lecture 9. February 7, 2018

Semantic Analysis. Lecture 9. February 7, 2018 Semantic Analysis Lecture 9 February 7, 2018 Midterm 1 Compiler Stages 12 / 14 COOL Programming 10 / 12 Regular Languages 26 / 30 Context-free Languages 17 / 21 Parsing 20 / 23 Extra Credit 4 / 6 Average

More information

G Programming Languages - Fall 2012

G Programming Languages - Fall 2012 G22.2110-003 Programming Languages - Fall 2012 Lecture 4 Thomas Wies New York University Review Last week Control Structures Selection Loops Adding Invariants Outline Subprograms Calling Sequences Parameter

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Formal verification of an optimizing compiler

Formal verification of an optimizing compiler Formal verification of an optimizing compiler Xavier Leroy INRIA Rocquencourt RTA 2007 X. Leroy (INRIA) Formal compiler verification RTA 2007 1 / 58 The compilation process General definition: any automatic

More information

CSCI 171 Chapter Outlines

CSCI 171 Chapter Outlines Contents CSCI 171 Chapter 1 Overview... 2 CSCI 171 Chapter 2 Programming Components... 3 CSCI 171 Chapter 3 (Sections 1 4) Selection Structures... 5 CSCI 171 Chapter 3 (Sections 5 & 6) Iteration Structures

More information

Identity-based Access Control

Identity-based Access Control Identity-based Access Control The kind of access control familiar from operating systems like Unix or Windows based on user identities This model originated in closed organisations ( enterprises ) like

More information

Verasco: a Formally Verified C Static Analyzer

Verasco: a Formally Verified C Static Analyzer Verasco: a Formally Verified C Static Analyzer Jacques-Henri Jourdan Joint work with: Vincent Laporte, Sandrine Blazy, Xavier Leroy, David Pichardie,... June 13, 2017, Montpellier GdR GPL thesis prize

More information

Deallocation Mechanisms. User-controlled Deallocation. Automatic Garbage Collection

Deallocation Mechanisms. User-controlled Deallocation. Automatic Garbage Collection Deallocation Mechanisms User-controlled Deallocation Allocating heap space is fairly easy. But how do we deallocate heap memory no longer in use? Sometimes we may never need to deallocate! If heaps objects

More information

Semantic Analysis. CSE 307 Principles of Programming Languages Stony Brook University

Semantic Analysis. CSE 307 Principles of Programming Languages Stony Brook University Semantic Analysis CSE 307 Principles of Programming Languages Stony Brook University http://www.cs.stonybrook.edu/~cse307 1 Role of Semantic Analysis Syntax vs. Semantics: syntax concerns the form of a

More information

CS 6110 S11 Lecture 25 Typed λ-calculus 6 April 2011

CS 6110 S11 Lecture 25 Typed λ-calculus 6 April 2011 CS 6110 S11 Lecture 25 Typed λ-calculus 6 April 2011 1 Introduction Type checking is a lightweight technique for proving simple properties of programs. Unlike theorem-proving techniques based on axiomatic

More information

Lexical Considerations

Lexical Considerations Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Fall 2005 Handout 6 Decaf Language Wednesday, September 7 The project for the course is to write a

More information

CPS311 Lecture: Procedures Last revised 9/9/13. Objectives:

CPS311 Lecture: Procedures Last revised 9/9/13. Objectives: CPS311 Lecture: Procedures Last revised 9/9/13 Objectives: 1. To introduce general issues that any architecture must address in terms of calling/returning from procedures, passing parameters (including

More information

Why Study Assembly Language?

Why Study Assembly Language? Why Study Assembly Language? This depends on the decade in which you studied assembly language. 1940 s You cannot study assembly language. It does not exist yet. 1950 s You study assembly language because,

More information

Static Analysis and Bugfinding

Static Analysis and Bugfinding Static Analysis and Bugfinding Alex Kantchelian 09/12/2011 Last week we talked about runtime checking methods: tools for detecting vulnerabilities being exploited in deployment. So far, these tools have

More information

The Java Language Implementation

The Java Language Implementation CS 242 2012 The Java Language Implementation Reading Chapter 13, sections 13.4 and 13.5 Optimizing Dynamically-Typed Object-Oriented Languages With Polymorphic Inline Caches, pages 1 5. Outline Java virtual

More information

Modal Logic: Implications for Design of a Language for Distributed Computation p.1/53

Modal Logic: Implications for Design of a Language for Distributed Computation p.1/53 Modal Logic: Implications for Design of a Language for Distributed Computation Jonathan Moody (with Frank Pfenning) Department of Computer Science Carnegie Mellon University Modal Logic: Implications for

More information

Type Systems. Pierce Ch. 3, 8, 11, 15 CSE

Type Systems. Pierce Ch. 3, 8, 11, 15 CSE Type Systems Pierce Ch. 3, 8, 11, 15 CSE 6341 1 A Simple Language ::= true false if then else 0 succ pred iszero Simple untyped expressions Natural numbers encoded as succ succ

More information

Inline Reference Monitoring Techniques

Inline Reference Monitoring Techniques Inline Reference Monitoring Techniques In the last lecture, we started talking about Inline Reference Monitors. The idea is that the policy enforcement code runs with the same address space as the code

More information

Low level security. Andrew Ruef

Low level security. Andrew Ruef Low level security Andrew Ruef What s going on Stuff is getting hacked all the time We re writing tons of software Often with little regard to reliability let alone security The regulatory environment

More information

CS107 Handout 08 Spring 2007 April 9, 2007 The Ins and Outs of C Arrays

CS107 Handout 08 Spring 2007 April 9, 2007 The Ins and Outs of C Arrays CS107 Handout 08 Spring 2007 April 9, 2007 The Ins and Outs of C Arrays C Arrays This handout was written by Nick Parlante and Julie Zelenski. As you recall, a C array is formed by laying out all the elements

More information

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11 CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11 CS 536 Spring 2015 1 Handling Overloaded Declarations Two approaches are popular: 1. Create a single symbol table

More information

Lecture Notes on Memory Management

Lecture Notes on Memory Management Lecture Notes on Memory Management 15-122: Principles of Imperative Computation Frank Pfenning Lecture 21 April 5, 2011 1 Introduction Unlike C0 and other modern languages like Java, C#, or ML, C requires

More information

CCured. One-Slide Summary. Lecture Outline. Type-Safe Retrofitting of C Programs

CCured. One-Slide Summary. Lecture Outline. Type-Safe Retrofitting of C Programs CCured Type-Safe Retrofitting of C Programs [Necula, McPeak,, Weimer, Condit, Harren] #1 One-Slide Summary CCured enforces memory safety and type safety in legacy C programs. CCured analyzes how you use

More information

Language Translation. Compilation vs. interpretation. Compilation diagram. Step 1: compile. Step 2: run. compiler. Compiled program. program.

Language Translation. Compilation vs. interpretation. Compilation diagram. Step 1: compile. Step 2: run. compiler. Compiled program. program. Language Translation Compilation vs. interpretation Compilation diagram Step 1: compile program compiler Compiled program Step 2: run input Compiled program output Language Translation compilation is translation

More information

CS558 Programming Languages

CS558 Programming Languages CS558 Programming Languages Winter 2017 Lecture 7b Andrew Tolmach Portland State University 1994-2017 Values and Types We divide the universe of values according to types A type is a set of values and

More information

1. Abstract syntax: deep structure and binding. 2. Static semantics: typing rules. 3. Dynamic semantics: execution rules.

1. Abstract syntax: deep structure and binding. 2. Static semantics: typing rules. 3. Dynamic semantics: execution rules. Introduction Formal definitions of programming languages have three parts: CMPSCI 630: Programming Languages Static and Dynamic Semantics Spring 2009 (with thanks to Robert Harper) 1. Abstract syntax:

More information

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1 Agenda CSE P 501 Compilers Java Implementation JVMs, JITs &c Hal Perkins Summer 2004 Java virtual machine architecture.class files Class loading Execution engines Interpreters & JITs various strategies

More information

Machine-Independent Optimizations

Machine-Independent Optimizations Chapter 9 Machine-Independent Optimizations High-level language constructs can introduce substantial run-time overhead if we naively translate each construct independently into machine code. This chapter

More information

Version:1.1. Overview of speculation-based cache timing side-channels

Version:1.1. Overview of speculation-based cache timing side-channels Author: Richard Grisenthwaite Date: January 2018 Version 1.1 Introduction This whitepaper looks at the susceptibility of Arm implementations following recent research findings from security researchers

More information

LESSON 13: LANGUAGE TRANSLATION

LESSON 13: LANGUAGE TRANSLATION LESSON 13: LANGUAGE TRANSLATION Objective Interpreters and Compilers. Language Translation Phases. Interpreters and Compilers A COMPILER is a program that translates a complete source program into machine

More information

5.Coding for 64-Bit Programs

5.Coding for 64-Bit Programs Chapter 5 5.Coding for 64-Bit Programs This chapter provides information about ways to write/update your code so that you can take advantage of the Silicon Graphics implementation of the IRIX 64-bit operating

More information

Lectures 5-6: Introduction to C

Lectures 5-6: Introduction to C Lectures 5-6: Introduction to C Motivation: C is both a high and a low-level language Very useful for systems programming Faster than Java This intro assumes knowledge of Java Focus is on differences Most

More information

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano Building a Runnable Program and Code Improvement Dario Marasco, Greg Klepic, Tess DiStefano Building a Runnable Program Review Front end code Source code analysis Syntax tree Back end code Target code

More information

A Feasibility Study for Methods of Effective Memoization Optimization

A Feasibility Study for Methods of Effective Memoization Optimization A Feasibility Study for Methods of Effective Memoization Optimization Daniel Mock October 2018 Abstract Traditionally, memoization is a compiler optimization that is applied to regions of code with few

More information

Œuf: Minimizing the Coq Extraction TCB

Œuf: Minimizing the Coq Extraction TCB Œuf: Minimizing the Coq Extraction TCB Eric Mullen University of Washington, USA USA Abstract Zachary Tatlock University of Washington, USA USA Verifying systems by implementing them in the programming

More information

Some Basic Concepts EL6483. Spring EL6483 Some Basic Concepts Spring / 22

Some Basic Concepts EL6483. Spring EL6483 Some Basic Concepts Spring / 22 Some Basic Concepts EL6483 Spring 2016 EL6483 Some Basic Concepts Spring 2016 1 / 22 Embedded systems Embedded systems are rather ubiquitous these days (and increasing rapidly). By some estimates, there

More information

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview COMP 181 Compilers Lecture 2 Overview September 7, 2006 Administrative Book? Hopefully: Compilers by Aho, Lam, Sethi, Ullman Mailing list Handouts? Programming assignments For next time, write a hello,

More information

A Short Summary of Javali

A Short Summary of Javali A Short Summary of Javali October 15, 2015 1 Introduction Javali is a simple language based on ideas found in languages like C++ or Java. Its purpose is to serve as the source language for a simple compiler

More information

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1 CSE P 501 Compilers Java Implementation JVMs, JITs &c Hal Perkins Winter 2008 3/11/2008 2002-08 Hal Perkins & UW CSE V-1 Agenda Java virtual machine architecture.class files Class loading Execution engines

More information

Compiler Design. Subject Code: 6CS63/06IS662. Part A UNIT 1. Chapter Introduction. 1.1 Language Processors

Compiler Design. Subject Code: 6CS63/06IS662. Part A UNIT 1. Chapter Introduction. 1.1 Language Processors Compiler Design Subject Code: 6CS63/06IS662 Part A UNIT 1 Chapter 1 1. Introduction 1.1 Language Processors A compiler is a program that can read a program in one language (source language) and translate

More information

A brief introduction to C programming for Java programmers

A brief introduction to C programming for Java programmers A brief introduction to C programming for Java programmers Sven Gestegård Robertz September 2017 There are many similarities between Java and C. The syntax in Java is basically

More information

Lecture 7: Type Systems and Symbol Tables. CS 540 George Mason University

Lecture 7: Type Systems and Symbol Tables. CS 540 George Mason University Lecture 7: Type Systems and Symbol Tables CS 540 George Mason University Static Analysis Compilers examine code to find semantic problems. Easy: undeclared variables, tag matching Difficult: preventing

More information

COP 3330 Final Exam Review

COP 3330 Final Exam Review COP 3330 Final Exam Review I. The Basics (Chapters 2, 5, 6) a. comments b. identifiers, reserved words c. white space d. compilers vs. interpreters e. syntax, semantics f. errors i. syntax ii. run-time

More information

CS107 Handout 13 Spring 2008 April 18, 2008 Computer Architecture: Take II

CS107 Handout 13 Spring 2008 April 18, 2008 Computer Architecture: Take II CS107 Handout 13 Spring 2008 April 18, 2008 Computer Architecture: Take II Example: Simple variables Handout written by Julie Zelenski and Nick Parlante A variable is a location in memory. When a variable

More information

Lecture Notes on Ints

Lecture Notes on Ints Lecture Notes on Ints 15-122: Principles of Imperative Computation Frank Pfenning Lecture 2 August 26, 2010 1 Introduction Two fundamental types in almost any programming language are booleans and integers.

More information

Symbol Tables Symbol Table: In computer science, a symbol table is a data structure used by a language translator such as a compiler or interpreter, where each identifier in a program's source code is

More information

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End Outline Semantic Analysis The role of semantic analysis in a compiler A laundry list of tasks Scope Static vs. Dynamic scoping Implementation: symbol tables Types Static analyses that detect type errors

More information