Creating Formally Verified Components for Layered Assurance with an LLVM-to-ACL2 Translator

Creating Formally Verified Components for Layered Assurance with an LLVM-to-ACL2 Translator Jennifer Davis, David Hardin, Jedidiah McClurg December 2013

Introduction Research objectives: Reduce need to trust compiler optimizations by reasoning about post-optimization intermediate representations Reason about small fragments of code written in assembly language We wish to create a library of formally verified software component models for layered assurance. In our current work, the components are in LLVM intermediate form. We wish to translate them to a theorem proving language, such as ACL2. We built an LLVM-to-ACL2 translator 2

Motivating Work Jianzhou Zhao (U Penn) et al. produced several different formalizations of operational semantics for LLVM in Coq. (2012) Intention to produce a verified LLVM compiler Magnus Myreen s (Cambridge) decompilation into logic work (2009) Imperative machine code (PPC, x86, ARM) -> HOL4 Extracts functional behavior of imperative code Assures decompilation process is sound Andrew Appel (Princeton) observed that SSA is functional programming (1998) This inspired us to build a translator from LLVM to ACL2 3

LLVM LLVM is the intermediate form for many common compilers, including clang. LLVM code generation targets exist for a wide variety of machines LLVM is a register-based intermediate in Static Single Assignment (SSA) form (each variable is assigned exactly once). An LLVM program consists of a list of entities. There are eight types including: function declarations function definitions Our software component models are created from code that has been compiled into the LLVM intermediate form 4

ACL2 A Computational Logic for Applicative Common Lisp (ACL2) Highly automated theorem proving system Functional language with admission criteria Executable subset of language Rich set of legal identifiers (@foo, %.06, @bar_%.0) Side-effect free subset of Lisp, so it inherits Lisp peculiarities Function definition: (defun funname (parm1 parm2 parm3) (<body>)) Function invocation: (funname x y z) let binds variables to values within a function body Multiway conditionals use the cond form Lists are the fundamental data structure ACL2 supports integers and rationals Lisp predicate names are traditionally given a suffix of p 5

LLVM-to-ACL2 Translation Toolchain theorem prover 6

Example C Source long sumarr(unsigned int n, long sum, long *array) { unsigned int j = 0; for (j = 0; j < n; j++) { sum += array[j]; } return sum; } We can produce LLVM from C source as follows: clang O4 S emit-llvm sumarr.c 7

Example LLVM C Source: long sumarr(unsigned int n, long sum, long *array) { unsigned int j = 0; for (j = 0; j < n; j++) { sum += array[j];} return sum;} define i64 @sumarr(i32 %n, i64 %sum, i64* nocapture %array) nounwind uwtable readonly { %1 = icmp eq i32 %n, 0 br i1 %1, label %._crit_edge, label %.lr.ph j sum.lr.ph: ; preds = %.lr.ph, %0 %indvars.iv = phi i64 [ %indvars.iv.next, %.lr.ph ], [ 0, %0 ] %.06 = phi i64 [ %4, %.lr.ph ], [ %sum, %0 ] %2 = getelementptr inbounds i64* %array, i64 %indvars.iv %3 = load i64* %2, align 8,!tbaa!0 %4 = add nsw i64 %3, %.06 %indvars.iv.next = add i64 %indvars.iv, 1 %lftr.wideiv = trunc i64 %indvars.iv.next to i32 %exitcond = icmp eq i32 %lftr.wideiv, %n br i1 %exitcond, label %._crit_edge, label %.lr.ph._crit_edge: ; preds = %.lr.ph, %0 %.0.lcssa = phi i64 [ %sum, %0 ], [ %4, %.lr.ph ] ret i64 %.0.lcssa } 8

Translation Snippet Each block within an LLVM function contains a list of instructions in SSA form with type information. Hence we can readily convert a list of instructions into an appropriate let construct. %2 = getelementptr inbounds i64* %array, i64 %indvars.iv %3 = load i64* %2, align 8,!tbaa!0 %4 = add nsw i64 %3, %.06 (let ((%2 (getelementptr %array %indvars.iv 8))) (let ((%3 (load-i64l %2 st))) (let ((%4 (ifix (+ %3 %.06))))...))) 9

Main Translator Algorithm Translator Get Function Names Remove Aliases Promote Blocks to Functions Translate to ACL2 LLVM AST ACL2 Code 10

Remove Aliases Aliases allow new names (e.g., @foo1 and @foo2) to be used for globals and functions @bar = global i32 123 @foo1 = alias i32* @bar @foo2 = alias i32* @bar We eliminate these 11

Main Translator Algorithm Translator Get Function Names Remove Aliases Promote Blocks to Functions Translate to ACL2 LLVM AST ACL2 Code 12

Promote Blocks to Functions As we have seen, LLVM functions often contain inner blocks and branch instructions Each of these blocks is pulled out as a new function. For each block, the phi instructions denote variables that become parameters for that new function. %.06 = phi i64 [ %4, %.lr.ph ], [ %sum, %0 ] The phi instructions also tell us the parameter values that must be used at the new function s call site(s) 13

Dealing with Order of Declarations ACL2 requires functions and constants to be defined before they are used We do a topological sort on each of the call/dependency graphs 14

Main Translator Algorithm Translator Get Function Names Remove Aliases Promote Blocks to Functions Translate to ACL2 LLVM AST ACL2 Code 15

Translate to ACL2 Function declaration ACL2 function stub Function definition with instruction list ACL2 defun construct with a nested let-bound expression Memory ACL2 single-threaded object (stobj) for efficient execution Floating-point number corresponding rational number 1.234 (+ 1 (/2 10) (/3 100) (/4 1000)) 16

Example ACL2.lr.ph: ; preds = %.lr.ph, %0 %indvars.iv = phi i64 [ %indvars.iv.next, %.lr.ph ], [ 0, %0 ] %.06 = phi i64 [ %4, %.lr.ph ], [ %sum, %0 ] %2 = getelementptr inbounds i64* %array, i64 %indvars.iv %3 = load i64* %2, align 8,!tbaa!0 %4 = add nsw i64 %3, %.06 %indvars.iv.next = add i64 %indvars.iv, 1 %lftr.wideiv = trunc i64 %indvars.iv.next to i32 %exitcond = icmp eq i32 %lftr.wideiv, %n br i1 %exitcond, label %._crit_edge, label %.lr.ph (defun @sumarr_%.lr.ph (%.06 %indvars.iv %n %array st) (declare (xargs :stobjs st :guard (and (integerp %.06) (natp %indvars.iv) (natp %n) (natp %array)))) (let ((%2 (getelementptr %array %indvars.iv 8))) (let ((%3 (load-i64l %2 st))) (let ((%4 (ifix (+ %3 %.06)))) (let ((%indvars.iv.next (nfix (+ %indvars.iv 1)))) (let ((%exitcond (if (= %indvars.iv.next %n) 1 0))) (if (= %exitcond 1) %4 (@sumarr_%.lr.ph %4 %indvars.iv.next %n %array st)))))))) 17

Example ACL2 (defun @sumarr_%.lr.ph (%.06 %indvars.iv %n %array st) (declare (xargs :measure (nfix (- (nfix %n) (nfix %indvars.iv))) :stobjs st :guard (and (integerp %.06) (natp %indvars.iv) (natp %n) (natp %array) (< %indvars.iv %n)))) (if (not (and (mbt (integerp %.06)) (mbt (natp %indvars.iv)) (mbt (natp %n)) (mbt (natp %array)) (mbt (< %indvars.iv %n)))) %.06 (let ((%2 (getelementptr %array %indvars.iv 8))) (let ((%3 (load-i64l %2 st))) (let ((%4 (ifix (+ %3 %.06)))) (let ((%indvars.iv.next (nfix (+ %indvars.iv 1)))) (let ((%exitcond (if (= %indvars.iv.next %n) 1 0))) (if (= %exitcond 1) %4 (@sumarr_%.lr.ph %4 %indvars.iv.next %n %array st))))))))) 18

Tail Recursion Note that the translated function for the LLVM loop becomes a tail-recursive function (uses an accumulator) in ACL2. Tail-recursive functions are nice for execution, since an arbitrary number of recursive tail calls can be made without exhausting the stack. However, tail-recursive functions are not convenient for reasoning because they pollute the induction scheme. We can generate non-tail-recursive functions operating over simple lists from tail-recursive, stobj-based functions. This technique is called Hardin s Bridge*. *in memory of Scott Hardin, a civil engineer who designed several physical bridges, and a man who valued rigor. He was the father of one of the authors. 19

Hardin's Bridge defiteration Form: Tail recursive with mutable state (x-tail k res st) defiteration for(k=0; k< *SZ*; k++) { res = op(d[k], res); } Form: Imperative and operating over an array Form: Non-tail-recursive with mutable state (x-iter j res st) 1 2 3 1 2 (defun x (res d) (if (endp d) res (op (car d) (x res (cdr d)))) Form: Non-tail-recursive and operating over a list 3 20

Applying Hardin s Bridge We use the bridge technique to prove that @sumarr_%.lr.ph is equal to the following non-tail-recursive function: (defun sumlist64 (res lst) (declare (xargs :measure (len lst))) (cond ((not (true-listp lst)) (ifix res)) ((endp lst) (ifix res)) (t (+ (ifix (load-i64ll (take 8 lst))) (sumlist64 res (nthcdr 8 lst)))))) Proving properties about @sumarr_%.lr.ph can then be accomplished by proving them instead about sumlist64, a nontail-recursive function better suited for theorem proving 21

Current State of the Translator We use the def::ung macro to automatically define the domain of recursive functions. This allows recursive functions to be admitted in ACL2 without manually adding measures. We added support for modular arithmetic (e.g., fixed width addition). We have rerun the sumarr example with the current translator. No editing of the translated code was needed. Recursive function admitted automatically via def::ung Fixed width addition is preserved in the non-tail-recursive spec. 22

Limitations of the Translator Exceptions Indirect call instructions 23

Future Work Stack analysis and data structure analysis LLVM DataLayout directives (global endianness, alignment/padding specification) LLVM intrinsic functions (there are a large number of these). Variable-length argument lists Attempt to eliminate cycles in the call graph by code rewrites when possible (rather than just blindly emitting mutualrecursion). 24

Conclusion Built an LLVM-to-ACL2 translator Produced an executable ACL2 specification Tail recursion Efficient execution with in-place updates via ACL2 s stobj mechanism Demonstrated that the translation produces working ACL2 code for a recursive example program Presented technique for reasoning about tail-recursive ACL2 functions that execute in-place Utilizes formally proven Hardin s bridge to non-tail-recursive versions operating on lists Tested examples with global variables, pointers, and string constants. 25