Creating Formally Verified Components for Layered Assurance with an LLVM-to-ACL2 Translator

Similar documents
Development of a Translator from LLVM to ACL2

Creating Formally Verified Components for Layered Assurance with an LLVM to ACL2 Translator

Reasoning About LLVM Code Using Codewalker

Visualizing code structure in LLVM

Efficient, Formally Verifiable Data Structures using ACL2 Single-Threaded Objects for High-Assurance Systems

15-411: LLVM. Jan Hoffmann. Substantial portions courtesy of Deby Katz

Induction Schemes. Math Foundations of Computer Science

Introduction to ACL2. CS 680 Formal Methods for Computer Verification. Jeremy Johnson Drexel University

LLVM and IR Construction

Compiler Construction: LLVMlite

Reasoning About Programs Panagiotis Manolios

A Mechanically Checked Proof of the Correctness of the Boyer-Moore Fast String Searching Algorithm

Lecture 2 Overview of the LLVM Compiler

Parameterized Congruences in ACL2

Lecture 3 Overview of the LLVM Compiler

Lecture 3 Overview of the LLVM Compiler

Reasoning About Programs Panagiotis Manolios

Efficient execution in an automated reasoning environment

Symbolic Programming. Dr. Zoran Duric () Symbolic Programming 1/ 89 August 28, / 89

4/1/15 LLVM AND SSA. Low-Level Virtual Machine (LLVM) LLVM Compiler Infrastructure. LL: A Subset of LLVM. Basic Blocks

Verifying Centaur s Floating Point Adder

Translation Validation for a Verified OS Kernel

CIS 341 Final Examination 4 May 2017

Baggy bounds with LLVM

Directions in ISA Specification. Anthony Fox. Computer Laboratory, University of Cambridge, UK

Functional programming with Common Lisp

A Machine-Checked Safety Proof for a CISC-Compatible SFI Technique

CSCC24 Functional Programming Scheme Part 2

COP4020 Programming Languages. Functional Programming Prof. Robert van Engelen

A Brief Introduction to Using LLVM. Nick Sumner

Targeting LLVM IR. LLVM IR, code emission, assignment 4

Single-source SYCL C++ on Xilinx FPGA. Xilinx Research Labs Khronos 2017/11/12 19

CSE 413 Languages & Implementation. Hal Perkins Winter 2019 Structs, Implementing Languages (credits: Dan Grossman, CSE 341)

Practical Formal Verification of Domain-Specific Language Applications

CS 61A Interpreters, Tail Calls, Macros, Streams, Iterators. Spring 2019 Guerrilla Section 5: April 20, Interpreters.

From Bigints to Native Code

Modeling Algorithms in SystemC and ACL2. John O Leary, David Russinoff Intel Corporation

Turning proof assistants into programming assistants

Mechanized Operational Semantics

Progress Report: Term Dags Using Stobjs

Outline. What is semantics? Denotational semantics. Semantics of naming. What is semantics? 2 / 21

Intermediate Representations & Symbol Tables

11/6/17. Functional programming. FP Foundations, Scheme (2) LISP Data Types. LISP Data Types. LISP Data Types. Scheme. LISP: John McCarthy 1958 MIT

ACL2 Challenge Problem: Formalizing BitCryptol April 20th, John Matthews Galois Connections

Introduction to LLVM compiler framework

Functional Programming Languages (FPL)

A Tool for Simplifying ACL2 Definitions

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Common Lisp Fundamentals

A Verifying Core for a Cryptographic Language Compiler

ECE 5775 (Fall 17) High-Level Digital Design Automation. Static Single Assignment

Functional Programming. Big Picture. Design of Programming Languages

A Robust Machine Code Proof Framework for Highly Secure Applications

A Framework for Automatic OpenMP Code Generation

An example of optimization in LLVM. Compiler construction Step 1: Naive translation to LLVM. Step 2: Translating to SSA form (opt -mem2reg)

LECTURE 16. Functional Programming

COS 320. Compiling Techniques

Dynamic Dispatch and Duck Typing. L25: Modern Compiler Design

The Low-Level Bounded Model Checker LLBMC

Proof-Pattern Recognition and Lemma Discovery in ACL2

CSCI-GA Scripting Languages

Type Checking. Outline. General properties of type systems. Types in programming languages. Notation for type rules.

CSC 533: Programming Languages. Spring 2015

Recursion & Iteration

Outline. General properties of type systems. Types in programming languages. Notation for type rules. Common type rules. Logical rules of inference

DEVIRTUALIZATION IN LLVM

Refinement and Theorem Proving

Denotational Semantics. Domain Theory

CSE 413 Midterm, May 6, 2011 Sample Solution Page 1 of 8

Summer 2017 Discussion 10: July 25, Introduction. 2 Primitives and Define

Scheme as implemented by Racket

CS 314 Principles of Programming Languages

The Specification, Verification, and Implementation of a High-Assurance Data Structure: An ACL2 Approach

Reasoning About Programs Panagiotis Manolios

Semantic Analysis. CSE 307 Principles of Programming Languages Stony Brook University

Modern Programming Languages. Lecture LISP Programming Language An Introduction

Functional Languages. Hwansoo Han

CSc 520 Principles of Programming Languages

Homework #3: CMPT-379

CS 415 Midterm Exam Spring 2002

Common LISP-Introduction

FUNKCIONÁLNÍ A LOGICKÉ PROGRAMOVÁNÍ 3. LISP: ZÁKLADNÍ FUNKCE, POUŽÍVÁNÍ REKURZE,

Advanced C Programming

Applied Theorem Proving: Modelling Instruction Sets and Decompiling Machine Code. Anthony Fox University of Cambridge, Computer Laboratory

CS 480. Lisp J. Kosecka George Mason University. Lisp Slides

ECE1387 Exercise 3: Using the LegUp High-level Synthesis Framework

5. Semantic Analysis. Mircea Lungu Oscar Nierstrasz

An Industrially Useful Prover

UMBC CMSC 331 Final Exam

Introduction to LLVM compiler framework

Lecture08: Scope and Lexical Address

Pierce Ch. 3, 8, 11, 15. Type Systems

CSCI 3155: Principles of Programming Languages Exam preparation #1 2007

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Finite Set Theory. based on Fully Ordered Lists. Jared Davis UT Austin. ACL2 Workshop 2004

Background. From my PhD (2009): Verified Lisp interpreter in ARM, x86 and PowerPC machine code

CONCEPTS OF PROGRAMMING LANGUAGES Solutions for Mid-Term Examination

Functional Programming. Pure Functional Programming

Lecture Notes on Loop Optimizations

Functions, Conditionals & Predicates

6.001 Notes: Section 8.1

Transcription:

Creating Formally Verified Components for Layered Assurance with an LLVM-to-ACL2 Translator Jennifer Davis, David Hardin, Jedidiah McClurg December 2013

Introduction Research objectives: Reduce need to trust compiler optimizations by reasoning about post-optimization intermediate representations Reason about small fragments of code written in assembly language We wish to create a library of formally verified software component models for layered assurance. In our current work, the components are in LLVM intermediate form. We wish to translate them to a theorem proving language, such as ACL2. We built an LLVM-to-ACL2 translator 2

Motivating Work Jianzhou Zhao (U Penn) et al. produced several different formalizations of operational semantics for LLVM in Coq. (2012) Intention to produce a verified LLVM compiler Magnus Myreen s (Cambridge) decompilation into logic work (2009) Imperative machine code (PPC, x86, ARM) -> HOL4 Extracts functional behavior of imperative code Assures decompilation process is sound Andrew Appel (Princeton) observed that SSA is functional programming (1998) This inspired us to build a translator from LLVM to ACL2 3

LLVM LLVM is the intermediate form for many common compilers, including clang. LLVM code generation targets exist for a wide variety of machines LLVM is a register-based intermediate in Static Single Assignment (SSA) form (each variable is assigned exactly once). An LLVM program consists of a list of entities. There are eight types including: function declarations function definitions Our software component models are created from code that has been compiled into the LLVM intermediate form 4

ACL2 A Computational Logic for Applicative Common Lisp (ACL2) Highly automated theorem proving system Functional language with admission criteria Executable subset of language Rich set of legal identifiers (@foo, %.06, @bar_%.0) Side-effect free subset of Lisp, so it inherits Lisp peculiarities Function definition: (defun funname (parm1 parm2 parm3) (<body>)) Function invocation: (funname x y z) let binds variables to values within a function body Multiway conditionals use the cond form Lists are the fundamental data structure ACL2 supports integers and rationals Lisp predicate names are traditionally given a suffix of p 5

LLVM-to-ACL2 Translation Toolchain theorem prover 6

Example C Source long sumarr(unsigned int n, long sum, long *array) { unsigned int j = 0; for (j = 0; j < n; j++) { sum += array[j]; } return sum; } We can produce LLVM from C source as follows: clang O4 S emit-llvm sumarr.c 7

Example LLVM C Source: long sumarr(unsigned int n, long sum, long *array) { unsigned int j = 0; for (j = 0; j < n; j++) { sum += array[j];} return sum;} define i64 @sumarr(i32 %n, i64 %sum, i64* nocapture %array) nounwind uwtable readonly { %1 = icmp eq i32 %n, 0 br i1 %1, label %._crit_edge, label %.lr.ph j sum.lr.ph: ; preds = %.lr.ph, %0 %indvars.iv = phi i64 [ %indvars.iv.next, %.lr.ph ], [ 0, %0 ] %.06 = phi i64 [ %4, %.lr.ph ], [ %sum, %0 ] %2 = getelementptr inbounds i64* %array, i64 %indvars.iv %3 = load i64* %2, align 8,!tbaa!0 %4 = add nsw i64 %3, %.06 %indvars.iv.next = add i64 %indvars.iv, 1 %lftr.wideiv = trunc i64 %indvars.iv.next to i32 %exitcond = icmp eq i32 %lftr.wideiv, %n br i1 %exitcond, label %._crit_edge, label %.lr.ph._crit_edge: ; preds = %.lr.ph, %0 %.0.lcssa = phi i64 [ %sum, %0 ], [ %4, %.lr.ph ] ret i64 %.0.lcssa } 8

Translation Snippet Each block within an LLVM function contains a list of instructions in SSA form with type information. Hence we can readily convert a list of instructions into an appropriate let construct. %2 = getelementptr inbounds i64* %array, i64 %indvars.iv %3 = load i64* %2, align 8,!tbaa!0 %4 = add nsw i64 %3, %.06 (let ((%2 (getelementptr %array %indvars.iv 8))) (let ((%3 (load-i64l %2 st))) (let ((%4 (ifix (+ %3 %.06))))...))) 9

Main Translator Algorithm Translator Get Function Names Remove Aliases Promote Blocks to Functions Translate to ACL2 LLVM AST ACL2 Code 10

Remove Aliases Aliases allow new names (e.g., @foo1 and @foo2) to be used for globals and functions @bar = global i32 123 @foo1 = alias i32* @bar @foo2 = alias i32* @bar We eliminate these 11

Main Translator Algorithm Translator Get Function Names Remove Aliases Promote Blocks to Functions Translate to ACL2 LLVM AST ACL2 Code 12

Promote Blocks to Functions As we have seen, LLVM functions often contain inner blocks and branch instructions Each of these blocks is pulled out as a new function. For each block, the phi instructions denote variables that become parameters for that new function. %.06 = phi i64 [ %4, %.lr.ph ], [ %sum, %0 ] The phi instructions also tell us the parameter values that must be used at the new function s call site(s) 13

Dealing with Order of Declarations ACL2 requires functions and constants to be defined before they are used We do a topological sort on each of the call/dependency graphs 14

Main Translator Algorithm Translator Get Function Names Remove Aliases Promote Blocks to Functions Translate to ACL2 LLVM AST ACL2 Code 15

Translate to ACL2 Function declaration ACL2 function stub Function definition with instruction list ACL2 defun construct with a nested let-bound expression Memory ACL2 single-threaded object (stobj) for efficient execution Floating-point number corresponding rational number 1.234 (+ 1 (/2 10) (/3 100) (/4 1000)) 16

Example ACL2.lr.ph: ; preds = %.lr.ph, %0 %indvars.iv = phi i64 [ %indvars.iv.next, %.lr.ph ], [ 0, %0 ] %.06 = phi i64 [ %4, %.lr.ph ], [ %sum, %0 ] %2 = getelementptr inbounds i64* %array, i64 %indvars.iv %3 = load i64* %2, align 8,!tbaa!0 %4 = add nsw i64 %3, %.06 %indvars.iv.next = add i64 %indvars.iv, 1 %lftr.wideiv = trunc i64 %indvars.iv.next to i32 %exitcond = icmp eq i32 %lftr.wideiv, %n br i1 %exitcond, label %._crit_edge, label %.lr.ph (defun @sumarr_%.lr.ph (%.06 %indvars.iv %n %array st) (declare (xargs :stobjs st :guard (and (integerp %.06) (natp %indvars.iv) (natp %n) (natp %array)))) (let ((%2 (getelementptr %array %indvars.iv 8))) (let ((%3 (load-i64l %2 st))) (let ((%4 (ifix (+ %3 %.06)))) (let ((%indvars.iv.next (nfix (+ %indvars.iv 1)))) (let ((%exitcond (if (= %indvars.iv.next %n) 1 0))) (if (= %exitcond 1) %4 (@sumarr_%.lr.ph %4 %indvars.iv.next %n %array st)))))))) 17

Example ACL2 (defun @sumarr_%.lr.ph (%.06 %indvars.iv %n %array st) (declare (xargs :measure (nfix (- (nfix %n) (nfix %indvars.iv))) :stobjs st :guard (and (integerp %.06) (natp %indvars.iv) (natp %n) (natp %array) (< %indvars.iv %n)))) (if (not (and (mbt (integerp %.06)) (mbt (natp %indvars.iv)) (mbt (natp %n)) (mbt (natp %array)) (mbt (< %indvars.iv %n)))) %.06 (let ((%2 (getelementptr %array %indvars.iv 8))) (let ((%3 (load-i64l %2 st))) (let ((%4 (ifix (+ %3 %.06)))) (let ((%indvars.iv.next (nfix (+ %indvars.iv 1)))) (let ((%exitcond (if (= %indvars.iv.next %n) 1 0))) (if (= %exitcond 1) %4 (@sumarr_%.lr.ph %4 %indvars.iv.next %n %array st))))))))) 18

Tail Recursion Note that the translated function for the LLVM loop becomes a tail-recursive function (uses an accumulator) in ACL2. Tail-recursive functions are nice for execution, since an arbitrary number of recursive tail calls can be made without exhausting the stack. However, tail-recursive functions are not convenient for reasoning because they pollute the induction scheme. We can generate non-tail-recursive functions operating over simple lists from tail-recursive, stobj-based functions. This technique is called Hardin s Bridge*. *in memory of Scott Hardin, a civil engineer who designed several physical bridges, and a man who valued rigor. He was the father of one of the authors. 19

Hardin's Bridge defiteration Form: Tail recursive with mutable state (x-tail k res st) defiteration for(k=0; k< *SZ*; k++) { res = op(d[k], res); } Form: Imperative and operating over an array Form: Non-tail-recursive with mutable state (x-iter j res st) 1 2 3 1 2 (defun x (res d) (if (endp d) res (op (car d) (x res (cdr d)))) Form: Non-tail-recursive and operating over a list 3 20

Applying Hardin s Bridge We use the bridge technique to prove that @sumarr_%.lr.ph is equal to the following non-tail-recursive function: (defun sumlist64 (res lst) (declare (xargs :measure (len lst))) (cond ((not (true-listp lst)) (ifix res)) ((endp lst) (ifix res)) (t (+ (ifix (load-i64ll (take 8 lst))) (sumlist64 res (nthcdr 8 lst)))))) Proving properties about @sumarr_%.lr.ph can then be accomplished by proving them instead about sumlist64, a nontail-recursive function better suited for theorem proving 21

Current State of the Translator We use the def::ung macro to automatically define the domain of recursive functions. This allows recursive functions to be admitted in ACL2 without manually adding measures. We added support for modular arithmetic (e.g., fixed width addition). We have rerun the sumarr example with the current translator. No editing of the translated code was needed. Recursive function admitted automatically via def::ung Fixed width addition is preserved in the non-tail-recursive spec. 22

Limitations of the Translator Exceptions Indirect call instructions 23

Future Work Stack analysis and data structure analysis LLVM DataLayout directives (global endianness, alignment/padding specification) LLVM intrinsic functions (there are a large number of these). Variable-length argument lists Attempt to eliminate cycles in the call graph by code rewrites when possible (rather than just blindly emitting mutualrecursion). 24

Conclusion Built an LLVM-to-ACL2 translator Produced an executable ACL2 specification Tail recursion Efficient execution with in-place updates via ACL2 s stobj mechanism Demonstrated that the translation produces working ACL2 code for a recursive example program Presented technique for reasoning about tail-recursive ACL2 functions that execute in-place Utilizes formally proven Hardin s bridge to non-tail-recursive versions operating on lists Tested examples with global variables, pointers, and string constants. 25