Verifying Optimizations using SMT Solvers

Similar documents
An Introduction to Satisfiability Modulo Theories

Program analysis and constraint solvers. Edgar Barbosa SyScan360 Beijing

Alive: Provably Correct InstCombine Optimizations

Lecture Notes on Real-world SMT

CS241 Computer Organization Spring 2015 IA

CPS104 Recitation: Assembly Programming

OptiCode: Machine Code Deobfuscation for Malware Analysis

CODE ANALYSIS CARPENTRY

CPEG421/621 Tutorial

You may work with a partner on this quiz; both of you must submit your answers.

Assembly Language: IA-32 Instructions

Sungkyunkwan University

Efficiently Solving Bit-Vector Problems Using Model Checkers

Machine-Level Programming II: Control and Arithmetic

4) C = 96 * B 5) 1 and 3 only 6) 2 and 4 only

CS 3843 Final Exam Fall 2012

Assembly Language: Function Calls

Credits to Randy Bryant & Dave O Hallaron

Questions about last homework? (Would more feedback be useful?) New reading assignment up: due next Monday

Machine Programming 1: Introduction

Assembly Language: Function Calls" Goals of this Lecture"

Assembly level Programming. 198:211 Computer Architecture. (recall) Von Neumann Architecture. Simplified hardware view. Lecture 10 Fall 2012

Question 4.2 2: (Solution, p 5) Suppose that the HYMN CPU begins with the following in memory. addr data (translation) LOAD 11110

Assembly Language: Function Calls" Goals of this Lecture"

Second Part of the Course

Assembly Language: Function Calls. Goals of this Lecture. Function Call Problems

CS241 Computer Organization Spring Loops & Arrays

Process Layout and Function Calls

CIS 341 Midterm February 28, Name (printed): Pennkey (login id): SOLUTIONS

Process Layout, Function Calls, and the Heap

ASSEMBLY I: BASIC OPERATIONS. Jo, Heeseung

The Hardware/Software Interface CSE351 Spring 2013

System Programming and Computer Architecture (Fall 2009)

CS241 Computer Organization Spring Addresses & Pointers

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2016 Lecture 12

CS61, Fall 2012 Midterm Review Section

Important From Last Time

Machine-Level Programming II: Arithmetic & Control /18-243: Introduction to Computer Systems 6th Lecture, 5 June 2012

Important From Last Time

CSE 351: Week 4. Tom Bergan, TA

SMT-LIB in Z. Copyright c : Lemma 1 Ltd Lemma 1 Ltd. 27, Brook St. Twyford Berks RG10 9NX. Abstract

Procedure Calls. Young W. Lim Sat. Young W. Lim Procedure Calls Sat 1 / 27

Turning C into Object Code Code in files p1.c p2.c Compile with command: gcc -O p1.c p2.c -o p Use optimizations (-O) Put resulting binary in file p

metasmt: A Unified Interface to SMT-LIB2

administrivia today start assembly probably won t finish all these slides Assignment 4 due tomorrow any questions?

GNATprove a Spark2014 verifying compiler Florian Schanda, Altran UK

CS 33: Week 3 Discussion. x86 Assembly (v1.0) Section 1G

Assembly I: Basic Operations. Jo, Heeseung

Machine-Level Programming II: Arithmetic & Control. Complete Memory Addressing Modes

CS61 Section Solutions 3

16.317: Microprocessor Systems Design I Fall 2014

Program Exploitation Intro

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 21: Generating Pentium Code 10 March 08

Page 1. Today. Important From Last Time. Is the assembly code right? Is the assembly code right? Which compiler is right?

Assembly I: Basic Operations. Computer Systems Laboratory Sungkyunkwan University

Homework 0: Given: k-bit exponent, n-bit fraction Find: Exponent E, Significand M, Fraction f, Value V, Bit representation

CS 31: Intro to Systems Functions and the Stack. Martin Gagne Swarthmore College February 23, 2016

Assembly Programmer s View Lecture 4A Machine-Level Programming I: Introduction

CS 2505 Computer Organization I Test 2. Do not start the test until instructed to do so! printed

CS 2505 Computer Organization I Test 2. Do not start the test until instructed to do so! printed

Function Calls COS 217. Reading: Chapter 4 of Programming From the Ground Up (available online from the course Web site)

CONSTRAINT SOLVING. Lecture at NYU Poly

CSCI 2121 Computer Organization and Assembly Language PRACTICE QUESTION BANK

Assembly Language Programming Introduction

CS 261 Fall Mike Lam, Professor. x86-64 Control Flow

Assembly II: Control Flow. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Chapter 4 Processor Architecture: Y86 (Sections 4.1 & 4.3) with material from Dr. Bin Ren, College of William & Mary

Instructor: Alvin R. Lebeck

SOEN228, Winter Revision 1.2 Date: October 25,

Computer Science 104:! Y86 & Single Cycle Processor Design!

complement) Multiply Unsigned: MUL (all operands are nonnegative) AX = BH * AL IMUL BH IMUL CX (DX,AX) = CX * AX Arithmetic MUL DWORD PTR [0x10]

Register Allocation, iii. Bringing in functions & using spilling & coalescing

CS 2505 Computer Organization I Test 2. Do not start the test until instructed to do so! printed

Machine Programming 4: Structured Data

Machine-level Representation of Programs. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CS 31: Intro to Systems ISAs and Assembly. Kevin Webb Swarthmore College February 9, 2016

CSE2421 FINAL EXAM SPRING Name KEY. Instructions: Signature

ASSEMBLY II: CONTROL FLOW. Jo, Heeseung

CS 31: Intro to Systems ISAs and Assembly. Kevin Webb Swarthmore College September 25, 2018

UW CSE 351, Winter 2013 Midterm Exam

Instruction Set Architectures

Procedure Calls. Young W. Lim Mon. Young W. Lim Procedure Calls Mon 1 / 29

Binary Program Analysis: Theory and Practice

Improving Local Search for Bit-Vector Logics in SMT with Path Propagation

Representation of Information

CSE 351 Midterm Exam

Machine Programming 2: Control flow

CS , Fall 2004 Exam 1

What the CPU Sees Basic Flow Control Conditional Flow Control Structured Flow Control Functions and Scope. C Flow Control.

ASSEMBLY III: PROCEDURES. Jo, Heeseung

Assembly III: Procedures. Jo, Heeseung

x86 Assembly Crash Course Don Porter

The SMT-LIB 2 Standard: Overview and Proposed New Theories

CS220 Logic Design AS06-Advanced Assembly. AS06-Advanced Assembly Shift Operations. AS06-Advanced Assembly Shift Operations

Basic Pentium Instructions. October 18

Machine-Level Programming II: Control Flow

CNIT 127: Exploit Development. Ch 1: Before you begin. Updated

Instruction Set Architectures

Satisfiability Modulo Theories Competition (SMT-COMP) 2012: Rules and Procedures

CSC 2400: Computer Systems. Towards the Hardware: Machine-Level Representation of Programs

Transcription:

Verifying Optimizations using SMT Solvers Nuno Lopes

Why verify optimizations? Catch bugs before they even exist Corner cases are hard to debug Time spent in additional verification step pays off Technology available today, with more to follow

Not a replacement for testing Beware of bugs in the above code; I have only proved it correct, not tried it Donald Knuth, 1977

Outline SAT/SMT Solvers InstCombine Assembly ConstantRange Future developments

Outline SAT/SMT Solvers InstCombine Assembly ConstantRange Future developments

SAT Solvers A SAT solver takes a Boolean formula as input: a b c b c And returns: SAT, if the formula is satisfiable UNSAT, if the formula is unsatisfiable If SAT, we also get a model: a = true, b = false, c = false

SMT Solvers Generalization of SAT solvers Variables can take other domains: Booleans Bit-vectors Reals (linear / non-linear) Integers (linear / non-linear) Arrays Data types Floating point Uninterpreted functions (UFs)

Available SMT Solvers Boolector CVC4 MathSAT 5 STP Z3 (http://rise4fun.com/z3/)

Bit-Vector Theory Operations between bit-vector variables: Add/Sub/Mul/Div/Rem Shift and rotate Zero/sign extend Bitwise And/or/neg/not/nand/xor/ Comparison: ge/le/ Concat and extract Includes sign/unsigned variants Variables of fixed bit width

Bit-vector theory: example Let s prove that the following are equivalent: x 1 &x = 0 x&( x) = x Thinking SMT: Both formulas give the same result for all x There isn t a value for x such that the result of the formulas differs

Example in SMT-LIB 2 (declare-fun x () (_ BitVec 32)) (assert (not (= ; x&(x-1) == 0 (= (bvand x (bvsub x #x00000001)) #x00000000) ))) ; x&(-x) == x (= (bvand x (bvneg x)) x)) (check-sat) > unsat http://rise4fun.com/z3/2yfz

Example: really testing for power of 2? (declare-fun x () (_ BitVec 4)) (assert (not (= ; x&(x-1) == 0 (= (bvand x (bvsub x #x1)) #x0) ))) ; x == 1 or x == 2 or x == 4 or x == 8 (or (= x #x1) (= x #x2) (= x #x4) (= x #x8)) (check-sat) (get-model) > sat > (model (define-fun x () (_ BitVec 4) #x0)) http://rise4fun.com/z3/qgl2

Outline SAT/SMT Solvers InstCombine Assembly ConstantRange Future developments

InstCombine Optimizes sequences of instructions Perfect target for verification with SMT solvers

InstCombine Example ; (A ^ -1) & (1 << B)!= 0 %neg = xor i32 %A, -1 %shl = shl i32 1, %B %and = and i32 %neg, %shl %cmp = icmp ne i32 %and, 0 ; (1 << B) & A == 0 %shl = shl i32 1, %B %and = and i32 %shl, %A %cmp = icmp eq i32 %and, 0

InstCombine Example (declare-fun A () (_ BitVec 32)) (declare-fun B () (_ BitVec 32)) (assert (not (= ; (1 << B) & (A ^ -1)!= 0 (not (= (bvand (bvshl #x00000001 B) (bvxor > sata #xffffffff)) #x00000000)) > (model ; (1 << B) & A == 0 (define-fun A () (_ BitVec 32) (= (bvand (bvshl #x00000001 #x00000000) B) A) #x00000000) (define-fun B () (_ BitVec 32) ))) #x00020007) ) (check-sat) http://rise4fun.com/z3/omrp

InstCombine Example (declare-fun A () (_ BitVec 32)) (declare-fun B () (_ BitVec 32)) (assert (bvule B #x0000001f)) (assert (not (= ; (1 << B) & (A ^ -1)!= 0 (not (= (bvand (bvshl #x00000001 B) (bvxor A #xffffffff)) #x00000000)) ))) ; (1 << B) & A == 0 (= (bvand (bvshl #x00000001 B) A) #x00000000) (check-sat) > unsat http://rise4fun.com/z3/pj2b

Outline SAT/SMT Solvers InstCombine Assembly ConstantRange Future developments

IR to Assembly PR16426: poor code for multiple builtin_*_overflow() // returns x * y + z // 17 instructions on X86 unsigned foo(unsigned x, unsigned y, unsigned z) { unsigned res; if ( builtin_umul_overflow(x, y, &res) builtin_uadd_overflow(res, z, &res)) { return 0; } return res; }

PR16426: IR define i32 foo(i32 %x, i32 %y, i32 %z) { entry: %0 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %x, i32 %y) %1 = extractvalue { i32, i1 } %0, 1 %2 = extractvalue { i32, i1 } %0, 0 %3 = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %2, i32 %z) %4 = extractvalue { i32, i1 } %3, 1 %or3 = or i1 %1, %4 br i1 %or3, label %return, label %if.end if.end: %5 = extractvalue { i32, i1 } %3, 0 br label %return return: %retval.0 = phi i32 [ %5, %if.end ], [ 0, %entry ] ret i32 %retval.0 }

PR16426: Current X86 Assembly (17 instructions) foo: # BB#0: # %entry pushl %esi movl 8(%esp), %eax mull 12(%esp) pushfl popl %esi addl 16(%esp), %eax setb %dl xorl %ecx, %ecx pushl %esi popfl jo.lbb0_3 # BB#1: # %entry testb %dl, %dl jne.lbb0_3 # BB#2: # %if.end movl %eax, %ecx.lbb0_3: # %return movl %ecx, %eax popl %esi ret

PR16426: Proposed X86 Assembly (8 instructions) movl 8(%esp), %eax mull 12(%esp) addl 16(%esp), %eax adcl %edx, %edx jne.lbb0_1.lbb0_2: # result already in EAX ret.lbb0_1: xorl %eax, %eax jmp.lbb0_2

PR16426: Michael says my proposal has a bug movl 8(%esp), %eax mull 12(%esp) addl 16(%esp), %eax adcl 0, %edx jne.lbb0_1.lbb0_2: # result already in EAX ret.lbb0_1: xorl %eax, %eax jmp.lbb0_2

PR16426: Asm in SMT ; movl 8(%esp), %eax ; mull 12(%esp) (assert (let ((mul (bvmul ((_ zero_extend 32) x) ((_ zero_extend 32) y)))) (and ))) (= EAX ((_ extract 31 0) mul)) (= EDX ((_ extract 63 32) mul))

PR16426: Asm in SMT ; addl 16(%esp), %eax (assert (and )) (= EAX2 (bvadd EAX z)) (= CF ((_ extract 32 32) (bvadd ((_ zero_extend 1) EAX) ((_ zero_extend 1) z))))

PR16426: Asm in SMT ; adcl %edx, %edx (assert (and (= EDX2 (bvadd EDX EDX ((_ zero_extend 31) CF))) (= ZF (= EDX2 #x00000000)) ))

PR16426: Asm in SMT jne.lbb0_2: ret.lbb0_1: xorl jmp.lbb0_1 # Jump if ZF=0 %eax, %eax.lbb0_2 (assert (= asm_result (ite ZF EAX2 #x00000000) ))

PR16426: IR in SMT (assert (= llvm_result (let ((overflow (or (bvugt (bvmul ((_ zero_extend 32) x) ((_ zero_extend 32) y)) #x00000000ffffffff) (bvugt (bvadd ((_ zero_extend 4) (bvmul x y)) ((_ zero_extend 4) z)) #x0ffffffff)))) (ite overflow #x00000000 (bvadd (bvmul x y) z))) ))

PR16426: Correctness (declare-fun x () (_ BitVec 32)) (declare-fun y () (_ BitVec 32)) (declare-fun z () (_ BitVec 32)) (assert (not (= ))) asm_result llvm_result (check-sat) (get-model) > sat > (model (define-fun z () (_ BitVec 32) #x15234d22) (define-fun y () (_ BitVec 32) #x84400100) (define-fun x () (_ BitVec 32) #xf7c5ebbe) ) http://rise4fun.com/z3/vixt

Outline SAT/SMT Solvers InstCombine Assembly ConstantRange Future developments

ConstantRange Data-structure that represents ranges of integers with overflow semantics (i.e., bit-vectors) [0,5) from 0 to 4 [5,2) from 5 to INT_MAX or from 0 to 1 Used by Lazy Value Info (LVI), and Correlated Value Propagation (CVP) Several bugs in the past (correctness and optimality)

ConstantRange::signExtend() 8 lines of C++ Is it correct?

Auxiliary definitions in SMT (define-sort Integer () (_ BitVec 32)) (define-sort Interval () (_ BitVec 64)) (define-sort Interval2 () (_ BitVec 72)) (define-fun L ((I Interval)) Integer ((_ extract 63 32) I)) (define-fun H ((I Interval)) Integer ((_ extract 31 0) I)) (define-fun isfullset ((I Interval)) Bool (and (= (L I) (H I)) (= (L I) #xffffffff)))

signextend() in SMT (define-fun signextend ((I Interval)) Interval2 ) (ite ) (isemptyset I) EmptySet (ite (or (isfullset I) (issignwrappedset I)) ) (getinterval #xf80000000 (bvadd #x07fffffff #x000000001)) (getinterval ((_ sign_extend 4) (L I)) ((_ sign_extend 4) (H I)))

Correctness of signextend() (declare-fun n () Integer) (declare-fun N () Interval) (assert (and (contains n N) ) ) (not (contains2 ((_ sign_extend 4) n) (signextend N))) (check-sat) > unsat http://rise4fun.com/z3/wlfx

Optimality of signextend() It s correct, cool, but.. Does signextend() always returns the tightest range? Or are we missing optimization opportunities?

Optimality of signextend() in SMT (declare-fun N () Interval) (declare-fun R () Interval2) (assert (bvult (getsetsize R) (getsetsize (signextend N)))) (assert > sat (forall ((n Integer)) > (model (=> (contains n N) (define-fun N () (_ BitVec 64) (contains2 ((_ sign_extend #x8000010080000000) 4) n) R)) )) (define-fun R () (_ BitVec 72) #xe010000004009e0d04) ) (check-sat-using qfbv) http://rise4fun.com/z3/wlfx

Debugging with SMT models (eval (L N)) (eval (H N)) (eval (L2 R)) (eval (H2 R)) (eval (L2 (signextend N))) (eval (H2 (signextend N))) #x80000100 #x80000000 #xe01000000 #x4009e0d04 #xf80000100 #xf80000000 http://rise4fun.com/z3/wlfx

Optimality of signextend() Fixed --- llvm/trunk/lib/support/constantrange.cpp 2013/10/28 16:52:38 193523 +++ llvm/trunk/lib/support/constantrange.cpp 2013/10/31 19:53:53 193795 @@ -445,6 +445,11 @@ unsigned SrcTySize = getbitwidth(); assert(srctysize < DstTySize && "Not a value extension"); + + // special case: [X, INT_MIN) -- not really wrapping around + if (Upper.isMinSignedValue()) + return ConstantRange(Lower.sext(DstTySize), Upper.zext(DstTySize)); + if (isfullset() issignwrappedset()) { return ConstantRange(APInt::getHighBitsSet(DstTySize,DstTySize-SrcTySize+1), APInt::getLowBitsSet(DstTySize, SrcTySize-1) + 1); http://rise4fun.com/z3/4pl9s http://rise4fun.com/z3/ogaw

Outline SAT/SMT Solvers InstCombine Assembly ConstantRange Future developments

Future work Automatic translation from *.cpp to *.smt2 Recursive functions in SMT (Horn clauses) Floating point in SMT (for OpenCL?) Verify more complex stuff (SCEV,?) Termination checking: do InstCombine and LegalizeDAG (and other canonicalization passes) terminate for all inputs?

Conclusion Software verification (namely SMT solvers) is ready to verify some parts of compilers InstCombine, DAG Combiner, LegalizeDAG, etc can be verified today Ideal for answering What if I do this change..? questions Syntax of SMT-LIB 2: Bit-vectors: http://smtlib.cs.uiowa.edu/logics/qf_bv.smt2 Arrays: http://smtlib.cs.uiowa.edu/theories/arraysex.smt2 Floating point: https://groups.google.com/forum/#!forum/smt-fp Stack Overflow: #smt, #z3

Trip sponsored by:

Título da apresentação