The Correctness-Security Gap in Compiler Optimization

Similar documents
REFLECTIONS ON TRUSTING BITCODE

Sandwiches for everyone

Reflections on using C(++) Root Cause Analysis

System Administration and Network Security

Outline. Classic races: files in /tmp. Race conditions. TOCTTOU example. TOCTTOU gaps. Vulnerabilities in OS interaction

CS 61: Systems programming and machine organization. Prof. Stephen Chong November 15, 2010

Important From Last Time

Important From Last Time

New York University CSCI-UA : Advanced Computer Systems: Spring 2016 Midterm Exam

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

Turning proof assistants into programming assistants

Page 1. Today. Important From Last Time. Is the assembly code right? Is the assembly code right? Which compiler is right?

Test- Case Reduc-on for C Compiler Bugs. John Regehr, Yang Chen, Pascal Cuoq, Eric Eide, Chucky Ellison, Xuejun Yang

Type checking. Jianguo Lu. November 27, slides adapted from Sean Treichler and Alex Aiken s. Jianguo Lu November 27, / 39

Final CSE 131B Spring 2004

Homework 3 CS161 Computer Security, Fall 2008 Assigned 10/07/08 Due 10/13/08

CSE 127 Computer Security

Verifying Concurrent Programs

4. Jump to *RA 4. StackGuard 5. Execute code 5. Instruction Set Randomization 6. Make system call 6. System call Randomization

Computer Aided Cryptographic Engineering

FaCT. A Flexible, Constant-Time Programming Language

CS-527 Software Security

Compilers. Lecture 2 Overview. (original slides by Sam

C11 Compiler Mappings: Exploration, Verification, and Counterexamples

SoK: Eternal War in Memory

MC: Meta-level Compilation

Undefinedness and Non-determinism in C

What is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573

A Study on the Preservation on Cryptographic Constant-Time Security in the CompCert Compiler

Memory Consistency Models

Page 1. Today. Last Time. Is the assembly code right? Is the assembly code right? Which compiler is right? Compiler requirements CPP Volatile

Verified compilers. Guest lecture for Compiler Construction, Spring Magnus Myréen. Chalmers University of Technology

Usually, target code is semantically equivalent to source code, but not always!

Lecture 03 Bits, Bytes and Data Types

Programming Language Implementation

T Jarkko Turkulainen, F-Secure Corporation

Program Testing via Symbolic Execution

An Evil Copy: How the Loader Betrays You

Language Security. Lecture 40

Constant-time programming in C

Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY Fall 2011.

Computer Systems Lecture 9

Generating Low-Overhead Dynamic Binary Translators. Mathias Payer and Thomas R. Gross Department of Computer Science ETH Zürich

Software security, secure programming

Browser Security Guarantees through Formal Shim Verification

Lecture Notes for 04/04/06: UNTRUSTED CODE Fatima Zarinni.

Under the Compiler's Hood: Supercharge Your PLAYSTATION 3 (PS3 ) Code. Understanding your compiler is the key to success in the gaming world.

Programming in C and C++

X86 Review Process Layout, ISA, etc. CS642: Computer Security. Drew Davidson

Adam Chlipala University of California, Berkeley ICFP 2006

Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY Fall Quiz I

Martin Kruliš, v

Translation Validation for a Verified OS Kernel

Overview of Compiler. A. Introduction

Group B Assignment 9. Code generation using DAG. Title of Assignment: Problem Definition: Code generation using DAG / labeled tree.

Other array problems. Integer overflow. Outline. Integer overflow example. Signed and unsigned

CSC 405 Computer Security Shellcode

An Abstract Stack Based Approach to Verified Compositional Compilation to Machine Code

Compilers Crash Course

Buffer Overflows Defending against arbitrary code insertion and execution

CS 161 Computer Security

Formal C semantics: CompCert and the C standard

JavaScript & Security get married. Yan Zhu NCC Group SF Open Forum 9/17/15

COSC345. Programs are Data Processing and Generating code

Overview. Constructors and destructors Virtual functions Single inheritance Multiple inheritance RTTI Templates Exceptions Operator Overloading

Finding User/Kernel Pointer Bugs with Type Inference p.1

Two s Complement Review. Two s Complement Review. Agenda. Agenda 6/21/2011

Simply-Typed Lambda Calculus

x86 architecture et similia

Code optimization. Have we achieved optimal code? Impossible to answer! We make improvements to the code. Aim: faster code and/or less space

CS 161 Computer Security

kguard++: Improving the Performance of kguard with Low-latency Code Inflation

EECS 213 Introduction to Computer Systems Dinda, Spring Homework 3. Memory and Cache

What is concurrency? Concurrency. What is parallelism? concurrency vs parallelism. Concurrency: (the illusion of) happening at the same time.

Field Analysis. Last time Exploit encapsulation to improve memory system performance

5) Attacker causes damage Different to gaining control. For example, the attacker might quit after gaining control.

USC 227 Office hours: 3-4 Monday and Wednesday CS553 Lecture 1 Introduction 4

secubt Hacking the Hackers with User Space Virtualization

Concurrency. Johan Montelius KTH

Dangerous Optimizations and the Loss of Causality

Towards Automatic Generation of Vulnerability- Based Signatures

3. Logical Values. Boolean Functions; the Type bool; logical and relational operators; shortcut evaluation

Verification of an ML compiler. Lecture 1: An introduction to compiler verification

Compiler Optimization

W4118: PC Hardware and x86. Junfeng Yang

Automatic program generation for detecting vulnerabilities and errors in compilers and interpreters

This time. Defenses and other memory safety vulnerabilities. Everything you ve always wanted to know about gdb but were too afraid to ask

Randomized Stress-Testing of Link-Time Optimizers

Machine Programming 3: Procedures

6.035 Project 3: Unoptimized Code Generation. Jason Ansel MIT - CSAIL

Software Testing CS 408. Lecture 10: Compiler Testing 2/15/18

3. Logical Values. Our Goal. Boolean Values in Mathematics. The Type bool in C++

Komodo: Using Verification to Disentangle Secure-Enclave Hardware from Software

CS 61C: Great Ideas in Computer Architecture Introduction to C

3. Logical Values. Our Goal. Boolean Values in Mathematics. The Type bool in C++

Formal verification of an optimizing compiler

Analyzing Systems. Steven M. Bellovin November 26,

Communicating with People (2.8)

Adapting applications to exploit virtualization management knowledge

Secure Programming Lecture 3: Memory Corruption I (Stack Overflows)

Transcription:

The Correctness-Security Gap in Compiler Optimization Vijay D Silva, Mathias Payer, Dawn Song LangSec 2015

1 Compilers and Trust 2 Correctness vs. Security by Example 3 Correctness vs. Security, Formally 4 Future Directions 2

TURING AWARD LECTURE Reflections on Trusting Trust To what extent should one trust a statement that a program is free of Trojan horses? Perhaps it is more important to trust the people who wrote the software. KEN THOMPSON pile" is called to compile the next line of source. Figure 3.2 shows a simple modification to the compiler that will deliberately miscompile source whenever a particular pattern is matched. If this were not deliberate, it would be called a compiler "bug." Since it is deliberate, it should be called a "Trojan horse." The actual bug I planted in the compiler would match code in the UNIX "login" command. The replacement code would miscompile the login command so that it would accept either the intended encrypted password or a particular known password. Thus if this code were installed in binary and the binary were used MORAL The moral is obvious. You can't trust code that you did not totally create yourself. (Especially code from companies that employ people like me.) No amount of source-level verification or scrutiny will protect you from using untrusted code. In demonstrating the possibility of this kind of attack, I picked on the C compiler. I could have picked on any program-handling program such as an assembler, a loader, or even hardware microcode. As the level of program gets lower, these bugs will be harder and harder to detect. A well-installed microcode bug will be almost impossible to detect. After trying to convince you that I cannot be trusted,

Testing Verification Whalley, 94, vpoiso McKeeman, 98 McPeak & Wilkerson, 03, Delta Goerigk, 00 (in ACL2) Lacey et al. 02 Lerner et al. 05, Rhodium Yang et al. 11, C-Smith Regehr et al. 12,C-Reduce Leroy et al. 06, CompCert Tatlock & Lerner, 10, XCert Taming Compiler Fuzzers, PLDI 13 Formal Verification of a Realistic Compiler, CACM 08

1 A very brief history of compiler correctness 2 Correctness vs. Security by Example 3 Correctness vs. Security, Formally 4 Future Directions 8

Dead Store Elimination #include <string> using std::string; #include <memory> // The specifics of this function are // not important for demonstrating this bug. const string getpasswordfromuser() const; bool ispasswordcorrect() { bool ispasswordcorrect = false; string Password("password"); if(password == getpasswordfromuser()) { ispasswordcorrect = true; // This line is removed from the optimized code // even though it secures the code by wiping // the password from memory. memset(password, 0, sizeof(password)); return ispasswordcorrect; From the GCC mailing list, 2002 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=8537 9

From: "Joseph D. Wagner" <wagnerjd@prodigy.net> To: <fw@gcc.gnu.org>,! <gcc-bugs@gcc.gnu.org>,! <gcc-prs@gcc.gnu.org>,! <nobody@gcc.gnu.org>,! <wagnerjd@prodigy.net>,! <gcc-gnats@gcc.gnu.org> Cc: Subject: RE: optimization/8537: Optimizer Removes Code Necessary for Security Date: Sun, 17 Nov 2002 08:59:53-0600 Direct quote from: http://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/bug-criteria.html "If the compiler produces valid assembly code that does not correctly execute the input source code, that is a compiler bug." So to all you naysayers out there who claim this is a programming error or poor coding, YES, IT IS A BUG! From the GCC mailing list, 2002 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=8537 10

Motivating Questions: Can a formally verified, correctly implemented compiler optimization introduce a security bug not present in the source? 11

PLDI 2003 POPL 2002 POPL 2006 12 POPL 2004

Motivating Questions: Can a formally verified, correctly implemented compiler optimization introduce a security bug not present in the source? YES! This is the Correctness-Security Gap 13

Motivating Questions: Can a formally verified, correctly implemented compiler optimization introduce a security bug not present in the source? YES! How prevalent is the problem? Also, what gives? 14

Memory Persistence Dead store elimination Function call inlining Code motion Side Channels Subexpression Elimination Strength reduction Peephole Optimizations Language Specifics Undefinedness in C/C++ Memory model issues Synchronization issues

Function Call Inlining char *getpwhash() { // code performing a secure computation // assuming a trusted execution environment. void compute() { // local variables long i, j; char *sha; // Code in this function does not assume // a trusted execution environment.... //call secure function sha=getpwhash();... (from the paper) 16

Side Channels *SCALE=\(2); # 2 or 8, that is the question:-) Value of 8 results # in 16KB large table, which is tough on L1 cache, but eliminates # unaligned references to it. Value of 2 results in 4KB table, but # 7/8 of references to it are unaligned. AMD cores seem to be # allergic to the latter, while Intel ones - to former [see the # table]. I stick to value of 2 for two reasons: 1. smaller table # minimizes cache trashing and thus mitigates the hazard of side- # channel leakage similar to AES cache-timing one; 2. performance # gap among different µ-archs is smaller.... &set_label("roundsdone",16);! &mov! ("esi",&dwp(0,"ebx"));!! # reload argument block! &mov! ("edi",&dwp(4,"ebx"));! &mov! ("eax",&dwp(8,"ebx"));! for($i=0;$i<8;$i++) { &pxor(@mm[$i],&qwp($i*8,"edi")); # L^=inp! for($i=0;$i<8;$i++) { &pxor(@mm[$i],&qwp($i*8,"esi")); # L^=H! for($i=0;$i<8;$i++) { &movq(&qwp($i*8,"esi"),@mm[$i]); # H=L! &lea! ("edi",&dwp(64,"edi"));! # inp+=64! &sub! ("eax",1);!!! # num--! &jz!(&label("alldone"));! &mov! (&DWP(4,"ebx"),"edi");!! # update argument block! &mov! (&DWP(8,"ebx"),"eax");! &jmp! (&label("outerloop")); https://github.com/openssl/openssl/blob/e0fc7961c4fbd27577fb519d9aea2dc788742715/crypto/ whrlpool/asm/wp-mmx.pl 17

Common Subexpression Elimination int crypt(int k*){ int key = 0; if (k[0]==0xc0de){ key=k[0]*15+3; key+=k[1]*15+3; key+=k[2]*15+3; else { key=2*15+3; key+=2*15+3; key+=2*15+3; int crypt(int k*){ int key = 0; if (k[0]==0xc0de){ key=k[0]*15+3; key+=k[1]*15+3; key+=k[2]*15+3; else { // replaced by tmp = 2*15+3; key = 3*tmp; (from the paper) 18

Undefinedness (null dereferences) static unsigned int tun_chr_poll(struct file *file, poll_table * wait) { struct tun_file *tfile = file- >private_data; struct tun_struct *tun = tun_get(tfile); struct sock *sk = tun->sk; unsigned int mask = 0; if (!tun) return POLLERR; static unsigned int tun_chr_poll(struct file *file, poll_table * wait) { struct tun_file *tfile = file- >private_data; struct tun_struct *tun = tun_get(tfile); struct sock *sk = tun->sk; unsigned int mask = 0; return POLLERR;...... http://lwn.net/articles/341773/ 19

https://lkml.org/lkml/2007/5/7/213 20

SOSP 2013 21

1 A very brief history of compiler correctness 2 Correctness vs. Security by Example 3 Correctness vs. Security, Formally 4 Future Directions 22

Observations Compiler correctness proofs show that the behaviour of the code is the same before and after a transformation. Behaviour is defined as some observable aspect of execution, typically state. Execution is defined with respect to a hypothetical abstract machine. 23

A Simple, Correct Transformation int increment(int a) { int b = a; b++; return b; int increment(int a) { return a + 1; 24

A Simple, Correct Transformation int increment(int a) { int b = a; b++; return b; int increment(int a) { return a + 1; p1, a:5 p2, a:5, b:5 p3, a:5, b:6 call inc(x); y = ret b; pc, x:5, y:6 25

A Simple, Correct Transformation int increment(int a) { int b = a; b++; return b; int increment(int a) { return a + 1; call inc(x); p4, a:5 y = ret b; 26 pc, x:5, y:6

A Simple, Correct Transformation int increment(int a) { int b = a; b++; return b; int increment(int a) { return a + 1; p1, a:5 p2, a:5, b:5 p3, a:5, b:6 call inc(x); p4, a:5 y = ret b; 27 pc, x:5, y:6

Compiler Writer vs. Attacker Source Program Compiled Program Execution Environment Optimizations, Correctness Exploits, Vulnerabilities, Abstract Machine Runtime Machine OS, Architecture, etc. 28

More Observations Attackers reason about details (residual state, timing, etc.) not modelled by the abstract semantics machine. Correctness guarantees do not preserve security because those exploits are not even possible in the machine used in proofs! 29

However... 30

A Less Abstract Execution int increment(int a) { int b = a; b++; return b; int increment(int a) { return a + 1; p1, a:5 p2, a:5, b:5 p3, a:5, b:6 call inc(x); y = ret b; p3, a:5, b:6 is closer to implementation. 31 pc, x:5, y:6

A Simple, Correct Transformation int increment(int a) { int b = a; b++; return b; int increment(int a) { return a + 1; p4, a:5 p4, a:5 pc, x:5, y:5 32

A Less Abstract Execution int increment(int a) { int b = a; b++; return b; int increment(int a) { return a + 1; p1, a:5 p2, a:5, b:5 p3, a:5, b:6 p4, a:5 p4, a:5 p3, a:5, b:6 pc, x:5, y:5 pc, x:5, y:6 33

Formal Model vs. Proof Technique Source Program Compiled Program Execution Environment Proof technique Same proof technique! Model Abstract Machine Runtime Machine OS, Architecture, etc. Different models 34

1 Compilers and Trust 2 Correctness vs. Security by Example 3 Correctness vs. Security, Formally 4 Future Directions 35

Testing for a Correctness-Security Gap Memory Persistence Dead store elimination Function call inlining Code motion Side Channels Subexpression Elimination Strength reduction Peephole Optimizations Language Specifics Undefinedness in C/C++ Memory model issues Synchronization issues 36

New, Formal Machine Models Source Semantics Machine IR Semantics Machine Assembler Semantics Machine Timing Machines Memory Hierarchy Machines Power Machines 37

Parameterized Correctness Proofs Optimization Machine Attacker Is the code before and after optimization, equivalent from the viewpoint of an attacker observing the machine? 38

Weak Memory Models Security-Preserving Compilers Litmus tests Memory barriers Fence insertion? New formal models Correctness modulo memory...