Fuzzing Clang to find ABI Bugs. David Majnemer

Similar documents
DEVIRTUALIZATION IN LLVM

Connecting the EDG front-end to LLVM. Renato Golin, Evzen Muller, Jim MacArthur, Al Grant ARM Ltd.

Non 8-bit byte support in Clang and LLVM Ed Jones, Simon Cook

QUIZ. What is wrong with this code that uses default arguments?

Automatic program generation for detecting vulnerabilities and errors in compilers and interpreters

Week 5, continued. This is CS50. Harvard University. Fall Cheng Gong

Memory Corruption 101 From Primitives to Exploit

CSE 220: Systems Programming

Heap Arrays and Linked Lists. Steven R. Bagley

Devirtualize Documentation

A brief introduction to C programming for Java programmers

Introduction to Programming Using Java (98-388)

Short Notes of CS201

Data Storage. August 9, Indiana University. Geoffrey Brown, Bryce Himebaugh 2015 August 9, / 19

CS201 - Introduction to Programming Glossary By

DDMD AND AUTOMATED CONVERSION FROM C++ TO D

Pointers, Arrays and Parameters

These are notes for the third lecture; if statements and loops.

C++ internals. Francis Visoiu Mistrih - 1 / 41

Introduction to C++ with content from

Preliminaries. Part I

EuroLLVM Adrien Guinet 2018/04/17

PIC 10A Pointers, Arrays, and Dynamic Memory Allocation. Ernest Ryu UCLA Mathematics

QUIZ. Can you find 5 errors in this code?

Heap Arrays. Steven R. Bagley

Why C? Because we can t in good conscience espouse Fortran.

Skip the FFI! Embedding Clang for C Interoperability. Jordan Rose Compiler Engineer, Apple. John McCall Compiler Engineer, Apple

Inheritance, Polymorphism and the Object Memory Model

CS 137 Part 6. ASCII, Characters, Strings and Unicode. November 3rd, 2017

Item 4: Extensible Templates: Via Inheritance or Traits?

Control Flow. COMS W1007 Introduction to Computer Science. Christopher Conway 3 June 2003

Overview. Constructors and destructors Virtual functions Single inheritance Multiple inheritance RTTI Templates Exceptions Operator Overloading

Going to cover; - Why we have SPIR-V - Brief history of SPIR-V - Some of the core required features we wanted - How OpenCL will use SPIR-V - How

6.S096 Lecture 10 Course Recap, Interviews, Advanced Topics

C++ without Classes. CMSC433, Fall 2001 Programming Language Technology and Paradigms. More C++ without Classes. Project 1. new/delete.

Aliasing restrictions of C11 formalized in Coq

Ch. 11: References & the Copy-Constructor. - continued -

Formal Verication of C++ Object Construction and Destruction

American National Standards Institute Reply to: Josee Lajoie

CS140 Final Project. Nathan Crandall, Dane Pitkin, Introduction:

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Bits and Bytes and Numbers

2SKILL. Variables Lesson 6. Remembering numbers (and other stuff)...

Introducing C++ David Chisnall. March 15, 2011

Introduction to C++ Introduction. Structure of a C++ Program. Structure of a C++ Program. C++ widely-used general-purpose programming language

C++ (Non for C Programmer) (BT307) 40 Hours

CS3157: Advanced Programming. Outline

Semantic Analysis and Type Checking

CSE 565 Computer Security Fall 2018

C++ for Java Programmers

Variables. Data Types.

The compilation process is driven by the syntactic structure of the program as discovered by the parser

Deep C (and C++) by Olve Maudal

Input And Output of C++

Variables and Data Representation

Making Inheritance Work: C++ Issues

Making Inheritance Work: C++ Issues

ECE 15B COMPUTER ORGANIZATION

Deep C by Olve Maudal

logistics: ROP assignment

Identifying Memory Corruption Bugs with Compiler Instrumentations. 이병영 ( 조지아공과대학교

Overview. Constructors and destructors Virtual functions Single inheritance Multiple inheritance RTTI Templates Exceptions Operator Overloading

CSE 565 Computer Security Fall 2018

C Language Advanced Concepts. Microcomputer Architecture and Interfacing Colorado School of Mines Professor William Hoff

Midterm 2. 7] Explain in your own words the concept of a handle class and how it s implemented in C++: What s wrong with this answer?

Chapter 1 Getting Started

QUIZ. What are 3 differences between C and C++ const variables?

Enhanced Debugging with Traces

Topic Notes: Bits and Bytes and Numbers

CSE351 Winter 2016, Final Examination March 16, 2016

CS-537: Midterm Exam (Fall 2008) Hard Questions, Simple Answers

Intro. Scheme Basics. scm> 5 5. scm>

C++11/14 Rocks. Clang Edition. Alex Korban

Final Exam. 12 December 2018, 120 minutes, 26 questions, 100 points

Basic program The following is a basic program in C++; Basic C++ Source Code Compiler Object Code Linker (with libraries) Executable

CS311 Lecture: The Architecture of a Simple Computer

G52CPP C++ Programming Lecture 7. Dr Jason Atkin

Lecture 4 September Required reading materials for this class

Runtime Environments, Part II

CS 31: Intro to Systems Pointers and Memory. Kevin Webb Swarthmore College October 2, 2018

Exploiting Vulnerabilities in Media Software. isec Partners

ECE 449 OOP and Computer Simulation Lecture 14 Final Exam Review

Bounding Object Instantiations, Part 2

Malloc Lab & Midterm Solutions. Recitation 11: Tuesday: 11/08/2016

Lesson 3: Accepting User Input and Using Different Methods for Output

Having fun with apple s IOKit. Ilja van sprundel

CHAPTER 3: FUNDAMENTAL OF SOFTWARE ENGINEERING FOR GAMES.

C++ assign array values. C++ assign array values.zip

Tokens, Expressions and Control Structures

An Insight Into Inheritance, Object Oriented Programming, Run-Time Type Information, and Exceptions PV264 Advanced Programming in C++

Ch. 12: Operator Overloading

ECE220: Computer Systems and Programming Spring 2018 Honors Section due: Saturday 14 April at 11:59:59 p.m. Code Generation for an LC-3 Compiler

Assignment 1c: Compiler organization and backend programming

Language Security. Lecture 40

CS61C : Machine Structures

Representing Memory-Mapped Devices as Objects

Supporting Class / C++ Lecture Notes

C++ Undefined Behavior What is it, and why should I care?

Other array problems. Integer overflow. Outline. Integer overflow example. Signed and unsigned

Functions, Pointers, and the Basics of C++ Classes

CSE 401/M501 Compilers

Transcription:

Fuzzing Clang to find ABI Bugs David Majnemer

What s in an ABI? The size, alignment, etc. of types Layout of records, RTTI, virtual tables, etc. The decoration of types, functions, etc. To generalize: anything that you need N > 1 compilers to agree upon

C++: A complicated language union U { int a; int b; };! int U::*x = &U::a; int U::*y = &U::b;! Does x equal y?

We ve got a standard How hard could it be?

[T]wo pointers to members compare equal if they would refer to the same member of the same most derived object or the same subobject if indirection with a hypothetical object of the associated class type were performed, otherwise they compare unequal. No ABI correctly implements this.

Why does any of this matter? Data passed across ABI boundaries may be interpreted by another compiler Unpredictable things may happen if two compilers disagree about how to interpret this data Subtle bugs can be some of the worst bugs

Finding bugs isn t easy ABI implementation techniques may collide with each other in unpredictable ways One compiler permutes field order in structs if the alignment is 16 AND it has an empty virtual base AND it has at least one bitfield member AND Some ABIs are not documented Even if they are, you can t always trust the documentation

What happens if we aren t proactive Let users find our bugs for us This can be demoralizing for users, eroding their trust Altruistic; we must hope that the user will file the bug At best, the user s time has been spent on something they probably didn t want to do

Let computers find the bugs 1. Generate some C++ 2. Feed it to the compiler 3. Did the compiler die? If so, we have an interesting test case 4. If not, let s ask another compiler to do the same 5. Compare the output of the two compilers

What we managed to attack External name generating (name mangling) Virtual table layout Thunk generation Record layout IR generation

In the beginning, there was record layout Thought to be high value, low effort to fuzz Generate a single TU execution test; expected identical results upon execution We want full coverage but without an excessive number of tests

The plan for version 0.1 of the fuzzer seemed unambitious Generate hierarchies of classes Fill classes with fields Support C scalar types (int, char, etc.) Support bitfields No arrays, pointer to member functions, etc. No virtual methods No pragmas or attributes Dump offsets of fields All classes must have a constructor

First steps Let s generate some hierarchies

First steps struct A { }; struct B : virtual A { }; struct C : virtual B, A { };! warning C4584: 'C': base-class 'A' is already a base-class of 'B' struct A { }; struct B : virtual A { }; struct C : A, virtual B { };! error C2584: 'C': direct base 'A' is inaccessible; already a base of 'B'

First Lesson Successful fuzzing requires a model of what good test cases should look like High failure rate can completely cripple the fuzzer Less restrictive is better than more restrictive, you might lose out on test cases otherwise

Fuzzer 0.1, while quite limited, was wildly successful Support for #pragma pack and declspec(align) was added

A typical test case struct A {}; struct B {}; #pragma pack(push, 1) struct C : virtual A, virtual B { }; #pragma pack(pop) struct D : C {}; alignof(c) == 1, correct alignof(d) == 1, wrong! correct answer is 4

It s like whack-a-mole struct A {}; struct B {}; struct C : virtual A, virtual B { }; #pragma pack(push, 1) struct D : C {}; #pragma pack(pop) alignof(c) == 4, correct alignof(d) == 4, wrong! correct answer is 1

Testing synthesis of default operators Copy constructor IR generation is sophisticated Tries to use memcpy if it s valid & profitable, otherwise falls back to field-by-field initialization Sophistication comes at a cost: complexity ABI-specific assumptions baked into generic code, resulting in surprising IR Fuzz tested by sticking dllexport on all classes Forces emission of all special member operators

C++ type to LLVM IR type We need an IR type for a particular C++ type in different contexts Surprisingly leads to different IR types for the same C++ type Increased attack surface

Meet CGRecordLayout We asked the compiler to zero-initialize u union U { double x; long long y; };! U u;! %union.u = type { double }! @u = global %union.u zeroinitializer First named union member is initialized Shocking number of compilers get this wrong Code is relatively simple, largely powered by AST layout algorithms

Meet ConstStructBuilder We asked the compiler to aggregate-initialize u union U { double x; long long y; };! U u = {.y = 0 };! %union.u = type { double }! @u = global { i64 } { i64 0 } Can t use %union.u to initialize, wrong type Anonymous type used instead Slavishly builds a new type from scratch Has its own bitfield layout algorithm!

CGRecordLayout Used for zero-initialization Memory type, used for loads and stores ConstStructBuilder Used for aggregate initialization (C99 designated initializers, C++11 initializer lists) This seems complicated, why not let one rule them all? CGRecordLayout is useful, largely reduces the number of new types we need but cannot always be used for aggregate initialization ConstStructBuilder can handle aggregate initialization but has no idea how to handle virtual bases, vtordisps, etc. These problems aren t insurmountable but they aren t trivial either :(

What about virtual tables? Some ABIs have a virtual base table and a virtual function table, others concatenate both into one table Virtual function table entries might point to virtual functions or to thunks which then delegate to the actual function body Thunk might adjust the this pointer, the returned value or both! RTTI data lives in the virtual function table Composed of complex structures which describe inheritance structure, layout, accessibility, etc.

Comparing VTables Initial virtual function table comparer was a wrapper around llvm s obj2yaml Worked excellently at first, eventually became a bottleneck A dedicated tool was written, llvm-vtabledump More sophisticated: can parse RTTI data, dump virtual base offsets, etc.

S::`vftable'[0]: const S::`RTTI Complete Object Locator' S::`vftable'[4]: public: virtual void * thiscall S::`destructor'(unsigned int) S::`vbtable'[0]: -4 S::`vbtable'[4]: 4 S::`RTTI Complete Object Locator'[IsImageRelative]: 0 S::`RTTI Complete Object Locator'[OffsetToTop]: 0 S::`RTTI Complete Object Locator'[VFPtrOffset]: 0 S::`RTTI Base Class Array'[0]: S::`RTTI Base Class Descriptor at (0,-1,0,64)'

A typical VTable testcase Clang s vftable for C: struct A { virtual A *f(); };! struct B : virtual A { virtual B *f(); B() {} };! struct C : virtual A, B {}; A* B::f() [thunk] MS vftable for C: B* B::f() [thunk] B* B::f() Both compilers are wrong! A* B::f() [thunk] B* B::f()

A cute trick used for pure classes Would like to be able to reference virtual function table struct A { virtual A *f() = 0; };! struct B : virtual A { virtual B *f() = 0; }; Can t construct an object of type A or B Don t want to add ctor or dtor, both have ABI implications declspec(dllexport) references the vftable so it may be exported ;)

This approach worked marvelously for RTTI RTTI was the first complex component started after the fuzzer was written Feedback loop was created, made it possible to iteratively improve compatibility Zero known bugs in RTTI as of this talk

Virtual tables don t seem so hard, what s the big deal? It turns out the other compiler has bugs (*cue gasps*) Develop heuristics to determine when clang is correct and they are incorrect We hope we didn t miss any interesting cases :( Non-virtual overloads can have an effect on virtual table contents

String literals Some ABIs mangle their string literals Wait, seriously? Yeah, that way they merge across translation units

Examples hello! turns into??_c@_06ganfphod@hello?$cb?$aa@ L hello! turns into??_c@_1o@imicciob@?$aah?$aae? $AAl?$AAl?$AAo?$AA?$CB?$AA?$AA@ Wonderful, right?

Custom fuzzer written I thought I was on the right track but I wanted to be sure, this was easily tested with a purposebuilt fuzzer

// <char-type> ::= 0 # char // ::= 1 # wchar_t // ::=??? # char16_t/char32_t will need a mangling too... // // <literal-length> ::= <non-negative integer> # the length of the literal // // <encoded-crc> ::= <hex digit>+ @ # crc of the literal including // # null-terminator // // <encoded-string> ::= <simple character> # uninteresting character // ::= '?$' <hex digit> <hex digit> # these two nibbles // # encode the byte for the // # character // ::= '?' [a-z] # \xe1 - \xfa // ::= '?' [A-Z] # \xc1 - \xda // ::= '?' [0-9] # [,/\:. \n\t'-] // // <literal> ::= '??_C@_' <char-type> <literal-length> <encoded-crc> // <encoded-string> '@'

Is this approach effective? 98 MS ABI bugs found since the fuzzer was written: 17748 17750 17761 17767 17768 17772 17816 18021 18022 18024 18025 18026 18027 18035 18039 18118 18167 18168 18169 18170 18172 18173 18175 18186 18215 18216 18248 18264 18278 18433 18434 18435 18436 18437 18444 18464 18467 18474 18476 18479 18617 18618 18675 18692 18694 18702 18826 18844 18845 18880 18902 18917 18951 18967 19012 19025 19066 19104 19172 19180 19181 19240 19361 19398 19399 19407 19408 19413 19414 19487 19505 19506 19733 20017 20047 20221 20315 20343 20351 20418 20444 20464 20477 20479 20653 20719 20897 20947 21022 21164 21031 21046 21060 21061 21062 21064 21073 21074 Bug numbers in bold are bugs found by superfuzz.

Thanks!