Reflections on using C(++) Root Cause Analysis

Similar documents
Software and Web Security 1. Root Cause Analysis. Abstractions Assumptions Trust. sws1 1

Software Security Buffer Overflows more countermeasures. Erik Poll. Digital Security Radboud University Nijmegen

Software Security Buffer Overflows more countermeasures. Erik Poll. Digital Security Radboud University Nijmegen

Limitations of the stack

Software Security Buffer Overflows public enemy number 1. Erik Poll. Security of Systems (SoS) group Radboud University Nijmegen

Last week. Data on the stack is allocated automatically when we do a function call, and removed when we return

Programmer s Points to Remember:

Outline. Classic races: files in /tmp. Race conditions. TOCTTOU example. TOCTTOU gaps. Vulnerabilities in OS interaction

Hacking in C. Pointers. Radboud University, Nijmegen, The Netherlands. Spring 2019

Pointers (continued), arrays and strings

CIS 551 / TCOM 401 Computer and Network Security. Spring 2007 Lecture 2

Secure Programming I. Steven M. Bellovin September 28,

Pointers (continued), arrays and strings

ECS 153 Discussion Section. April 6, 2015

CNIT 127: Exploit Development. Ch 18: Source Code Auditing. Updated

Software Security Buffer Overflows. public enemy number 1. Erik Poll. Digital Security Radboud University Nijmegen

Computer Security. Robust and secure programming in C. Marius Minea. 12 October 2017

A brief introduction to C programming for Java programmers

Other array problems. Integer overflow. Outline. Integer overflow example. Signed and unsigned

CSCE 548 Building Secure Software Buffer Overflow. Professor Lisa Luo Spring 2018

CS 61: Systems programming and machine organization. Prof. Stephen Chong November 15, 2010

Topics in Software Security Vulnerability

Buffer overflow background

Important From Last Time

Important From Last Time

Copyright (2004) Purdue Research Foundation. All rights reserved.

(Early) Memory Corruption Attacks

Page 1. Today. Important From Last Time. Is the assembly code right? Is the assembly code right? Which compiler is right?

Lecture 4 September Required reading materials for this class

Agenda. Security and Standard C Libraries. Martyn Lovell Visual C++ Libraries Microsoft Corporation

Verification & Validation of Open Source

Advanced Systems Security: Ordinary Operating Systems

This time. Defenses and other memory safety vulnerabilities. Everything you ve always wanted to know about gdb but were too afraid to ask

CS 261 Fall C Introduction. Variables, Memory Model, Pointers, and Debugging. Mike Lam, Professor

Advanced Systems Security: Ordinary Operating Systems

COSC Software Engineering. Lecture 16: Managing Memory Managers

18-642: Code Style for Compilers

CSC 1600 Memory Layout for Unix Processes"

Secure Coding Techniques

Sandwiches for everyone

Introduction. Background. Document: WG 14/N1619. Text for comment WFW-1 of N1618

Module: Safe Programming. Professor Trent Jaeger. CSE543 - Introduction to Computer and Network Security

Betriebssysteme und Sicherheit Sicherheit. Buffer Overflows

Linux Memory Layout. Lecture 6B Machine-Level Programming V: Miscellaneous Topics. Linux Memory Allocation. Text & Stack Example. Topics.

IAGO ATTACKS: WHY THE SYSTEM CALL API IS A BAD UNTRUSTED RPC INTERFACE

CYSE 411/AIT681 Secure Software Engineering Topic #10. Secure Coding: Integer Security

One-Slide Summary. Lecture Outline. Language Security

2/9/18. Readings. CYSE 411/AIT681 Secure Software Engineering. Introductory Example. Secure Coding. Vulnerability. Introductory Example.

2/9/18. CYSE 411/AIT681 Secure Software Engineering. Readings. Secure Coding. This lecture: String management Pointer Subterfuge

CSCI-243 Exam 1 Review February 22, 2015 Presented by the RIT Computer Science Community

Pointer Casts and Data Accesses

Software Security: Buffer Overflow Attacks

ECE 471 Embedded Systems Lecture 22

Making C Less Dangerous

Programming refresher and intro to C programming

CSE 565 Computer Security Fall 2018

o Code, executable, and process o Main memory vs. virtual memory

Lecture 05 Integer overflow. Stephen Checkoway University of Illinois at Chicago

Software Security II: Memory Errors - Attacks & Defenses

C Introduction. Comparison w/ Java, Memory Model, and Pointers

Buffer Overflows Defending against arbitrary code insertion and execution

Topic 8: I/O. Reading: Chapter 7 in Kernighan & Ritchie more details in Appendix B (optional) even more details in GNU C Library manual (optional)

Lecture 8 Dynamic Memory Allocation

CMPSC 497: Midterm Review

Brave New 64-Bit World. An MWR InfoSecurity Whitepaper. 2 nd June Page 1 of 12 MWR InfoSecurity Brave New 64-Bit World

Recitation: C Review. TA s 20 Feb 2017

Page 1. Today. Last Time. Is the assembly code right? Is the assembly code right? Which compiler is right? Compiler requirements CPP Volatile

Announcements. assign0 due tonight. Labs start this week. No late submissions. Very helpful for assign1

Language Security. Lecture 40

Secure Programming. An introduction to Splint. Informatics and Mathematical Modelling Technical University of Denmark E

Buffer Overflows. Buffer Overflow. Many of the following slides are based on those from

'Safe' programming languages

Required reading: StackGuard: Simple Stack Smash Protection for GCC

CSC C69: OPERATING SYSTEMS

CS 261 Fall Mike Lam, Professor. Structs and I/O

System Assertions. Andreas Zeller

QUIZ. What is wrong with this code that uses default arguments?

Agenda. Peer Instruction Question 1. Peer Instruction Answer 1. Peer Instruction Question 2 6/22/2011

CSE 127 Computer Security

Buffer overflows (a security interlude) Address space layout the stack discipline + C's lack of bounds-checking HUGE PROBLEM

Secure Programming. Buffer Overflows Learning objectives. Type. 3 Buffer Overflows. 4 An Important Vulnerability. 5 Example Overflow

Introduction to Computer Systems , fall th Lecture, Sep. 28 th

CSCE 548 Building Secure Software Integers & Integer-related Attacks & Format String Attacks. Professor Lisa Luo Spring 2018

Dynamic Allocation in C

C Characters and Strings

When you add a number to a pointer, that number is added, but first it is multiplied by the sizeof the type the pointer points to.

Secure Software Development: Theory and Practice

Basic Buffer Overflows

TI2725-C, C programming lab, course

Lecture Notes on Types in C

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING

Stanford University Computer Science Department CS 295 midterm. May 14, (45 points) (30 points) total

DAY 3. CS3600, Northeastern University. Alan Mislove

18-642: Code Style for Compilers

EL2310 Scientific Programming

Memory Corruption 101 From Primitives to Exploit

C Refresher, Advance C, Coding Standard, Misra C Compliance & Real-time Programming

Ethical Hacking: Preventing & Writing Buffer Overflow Exploits

C and C++ Secure Coding 4-day course. Syllabus

System Administration and Network Security

Transcription:

Hacking in C Reflections on using C(++) Root Cause Analysis Abstractions Complexity Assumptions Trust hic 1

There are only two kinds of programming languages: the ones people complain about and the ones nobody uses. Bjarne Stroustrup, the creator of C++ hic 2

What we have seen in this course Some complexities that hide under the hood of C: representation of data types, eg long as big/little endian sequences of bytes string, or char*, as char sequence terminated by the null character \0 allocation of data on stack (by default) and heap (with malloc) execution of code - esp. function calls - using the stack with return addresses and frame pointers with some consequences unexpected interpretation of data implicit casts between data types, p+1 for pointers p, %n %s %p in format strings, undefined behaviour unintended manipulation of data array index outside bounds, pointer arithmetic, accessing uninitialized or de-allocated memory, memory leaks which allows attacks hic 3

The good news C is a small language that is close to the hardware you can produce highly efficient code compiled code runs on raw hardware with minimal infrastructure Therefore C is typically the programming language of choice for highly efficient code for embedded systems (which have limited capabilities) for system software (operating systems, device drivers,...) hic 4

The somewhat bad news The precise semantics of C programs depends on underlying hardware, eg sizes of data types differ per architecture casting between types will reveal endianness of the platform... The precise semantics of a C program can only be determined by looking at 1) the compiled code 2) the underlying hardware For efficiency it is unavoidable you have to know the underlying hardware, but for the semantics you d wish to avoid this. This also hampers portability of code hic 5

The really bad news Writing secure C(++) code is hard, because the built-in notorious sources of security vulnerabilities buffer overruns, and absence of array bound checks dynamic memory, managed by the programmer with malloc and free and using complex pointers More generally: undefined behavior caused by this And format string attacks, though these should be easy to fix.. hic 6

The good vs the bad news C is a terse and unforgiving abstraction of silicon hic 7

Undefined behaviour Defined in the FAQ as Anything at all can happen; the Standard imposes no requirements. The program may fail to compile, or it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended. hic 8

Undefined causing problems [Example from Linux kernel] 1. unsigned int tun_chr_poll(struct file *file, 2. poll_table *wait) 3. { 4. struct tun_file *tfile = file->private_data; 5. struct tun_struct *tun = tun_get(tfile); 6. struct sock *sk = tun->sk; // shorthand for (*tun).sk 7. if (!tun) return POLLERR; // check if tun is non-null 8...... } Line 7 checks if tun is NULL, and if so, returns error But in line 6 tun is already de-referenced... Compiler is now allowed to assume that tun is non-null in line 7 because if tun is null, then line 6 is undefined behaviour, and any code the compiler produces is ok gcc will actually remove line 7 as compiler optimisation. This caused a security flaw in Linux kernel when compiled with gcc. hic 9

Undefined causing problems [Example from Linux kernel] The code below is careful to check for negative lengths and potential integer overflows 1. int vsnprintf(char *buf, size_t size,...) { 2. char *end; 3. if (size < 0) return 0; // reject negative length 4. end = buf + size; 5. /* Make sure end is always >= buf */ 6. if (end < buf ) {...} 7.... 8. } Programmer assumes that if buf+size overflows, it will become negative. However, buf+size overflowing is undefined behaviour, and all bets are off. Compiler may remove line 6; it knows that size is non-negative and may assume that buf+size >= buf for any addition with a non-negative number because if the addition overflows, this is undefined behaviour, and any code it produces is ok hic 10

C(++) vs safer languages You can write insecure code in any programming language. Still, some programming languages offer more in-built protection than others, eg against buffer overruns, by checking array bounds problems with dynamic memory, eg by garbage collection missing initialisation, by offering default initialisation suspicious type casts, by disallowing these integer overflows, by raising exceptions when these occur C(++) programmer is like trapeze artist without safety net hic 11

Consequences of the bad news Many products should carry a government health warning Warning: this product contains C(++) code and therefore, unless the programmers were experts and never made a mistake, is likely to contain buffer overflow vulnerabilities. As a C(++) programmer, you have to become an expert at avoiding classic security flaws in these languages incl. using all the defenses discussed last week hic 12

C(++) secure coding standards & guidelines Fortunately, there is now good info available, eg C(++) Secure Coding Standards by CERT https://www.securecoding.cert.org NB If you are going to write C(++) code, you have to read such documents! If you work for a company that produces C(++) code, you ll have to make sure that all programmers read them too! hic 13

Extreme risk gets Dangerous C system calls [source: Building secure software, J. Viega & G. McGraw, 2002] High risk strcpy strcat sprintf scanf sscanf fscanf vfscanf vsscanf streadd strecpy strtrns realpath syslog getenv getopt getopt_long getpass Moderate risk getchar fgetc getc read bcopy Low risk fgets memcpy snprintf strccpy strcadd strncpy strncat vsnprintf 14

C(++) secure coding standards & guidelines More general coding guidelines, which also cover other (OS-related) aspects besides generic C(++) issues: Secure programming HOWTO by Dan Wheeler chapter 6 on buffer overflows Linux/UNIX oriented http://www.dwheeler.com/secure-programs/secure-programs-howto/buffer-overflow.html Writing Secure Code by Howard & Leblanc chapter 5 on buffer overflows Microsoft-oriented hic 15

Some root cause analysis hic 16

Recurring theme: functionality vs security There is often a tension between functionality and security. People always choose for functionality over security: Classic example: efficiency of the language (not checking array bounds) vs security of the language (checking array bounds) hic 17

functionality vs security 18

Recurring theme: mixing user & control data Mixing control data (namely return addresses & frame pointers) and untrusted user data (which may overrun buffers) next to each other on the stack was not a great idea. Remember the root cause of phone phreaking! Note that format string attacks also involve control data (or control characters) inside user input hic 19

Recurring theme: complexity Who understands all implicit conversions between C data types? Who can understand whether a large C program leaks memory? Or accidentally accesses freed memory? (because there are too many/too few/the wrong free statements) Who understands all the sources of possibly undefined behaviour? Who understands the compiler optimalisations that are allowed in the presence of undefined behavior? Should a programmer have to know the entire C language specification to be able to write secure code? Complexity is a big enemy of security Abstractions are our main (only?) tool to combat or at least control complexity. hic 20

Controlling complexity: Abstractions We want to deal with abstractions instead of complex underlying representations, eg a long instead of a big- or little-endian sequence of sizeof(long) bytes a string "Hello" instead of a null-terminated sequence of array indexing, eg. a[12], instead of pointer arithmetic in &a+12*sizeof(int) a function call f(12) instead of push 12 on the stack push current address & frame pointer on stack allocate local variables and jump to f upon return, pop everything from the stack & continue at the return address you found on the stack hic 21

Abstractions Ideally, abstractions should be rock solid, so they cannot be broken. This means they do not rely on assumptions (on how they are used) the underlying representation does not matter, and is never revealed even when users supply malicious input. If that is the case we do not have to trust programs, or the programmers that write them, to use abstractions in the right way. hic 22

Implicit assumptions Often abstractions rely on implicit assumptions, which can be broken For example, assuming that there is infinite stack memory, so function calls never fail there is infinite heap memory, so a malloc never fails, and free s are not really needed int s are mathematical integers strings fit into buffers the return address on the stack still points to the right place a char array has a null terminator a pointer is not null... Implicit invalid assumptions are important cause of security problems hic 23

Trust Is trust good? Is trust bad? ie. do we want as much trust as possible, or as little trust as possible? hic 24

Trust vs trustworthiness vertrouwen vs betrouwbaarheid Trust backed by trustworthiness is good. Trust without trustworthiness is bad. We want to minimise trust, because trust in something means that that something could damage you. TCB (Trusted Computing Base) is the collection of software & hardware that we have to trust for a system to be secure We want the TCB to be as small as possible. Of course, we also want our TCB to be as trustworthy as possible. hic 25

Trusted Computing Base (TCB) The Trusted Computing Base of a C(++) program includes all the code, incl. libraries because any buffer overflow, flawed pointer arithmetic,... somewhere can effect the memory anywhere The Trusted Computing Base also includes the compiler the OS the hardware hic 26

Reflection on trusting trust Title of Ken Thompson s Turing award acceptance speech, where he revealed a backdoor in UNIX and Trojan in C-compiler. 1. backdoor in login.c: if (name == "ken") {don't check password; log in as root} 2. code in C compiler to add backdoor when recompiling login.c 3. code in C compiler to add code (2 & 3!) when (re)compiling a compiler Moral of the story: you may be trusting more than you expect! Trust is transitive hic 27

Overview: recurring themes in insecurity complexity incl. unexpected interpretation of special characters %n %s ; \.. functionality vs security mixing user and control data abstractions that can be broken (misplaced) trust in these abstractions trust in general, and not realising what the TCB is hic 28