D as Better C Compiler. by Walter Bright dlang.org

Similar documents
CYSE 411/AIT681 Secure Software Engineering Topic #10. Secure Coding: Integer Security

2/9/18. Readings. CYSE 411/AIT681 Secure Software Engineering. Introductory Example. Secure Coding. Vulnerability. Introductory Example.

2/9/18. CYSE 411/AIT681 Secure Software Engineering. Readings. Secure Coding. This lecture: String management Pointer Subterfuge

advanced data types (2) typedef. today advanced data types (3) enum. mon 23 sep 2002 defining your own types using typedef

CSC209H Lecture 3. Dan Zingaro. January 21, 2015

CPS104 Recitation: Assembly Programming

UNDEFINED BEHAVIOR IS AWESOME

o Code, executable, and process o Main memory vs. virtual memory

CSCI-243 Exam 1 Review February 22, 2015 Presented by the RIT Computer Science Community

Deep C. Multifile projects Getting it running Data types Typecasting Memory management Pointers. CS-343 Operating Systems

Final CSE 131B Spring 2004

High-performance computing and programming Intro to C on Unix/Linux. Uppsala universitet

Programming. Pointers, Multi-dimensional Arrays and Memory Management

CSC 1600 Memory Layout for Unix Processes"

C Programming. Course Outline. C Programming. Code: MBD101. Duration: 10 Hours. Prerequisites:

Review Topics. Final Exam Review Slides

CSci 4061 Introduction to Operating Systems. Programs in C/Unix

C BOOTCAMP DAY 2. CS3600, Northeastern University. Alan Mislove. Slides adapted from Anandha Gopalan s CS132 course at Univ.

Dynamic memory allocation

C++ Undefined Behavior What is it, and why should I care?

Memory Corruption Vulnerabilities, Part II

Tutorial 1: Introduction to C Computer Architecture and Systems Programming ( )

Final Exam. Fall Semester 2016 KAIST EE209 Programming Structures for Electrical Engineering. Name: Student ID:

C++ Undefined Behavior

Lecture 9 Assertions and Error Handling CS240

Dynamic memory. EECS 211 Winter 2019

Data types, arrays, pointer, memory storage classes, function call

ECE 264 Exam 2. 6:30-7:30PM, March 9, You must sign here. Otherwise you will receive a 1-point penalty.

CSE409, Rob Johnson, Alin Tomescu, November 11 th, 2011 Buffer overflow defenses

In Java we have the keyword null, which is the value of an uninitialized reference type

CNIT 127: Exploit Development. Ch 1: Before you begin. Updated

Today s class. Finish review of C Process description and control. Informationsteknologi. Tuesday, September 18, 2007

Biography. Background

Betriebssysteme und Sicherheit Sicherheit. Buffer Overflows

Lecture 03 Bits, Bytes and Data Types

Memory Allocation in C

System calls and assembler

P.G.TRB - COMPUTER SCIENCE. c) data processing language d) none of the above

COSC Software Engineering. Lecture 16: Managing Memory Managers

CSE 333 Midterm Exam Sample Solution 7/28/14

CS527 Software Security

Basic C Programming (2) Bin Li Assistant Professor Dept. of Electrical, Computer and Biomedical Engineering University of Rhode Island

Page 1. Stuff. Last Time. Today. Safety-Critical Systems MISRA-C. Terminology. Interrupts Inline assembly Intrinsics

Buffer Overflow Vulnerability

Sample Midterm (Spring 2010)

Outline. Lecture 1 C primer What we will cover. If-statements and blocks in Python and C. Operators in Python and C

Armide Documentation. Release Kyle Mayes

From Java to C. Thanks to Randal E. Bryant and David R. O'Hallaron (Carnegie-Mellon University) for providing the basis for these slides

Homework 3 CS161 Computer Security, Fall 2008 Assigned 10/07/08 Due 10/13/08

Memory Allocation in C C Programming and Software Tools. N.C. State Department of Computer Science

CSE 30 Fall 2013 Final Exam

Subject: Fundamental of Computer Programming 2068

Memory Corruption 101 From Primitives to Exploit

Concurrent Programming in the D Programming Language. by Walter Bright Digital Mars

CS 31: Intro to Systems Pointers and Memory. Martin Gagne Swarthmore College February 16, 2016

Arrays and Memory Management

Lecture 08 Control-flow Hijacking Defenses

Roadmap: Security in the software lifecycle. Memory corruption vulnerabilities

This code has a bug that allows a hacker to take control of its execution and run evilfunc().

CptS 360 (System Programming) Unit 4: Debugging

CSE 374 Programming Concepts & Tools

CSE 30 Winter 2014 Final Exam

Call The Project Dynamic-Memory

Pointer Arithmetic and Lexical Scoping. CS449 Spring 2016

Secure Programming Lecture 3: Memory Corruption I (Stack Overflows)

Midterm Exam Nov 8th, COMS W3157 Advanced Programming Columbia University Fall Instructor: Jae Woo Lee.

Kurt Schmidt. October 30, 2018

Pointers, Arrays, Memory: AKA the cause of those Segfaults

Lecture 14. Dynamic Memory Allocation

Introduction to Computer Systems , fall th Lecture, Sep. 28 th

JTSK Programming in C II C-Lab II. Lecture 3 & 4

Cling: A Memory Allocator to Mitigate Dangling Pointers. Periklis Akritidis

CS2141 Software Development using C/C++ Debugging

Important From Last Time

2 Sadeghi, Davi TU Darmstadt 2012 Secure, Trusted, and Trustworthy Computing Chapter 6: Runtime Attacks

Semantics of C++ Hauptseminar im Wintersemester 2009/10 Templates

CA341 - Comparative Programming Languages

Computer Systems Lecture 9

CSE 30 Fall 2012 Final Exam

CMPSC 497 Other Memory Vulnerabilities

Memory and C/C++ modules

Memory Management. a C view. Dr Alun Moon KF5010. Computer Science. Dr Alun Moon (Computer Science) Memory Management KF / 24

CS 261 Data Structures. Introduction to C Programming

Buffer Overflow & Format Strings

Final CSE 131B Spring 2005

CSE 12 Spring 2016 Week One, Lecture Two

CSE 333 Autumn 2013 Midterm

Buffer Overflow Vulnerability Lab Due: September 06, 2018, Thursday (Noon) Submit your lab report through to

From Over ow to Shell

MISRA-C. Subset of the C language for critical systems

Semantics (cont.) Symbol Table. Static Scope. Static Scope. Static Scope. CSE 3302 Programming Languages. Static vs. Dynamic Scope

EURECOM 6/2/2012 SYSTEM SECURITY Σ

Short Notes of CS201

When you add a number to a pointer, that number is added, but first it is multiplied by the sizeof the type the pointer points to.

CSE 333 Autumn 2014 Midterm Key

ch = argv[i][++j]; /* why does ++j but j++ does not? */

CSE 30 Fall 2007 Final Exam

The D Programming Language

Linux Memory Layout. Lecture 6B Machine-Level Programming V: Miscellaneous Topics. Linux Memory Allocation. Text & Stack Example. Topics.

CS201 - Introduction to Programming Glossary By

Transcription:

D as Better C Compiler by Walter Bright dlang.org

C Brilliantly conceived language Major force for 40 years Engine for major critical software Well known and understood Man behind the curtain

All Is Not Roses 40 years of advancement in programming languages No memory safety

Memory Safety being protected from various software bugs and security vulnerabilities when dealing with memory access, such as buffer overflows and dangling pointers. wikipedia

Costs of Memory Unsafety millions of dollars endless hours of effort to find and correct constant threat embarrassment

Buffer Overflow Heartbleed https://en.wikipedia.org/wiki/heartbleed

Buffer Overflow Public Enemy #1 Microsoft's Writing Secure Code 2

Buffer Overflow Morris Worm First internet worm relied on buffer overflow C bug

Buffer Overflow #3 on list of top 25 security vulnerabilities http://cwe.mitre.org/top25/index.html#cwe-120

Not a Buffer Overflow

Good News! It's not your fault

Has persisted for 40 years Not a tractable problem Not fixable by fixing the programmer

It's a Tooling Problem

int sum(int *array, size_t length) { int sum = 0; for (size_t i = 0; i <= length; ++i) sum += array[i]; return sum; } Wizard of Oz

D Arrays are Phat Pointers struct Array { size_t length; int* ptr; } https://dlang.org/spec/arrays.html#dynamic-arrays

First D Version int sum(int[ ] array) { int sum = 0; for (size_t i = 0; i < array.length; ++i) sum += array[i]; return sum; }

Using foreach int sum(int[ ] array) { int sum = 0; foreach (i; 0.. array.length) sum += array[i]; return sum; }

More Advanced foreach int sum(int[ ] array) { int sum = 0; foreach (value; array) sum += value; return sum; }

Same problem with 0 terminated strings D strings are just arrays of characters

And Much More Safety no uninitialized pointers no pointers to expired stack frames no aliasing pointers with other types no implicit narrowing conversions pointer lifetime tracking etc...

Calling C Code from D extern (C) void* malloc(size_t); T[ ] allocarray(t)(size_t numelements) { auto p = malloc(t.sizeof * numelements); assert(p); return p[0.. numelements]; }

Problem D code needs to link with the D runtime library. But an existing C project does not link with the D runtime library. Linking it in produces issues with the size of the D runtime library, and how/when it is initialized compared to that of the existing C project.

Solution - BetterC Create a subset of D that does not require the D runtime library.

Features Altered assert() failures now go to C standard library RAII no longer unwinds exceptions No dynamic type info No exception handling No automatic memory management

What's this going to cost?

C #include <stdio.h> int main(size_t argc, char** argv) { for (size_t i = 0; i < argc; ++i) printf("arg[%zd] = %s\n", i, argv[i]); return 0; }

D as Better C import core.stdc.stdio; extern (C) int main(size_t argc, char** argv) { foreach (i, s; argv[0.. argc]) printf("arg[%zd] = %s\n", i, s); return 0; }

Some Assembly Required

sub ESP,01Ch mov 014h[ESP],ESI mov ESI,020h[ESP] mov 018h[ESP],EDI mov EDI,024h[ESP] mov 010h[ESP],EBX xor EBX,EBX test ESI,ESI je L1 L2: mov ECX,[EBX*4][EDI] mov 8[ESP],ECX mov 4[ESP],EBX mov [ESP],offset _DATA call _printf inc EBX cmp EBX,ESI jb L2 L1: mov EBX,010h[ESP] mov ESI,014h[ESP] mov EDI,018h[ESP] add ESP,01Ch xor EAX,EAX ret sub ESP,01Ch mov 014h[ESP],ESI mov ESI,020h[ESP] mov 018h[ESP],EDI mov EDI,024h[ESP] mov 010h[ESP],EBX xor EBX,EBX test ESI,ESI je L1 L2: mov ECX,[EBX*4][EDI] mov 8[ESP],ECX mov 4[ESP],EBX mov [ESP],offset CONST call _printf inc EBX cmp EBX,ESI jb L2 L1: mov EBX,010h[ESP] mov ESI,014h[ESP] mov EDI,018h[ESP] add ESP,01Ch xor EAX,EAX ret

No Compromise The same code is generated!

How to compile and link in a BetterC function into a C project...

The Original C Code #include <stdio.h> int sum(int *array, size_t length) { int sum = 0; for (size_t i = 0; i < length; ++i) sum += array[i]; return sum; } int main(size_t argc, char** argv) { int array[3] = { 1, 37, 28 }; int s = sum(array,sizeof(array)/sizeof(array[0])); printf("sum = %d\n", s); return 0; }

main.c #include <stdio.h> int sum(int *array, size_t length); int main(size_t argc, char** argv) { int array[3] = { 1, 37, 28 }; int s = sum(array,sizeof(array)/sizeof(array[0])); printf("sum = %d\n", s); return 0; }

sum.d extern ( C ) int sum(int *array, size_t length) { return sum(array[0.. length]); } @safe int sum(int[ ] array) { int sum = 0; foreach (value; array) sum += value; return sum; }

Compiling dmd -c sum.d dmc main.c sum.obj

Converting a C file to D Remove preprocessor metaprogramming (should get rid of it anyway) Copy the code to a.d file Translate one function at a time run the test suite after each

Don't Refactor/Improve/Fix Wait until it is all translated and passes all tests! resist overwhelming temptation or you'll be sorry! Just rote translate it

Proof That This Works The Digital Mars C/C++ Compiler Front End https://github.com/digitalmars/compiler/tree/mast er/dm/src/dmc

Now That It's Working in D What can be done to benefit? (Besides using proper arrays!)

#include <stdio.h> #include <stdlib.h> #include <stdbool.h> bool see(void* buf, int maybe); bool ispossible(int maybe) { void *buf = malloc(100); if (see(buf, maybe)) { free(buf); return true; } free(buf); return false; }

import core.stdc.stdlib; // modules! bool see(void* buf, int maybe); bool ispossible(int maybe) { void *buf = malloc(100); scope(exit) free(buf); // scope guard if (see(buf, maybe)) return true; return false; }

RAII import core.stdc.stdlib; bool see(void* buf, int maybe); struct S { void* buf; ~this() { free(buf); } } bool ispossible(int maybe) { S s; s.buf = malloc(100); if (see(s.buf, maybe)) return true; return false; }

int compute(int x, int y) { if (x == y) return -1; // error if (x == 5) return 2; if (x == y + 3) return 7; return -1; // error }

int compute(int x, int y) { const bool log = true; if (x == y) { if (log) printf("fail\n"); return -1; } if (x == 5) { if (log) printf("success 2\n"); return 2; } if (x == y + 3) { if (log) printf("success 7\n"); return 7; } if (log) printf("fail\n"); return -1; }

Nested Functions int compute(int x, int y) { enum log = true; int fail() { if (log) printf("fail\n"); return -1; } int success(int i) { if (log) printf("success %d\n", i); return i; } if (x == y) return fail(); if (x == 5) return success(2); if (x == y + 3) return success(7); return fail(); }

Some Things D Can't Do

#define BEGIN { #define END } int sum(int *array, size_t length) BEGIN int sum = 0; size_t i; for (i = 0; i < length; ++i) BEGIN sum += array[i]; END return sum; END

Get Started Today! Incrementally use D functions in large C project Stop buffer overflows and other safety bugs Use D to improve code dlang.org