MC: Meta-level Compilation

Similar documents
Checking System Rules Using System-Specific, Programmer- Written Compiler Extensions

Motivation. What s the Problem? What Will we be Talking About? What s the Solution? What s the Problem?

Verification & Validation of Open Source

Information Flow Analysis and Type Systems for Secure C Language (VITC Project) Jun FURUSE. The University of Tokyo

Static Analysis versus Software Model Checking for bug finding

Examining the Code. [Reading assignment: Chapter 6, pp ]

Buffer overflow prevention, and other attacks

CS 326 Operating Systems C Programming. Greg Benson Department of Computer Science University of San Francisco

Recitation: C Review. TA s 20 Feb 2017

Static Analysis in C/C++ code with Polyspace

Lecture 10. Pointless Tainting? Evaluating the Practicality of Pointer Tainting. Asia Slowinska, Herbert Bos. Advanced Operating Systems

APISan: Sanitizing API Usages through Semantic Cross-checking

CNIT 127: Exploit Development. Ch 3: Shellcode. Updated

CSCI 171 Chapter Outlines

6. Pointers, Structs, and Arrays. 1. Juli 2011

PRINCIPLES OF OPERATING SYSTEMS

Oracle Developer Studio Code Analyzer

Memory Corruption 101 From Primitives to Exploit

CS 31: Intro to Systems Pointers and Memory. Martin Gagne Swarthmore College February 16, 2016

Verification and Validation

Principles of Software Construction: Objects, Design and Concurrency. Static Analysis. toad Spring 2013

DEBUGGING: DYNAMIC PROGRAM ANALYSIS

Windows architecture. user. mode. Env. subsystems. Executive. Device drivers Kernel. kernel. mode HAL. Hardware. Process B. Process C.

ENCE Computer Organization and Architecture. Chapter 1. Software Perspective

Scalable Program Analysis Using Boolean Satisfiability: The Saturn Project

CSCE Operating Systems Interrupts, Exceptions, and Signals. Qiang Zeng, Ph.D. Fall 2018

Static Analysis Basics II

Important From Last Time

Page 1. Today. Important From Last Time. Is the assembly code right? Is the assembly code right? Which compiler is right?

INITIALISING POINTER VARIABLES; DYNAMIC VARIABLES; OPERATIONS ON POINTERS

CS 31: Intro to Systems Pointers and Memory. Kevin Webb Swarthmore College October 2, 2018

Important From Last Time

6. Pointers, Structs, and Arrays. March 14 & 15, 2011

Data Types and Computer Storage Arrays and Pointers. K&R, chapter 5

Short Notes of CS201

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

Outline. Lecture 1 C primer What we will cover. If-statements and blocks in Python and C. Operators in Python and C

Introduction to C. Robert Escriva. Cornell CS 4411, August 30, Geared toward programmers

Class Information ANNOUCEMENTS

Final CSE 131B Spring 2004

CS201 - Introduction to Programming Glossary By

CS-527 Software Security

Lecture 8 Dynamic Memory Allocation

C Programming SYLLABUS COVERAGE SYLLABUS IN DETAILS

Outline. Threads. Single and Multithreaded Processes. Benefits of Threads. Eike Ritter 1. Modified: October 16, 2012

Formats of Translated Programs

The C Preprocessor (and more)!

Lecture 14 Notes. Brent Edmunds

Please refer to the turn-in procedure document on the website for instructions on the turn-in procedure.

Ian Sommerville 2006 Software Engineering, 8th edition. Chapter 22 Slide 1

Honours/Master/PhD Thesis Projects Supervised by Dr. Yulei Sui

Static Analysis. Principles of Software System Construction. Jonathan Aldrich. Some slides from Ciera Jaspan

Homework 3 CS161 Computer Security, Fall 2008 Assigned 10/07/08 Due 10/13/08

Static Analysis in Practice

Other array problems. Integer overflow. Outline. Integer overflow example. Signed and unsigned

Overview AEG Conclusion CS 6V Automatic Exploit Generation (AEG) Matthew Stephen. Department of Computer Science University of Texas at Dallas

1.1 For Fun and Profit. 1.2 Common Techniques. My Preferred Techniques

CS164: Programming Assignment 5 Decaf Semantic Analysis and Code Generation

CSCE 548 Building Secure Software Integers & Integer-related Attacks & Format String Attacks. Professor Lisa Luo Spring 2018

Lecture Outline. COOL operational semantics. Operational Semantics of Cool. Motivation. Lecture 13. Notation. The rules. Evaluation Rules So Far

How to Sandbox IIS Automatically without 0 False Positive and Negative

18-600: Recitation #3

82V391x / 8V893xx WAN PLL Device Families Device Driver User s Guide

Objectives. Chapter 19. Verification vs. validation. Topics covered. Static and dynamic verification. The V&V process

Quiz 0 Review Session. October 13th, 2014

ΗΜΥ 317 Τεχνολογία Υπολογισμού

CS 5513 Entry Quiz. Systems Programming (CS2213/CS3423))

Debugging. ICS312 Machine-Level and Systems Programming. Henri Casanova

Semaphore. Originally called P() and V() wait (S) { while S <= 0 ; // no-op S--; } signal (S) { S++; }

Cppcheck Cppcheck 1.66

Lecture 9 Assertions and Error Handling CS240

SoK: Eternal War in Memory

CS61C : Machine Structures

pthreads CS449 Fall 2017

Motivation was to facilitate development of systems software, especially OS development.

Operating- System Structures

kguard++: Improving the Performance of kguard with Low-latency Code Inflation

Introduction to C. Sean Ogden. Cornell CS 4411, August 30, Geared toward programmers

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING

C Programming. Course Outline. C Programming. Code: MBD101. Duration: 10 Hours. Prerequisites:

Lecture 07 Debugging Programs with GDB

Introduction to C. Ayush Dubey. Cornell CS 4411, August 31, Geared toward programmers

To obtain the current global trace mask, call meitraceget(...). To modify the global trace mask, call meitraceset(...).

Memory and C/C++ modules

Recitation: Cache Lab & C

Static Analysis in Practice

ELEC 377 Operating Systems. Week 1 Class 2

CSC209: Software tools. Unix files and directories permissions utilities/commands Shell programming quoting wild cards files

CSC209: Software tools. Unix files and directories permissions utilities/commands Shell programming quoting wild cards files. Compiler vs.

CSci 4061 Introduction to Operating Systems. Programs in C/Unix

Turning proof assistants into programming assistants

RCU. ò Walk through two system calls in some detail. ò Open and read. ò Too much code to cover all FS system calls. ò 3 Cases for a dentry:

RCU in the Linux Kernel: One Decade Later

CS61C : Machine Structures

VFS, Continued. Don Porter CSE 506

Lecture 3. More About C

CS 550 Operating Systems Spring Interrupt

CNIT 127: Exploit Development. Ch 18: Source Code Auditing. Updated

Introduction CHAPTER. Review Questions

CYSE 411/AIT681 Secure Software Engineering Topic #12. Secure Coding: Formatted Output

Transcription:

MC: Meta-level Compilation Extending the Process of Code Compilation with Application-Specific Information for the layman developer (code monkey) Gaurav S. Kc 8 th October, 2003 1

Outline Dawson Engler Overview of the Compilation Process Meta-level Compilation early days with MAGIK current incarnation: MC good for detecting bugs:» NULL pointer misuse» memory leak (failure to deallocate memory)» memory corruption (illegal use of deallocated memory)» security holes (buffer overflow, formatstring vulnerabilities) Conclusions 2

Dawson Engler The man behind MAGIK and MC PhD from MIT '98 Stanford Faculty (Metalevel Compilation Group) http://metacomp.stanford.edu The goal of the Meta-level Compilation (MC) project is to allow system implementors to easily build simple domain- and application-specific compiler extensions to check, optimize, and transform code. Publications on MC at OSDI, PLDI, SOSP, Oakland Symposium, ACM CCS Coverity.com: commercialised MC 3

Compilers S/W lifecycle phases Requirements engineering Design, and implementation Repeat, and maintain Compilation phases Pre-process (cpp) : macro processing Compiler proper (cc1) front end synthesis: source IR, symbol table, control-flow, data-flow middle end optimisation: IR IR back end generation: IR optimised machine assembly Assembler (as): assembler macro processing, translate ASCII instructions into binary machine code Linkage editor (ld): combine several object modules (and library files) to produce static or dynamically-linked executables 4

Meta-level Compilation Static information generated by the front-end synthesis phase is lost after compilation Application-specific compiler extensions & optimisations can benefit from this information Compiler developer cannot anticipate all possible domain-specific extensions Application writer doesn't want to learn compiler internals Need: Simpler mechanism for coding applicationspecific extensions for integration into compiler 5

MC Paper: Incorporating Application Semantics and Control into Compilation Dawson R. Engler, First Conference on Domain Specific Languages, 1997 Programmers can be active users of compilers Incorporate domain-specific extensions into the compilation process Facilitate previously impossible application-level optimisations and semantic-checking (dereference NULL) Leave application source code unmodified Source-level (IR) modifications for portable user extensions Full compiler optimisations on modified IR Leave compiler source code unmodified Extensions will be exhibit "built-in" behaviour in compiler 6

magik: An ANSI-C api to LCC IR Dynamically linked into modified LCC compiler User extensions: Code: invoked at every function definition Data: invoked at every struct definition Examples: Automatically replace a poly-typed function (output) with printf and appropriate format-string output("i = ", i, ", j = ", j) printf("%s%d%s%d", "i = ", i, ", j = ", j) Mandatory checking of return codes for system calls read(fd, buffer, size) if (0 > read(fd, buffer, size)) error("failed system call <read>\n") 7

magik: illustration Replace poly-typed function with printf equivalent foreach function-call ( "output" ) foreach function-argument ( = arg ) switch argument-type ( arg ) { case Integer: strcat ( typestring, "%d" ); break; } case Pointer: if rawpointertype ( arg ) == CHAR strcat ( typestring, "%s" ); else strcat ( typestring, "0x%p" ); break; replace-call ( function-call, "printf" ) insert-argument (function-call, typestring ) 8

MC Paper: Checking System Rules Using System-Specific, Programmer-Written Compiler Extensions Operating Systems Design and Implementation, 2000 System rules for Operating System Kernel Kernel sanitises user-space data before accessing it (do X before Y) A lock must have a corresponding unlock on every code path (when X, do Y) Peer reviews for manual inspection of source code: not rigorous, human error. Automated enforcing of system rules Testing: time-consuming, not exhaustive since complexity/size scale with system size. Impractical to test all device drivers for Linux Formal Verification: model checkers, theorem provers/checkers to validate consistency of abstract specification of system. Hard to accurately represent system in specifications: over-simplification, omission of features, unless generating code from specs Compiler-based static analysis tools are useful No scalability problem. Works directly on source code System rules have straightforward mapping to source code Rules are enforced as new phases in the compilation 9

metal: A high-level, state machine language yacc-like specification for SM: matched patterns in source code causes transitions between different states Linkable object code compiled from metal specifications using mcc. Dynamically linked into compiler, xg++ (based on GNU g++, working on gcc version) SM is applied down all possible control-flow paths for each function 10

MC / metal: illustration Ensure corresponding sti (re-enable interrupts) for every cli sm check_interrupts { // Patterns pattern enable = { sti(); }; pattern disable = { cli(); }; // States and transitions / actions is_enabled : disable ==> is_disabled enable ==> { error( "double enable" ); } ; } is_disabled: enable ==> is_enabled disable ==> { error( "double disable" ); } $end-of-path$ ==> { error( "exit w/ intr" } ; 11

Other MC / metal checks Make the kernel check user-space pointers before de-referencing (applicable to library interfaces) For states {unknown, null, not_null, freed}, find when pointers are used: before being checked on NULL paths after being free d Find double-free errors Find error paths (returning a negative value) that don t free allocated memory Cannot handle multi-threaded applications 12

Conclusions Meta-level compilation New phases for user-extensible compiler Domain-specific checks for locating application bugs enforcing system rules Compiler experience required 13