Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor

Similar documents
Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor

Real-World Buffer Overflow Protection in User & Kernel Space

Prototyping Architectural Support for Program Rollback Using FPGAs

Raksha: A Flexible Information Flow Architecture for Software Security

DBT Tool. DBT Framework

Filtering Metadata Lookups in Instruction-Grain Application Monitoring

What are Exceptions? EE 457 Unit 8. Exception Processing. Exception Examples 1. Exceptions What Happens When Things Go Wrong

AFRL-RI-RS-TR

EE 457 Unit 8. Exceptions What Happens When Things Go Wrong

Thread-Safe Dynamic Binary Translation using Transactional Memory

Announcements. ECE4750/CS4420 Computer Architecture L11: Speculative Execution I. Edward Suh Computer Systems Laboratory

Sandboxing Untrusted Code: Software-Based Fault Isolation (SFI)

High-Performance Parallel Accelerator for Flexible and Efficient Run-Time Monitoring

Static Vulnerability Analysis

Tolerating Malicious Drivers in Linux. Silas Boyd-Wickizer and Nickolai Zeldovich

Multithreaded Processors. Department of Electrical Engineering Stanford University

Hardware Enforcement of Application Security Policies Using Tagged Memory

The Design Complexity of Program Undo Support in a General-Purpose Processor

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture

Tainting is Not Pointless

How to Sandbox IIS Automatically without 0 False Positive and Negative

The Design Complexity of Program Undo Support in a General Purpose Processor. Radu Teodorescu and Josep Torrellas

Caches. Hiding Memory Access Times

M 3 Microkernel-based System for Heterogeneous Manycores

CS5460: Operating Systems

Spectre and Meltdown. Clifford Wolf q/talk

data block 0, word 0 block 0, word 1 block 1, word 0 block 1, word 1 block 2, word 0 block 2, word 1 block 3, word 0 block 3, word 1 Word index cache

HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing

From Speculation to Security: Practical and Efficient Information Flow Tracking Using Speculative Hardware

Lecture 10. Pointless Tainting? Evaluating the Practicality of Pointer Tainting. Asia Slowinska, Herbert Bos. Advanced Operating Systems

Caching and reliability

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism

A program execution is memory safe so long as memory access errors never occur:

NFSv4 as the Building Block for Fault Tolerant Applications

CCured. One-Slide Summary. Lecture Outline. Type-Safe Retrofitting of C Programs

POLITECNICO DI MILANO. Exception handling. Donatella Sciuto:

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Koji Inoue Department of Informatics, Kyushu University Japan Science and Technology Agency

Accelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation Kevin Hsieh

Itanium 2 Processor Microarchitecture Overview

EE108B Lecture 13. Caches Wrap Up Processes, Interrupts, and Exceptions. Christos Kozyrakis Stanford University

Confinement (Running Untrusted Programs)

José F. Martínez 1, Jose Renau 2 Michael C. Huang 3, Milos Prvulovic 2, and Josep Torrellas 2

Announcements. EE108B Lecture 13. Caches Wrap Up Processes, Interrupts, and Exceptions. Measuring Performance. Memory Performance

250P: Computer Systems Architecture. Lecture 9: Out-of-order execution (continued) Anton Burtsev February, 2019

Syscalls, exceptions, and interrupts, oh my!

One-Slide Summary. Lecture Outline. Language Security

CS 537 Lecture 2 Computer Architecture and Operating Systems. OS Tasks

are Softw Instruction Set Architecture Microarchitecture are rdw

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Outline. Exploiting Program Parallelism. The Hydra Approach. Data Speculation Support for a Chip Multiprocessor (Hydra CMP) HYDRA

Motivations for Virtual Memory Virtual Memory Oct. 29, Why VM Works? Motivation #1: DRAM a Cache for Disk

Ensuring Critical Data Integrity via Information Flow Signatures

Lecture 11 Cache. Peng Liu.

CS 61C: Great Ideas in Computer Architecture Virtual Memory. Instructors: John Wawrzynek & Vladimir Stojanovic

Pipelining to Superscalar

Virtual Memory Oct. 29, 2002

Virtual Memory. Motivations for VM Address translation Accelerating translation with TLBs

Registers. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH

secubt Hacking the Hackers with User Space Virtualization

CS 152 Computer Architecture and Engineering. Lecture 13 - Out-of-Order Issue and Register Renaming

Motivation. Threads. Multithreaded Server Architecture. Thread of execution. Chapter 4

EITF20: Computer Architecture Part2.2.1: Pipeline-1

COMPUTER ORGANIZATION AND DESI

EECS150 - Digital Design Lecture 11 SRAM (II), Caches. Announcements

ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors

Mobilizing the Micro-Ops: Exploiting Context Sensitive Decoding for Security and Energy Efficiency

Hakim Weatherspoon CS 3410 Computer Science Cornell University

Nemesis: Preventing Web Authentication & Access Control Vulnerabilities. Michael Dalton, Christos Kozyrakis Stanford University

IsoStack Highly Efficient Network Processing on Dedicated Cores

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

ELE 375 Final Exam Fall, 2000 Prof. Martonosi

Flexible Architecture Research Machine (FARM)

Full Datapath. Chapter 4 The Processor 2

Branch Prediction & Speculative Execution. Branch Penalties in Modern Pipelines

The Nios II Family of Configurable Soft-core Processors

Bloom Filtering Cache Misses for Accurate Data Speculation and Prefetching

CISC 360. Virtual Memory Dec. 4, 2008

C 1. Last time. CSE 490/590 Computer Architecture. Complex Pipelining I. Complex Pipelining: Motivation. Floating-Point Unit (FPU) Floating-Point ISA

ECE/CS 552: Pipelining to Superscalar Prof. Mikko Lipasti

Operating System: Chap13 I/O Systems. National Tsing-Hua University 2016, Fall Semester

virtual memory Page 1 CSE 361S Disk Disk

Impact of Cache Coherence Protocols on the Processing of Network Traffic

Securing Software Applications Using Dynamic Dataflow Analysis. OWASP June 16, The OWASP Foundation

CS 152 Computer Architecture and Engineering

V. Primary & Secondary Memory!

Fast access ===> use map to find object. HW == SW ===> map is in HW or SW or combo. Extend range ===> longer, hierarchical names

Language Security. Lecture 40

Beyond Block I/O: Rethinking

Lecture 18: Multithreading and Multicores

Runtime Defenses against Memory Corruption

Road Map. Road Map. Motivation (Cont.) Motivation. Intel IXA 2400 NP Architecture. Performance of Embedded System Application on Network Processor

15-740/ Computer Architecture Lecture 7: Pipelining. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/26/2011

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

COMPUTER ORGANIZATION AND DESIGN

Unleashing D* on Android Kernel Drivers. Aravind Machiry

Effective Memory Protection Using Dynamic Tainting

Lecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1)

15-740/ Computer Architecture Lecture 10: Runahead and MLP. Prof. Onur Mutlu Carnegie Mellon University

Transcription:

1 Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor Hari Kannan, Michael Dalton, Christos Kozyrakis Presenter: Yue Zheng Yulin Shi

Outline Motivation & Background Hardware DIFT overview DIFT Coprocessor Design Prototype System Evaluation Conclusion Discussion points 2 2

Outline Motivation & Background Hardware DIFT overview DIFT Coprocessor Design Prototype System Evaluation Conclusion Discussion points 3 3

Motivation & Background? Why is the pointer dereference potentially dangerous Dynamic Information Flow Tracking (DIFT) Tag data from untrusted source Track tainted data propagation Check unsafe tainted data usage Vulnerable C Code int idx = tainted_input; buffer[idx] = x; // memory corruption Tainted Pointer Dereference R1 load R2 add R4 store M[R4] TRAP &tainted_input M[R1] R2 + R3 R5 #Re g R1 R2 R3 R4 R5 Data &tainted_input idx (tainted_input) &buffer &buffer + idx x Tag 4 4

Why is the pointer dereference potentially dangerous? Buffer Overflow Attack store M[R4] R5 Vulnerable C Code char buf[126]; strcpy(buf,str); TRAP 5 5

Motivation & Background Software DIFT Use Dynamic Binary Translation (DBT) to implement DIFT Avoid recompilation Introduce significant overheads Limitation Incompatible with self-modifying and multithreaded programs 6 6

Outline Motivation & Background Hardware DIFT overview DIFT Coprocessor Design Prototype System Evaluation Conclusion Discussion points 7 7

Hardware DIFT overview In-core DIFT Minimize runtime overhead Require significant modifications to processor structure 8 8

Hardware DIFT overview Offloading DIFT Offer the flexibility of analysis in software Introduce significant overheads halve the throughput, double the power consumption Require pipeline changes 9 9

Hardware DIFT overview Off-core DIFT DIFT synchronizes with main core only on system calls Eliminate the changes to processor Allow pairing with multiple processor designs 10 10

Outline Motivation & Background Hardware DIFT overview DIFT Coprocessor Design Prototype System Evaluation Conclusion Discussion points 11 11

DIFT Coprocessor Design Security Model R1 Security Model load R2 &tainted_input M[R1] Synchronization Coprocessor Instruction Exception add R4 store M[R4] TRAP R2 + R3 R5 System Call Tainted Pointer Dereference IF ID RE N Main Core R O B EX EX TRAP Commit 12 12

DIFT Coprocessor Design - Architecture Main Core Processor VS Coprocessor Main Core Processor Coprocessor Handle data 32 bits Complexity Handle tag propagation and check 4 bits Simple 13 13

DIFT Coprocessor Design - Architecture Four-stage pipeline: Decode, Execution, TagCheck, WriteBack Main Core Decoupling Queue Tag Reg File Security Decode Tag ALU Tag Check Logic Writeback Queue Stall DIFT Coprocessor Tag Cache L2 Cache DRAM Tag s Decode Execution TagCheck WriteBack 14 14

DIFT Coprocessor Design - Interface Coprocessor Setup Software control security policies Instruction Flow Information PC, Instruction Encoding and Memory Address Decoupling Greater or equal processing rate to avoid full queue Security Exceptions Asynchronous interrupts and run in trusted mode PC, Inst Encoding, Mem Addr Security Decode 15 15

Outline Motivation & Background Hardware DIFT overview DIFT Coprocessor Design Prototype System Evaluation Conclusion Discussion points 16 16

Prototype Hardware SPARC V8 Processor FPGA Board Software Gentoo Linux 2.6 Design Statistics 7.64% area overhead 17 17

Outline Motivation & Background Hardware DIFT overview DIFT Coprocessor Architecture Prototype System Evaluation Conclusion Discussion points 18 18

Evaluation Security Evaluation Wide Low High Independent level workload attacks of range programming language Common Utilities Servers Web apps Kernel code 19 Program Lan g. Attack Detected Vulnerability Gzip C Directory traversal Open tainted directory Tar C Directory traversal Open tainted directory Scry PHP Cross-site scripting Tainted <script> tag Htdig C++ Cross-site scripting Tainted <script> tag Polymorph C Buffer overflow Tainted code ptr dereference Sendmail C Buffer overflow Tainted data ptr dereference Quotactl syscall C User/kernel pointer dereference Tainted pointer to kernelspace SUS C Format string bug Tainted %n in syslog WU_FTPD C Format string bug Tainted %n in vfprintf 19

Evaluation Performance Evaluation Execution time 512-byte tag cache 6-entry queue Runtime overhead < 1% 20 20

Evaluation Performance Evaluation Scaling the tag cache 21 21

Evaluation Performance Evaluation Scaling the decoupling queue 16-byte tag cache Tag miss 22 22

Evaluation Processor/Coprocessor Performance Ratio 11.7% 3.8% 16-entry queue Can be paired with various main cores 23 23

Outline Motivation & Background Hardware DIFT overview DIFT Coprocessor Architecture Prototype System Evaluation Conclusion Discussion points 24 24

Conclusion DIFT: a promising security technique Proposed an off-core, decoupling coprocessor for DIFT Provide the same security features as in-core DIFT Reduce DIFT implementation cost drastically Has low area and performance overheads Developed a full-system prototype Protect real-world Linux applications 25 25

Questions? 26 26

27 Debate Is a wider-issue coprocessor better than a single-issue coprocessor for 3-way superscalar processors? Is it worth to add a checkpoint scheme to DIFT to provide reliable recovery? ( A checkpoint scheme allows the system to rollback for recovery)