Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor

Size: px

Start display at page:

Download "Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor"

Cleopatra Norton
5 years ago
Views:

1 Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor Hari Kannan, Michael Dalton, Christos Kozyrakis Computer Systems Laboratory Stanford University

2 Motivation Dynamic analysis help better understand SW behavior Security, Debugging, Full system profiling Hardware support for such analyses very useful Provides speed advantage over SW solutions Systems manage metadata for analysis in hardware Implementation challenges Storage overheads of metadata (Suh 05) Processing of metadata Need fast processing (low overheads) Need cost effective implementation Solution: Tightly coupled coprocessor for analysis 2

3 Case Study DIFT (Dynamic Information Flow Tracking) DIFT taints data from untrusted sources Extra tag bit per word marks if untrusted Propagate taint during program execution Operations with tainted data produce tainted results Check for suspicious uses of tainted data Tainted code execution Tainted pointer dereference (code & data) Tainted SQL command Can detect both low-level & high-level threats 3

4 DIFT Example: Memory Corruption Vulnerable C Code int idx = tainted_input; buffer[idx] = x; // memory corruption T Data set r1 &tainted_input r1:0 r1:&input load r2 M[r1] r2:idx=input add r4 r2 + r3 r3:&buffer store M[r4] r5 r4:0 r4:&buffer+idx TRAP r5:x Tainted pointer dereference security trap 4

5 HW Option 1: In-core DIFT I-Cache Decode RegFile ALU D-Cache Traps W B Policy Decode Tag ALU Tag Check Integrated DIFT hardware [Dalton 07, Suh 04, Chen 05] No performance, minor power, and minor area overhead Invasive changes to processor High design and validation costs Synchronizes metadata and data per instruction 5

6 HW Option 2: Offloading DIFT General Purpose Core Core 1 (App) Core 2 (DIFT) General Purpose Core Capture Analyze Trace Trace Log buffer (L2 cache) SW DIFT on modified multi-core chip (e.g., CMU s LBA) Flexible support for various analyses Large area & power overhead (2 nd core, trace compress) Large performance overhead (DBT, memory traffic) Significant changes to processor & memory hierarchy 6

7 Our Proposal: DIFT Coprocessor General Purpose Core Main Core Instructions Exceptions Tag Core DIFT Coprocessor Cache Tag Cache L2 Cache Off-core DIFT coprocessor (similar to watchdog processors) Small performance, power, and area overhead Minor changes to processor Reuse across processor designs 7

8 Outline Motivation & Overview Software Interface of the coprocessor Architecture of the coprocessor Performance & Security Evaluation Conclusion 8

9 Coprocessor Setup A pair of policy registers Accessible via coprocessor instructions Could also be memory-mapped Policy granularity: operation type Select input operands to be checked (if tainted) Select input operands that propagate taint to output Select the propagation mode (and, or, xor) ISA instructions decomposed to 1 operations Types: ALU, logical, branch, memory, compare, FP, Makes policies independent of ISA packaging Same HW policies for both RISC & CISC ISAs 9

10 What happens without Proc/Coproc Synchronization? Vulnerable C Code int idx = tainted_input; buffer[idx] = x; // memory corruption set r1 &tainted_input load r2 M[r1] add r4 r2 + r3 store M[r4] r5 SYSTEM T Data r1:0 r1:&input r2:idx=input r3:&buffer r4:0 r4:&buffer+idx COMPROMISE EXPLOIT r5:x exec (sys call) Attacker executes system call system compromise 10

11 System Calls as Sync points Key Idea: Main core and coproc sync at system calls Security: This prevents attacker from executing system calls Application s corrupted address space can be discarded Does not weaken the DIFT model DIFT detects attack only at time of exploit, not corruption Performance: Synchronization overhead typically tens of cycles Function of decoupling queue size Lost in the noise of system call overheads (hundreds of cycles) 11

12 System Call Synchronization Vulnerable C Code int idx = tainted_input; buffer[idx] = x; // memory corruption set r1 &tainted_input load r2 M[r1] add r4 r2 + r3 Data r1:0 r1:&input r2:idx=input r3:&buffer store M[r4] r5 r4:0 r4:&buffer+idx exec (sys call) TRAP T r5:x STALL Tainted pointer dereference security exception 12

13 Coprocessor Design Security exception Processor Core I Cache D Cache Decoupling queue Stall Policy Decode Tag RF PC DIFT Coprocessor Inst Encoding Physical Address Tag ALU Tag Cache Tag Check W B L2 Cache DIFT functionality in a coprocessor 4 tag bits of metadata per word of data Coprocessor Interface (via decoupling queue) Pass committed instruction information Instruction encoding could be at micro-op granularity (in x86) Physical address obviates need for MMU in coprocessor 13

Prototype Hardware Paired with simple SPARC V8 core

@40MH z Leon-3 @65MHz 512MB DRAM Ethernet Ethernet

6 Design statistics Clock frequency: same as

14 Prototype Hardware Paired with simple SPARC V8 core (Leon-3) Mapped to FPGA board 512MB Leon-3 z 512MB DRAM Ethernet Ethernet AoE AoE Software Fully-featured Linux 2.6 Design statistics Clock frequency: same as original Logic: +7.5% overhead of simple in-order core with no speculation 14

15 System Performance Overheads Runtime Overhead (%) 1.00% 0.80% 0.60% 0.40% 0.20% 0.00% gzip gap vpr gcc mcf crafty parser vortex bzip2 twolf Runtime overhead < 1% over SPEC benchmarks 512 byte tag cache 6-entry decoupling queue 15

16 Scaling the tag cache Runtime Overhead (%) 25% 20% 15% 10% 5% 0% Queue full stalls Stalls Memory contention Stalls 16B 32B 64B 128B 256B 512B 1K Size of the Tag Cache Worst case micro-benchmark 512-byte tag cache provides good performance 16

17 Scaling the decoupling queue Runtime Overhead (%) 12% 10% 8% 6% 4% 2% 0% Size of the Queue (no. of entries) Queue fill full Stalls Memory contention Stalls Worst case micro-benchmark 6 entry queue reduces performance overhead 17

18 Coprocessors for complex cores 1.2 gzip Modest overheads with higher IPC cores Because main core rarely achieves peak IPC (=1) Relative Overhead Ratio of main core's clock to coprocessor's clock Coprocessor performs very simple operations Implies coprocessor can be paired with complex cores 18 gcc twolf

19 Security Policies Overview Buffer Overflow Policy Offset-based attacks (control ptr) Format String Policy SQL/XSS Identify all pointers, and track data taint. Check for illegal tainted ptr use. Y Y Track data taint, and bounds check to validate. P Bit T Bit B Bit S Bit Check tainted args to print commands. Y Y Check tainted commands. Y Y Red zone Policy Sandbox heap data. Y Y Sandboxing Policy Protect the security handler. Y 19

20 Security Experiments Program Lang. Attack Detected Vulnerability tar C Directory Traversal Open tainted dir gzip C Directory Traversal Open tainted dir Wu-FTPD C Format String Tainted %n in vfprintf string SUS C Format String Tainted %n in syslog quotactl syscall C User/kernel pointer dereference Tainted pointer to kernelspace sendmail C Buffer (BSS) Overflow Tainted code ptr polymorph C Buffer Overflow Tainted code ptr htdig C++ Cross-site Scripting Tainted <script> tag Scry PHP Cross-site Scripting Tainted <script> tag Unmodified SPARC binaries from real-world programs Basic/net utilities, servers, web apps, search engine 20

21 Security Experiments Program Lang. Attack Detected Vulnerability tar C Directory Traversal Open tainted dir gzip C Directory Traversal Open tainted dir Wu-FTPD C Format String Tainted %n in vfprintf string SUS C Format String Tainted %n in syslog quotactl syscall C User/kernel pointer dereference Tainted pointer to kernelspace sendmail C Buffer (BSS) Overflow Tainted code ptr polymorph C Buffer Overflow Tainted code ptr htdig C++ Cross-site Scripting Tainted <script> tag Scry PHP Cross-site Scripting Tainted <script> tag Protection against low-level memory corruptions Both in userspace and kernelspace 21

22 Security Experiments Program Lang. Attack Detected Vulnerability tar C Directory Traversal Open tainted dir gzip C Directory Traversal Open tainted dir Wu-FTPD C Format String Tainted %n in vfprintf string SUS C Format String Tainted %n in syslog quotactl syscall C User/kernel pointer dereference Tainted pointer to kernelspace sendmail C Buffer (BSS) Overflow Tainted code ptr polymorph C Buffer Overflow Tainted code ptr htdig C++ Cross-site Scripting Tainted <script> tag Scry PHP Cross-site Scripting Tainted <script> tag Protection against semantic vulnerabilities 22

23 Security Experiments Program Lang. Attack Detected Vulnerability tar C Directory Traversal Open tainted dir gzip C Directory Traversal Open tainted dir Wu-FTPD C Format String Tainted %n in vfprintf string SUS C Format String Tainted %n in syslog quotactl syscall C User/kernel pointer dereference Tainted pointer to kernelspace sendmail C Buffer (BSS) Overflow Tainted code ptr polymorph C Buffer Overflow Tainted code ptr htdig C++ Cross-site Scripting Tainted <script> tag Scry PHP Cross-site Scripting Tainted <script> tag Protection is independent of programming language Propagation & checks at the level of basic ops 23

24 Conclusions Hardware dynamic analyses aid program understanding Decoupling analyses from main core essential for practicality Proposed a tightly coupled coprocessor for DIFT Does not compromise security model Has low performance and area overheads Full-system FPGA prototype Reliably catches exploits in user & kernel-space 24

Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor

1 Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor Hari Kannan, Michael Dalton, Christos Kozyrakis Presenter: Yue Zheng Yulin Shi Outline Motivation & Background Hardware DIFT