Breaking Kernel Address Space Layout Randomization (KASLR) with Intel TSX. Yeongjin Jang, Sangho Lee, and Taesoo Kim Georgia Institute of Technology

Similar documents
Meltdown or "Holy Crap: How did we do this to ourselves" Meltdown exploits side effects of out-of-order execution to read arbitrary kernelmemory

Jump Over ASLR: Attacking Branch Predictors to Bypass ASLR

Leaky Cauldron on the Dark Land: Understanding Memory Side-Channel Hazards in SGX

T-SGX: Eradicating Controlled-Channel

Inferring Fine-grained Control Flow Inside SGX Enclaves with Branch Shadowing

Overhead Evaluation about Kprobes and Djprobe (Direct Jump Probe)

Locking Down the Processor via the Rowhammer Attack

Hardware Enclave Attacks CS261

Intel Transactional Synchronization Extensions (Intel TSX) Linux update. Andi Kleen Intel OTC. Linux Plumbers Sep 2013

Department of Computer Science, Institute for System Architecture, Operating Systems Group. Real-Time Systems '08 / '09. Hardware.

UniSan: Proactive Kernel Memory Initialization to Eliminate Data Leakages

Virtual Memory. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Address spaces and memory management

Anne Bracy CS 3410 Computer Science Cornell University

[537] Virtual Machines. Tyler Harter

Spectre and Meltdown. Clifford Wolf q/talk

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1

Profiling: Understand Your Application

Meltdown and Spectre - understanding and mitigating the threats (Part Deux)

Chapter 5 B. Large and Fast: Exploiting Memory Hierarchy

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1

Computer Architecture Background

Dan Noé University of New Hampshire / VeloBit

I/O Systems. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)

ECE 498 Linux Assembly Language Lecture 1

Undermining Information Hiding (And What to do About it)

ARMageddon: Cache Attacks on Mobile Devices

Meltdown and Spectre - understanding and mitigating the threats

Xen and the Art of Virtualization. CSE-291 (Cloud Computing) Fall 2016

SGX Security Background. Masab Ahmad Department of Electrical and Computer Engineering University of Connecticut

Multi-level Translation. CS 537 Lecture 9 Paging. Example two-level page table. Multi-level Translation Analysis

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1

Syscalls, exceptions, and interrupts, oh my!

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Operating Systems CMPSCI 377 Spring Mark Corner University of Massachusetts Amherst

238P: Operating Systems. Lecture 5: Address translation. Anton Burtsev January, 2018

Lec 22: Interrupts. Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University. Announcements

Anne Bracy CS 3410 Computer Science Cornell University

Accelerating Dynamic Binary Translation with GPUs

Last 2 Classes: Introduction to Operating Systems & C++ tutorial. Today: OS and Computer Architecture

Hiding in the Shadows: Empowering ARM for Stealthy Virtual Machine Introspection ACSAC 2018

Hakim Weatherspoon CS 3410 Computer Science Cornell University

CS 550 Operating Systems Spring Interrupt

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand

CS3210: Virtual memory. Taesoo Kim w/ minor updates K. Harrigan

ECE 3055: Final Exam

Who am I? Moritz Lipp PhD Graz University of

LAZARUS: Practical Side-channel Resilient Kernel-Space Randomization

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

An Implementation Of Multiprocessor Linux

CSE 120 Principles of Operating Systems

Chapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs

CSE 410 Final Exam 6/09/09. Suppose we have a memory and a direct-mapped cache with the following characteristics.

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

The Kernel Abstraction

Synopsis: Intel CPU information leak. Haswell, probably others. Date: Nov 28, Issue: ======

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

V. Primary & Secondary Memory!

Lecture 22: Virtual Memory: Survey of Modern Systems

Leaky Cauldron on the Dark Land: Understanding Memory Side-Channel Hazards in SGX

SPECULOSE: Analyzing the Security Implications of Speculative Execution in CPUs

CS533 Concepts of Operating Systems. Jonathan Walpole

24-vm.txt Mon Nov 21 22:13: Notes on Virtual Machines , Fall 2011 Carnegie Mellon University Randal E. Bryant.

Micro-Architectural Attacks and Countermeasures

5 Solutions. Solution a. no solution provided. b. no solution provided

QPSI. Qualcomm Technologies Countermeasures Update

COS 318: Operating Systems

Martin Kruliš, v

The Last Mile An Empirical Study of Timing Channels on sel4

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

Chapter 5. Large and Fast: Exploiting Memory Hierarchy. Part II Virtual Memory

HideM: Protecting the Contents of Userspace Memory in the Face of Disclosure Vulnerabilities

Memory. Principle of Locality. It is impossible to have memory that is both. We create an illusion for the programmer. Employ memory hierarchy

CSE502: Computer Architecture CSE 502: Computer Architecture

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi

CPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.

Lecture 19: Survey of Modern VMs. Housekeeping

arxiv: v1 [cs.cr] 3 Jan 2018

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

A Global Operating System for HPC Clusters

Exokernel: An Operating System Architecture for Application Level Resource Management

Virtual Memory: From Address Translation to Demand Paging

Intel Analysis of Speculative Execution Side Channels

HAVEGE. HArdware Volatile Entropy Gathering and Expansion. Unpredictable random number generation at user level. André Seznec.

0x1A Great Papers in Computer Security

Basic Memory Management

SGX Enclave Life Cycle Tracking TLB Flushes Security Guarantees

Lecture 15: I/O Devices & Drivers

Virtualization. Adam Belay

Computer Architecture Lecture 13: Virtual Memory II

OS: An Overview. ICS332 Operating Systems

CS 161 Computer Security

March 10, Linux Live Patching. Adrien schischi Schildknecht. Why? Who? How? When? (consistency model) Conclusion

CSC369 Lecture 2. Larry Zhang

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Memory Hierarchy. Mehran Rezaei

Microkernels and Portability. What is Portability wrt Operating Systems? Reuse of code for different platforms and processor architectures.

143A: Principles of Operating Systems. Lecture 6: Address translation. Anton Burtsev January, 2017

Transcription:

Breaking Kernel Address Space Layout Randomization (KASLR) with Intel TSX Yeongjin Jang, Sangho Lee, and Taesoo Kim Georgia Institute of Technology

Kernel Address Space Layout Randomization (KASLR) A statistical mitigation for memory corruption exploits Randomize address layout per each boot Efficient (<5% overhead) Attacker should guess where code/data are located for exploit. In Windows, a successful guess rate is 1/8192. 2

Example: Linux To escalate privilege to root through a kernel exploit, attackers want to call commit_creds(prepare_kernel_creds(0)). 3

Example: Linux KASLR changes kernel symbol addresses every boot. 4

Example: Linux KASLR changes kernel symbol addresses every boot. 1 st Boot 5

Example: Linux KASLR changes kernel symbol addresses every boot. 1 st Boot 2 nd Boot 6

KASLR Makes Attacks Harder KASLR introduces an additional bar to exploits Finding an information leak vulnerability Pr[ Memory Corruption Vuln ] Both attackers and defenders aim to detect info leak vulnerabilities. 7

KASLR Makes Attacks Harder KASLR introduces an additional bar to exploits Finding an information leak vulnerability Pr[ Memory Corruption Vuln ] Pr[ information_leak ] Pr[ Memory Corruption Vuln] Both attackers and defenders aim to detect info leak vulnerabilities. 8

Is there any other way than info leak? Practical Timing Side Channel Attacks Against Kernel Space ASLR (Hund et al., Oakland 2013) A hardware-level side channel attack against KASLR No information leak vulnerability in OS is required 9

TLB Timing Side Channel Virtual Address TLB Miss Hit 10

TLB Timing Side Channel Virtual Address TLB Miss Hit Mapped address generate page fault quicker! 11

TLB Timing Side Channel Virtual Address TLB Miss Unmapped address Hit takes ~40 cycles more for page table walk Mapped address generate page fault quicker! 12

TLB Timing Side Channel Virtual Address TLB Miss Unmapped address Hit takes ~40 cycles more for page table walk Mapped address generate page fault quicker! 13

TLB Timing Side Channel Virtual Address TLB Miss Unmapped address Hit takes ~40 cycles more for page table walk Mapped address generate page fault quicker! 14

TLB Timing Side Channel Virtual Address TLB Miss Unmapped address Hit takes ~40 cycles more for page table walk Mapped address generate page fault quicker! 15

TLB Timing Side Channel Virtual Address TLB Miss Unmapped address Hit takes ~40 cycles more for page table walk Mapped address generate page fault quicker! 16

TLB Timing Side Channel Virtual Address TLB Miss Unmapped address Hit takes ~40 cycles more for page table walk Mapped address generate page fault quicker! 17

TLB Timing Side Channel Result: Fault with TLB hit took less than 4050 cycles While TLB miss took more than that Limitation: Too noisy Why???? 18

TLB Timing Side Channel Result: Fault with TLB hit took less than 4050 cycles While TLB miss took more than that Limitation: Too noisy Why???? Unmapped Mapped 19

TLB Timing Side Channel T User CPU L OS Exception Handling OS Noise B CPU T L B OS Noise Measured Time (~4000 cycles) Timing Side Channel (~40 cycles) OS Noise (~100 cycles) Fault Handling Noise is too much! User Execution CPU Exception TLB Side Channel OS Execution OS Handling Noise 20

TLB Timing Side Channel T User CPU L OS Exception Handling OS Noise B CPU T L B OS Noise Measured Time (~4000 cycles) If we can eliminate the noise at OS, then the Timing Side Channel (~40 cycles) timing channel will be more stable. OS Noise (~100 cycles) Fault Handling Noise is too much! User Execution CPU Exception TLB Side Channel OS Execution OS Handling Noise 21

A More Practical Side Channel Attack on KASLR The DrK Attack: We present a practical side channel attack on KASLR De-randomizing Kernel ASLR (this is where DrK comes from) Exploit Intel TSX for eliminate the noise from OS Distinguish mapped and unmapped pages Distinguish executable and non-executable pages 22

Transactional Synchronization Extension (Intel TSX) TSX: relaxed but faster way of handling synchronization 23

Transactional Synchronization Extension (Intel TSX) TSX: relaxed but faster way of handling synchronization 1. Do not block, do not use lock 24

Transactional Synchronization Extension (Intel TSX) TSX: relaxed but faster way of handling synchronization 1. Do not block, do not use lock 2. Try atomic operation (can fail) 25

Transactional Synchronization Extension (Intel TSX) TSX: relaxed but faster way of handling synchronization 1. Do not block, do not use lock 2. Try atomic operation (can fail) 3. If failed, handle failure with abort handler (retry, get back to traditional lock, etc.) 26

Transaction Aborts If Exist any of a Conflict Condition of Conflict Thread races Cache eviction (L1 write/l3 read) Interrupt Context Switch (timer) Syscalls Exceptions Page Fault General Protection Debugging Run If Transaction Aborts 27

Transaction Aborts If Exist any of a Conflict Abort Handler of TSX Suppress all sync. exceptions E.g., page fault Do not notify OS Just jump into abort_handler() No Exception delivery to the OS! (returns quicker, so less noisy than OS exception handler) Run If Transaction Aborts 28

Reducing Noise with Intel TSX Measured Time (~ 4000 cycles) T User CPU L OS Exception Handling OS Noise B User Execution CPU Exception TLB Side Channel OS Execution OS Handling Noise 29

Reducing Noise with Intel TSX Measured Time (~ 4000 cycles) T User CPU L OS Exception Handling OS Noise B Measured Time (~ 180 cycles) User CPU T L B Timing Side Channel (~ 40 cycles) Not involving OS, Less noisy! User Execution CPU Exception TLB Side Channel OS Execution OS Handling Noise 30

Exploiting TSX as an Exception Handler How to use TSX as an exception handler? 31

Exploiting TSX as an Exception Handler How to use TSX as an exception handler? 1. Timestamp at the beginning 32

Exploiting TSX as an Exception Handler How to use TSX as an exception handler? 1. Timestamp at the beginning 2. Access kernel memory within the TSX region (always aborts) 33

Exploiting TSX as an Exception Handler How to use TSX as an exception handler? 1. Timestamp at the beginning 2. Access kernel memory within the TSX region (always aborts) 3. Measure timing at abort handler 34

Exploiting TSX as an Exception Handler How to use TSX as an exception handler? 1. Timestamp at the beginning 2. Access kernel memory within the TSX region (always aborts) Processor directly calls the handler OS handling path is not involved 3. Measure timing at abort handler 35

Measuring Timing Side Channel Mapped / Unmapped kernel addresses (across 4 CPUs) Ran 1000 iterations for the probing, minimum clock on 10 runs Processor Mapped Page Unmapped Page i7-6700k (4.0Ghz) 209 240 (+31) i5-6300hq (2.3Ghz) 164 188 (+24) i7-5600u (2.6Ghz) 149 173 (+24) E3-1271v3 (3.6Ghz) 177 195 (+18) Mapped page always faults faster 36

Measuring Timing Side Channel Executable / Non-executable kernel addresses Ran 1000 iterations for the probing, minimum clock on 10 runs Processor Executable Page Non-exec Page i7-6700k (4.0Ghz) 181 226 (+45) i5-6300hq (2.3Ghz) 142 178 (+36) i7-5600u (2.6Ghz) 134 164 (+30) E3-1271v3 (3.6Ghz) 159 189 (+30) Executable page always faults faster 37

Clear Timing Channel Unmapped Non-Executable or Unmapped Mapped Executable Clear separation between different mapping status! 38

Attack on Various OSes Attack Targets DrK is hardware side-channel attack The mechanism is independent to OS We target popular OSes: Linux, Windows, and macos Attack Types Type 1: Revealing mapping status of each page (X / NX / U) Type 2: Finer-grained module detection 39

Attack on Various OSes Type 1: Revealing mapping status of each page Try to reveal the mapping status per each page in the area X (executable) / NX (Non-executable) / U (unmapped) 40

Attack on Various OSes Type 2: Finer-grained module detection Section-size Signature Modules are allocated in fixed size of X/NX sections if the attacker knows the binary file X NX 0x4000 0x4000 libahci Example If the size of executable map is 0x4000, and the size of nonexecutable section is 0x4000, then it is libahci! X NX 0x16000 0x1a000 iwlwifi 41

Demo 2: Full Attack on Linux 42

Result Summary Linux: 100% of accuracy around 0.1 second Windows: 100% for M/U in 5 sec, 99.28% for X/NX for 45 sec OS X: 100% for detecting ASLR slide, in 31ms Linux on Amazon EC2: 100% of accuracy in 3 seconds 43

Timing Side Channel (M/U) For Mapped / Unmapped addresses Measured performance counters (on 1,000,000 probing) Perf. Counter Mapped Page Unmapped Page Description dtlb-loads 3,021,847 3,020,243 dtlb-load-misses 84 2,000,086 TLB-miss on U Observed Timing 209 (fast) 240 (slow) dtlb hit on mapped pages, but not for unmapped pages. Timing channel is generated by dtlb hit/miss 44

Timing Side Channel (M/U) For Mapped / Unmapped addresses Measured performance counters (on 1,000,000 probing) Perf. Counter Mapped Page Unmapped Page Description dtlb-loads 3,021,847 3,020,243 dtlb-load-misses 84 2,000,086 TLB-miss on U Observed Timing 209 (fast) 240 (slow) dtlb hit on mapped pages, but not for unmapped pages. Timing channel is generated by dtlb hit/miss 45

Path for an Unmapped Page Probing an unmapped page took 240 cycles Page Table dtlb PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 46

Path for an Unmapped Page Probing an unmapped page took 240 cycles Kernel address access dtlb Page Table PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 47

Path for an Unmapped Page Probing an unmapped page took 240 cycles Kernel address access dtlb TLB miss Page Table PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 48

Path for an Unmapped Page Probing an unmapped page took 240 cycles Kernel address access dtlb TLB miss Page Table PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 Page fault! 49

Path for an Unmapped Page Probing an unmapped page took 240 cycles Kernel address access dtlb TLB miss Page Table PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 Always do page table walk (slow) Page fault! 50

Path for a mapped Page On the first access, 240 cycles Page Table dtlb PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 51

Path for a mapped Page On the first access, 240 cycles Kernel address access dtlb Page Table PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 52

Path for a mapped Page On the first access, 240 cycles Kernel address access dtlb TLB miss Page Table PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 53

Path for a mapped Page On the first access, 240 cycles Kernel address access dtlb TLB miss Page Table PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 Page fault! 54

Path for a mapped Page On the first access, 240 cycles Kernel address access dtlb TLB miss Page Table PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 Cache TLB entry! Page fault! 55

Path for a mapped Page On the second access, 209 cycles dtlb Page Table PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 56

Path for a mapped Page On the second access, 209 cycles Page Table Kernel address access dtlb PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 57

Path for a mapped Page On the second access, 209 cycles Page Table Kernel address access dtlb hit dtlb PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 Page fault! 58

Path for a mapped Page On the second access, 209 cycles Page Table Kernel address access dtlb hit dtlb PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 Page fault! No page table walk on the second access (fast) 59

Timing Side Channel (X/NX) For Executable / Non-executable addresses Measured performance counters (on 1,000,000 probing) Perf. Counter Exec Page Non-exec Page Unmapped Page itlb-loads (hit) 590 1,000,247 272 itlb-load-misses 31 12 1,000,175 Observed Timing 181 (fast) 226 (slow) 226 (slow) Point #1: itlb hit on Non-exec, but it is slow (226) why? itlb is not the origin of the side channel 60

Timing Side Channel (X/NX) For Executable / Non-executable addresses Measured performance counters (on 1,000,000 probing) Perf. Counter Exec Page Non-exec Page Unmapped Page itlb-loads (hit) 590 1,000,247 272 itlb-load-misses 31 12 1,000,175 Observed Timing 181 (fast) 226 (slow) 226 (slow) Point #1: itlb hit on Non-exec, but it is slow (226) why? itlb is not the origin of the side channel 61

Timing Side Channel (X/NX) For Executable / Non-executable addresses Measured performance counters (on 1,000,000 probing) Perf. Counter Exec Page Non-exec Page Unmapped Page itlb-loads (hit) 590 1,000,247 272 itlb-load-misses 31 12 1,000,175 Observed Timing 181 (fast) 226 (slow) 226 (slow) Point #2: itlb does not even hit on Exec page, while NX page hits itlb itlb did not involve in the fast path Is there any cache that does not require address translation? 62

Intel Cache Architecture From the patent US 20100138608 A1, 63 registered by Intel Corporation

Intel Cache Architecture L1 instruction cache Virtually-indexed, Physically-tagged cache (requires TLB access) Caches actual x86/x64 opcode From the patent US 20100138608 A1, 64 registered by Intel Corporation

Intel Cache Architecture Decoded i-cache An instruction will be decoded as micro-ops (RISC-like instruction) Decoded i-cache stores micro-ops Virtually-indexed, Virtually-tagged cache (no TLB access) From the patent US 20100138608 A1, 65 registered by Intel Corporation

Path for an Unmapped Page On the second access, 226 cycles Page Table itlb PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 66

Path for an Unmapped Page On the second access, 226 cycles Kernel address access itlb Page Table PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 67

Path for an Unmapped Page On the second access, 226 cycles Kernel address access itlb TLB miss Page Table PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 68

Path for an Unmapped Page On the second access, 226 cycles Kernel address access itlb TLB miss Page Table PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 Page fault! 69

Path for an Unmapped Page On the second access, 226 cycles Kernel address access itlb TLB miss Page Table PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 Always do page table walk (slow) Page fault! 70

Path for an Executable Page On the first access Decoded I-cache itlb Page Table PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 71

Path for an Executable Page On the first access Page Table Kernel address access Decoded I-cache itlb PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 72

Path for an Executable Page On the first access Page Table Kernel address access Decoded I-cache miss itlb PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 73

Path for an Executable Page On the first access Page Table Kernel address access Decoded I-cache miss itlb TLB miss PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 74

Path for an Executable Page On the first access Page Table Kernel address access Decoded I-cache miss itlb TLB miss PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 Insufficient privilege, fault! 75

Path for an Executable Page On the first access Page Table Kernel address access Decoded I-cache miss itlb TLB miss Cache TLB PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 Insufficient privilege, fault! 76

Path for an Executable Page On the first access Page Table Kernel address access Decoded I-cache uops miss itlb TLB miss Cache TLB PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 Cache Decoded Instructions Insufficient privilege, fault! 77

Path for an Executable Page On the second access, 181 cycles Page Table Decoded I-cache uops itlb PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 78

Path for an Executable Page On the second access, 181 cycles Page Table Kernel address access Decoded I-cache uops itlb PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 79

Path for an Executable Page On the second access, 181 cycles Page Table Kernel address access Decoded I-cache uops Decoded I-cache hit! itlb PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 Insufficient privilege, fault! 80

Path for an Executable Page On the second access, 181 cycles Page Table Kernel address access Decoded I-cache uops Decoded I-cache hit! itlb PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 Insufficient privilege, fault! No TLB access, No page table walk (fast) 81

Path for a non-executable, but mapped Page On the second access, 226 cycles Page Table Decoded I-cache itlb PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 82

Path for a non-executable, but mapped Page On the second access, 226 cycles Page Table Kernel address access Decoded I-cache itlb PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 83

Path for a non-executable, but mapped Page On the second access, 226 cycles Page Table Kernel address access Decoded I-cache miss itlb PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 84

Path for a non-executable, but mapped Page On the second access, 226 cycles Page Table Kernel address access Decoded I-cache miss itlb PML4 PML3 PML3 PML2 PML2 PML2 TLB hit PML1 PML1 PML1 Page fault! 85

Path for a non-executable, but mapped Page On the second access, 226 cycles Page Table Kernel address access Decoded I-cache miss itlb PML4 PML3 PML3 PML2 PML2 PML2 TLB hit PML1 PML1 PML1 Page fault! If no page table walk, it should be faster than unmapped (but not!) 86

Cache Coherence and TLB TLB is not a coherent cache in Intel Architecture 87

Cache Coherence and TLB TLB is not a coherent cache in Intel Architecture Core 1 1. Core 1 sets 0xff01 as Non-executable memory TLB 0xff01->0x0010, NX 88

Cache Coherence and TLB TLB is not a coherent cache in Intel Architecture Core 1 TLB 0xff01->0x0010, NX 1. Core 1 sets 0xff01 as Non-executable memory 2. Core 2 sets 0xff01 as Executable memory No coherency, do not update/invalidate TLB in Core 1 Core 2 TLB 0xff01->0x0010, X 89

Cache Coherence and TLB TLB is not a coherent cache in Intel Architecture Core 1 TLB 0xff01->0x0010, NX 1. Core 1 sets 0xff01 as Non-executable memory 2. Core 2 sets 0xff01 as Executable memory No coherency, do not update/invalidate TLB in Core 1 3. Core 1 try to execute on 0xff01 -> fault by NX Core 2 TLB 0xff01->0x0010, X 90

Cache Coherence and TLB TLB is not a coherent cache in Intel Architecture Core 1 TLB 0xff01->0x0010, NX Execute 1. Core 1 sets 0xff01 as Non-executable memory 2. Core 2 sets 0xff01 as Executable memory No coherency, do not update/invalidate TLB in Core 1 3. Core 1 try to execute on 0xff01 -> fault by NX Core 2 TLB 0xff01->0x0010, X 4. Core 1 must walk through the page table The page table entry is X, update TLB, then execute! 91

Path for a Non-executable, but mapped Page On the second access, 226 cycles Page Table Decoded I-cache itlb PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 92

Path for a Non-executable, but mapped Page On the second access, 226 cycles Page Table Kernel address access Decoded I-cache itlb PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 93

Path for a Non-executable, but mapped Page On the second access, 226 cycles Page Table Kernel address access Decoded I-cache miss itlb PML4 PML3 PML3 PML2 PML2 PML2 PML1 PML1 PML1 94

Path for a Non-executable, but mapped Page On the second access, 226 cycles Page Table Kernel address access Decoded I-cache miss itlb PML4 PML3 PML3 PML2 PML2 PML2 TLB hit PML1 PML1 PML1 NX, cannot execute! 95

Path for a Non-executable, but mapped Page On the second access, 226 cycles Page Table Kernel address access Decoded I-cache miss itlb PML4 PML3 PML3 PML2 PML2 PML2 TLB hit PML1 PML1 PML1 NX, cannot execute! 96

Path for a Non-executable, but mapped Page On the second access, 226 cycles Page Table Kernel address access Decoded I-cache miss itlb Cache TLB PML4 PML3 PML3 PML2 PML2 PML2 TLB hit PML1 PML1 PML1 NX, cannot execute! NX, Page fault! 97

Root-cause of Timing Side Channel (X/NX) For executable / non-executable addresses Fast Path (X) Slow Path (NX) Slow Path (U) 1. Jmp into the Kernel addr 2. Decoded I-cache hits 3. Page fault! 1. Jmp into the kernel addr 2. itlb hit 3. Protection check fails, page table walk. 4. Page fault! Cycles: 181 Cycles: 226 Cycles: 226 1. Jmp into the kernel addr 2. itlb miss 3. Walks through page table 4. Page fault! Decoded i-cache generates timing side channel 98

Countermeasures? Modifying CPU to eliminate timing channels Difficult to be realized L Turning off TSX Cannot be turned off in software manner (neither from MSR nor from BIOS) Coarse-grained timer? A workaround could be having another thread to measure the timing indirectly (e.g., counting i++;) 99

Countermeasures? Using separated page tables for kernel and user processes High performance overhead (~30%) due to frequent TLB flush TLB flush on every copy_to_user() Fine-grained randomization Compatibility issues on memory alignment, etc. Inserting fake mapped / executable pages between the maps Adds some false positives to the DrK Attack 100

Conclusion Intel TSX makes cache side-channel less noisy Suppress OS Exception Timing side channel can distinguish X / NX / U pages dtlb (for Mapped & Unmapped) Decoded i-cache (for executable / non-executable) Work across 3 different architectures, commodity OSes, and Amazon EC2 Current KASLR is not as secure as expected 101

Any Questions? Try DrK at https://github.com/sslab-gatech/drk 102