Spectre and Meltdown. Clifford Wolf q/talk

Size: px

Start display at page:

Download "Spectre and Meltdown. Clifford Wolf q/talk"

Bonnie Barrett
5 years ago
Views:

1 Spectre and Meltdown Clifford Wolf q/talk

2 Spectre and Meltdown Spectre (CVE and CVE ) Is an architectural security bug that effects most modern processors with speculative execution It allows a program to read memory locations in its memory space without technically accessing that location. This is a problem with code running in sandbox environments, such as a web-browser executing JavaScript code: The JavaScript code can access all data in the browsers memory, such as login credentials for webpages. Meltdown (CVE ) Is a related hardware vulnerability in some Intel x86, some IBM POWER, and some ARM processors. It allows a process to read all memory in the system.

3 But how does it work? To answer this question we must first discuss some implementation details of modern speculating superscalar out-of-order processors. Scalar execute one instruction per cycle Superscalar execute >1 instruction per cycle Out-of-order execute instructions in a different order than they appear in the program code Speculative execution instead of waiting for the result of a computation, guess the result and keep executing. Roll back if the guess turned out to be incorrect. This helps avoid pipeline stalls in cases where its possible to make good guesses. The types of speculative execution important for understanding Spectre/Meltdown: Branch prediction guess if a branch is taken or not Branch target prediction guess the target of a dynamic jump Trap optimism always guess that instructions will not cause traps When the guess was wrong we need to roll back the entire CPU state so that it looks to the software as if no code had been executed speculatively.

4 What is pipelining? The more work we do in one cycle, the slower our circuit gets. This is a slow circuit: Data FF Task 1 Task 2 Task 3 Task 4 FF decode load exec store Clock But we want high clock rates for CPUs! This pipeline works with a 4x faster clock rate: Data FF Stage 1 FF Stage 2 FF Stage 3 FF Stage 4 FF Clock

5 time In-order pipeline stalls Program: A: r1 r2 r3 B: r4 r5 r6 C: r7 r8 r9 D: r9 r1 r1 E: r10 r11 r12 F: r13 r14 r15 G: r16 r17 r18 D has to wait for C => pipeline stalls Stage 1 Stage 2 Stage 3 Stage 4 A B A C B A D C B A D C B D C D E D F E D G F E D G F E G F G

6 time Out-of-order execution to the rescue! Program: A: r1 r2 r3 B: r4 r5 r6 C: r7 r8 r9 D: r9 r1 r1 E: r10 r11 r12 F: r13 r14 r15 G: r16 r17 r18 Stage 1 Stage 2 Stage 3 Stage 4 A B A C B A E C B A F E C B G F E C D G F E D G F D G D E, F, G is executed out-of-order to improve system performance. But what if D traps?

7 Out-of-order execution in modern CPUs Some instructions (such as memory loads) can stall for >100 cycles. We need very deep outof-order execution to hide this latency. Without speculative execution it would be impossible to keep the processor busy for so many cycles. There is no way around speculative execution for modern high-speed processors. We need many more physical addresses than are available in the ISA to remember previous states (Scoreboarding isn t sufficient Register renaming, Tomasulo s algorithm) We need previous states for rollback when instructions trap or branch prediction is wrong. And we need more registers because the dynamic instruction order may have significantly higher register pressure than the original instruction order. But there is more to the processor state than just general purpose registers. Clean rollback is incredibly hard! For memory writes there is a store buffer for the pending writes during speculative execution. CPU flags may be stored in shadow registers for each checkpoint we might need to rollback to. But there is no mechanism to rollback the state of the CPU caches. Caches are just a performance optimization, so it can t hurt if information from speculative execution can be recovered from cache timings... right? Unfortunately this is wrong.

8 What is a CPU cache? Caches are local memories close to the CPU that have much faster access times than main memory. Addresses in the cache can be accessed quickly The first access to an address moves that memory location into the cache Addresses that haven t been accessed in a while are evicted from the cache The granularity of this is aligned cache lines of usually 64 bytes each. There are special commands to flush the CPU caches. Even without those commands we can access memory in a way that guarantees that all cache lines of interest are evicted from the cache. (By accessing other memory locations that are mapped to the same cache slot.) By measuring the access time to a memory location we can measure if that location is in the cache or not. This allows us to detect which memory locations the CPU has accessed recently.

9 Spectre Variant 1 CVE (bounds check bypass, simplified explanation) Consider something like the following code: uint8_t unprotected_data[128]; uint8_t protected_data[1]; int peek(int i) { flush_or_evict_caches(); if (slow_predicted_true(i < 128)) { int a = unprotected_data[i]; int b = unprotected_data[64*(a&1)]; return b; } return is_in_cache(unprotected_data[64]); } peek(128) will return 1 if the least significant bit of protected_data[0] is set. We have effectively bypassed the (i < 128) bounds check.

10 Spectre Variant 2 CVE (branch target injection) Variant 1 relies on tricking the branch predictor into making an incorrect guess on whether a branch is taken or not. But processors can also branch to dynamic locations: x86: jmp eax; jmp[eax]; ret; jmp dword ptr [0x ] ARM: MOV pc, r14 MIPS: jr $ra RISC-V: jalr x0,x1,0 Spectre Variant 2 tricks the branch predictor into incorrectly guessing the destination of such dynamic jumps. This can be used to speculatively execute arbitrary code gadgets, similar to return-oriented-programming (ROP). Exfiltrate data using cache side channel.

11 Spectre and JIT Sandboxes Spectre only allows a process to read its own memory. So you might ask: What is the problem? Its JIT sandboxes, where we run JIT-compiled untrusted code in our process, assuming the bounds checks added by the JIT compiler will prevent the code from reading data it should not have access to. For example: A website running JavaScript code in your browser might access security credentials or other private data in your browser memory. But that means the JavaScript code must be tailored to a JIT compiler to yield the correct malicious machine code. For example you can t simply flush CPU caches. Instead you must execute memory access pattern that will evict the relevant pages from the cache. That s why it said simplified explanation on the slide for Variant 1. The Spectre paper contains a JavaScript code snippet that demonstrates such an attack using the V8 JavaScript engine.

12 Meltdown CVE The Meltdown attack exploits a privilege escalation vulnerability specific to some processors: At least sometimes, Intel processors don t check memory protection during speculative execution. Instead memory protection is checked after the fact when instructions are committed. But at that point we already exfiltrated data using the cache side channel. By adding a trapping instruction before the access to privileged memory we prevent the access to be committed. So it never happened and no access violation is detected by the OS. But the data read can still be reconstructed using the cache state. Every Intel x86 / x86_64 processor since 1995 Only exception afaik are Intel Atom processors from before 2013 AMD x86 / x86_64 processors are not affected by Meltdown Very few ARM processors are affected. For example ARM Cortex-A75 IBM POWER and System Z are also affected by variants of Meltdown

13 Meltdown Mitigations Short-term mitigation for existing processors: Flush TLBs when leaving kernel code This prevents speculative access to kernel memory But it also adds a performance penalty that can be significant for some workloads, especially on processors that do not support selective TLB flushing (most Intel processors before Haswell). Long-term fix: Better isolation of kernel and user-land page tables Probably at the cost of not allowing speculative execution into kernel code (such as system calls) In my opinion there is no doubt that Meltdown is a hardware bug that needs to be addressed in future hardware generations. But Intel says its processors work as designed, calls mitigation a security feature instead of bug fix.

14 Spectre Mitigations In my opinion it is yet unclear to what extend we need to change our processors and to what extend we need to change the model of what a processor does that we use to write software. Possible mitigations without software changes include: Do not perform any speculative execution. For example your phone most likely has a processor that does not even perform out-of-order execution. Do not speculatively load or evict any cache lines. That would slow down the processor. This slowdown would be significant on a system where access to main memory is pretty slow (such as a modern PC). Add special hidden cache slots used for speculative execution. This would allow rollback to also correctly restore the cache state, eliminating the cache side-channel. Possible mitigations that require software changes and some kind of hardware support and/or compiler support: Use special code sequences for bounds checking that make sure we never speculatively execute memory accesses that are out-of-bounds. Use special code sequences for dynamic jumps that eliminate branch speculation (or always speculate with a safe branch target). Possible software-only mitigations: Never run JIT code with sensitive data mapped into the process address space. For example, run JIT code in a separate process and use explicit IPC to exchange data between it and the main program.

15 This is just the tip of the iceberg! Spectre and Meltdown are just the beginning. We need to fundamentally rethink the way we design complex computer systems. Formal modeling of all information flow With regard to side channels in general: What is SW responsibility and what is HW/OS responsibility? We need to rethink our models for HW that we use for writing SW. Other variations of Spectre: Scenarios where a victim process only executes the speculative code and the data is then extracted from the side-channel in another process. This would still be an issue even if rollback is perfect because the other process could monitor of usage of shared resources in real-time while speculative execution is happening. A variation on this would be using multiple threads on a hyper-threaded CPU. This would enable attacks on systems that don t speculatively load data into L1 cache: Measure if the other thread gets scheduled as result of the victim thread being stalled. Use loads of data not in L1 cache to signal the result from the speculative execution. Instead of L1 cache timings an attacker could monitor any other part of the system that get utilized during speculative execution, such as L2/L3 cache, main memory throughput, utilization of compute resources shared between cores (such as FPUs), power consumption, etc.

16 Questions? References Spectre and Meltdown papers and additional information: Link to this presentation:

Spectre, Meltdown, and the Impact of Security Vulnerabilities on your IT Environment. Orin Jeff Melnick

Spectre, Meltdown, and the Impact of Security Vulnerabilities on your IT Environment Orin Thomas @orinthomas Jeff Melnick Jeff.Melnick@Netwrix.com In this session Vulnerability types Spectre Meltdown Spectre