Performance is awesome!
|
|
- Hugh Willis
- 5 years ago
- Views:
Transcription
1
2 Acknowledgements I Background music for the choir song kindly provided by Kerbo-Kev. Cooking photos kindly provided by Becca Lee (ladyfaceblog). Santa Clause images by Some picture components are included from Mickey s Christmas Carol under fair use. 1
3 Acknowledgements II We want to thank our collaborators: Anders Fogh, Benjamin von Berg, Daniel Genkin, Dmitry Evtyushkin, Frank Piessens, Jann Horn, Jo Van Bulck, Mike Hamburg, Paul Kocher, Philipp Ortner, Stefan Mangard, Thomas Prescher, Werner Haas, and Yuval Yarom. The research behind this talk was partially funded by a generous gift from ARM and a generous gift from Intel. Any opinions, findings, and conclusions or recommendations are those of the authors and do not necessarily reflect the views of the funding parties. 2
4 3 Performance is awesome!
5 3 Performance is awesome!
6 Performance is awesome!
7 Performance is awesome! MHz 3
8 Performance is awesome! MHz RISC emulating CISC 3
9 Performance is awesome! MHz RISC emulating CISC 256 KB L2 cache integrated! 3
10 Performance is awesome! MHz RISC emulating CISC 256 KB L2 cache integrated! branch prediction 3
11 Performance is awesome! MHz RISC emulating CISC 256 KB L2 cache integrated! branch prediction out-of-order execution 3
12 More and More Performance The future is going to be fast: 4
13 More and More Performance The future is going to be fast: Apple A12 Bionic (iphone X): 16 KB pages 128 KB caches 4
14 Spectres of The Past
15 Side-Channel Attacks Bug-free software does not mean safe execution 5
16 Side-Channel Attacks Bug-free software does not mean safe execution Information leaks due to underlying hardware 5
17 Side-Channel Attacks Bug-free software does not mean safe execution Information leaks due to underlying hardware Exploit leakage through side-effects 5
18 Side-Channel Attacks Bug-free software does not mean safe execution Information leaks due to underlying hardware Exploit leakage through side-effects Power consumption Execution time CPU caches 5
19 Architecture and Microarchitecture Instruction Set Architecture (ISA) is an abstract model of a computer (x86, ARMv8, SPARC,... ) 6
20 Architecture and Microarchitecture Instruction Set Architecture (ISA) is an abstract model of a computer (x86, ARMv8, SPARC,... ) Interface between hardware and software 6
21 Architecture and Microarchitecture Instruction Set Architecture (ISA) is an abstract model of a computer (x86, ARMv8, SPARC,... ) Interface between hardware and software Microarchitecture is an ISA implementation 6
22 Architecture and Microarchitecture Instruction Set Architecture (ISA) is an abstract model of a computer (x86, ARMv8, SPARC,... ) Interface between hardware and software Microarchitecture is an ISA implementation 6
23 Microarchitectural Elements Modern CPUs contain multiple microarchitectural elements 7
24 Microarchitectural Elements Modern CPUs contain multiple microarchitectural elements Caches and buffer Predictor 7
25 Microarchitectural Elements Modern CPUs contain multiple microarchitectural elements Caches and buffer Predictor Transparent for the programmer 7
26 Microarchitectural Elements Modern CPUs contain multiple microarchitectural elements Caches and buffer Predictor Transparent for the programmer Timing optimizations side-channel leakage 7
27 8 Food Cache
28 8 Food Cache
29 8 Food Cache
30 8 Food Cache
31 8 Food Cache
32 8 Food Cache
33 8 Food Cache
34
35 Revolutionary concept! Store your food at home, never go to the grocery store during cooking. Can store ALL kinds of food. ONLY TODAY INSTEAD OF $1,300 ORDER VIA PHONE:
36
37
38 CPU Cache printf("%d", i); printf("%d", i); 9
39 CPU Cache printf("%d", i); printf("%d", i); Cache miss 9
40 CPU Cache printf("%d", i); Cache miss Request printf("%d", i); 9
41 CPU Cache printf("%d", i); printf("%d", i); Cache miss Request Response 9
42 CPU Cache printf("%d", i); printf("%d", i); Cache miss i Request Response 9
43 CPU Cache printf("%d", i); printf("%d", i); Cache miss Cache hit i Request Response 9
44 CPU Cache DRAM access, slow printf("%d", i); printf("%d", i); Cache miss Cache hit i Request Response 9
45 CPU Cache DRAM access, slow printf("%d", i); printf("%d", i); Cache miss Cache hit No DRAM access, much faster i Request Response 9
46 Caching speeds up Memory Accesses Cache Hits Number of accesses Access time [CPU cycles] 10
47 Caching speeds up Memory Accesses Cache Hits Cache Misses Number of accesses Access time [CPU cycles] 10
48 Flush+Reload Attacker flush access Shared Memory Victim access 11
49 Flush+Reload Attacker flush access cached Shared Memory Shared Memory cached Victim access 11
50 Flush+Reload Attacker flush access Shared Memory Shared Memory Victim access 11
51 Flush+Reload Attacker flush access Shared Memory Victim access 11
52 Flush+Reload Attacker flush access Shared Memory Victim access 11
53 Flush+Reload Attacker flush access Shared Memory Shared Memory Victim access 11
54 Flush+Reload Attacker flush access Shared Memory Shared Memory Victim access 11
55 Flush+Reload Attacker flush access Shared Memory Shared Memory Victim access vs 11 Victim accessed (fast) Victim did not access (slow)
56 Attacks Just by looking at cache hits/misses, we can... 12
57 Attacks Just by looking at cache hits/misses, we can... Leak AES keys from the cache 12
58 Attacks Just by looking at cache hits/misses, we can... Leak AES keys from the cache Leak keystroke timings via the cache 12
59 Attacks Just by looking at cache hits/misses, we can... Leak AES keys from the cache Leak keystroke timings via the cache Covertly send data through the cache 12
60 Attacks Just by looking at cache hits/misses, we can... Leak AES keys from the cache Leak keystroke timings via the cache Covertly send data through the cache Browser, Cloud, TEEs,... 12
61
62 More and More Performance The future is going to be fast: Apple A12 Bionic (iphone X): 16 KB pages 128 KB caches 13
63 More and More Performance The future is going to be fast: Apple A12 Bionic (iphone X): 16 KB pages 128 KB caches Intel more out-of-order parallelism 13
64
65
66
67
68
69
70
71
72
73
74 Spectres of The Present
75 Out-of-Order Execution int width = 10, height = 5; float diagonal = sqrt ( width * width + height * height ); int area = width * height ; printf (" Area %d x %d = %d\n", width, height, area ); 14
76 Out-of-Order Execution Dependency int width = 10, height = 5; float diagonal = sqrt ( width * width + height * height ); int area = width * height ; Parallelize printf (" Area %d x %d = %d\n", width, height, area ); 14
77 Out-of-Order Execution L1 Instruction Cache ITLB Frontend Branch Predictor µop Cache µops Instruction Fetch & PreDecode Instruction Queue 4-Way Decode µop µop µop µop CDB Execution Engine MUX Allocation Queue µop µop µop µop Reorder buffer µop µop µop µop µop µop µop µop Scheduler µop µop µop µop µop µop µop µop ALU, AES,... ALU, FMA,... ALU, Vect,... ALU, Branch Load data Load data Store data AGU Execution Units Instructions are fetched and decoded in the front-end Memory Subsystem Load Buffer Store Buffer DTLB L1 Data Cache STLB L2 Cache 15
78 Out-of-Order Execution L1 Instruction Cache ITLB Frontend Branch Predictor µop Cache µops Instruction Fetch & PreDecode Instruction Queue 4-Way Decode µop µop µop µop CDB Execution Engine MUX Allocation Queue µop µop µop µop Reorder buffer µop µop µop µop µop µop µop µop Scheduler µop µop µop µop µop µop µop µop ALU, AES,... ALU, FMA,... ALU, Vect,... ALU, Branch Load data Load data Store data AGU Execution Units Instructions are fetched and decoded in the front-end dispatched to the backend Memory Subsystem Load Buffer Store Buffer DTLB L1 Data Cache STLB L2 Cache 15
79 Out-of-Order Execution L1 Instruction Cache ITLB Frontend Branch Predictor µop Cache µops Instruction Fetch & PreDecode Instruction Queue 4-Way Decode µop µop µop µop CDB Execution Engine MUX Allocation Queue µop µop µop µop Reorder buffer µop µop µop µop µop µop µop µop Scheduler µop µop µop µop µop µop µop µop ALU, AES,... ALU, FMA,... ALU, Vect,... ALU, Branch Load data Load data Store data AGU Execution Units Instructions are fetched and decoded in the front-end dispatched to the backend processed by individual execution units Memory Subsystem Load Buffer Store Buffer DTLB L1 Data Cache STLB L2 Cache 15
80 Building the Code An experiment *( volatile char *) 0; array [84 * 4096] = 0; 16
81 Building the Code An experiment *( volatile char *) 0; array [84 * 4096] = 0; volatile because compiler was not happy warning: statement with no effect [- Wunused - value ] *( char *) 0; 16
82 Building the Code An experiment *( volatile char *) 0; array [84 * 4096] = 0; volatile because compiler was not happy warning: statement with no effect [- Wunused - value ] *( char *) 0; Static code analyzer is still not happy warning: Dereference of null pointer *( volatile char *) 0; 16
83 Building the Code Flush+Reload over all pages of the array Access time [cycles] Page 17
84 Building the Code Flush+Reload over all pages of the array Access time [cycles] Page Unreachable code line was actually executed 17
85 Building the Code Flush+Reload over all pages of the array Access time [cycles] Page Unreachable code line was actually executed Exception was only thrown afterwards 17
86 Building the Code Out-of-order instructions leave microarchitectural traces 18
87 Building the Code Out-of-order instructions leave microarchitectural traces We can see them for example in the cache 18
88 Building the Code Out-of-order instructions leave microarchitectural traces We can see them for example in the cache We call them transient instructions 18
89 Building the Code Out-of-order instructions leave microarchitectural traces We can see them for example in the cache We call them transient instructions Execution indirectly observable 18
90 19 Loading an address
91 19 Loading an address
92 19 Loading an address
93 19 Loading an address
94 19 Loading an address
95 19 Loading an address
96 Building the Code Add another layer of indirection to test char data = *( char *) 0 xffffffff81a000e0 ; array [ data * 4096] = 0; 20
97 Building the Code Add another layer of indirection to test char data = *( char *) 0 xffffffff81a000e0 ; array [ data * 4096] = 0; Then check if any part of array is cached 20
98 Building the Code Flush+Reload over all pages of the array Access time [cycles] Page Index of cache hit reveals data 21
99 Building the Code Flush+Reload over all pages of the array Access time [cycles] Page Index of cache hit reveals data Permission check fails sometimes 21
100
101
102 Meltdown Mitigation Kernel addresses in user space are a problem 23
103 Meltdown Mitigation Kernel addresses in user space are a problem Why don t we take the kernel addresses... 23
104 Meltdown Mitigation...and remove them if not needed? 24
105 Meltdown Mitigation...and remove them if not needed? User accessible check in hardware is not reliable 24
106 Meltdown-US Mitigation Unmap the kernel in user space 25
107 Meltdown-US Mitigation Unmap the kernel in user space Kernel addresses are then no longer present 25
108 Meltdown-US Mitigation Unmap the kernel in user space Kernel addresses are then no longer present Memory which is not mapped cannot be accessed at all 25
109 KAISER Userspace Kernelspace Applications Operating Memory System 26
110 KAISER Kernel View Userspace Applications User View Kernelspace Operating Memory System Userspace Applications context switch 27 Kernelspace
111 Meltdown-P (aka Foreshadow-NG) Page Table PTE 0 PTE 1 PTE #PTI PTE 511 L1 Cache 28
112 Meltdown-P (aka Foreshadow-NG) Page Table PTE 0 PTE 1 PTE #PTI PTE 511 present L1 Cache 28
113 Meltdown-P (aka Foreshadow-NG) Page Table PTE 0 PTE 1 PTE #PTI PTE 511 present Guest Physical to Host Physical L1 Cache 28
114 Meltdown-P (aka Foreshadow-NG) Page Table PTE 0 PTE 1 PTE #PTI PTE 511 present Guest Physical to Host Physical Physical Page L1 Cache L1 lookup with physical address 28
115 Meltdown-P (aka Foreshadow-NG) Page Table PTE 0 PTE 1 PTE #PTI PTE 511 not present L1 Cache 28
116 Meltdown-P (aka Foreshadow-NG) Page Table PTE 0 PTE 1 PTE #PTI PTE 511 not present L1 lookup with virtual address L1 Cache 28
117 More and More Performance The future is going to be fast: Apple A12 Bionic (iphone X): 16 KB pages 128 KB caches Intel more ports, more parallelism, larger reorder buffer 29
118 More and More Performance The future is going to be fast: Apple A12 Bionic (iphone X): 16 KB pages 128 KB caches Intel more ports, more parallelism, larger reorder buffer AMD perceptron-based prediction mechanisms 29
119
120 Speculative Execution Let us get rid of bottlenecks 31
121 Speculative Execution Use the naughty/nice list of last year 31
122 Speculative Execution Finally, check predictions with list of this year 31
123 Speculative Execution Throwing away wrongly manufactured presents 31
124 Speculative Execution Correct predictions result in free time 31
125
126 Spectre-PHT (aka Spectre Variant 1) LUT index = 0; char* data = "textkey"; if (index < 4) then Prediction else 33 LUT[data[index] * 4096] 0
127 Spectre-PHT (aka Spectre Variant 1) LUT index = 0; char* data = "textkey"; if (index < 4) then Prediction else 33 LUT[data[index] * 4096] 0
128 Spectre-PHT (aka Spectre Variant 1) LUT index = 0; char* data = "textkey"; if (index < 4) el se en h t Speculate Prediction LUT[data[index] * 4096] 33 0
129 Spectre-PHT (aka Spectre Variant 1) LUT index = 0; char* data = "textkey"; if (index < 4) Execute Index 't' el se en h t Prediction LUT[data[index] * 4096] 33 0
130 Spectre-PHT (aka Spectre Variant 1) LUT index = 1; char* data = "textkey"; if (index < 4) then Prediction else 33 LUT[data[index] * 4096] 0
131 Spectre-PHT (aka Spectre Variant 1) LUT index = 1; char* data = "textkey"; if (index < 4) then Prediction else 33 LUT[data[index] * 4096] 0
132 Spectre-PHT (aka Spectre Variant 1) LUT index = 1; char* data = "textkey"; if (index < 4) Index 'e' Speculate el en h t se Prediction LUT[data[index] * 4096] 33 0
133 Spectre-PHT (aka Spectre Variant 1) LUT index = 1; char* data = "textkey"; if (index < 4) Index e then Prediction else 33 LUT[data[index] * 4096] 0
134 Spectre-PHT (aka Spectre Variant 1) LUT index = 2; char* data = "textkey"; if (index < 4) then Prediction else 33 LUT[data[index] * 4096] 0
135 Spectre-PHT (aka Spectre Variant 1) LUT index = 2; char* data = "textkey"; if (index < 4) then Prediction else 33 LUT[data[index] * 4096] 0
136 Spectre-PHT (aka Spectre Variant 1) LUT index = 2; char* data = "textkey"; if (index < 4) Speculate el se en h t Index 'x' Prediction LUT[data[index] * 4096] 33 0
137 Spectre-PHT (aka Spectre Variant 1) LUT index = 2; char* data = "textkey"; if (index < 4) 33 Index x then Prediction else LUT[data[index] * 4096] 0
138 Spectre-PHT (aka Spectre Variant 1) LUT index = 3; char* data = "textkey"; if (index < 4) then Prediction else 33 LUT[data[index] * 4096] 0
139 Spectre-PHT (aka Spectre Variant 1) LUT index = 3; char* data = "textkey"; if (index < 4) then Prediction else 33 LUT[data[index] * 4096] 0
140 Spectre-PHT (aka Spectre Variant 1) LUT index = 3; char* data = "textkey"; if (index < 4) Speculate Index 't' el se en h t Prediction LUT[data[index] * 4096] 33 0
141 Spectre-PHT (aka Spectre Variant 1) LUT index = 3; char* data = "textkey"; if (index < 4) Index t then Prediction else 33 LUT[data[index] * 4096] 0
142 Spectre-PHT (aka Spectre Variant 1) LUT index = 4; char* data = "textkey"; if (index < 4) then Prediction else 33 LUT[data[index] * 4096] 0
143 Spectre-PHT (aka Spectre Variant 1) LUT index = 4; char* data = "textkey"; if (index < 4) then Prediction else 33 LUT[data[index] * 4096] 0
144 Spectre-PHT (aka Spectre Variant 1) LUT Index 'K' index = 4; char* data = "textkey"; if (index < 4) Speculate el en h t se Prediction LUT[data[index] * 4096] 33 0
145 Spectre-PHT (aka Spectre Variant 1) LUT Index 'K' index = 4; char* data = "textkey"; if (index < 4) Execute el n he se t Prediction LUT[data[index] * 4096] 33 0
146 Spectre-PHT (aka Spectre Variant 1) LUT index = 5; char* data = "textkey"; if (index < 4) then Prediction else 33 LUT[data[index] * 4096] 0
147 Spectre-PHT (aka Spectre Variant 1) LUT index = 5; char* data = "textkey"; if (index < 4) then Prediction else 33 LUT[data[index] * 4096] 0
148 Spectre-PHT (aka Spectre Variant 1) LUT Index 'E' index = 5; char* data = "textkey"; if (index < 4) Speculate el en h t se Prediction LUT[data[index] * 4096] 33 0
149 Spectre-PHT (aka Spectre Variant 1) LUT Index 'E' index = 5; char* data = "textkey"; if (index < 4) Execute el en h t se Prediction LUT[data[index] * 4096] 33 0
150 Spectre-PHT (aka Spectre Variant 1) LUT index = 6; char* data = "textkey"; if (index < 4) then Prediction else 33 LUT[data[index] * 4096] 0
151 Spectre-PHT (aka Spectre Variant 1) LUT index = 6; char* data = "textkey"; if (index < 4) then Prediction else 33 LUT[data[index] * 4096] 0
152 Spectre-PHT (aka Spectre Variant 1) LUT Index 'Y' index = 6; char* data = "textkey"; if (index < 4) Speculate el en h t se Prediction LUT[data[index] * 4096] 33 0
153 Spectre-PHT (aka Spectre Variant 1) LUT Index 'Y' index = 6; char* data = "textkey"; if (index < 4) Execute el en h t se Prediction LUT[data[index] * 4096] 33 0
154 Spectre-BTB (aka Spectre Variant 2) Animal* a = bird; a->move() fly() swim() swim() Prediction LUT[data[index] * 4096] 0 34
155 Spectre-BTB (aka Spectre Variant 2) Animal* a = bird; a->move() ) y( l f swim() Speculate sw im () Prediction LUT[data[index] * 4096] 34 0
156 Spectre-BTB (aka Spectre Variant 2) Animal* a = bird; a->move() fly() swim() swim() Prediction LUT[data[index] * 4096] 0 34
157 Spectre-BTB (aka Spectre Variant 2) Animal* a = bird; a->move() Execute ) y( swim() fl sw im () Prediction LUT[data[index] * 4096] 34 0
158 Spectre-BTB (aka Spectre Variant 2) Animal* a = bird; a->move() fly() fly() swim() Prediction LUT[data[index] * 4096] 0 34
159 Spectre-BTB (aka Spectre Variant 2) Animal* a = bird; a->move() Speculate ) y( fly() fl sw im () Prediction LUT[data[index] * 4096] 34 0
160 Spectre-BTB (aka Spectre Variant 2) Animal* a = bird; a->move() fly() fly() swim() Prediction LUT[data[index] * 4096] 0 34
161 Spectre-BTB (aka Spectre Variant 2) Animal* a = fish; a->move() fly() fly() swim() Prediction LUT[data[index] * 4096] 0 34
162 Spectre-BTB (aka Spectre Variant 2) Animal* a = fish; a->move() Speculate ) y( l f fly() sw im () Prediction LUT[data[index] * 4096] 34 0
163 Spectre-BTB (aka Spectre Variant 2) Animal* a = fish; a->move() fly() fly() swim() Prediction LUT[data[index] * 4096] 0 34
164 Spectre-BTB (aka Spectre Variant 2) Animal* a = fish; a->move() ) y( l f fly() Execute sw im () Prediction LUT[data[index] * 4096] 34 0
165 Spectre-BTB (aka Spectre Variant 2) Animal* a = fish; a->move() fly() swim() swim() Prediction LUT[data[index] * 4096] 0 34
166 Spectre-RSB function()... Victim Attacker RSB 35
167 Spectre-RSB function()... reg = secret Victim reg = dummy Attacker RSB 35
168 Spectre-RSB function()... reg = secret Victim call function(short) reg = dummy Attacker RSB 35 &victim
169 Spectre-RSB function()... reg = secret Victim call function(short) Attacker reg = dummy call function(long) data[reg * 4096] RSB 35 &attacker &victim
170 Spectre-RSB function()... reg = secret Victim call function(short) Attacker reg = dummy call function(long) data[reg * 4096] RSB 35 &attacker &victim
171 Spectre-RSB function()... reg = secret Victim call function(short) Attacker reg = dummy call function(long) data[reg * 4096] RSB 35 &victim
172 Spectre-RSB function()... reg = secret Victim call function(short) Attacker reg = dummy call function(long) data[reg * 4096] RSB 35 &victim
173 Meltdown vs. Spectre operation #n time 36
174 Meltdown vs. Spectre operation #n prediction time 36
175 Meltdown vs. Spectre operation #n prediction predict CF/DF operation #n+2 time 36
176 Meltdown vs. Spectre operation #n prediction predict CF/DF possibly architectural operation #n+2 transient execution time 36
177 Meltdown vs. Spectre operation #n retire predict CF/DF possibly architectural prediction operation #n+2 transient execution time 36
178 Meltdown vs. Spectre operation #n retire flush pipeline on wrong prediction predict CF/DF possibly architectural prediction operation #n+2 transient execution retire time 36
179 Meltdown vs. Spectre operation #n retire flush pipeline on wrong prediction predict CF/DF possibly architectural prediction operation #n+2 transient execution retire retire time 36
180 Meltdown vs. Spectre operation #n retire flush pipeline on wrong prediction operation #n prediction retire predict CF/DF possibly architectural operation #n+2 transient execution retire time time 36
181 Meltdown vs. Spectre operation #n retire flush pipeline on wrong prediction operation #n prediction retire data predict CF/DF possibly architectural operation #n+2 transient execution retire time time 36
182 Meltdown vs. Spectre operation #n retire flush pipeline on wrong prediction operation #n prediction retire data predict CF/DF possibly architectural operation #n+2 transient execution retire time data dependency operation #n+2 time 36
183 Meltdown vs. Spectre operation #n predict CF/DF possibly architectural retire prediction operation #n+2 transient execution flush pipeline on wrong prediction retire retire time possibly architectural operation #n data exception retire data dependency operation #n+2 transient execution time 36
184 Meltdown vs. Spectre operation #n predict CF/DF possibly architectural retire prediction operation #n+2 transient execution flush pipeline on wrong prediction retire retire time possibly architectural operation #n data exception retire data dependency operation #n+2 transient execution time 36
185 Meltdown vs. Spectre operation #n predict CF/DF possibly architectural retire prediction operation #n+2 transient execution flush pipeline on wrong prediction retire retire time possibly architectural operation #n data exception retire data dependency Meltdown operation #n+2 transient execution time 36
186 Meltdown vs. Spectre operation #n predict CF/DF possibly architectural retire prediction operation #n+2 transient execution flush pipeline on wrong prediction retire retire time possibly architectural operation #n data exception retire data dependency Meltdown operation #n+2 transient execution raise retire time 36
187 More and More Performance The future is going to be fast: Apple A12 Bionic (iphone X): 16 KB pages 128 KB caches Intel more ports, more parallelism, larger reorder buffer AMD perceptron-based prediction mechanisms 37
188 Spectres of The Future
189 Intel MPK Protection key for a group of pages 38
190 Intel MPK Protection key for a group of pages 4 bits in PTE identify key for protected memory regions 38
191 Intel MPK Protection key for a group of pages 4 bits in PTE identify key for protected memory regions Quick update of access rights 38
192 Meltdown-PK Protection keys are lazily enforced 39
193 Meltdown-PK Protection keys are lazily enforced Protected value is forwarded to transient instructions 39
194 Meltdown-BR x86 provides dedicated instruction raising #BR exception if bound-range is exceeded 40
195 Meltdown-BR x86 provides dedicated instruction raising #BR exception if bound-range is exceeded Data used in transient execution 40
196 Meltdown-BR x86 provides dedicated instruction raising #BR exception if bound-range is exceeded Data used in transient execution Attacker determines accessed cache line using Flush+Reload 40
197 40 Meltdown-BR
198 40 Meltdown-BR
199 40 Meltdown-BR
200 Meltdown-BR x86 provides dedicated instruction raising #BR exception if bound-range is exceeded Data used in transient execution Attacker determines accessed cache line using Flush+Reload First Meltdown-type attack on AMD 40
201 41 Vulnerable Vendors Vendor Attack Intel ARM AMD Meltdown-US Meltdown-P Meltdown-GP Meltdown-NM Meltdown-RW Meltdown-PK Meltdown-BR Meltdown-DE Meltdown-AC Meltdown-UD Meltdown-SS Meltdown-XD Meltdown-SM
202 Meltdown Defense Categorization Meltdown defenses in 2 categories: 42
203 Meltdown Defense Categorization Meltdown defenses in 2 categories: D1 Architecturally inaccessible data is also microarchitecturally inaccessible 42
204 Meltdown Defense Categorization Meltdown defenses in 2 categories: D1 Architecturally inaccessible data is also microarchitecturally inaccessible D2 Preventing occurrence of faults 42
205 43 Meltdown-P Mitigation
206 Meltdown-P Mitigation Clear phyiscal address field of unmapped PTEs 43
207 Meltdown-P Mitigation Clear phyiscal address field of unmapped PTEs Flush L1 upon switching protection domains 43
208 Demo Foreshadow-NG
209 Transient Execution Attacks: Classification Transient cause? 44
210 Transient Execution Attacks: Classification Spectre-type prediction Transient cause? fault Meltdown-type 44
211 Transient Execution Attacks: Classification Spectre-PHT Spectre-BTB Spectre-type Spectre-RSB Spectre-STL prediction Transient cause? fault Meltdown-type 44
212 Spectre: Mistraining Strategies Victim same address space/ in place Spectrevulnerable branch 45
213 Spectre: Mistraining Strategies Victim same address space/ out of place Aliased branch Address collision same address space/ in place Spectrevulnerable branch 45
214 Spectre: Mistraining Strategies Victim same address space/ out of place Aliased branch Address collision same address space/ in place Spectrevulnerable branch Shared Branch Prediction State 45
215 Spectre: Mistraining Strategies Victim Attacker same address space/ out of place Aliased branch Address collision same address space/ in place Spectrevulnerable branch Shared Branch Prediction State 45
216 Spectre: Mistraining Strategies Victim Attacker same address space/ out of place Aliased branch Address collision same address space/ in place Spectrevulnerable branch Same address cross address space/ in place Shared Branch Prediction State 45
217 Spectre: Mistraining Strategies Victim Attacker same address space/ out of place Aliased branch Aliased address cross address space/ out of place Address collision Address collision same address space/ in place Spectrevulnerable branch Same address cross address space/ in place Shared Branch Prediction State 45
218 Spectre Mistraining: Vulnerable Vendors Method Attack Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL
219 Spectre Mistraining: Vulnerable Vendors Intel Method same-address-space cross-address-space Attack in-place out-of-place in-place out-of-place Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL
220 Spectre Mistraining: Vulnerable Vendors Intel ARM Method same-address-space cross-address-space same-address-space cross-address-space Attack in-place out-of-place in-place out-of-place in-place out-of-place in-place out-of-place Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL
221 Spectre Mistraining: Vulnerable Vendors Intel ARM AMD Method same-address-space cross-address-space same-address-space cross-address-space same-address-space cross-address-space Attack in-place out-of-place in-place out-of-place in-place out-of-place in-place out-of-place in-place out-of-place in-place out-of-place Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL 46
222 Super Effective Solution: Drilling template 47 Drilling template
223 Spectre Defense Categorization Spectre defenses in 3 categories: 48
224 Spectre Defense Categorization Spectre defenses in 3 categories: C1 Mitigate or reduce accuracy of covert channels 48
225 Spectre Defense Categorization Spectre defenses in 3 categories: C1 Mitigate or reduce accuracy of covert channels C2 Mitigate or abort speculation 48
226 Spectre Defense Categorization Spectre defenses in 3 categories: C1 Mitigate or reduce accuracy of covert channels C2 Mitigate or abort speculation C3 Ensure secret cannot be reached 48
227 Spectre Defenses: Microarchitectural Target Microarchitectural Element Defense Cache TLB BTB BHB PHT RSB AVX FPU Execution Ports InvisiSpec SafeSpec DAWG RSB Stuffing Retpoline SLH YSNB IBRS Taint Tracking Timer Reduction STIPB IBPB Serialization Sloth SSBD/SSBB Poison Value Index Masking Site Isolation Category: C1 C2 C3 49
228 Site Isolation Each site executed in its own process 50
229 Site Isolation Each site executed in its own process limits amount of data that is exposed 50
230 Site Isolation Each site executed in its own process limits amount of data that is exposed Chrome 67: default, Firefox: work in progress 50
231 Serialization Insert instructions stopping speculation 51
232 Serialization Insert instructions stopping speculation insert after every bounds check 51
233 Serialization Insert instructions stopping speculation insert after every bounds check x86: LFENCE, ARM: CSDB with conditional selects or moves 51
234 InvisiSpec Make transient loads invisible in the cache hierarchy 52
235 InvisiSpec Make transient loads invisible in the cache hierarchy all transient loads use a speculative buffer 52
236 InvisiSpec Make transient loads invisible in the cache hierarchy all transient loads use a speculative buffer Correct prediction: buffer content loaded into cache 52
237 InvisiSpec Make transient loads invisible in the cache hierarchy all transient loads use a speculative buffer Correct prediction: buffer content loaded into cache Wrong prediction: transient load is reverted 52
238 Spectre: Defense Analysis Attack Defense InvisiSpec DAWG SafeSpec RSB Stuffing Retpoline Poison Value Index Masking Site Isolation SLH YSNB IBRS STIPB IBPB Serialization Timer Reduction Sloth Taint Tracking SSBD/SSBB
239 Spectre: Defense Analysis Attack Defense InvisiSpec DAWG SafeSpec RSB Stuffing Retpoline Poison Value Index Masking Site Isolation SLH YSNB IBRS STIPB IBPB Serialization Timer Reduction Sloth Taint Tracking SSBD/SSBB Intel Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL
240 Spectre: Defense Analysis Attack Defense InvisiSpec DAWG SafeSpec RSB Stuffing Retpoline Poison Value Index Masking Site Isolation SLH YSNB IBRS STIPB IBPB Serialization Timer Reduction Sloth Taint Tracking SSBD/SSBB Intel ARM Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL
241 Spectre: Defense Analysis Attack Defense InvisiSpec DAWG SafeSpec RSB Stuffing Retpoline Poison Value Index Masking Site Isolation SLH YSNB IBRS STIPB IBPB Serialization Timer Reduction Sloth Taint Tracking SSBD/SSBB Intel ARM AMD Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL 53
242
243
244
245
246 Transient Execution Attacks: Classification Transient cause? 54
247 Transient Execution Attacks: Classification Spectre-type prediction Transient cause? fault Meltdown-type 54
248 Transient Execution Attacks: Classification Spectre-type microarchitectural buffer Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL prediction Transient cause? fault Meltdown-type 54
249 Transient Execution Attacks: Classification Transient cause? Spectre-type prediction fault microarchitectural buffer Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL mistraining strategy Cross-address-space Same-address-space Cross-address-space Same-address-space Cross-address-space Same-address-space Meltdown-type 54
250 Transient Execution Attacks: Classification Transient cause? Spectre-type prediction fault microarchitectural buffer Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL mistraining strategy Cross-address-space Same-address-space Cross-address-space Same-address-space Cross-address-space Same-address-space in-place (IP) vs., out-of-place (OP) PHT-SA-IP BTB-CA-IP BTB-CA-OP RSB-CA-IP Meltdown-type RSB-CA-OP RSB-SA-IP RSB-SA-OP 54
251 Transient Execution Attacks: Classification Transient cause? Spectre-type prediction fault microarchitectural buffer Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL mistraining strategy Cross-address-space Same-address-space Cross-address-space Same-address-space Cross-address-space Same-address-space in-place (IP) vs., out-of-place (OP) PHT-CA-IP PHT-CA-OP PHT-SA-IP PHT-SA-OP BTB-CA-IP BTB-CA-OP BTB-SA-IP BTB-SA-OP RSB-CA-IP Meltdown-type RSB-CA-OP RSB-SA-IP RSB-SA-OP 54
252 Transient Execution Attacks: Classification Transient cause? Spectre-type prediction fault microarchitectural buffer Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL Meltdown-NM mistraining strategy Cross-address-space Same-address-space Cross-address-space Same-address-space Cross-address-space Same-address-space in-place (IP) vs., out-of-place (OP) PHT-CA-IP PHT-CA-OP PHT-SA-IP PHT-SA-OP BTB-CA-IP BTB-CA-OP BTB-SA-IP BTB-SA-OP RSB-CA-IP Meltdown-type Meltdown-PF RSB-CA-OP RSB-SA-IP RSB-SA-OP fault type Meltdown-BR 54 Meltdown-GP
253 Transient Execution Attacks: Classification Transient cause? Spectre-type prediction fault Meltdown-type microarchitectural buffer fault type Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL Meltdown-NM Meltdown-AC. Meltdown-DE. Meltdown-PF Meltdown-UD. Meltdown-SS. Meltdown-BR mistraining strategy Cross-address-space Same-address-space Cross-address-space Same-address-space Cross-address-space Same-address-space in-place (IP) vs., out-of-place (OP) PHT-CA-IP PHT-CA-OP PHT-SA-IP PHT-SA-OP BTB-CA-IP BTB-CA-OP BTB-SA-IP BTB-SA-OP RSB-CA-IP RSB-CA-OP RSB-SA-IP RSB-SA-OP 54 Meltdown-GP
254 Transient Execution Attacks: Classification Transient cause? Spectre-type prediction fault Meltdown-type microarchitectural buffer fault type Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL Meltdown-NM Meltdown-AC. Meltdown-DE. Meltdown-PF Meltdown-UD. Meltdown-SS. mistraining strategy Cross-address-space Same-address-space Cross-address-space Same-address-space Cross-address-space Same-address-space Meltdown-US Meltdown-P Meltdown-RW in-place (IP) vs., out-of-place (OP) PHT-CA-IP PHT-CA-OP PHT-SA-IP PHT-SA-OP BTB-CA-IP BTB-CA-OP BTB-SA-IP BTB-SA-OP RSB-CA-IP RSB-CA-OP RSB-SA-IP RSB-SA-OP Meltdown-BR Meltdown-MPX 54 Meltdown-GP
255 Transient Execution Attacks: Classification Transient cause? Spectre-type prediction fault Meltdown-type microarchitectural buffer fault type Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL Meltdown-NM Meltdown-AC. Meltdown-DE. Meltdown-PF Meltdown-UD. Meltdown-SS. mistraining strategy Cross-address-space Same-address-space Cross-address-space Same-address-space Cross-address-space Same-address-space Meltdown-US Meltdown-P Meltdown-RW Meltdown-XD. Meltdown-SM. in-place (IP) vs., out-of-place (OP) PHT-CA-IP PHT-CA-OP PHT-SA-IP PHT-SA-OP BTB-CA-IP BTB-CA-OP BTB-SA-IP BTB-SA-OP RSB-CA-IP RSB-CA-OP RSB-SA-IP RSB-SA-OP Meltdown-BR Meltdown-MPX 54 Meltdown-GP
256 Transient Execution Attacks: Classification Transient cause? Spectre-type prediction fault Meltdown-type microarchitectural buffer fault type Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL Meltdown-NM Meltdown-AC. Meltdown-DE. Meltdown-PF Meltdown-UD. Meltdown-SS. mistraining strategy Cross-address-space Same-address-space Cross-address-space Same-address-space Cross-address-space Same-address-space Meltdown-US Meltdown-P Meltdown-RW Meltdown-PK Meltdown-XD. Meltdown-SM. in-place (IP) vs., out-of-place (OP) PHT-CA-IP PHT-CA-OP PHT-SA-IP PHT-SA-OP BTB-CA-IP BTB-CA-OP BTB-SA-IP BTB-SA-OP RSB-CA-IP RSB-CA-OP RSB-SA-IP RSB-SA-OP 54 Meltdown-BR Meltdown-GP Meltdown-MPX Meltdown-BND
257 Lessons learned We have ignored microarchitectural attacks for many years: 55
258 Lessons learned We have ignored microarchitectural attacks for many years: attacks on crypto 55
259 Lessons learned We have ignored microarchitectural attacks for many years: attacks on crypto software should be fixed 55
260 Lessons learned We have ignored microarchitectural attacks for many years: attacks on crypto software should be fixed attacks on ASLR 55
261 Lessons learned We have ignored microarchitectural attacks for many years: attacks on crypto software should be fixed attacks on ASLR ASLR is broken anyway 55
262 Lessons learned We have ignored microarchitectural attacks for many years: attacks on crypto software should be fixed attacks on ASLR ASLR is broken anyway attacks on TEEs 55
263 Lessons learned We have ignored microarchitectural attacks for many years: attacks on crypto software should be fixed attacks on ASLR ASLR is broken anyway attacks on TEEs not within threat model 55
264 Lessons learned We have ignored microarchitectural attacks for many years: attacks on crypto software should be fixed attacks on ASLR ASLR is broken anyway attacks on TEEs not within threat model Rowhammer 55
265 Lessons learned We have ignored microarchitectural attacks for many years: attacks on crypto software should be fixed attacks on ASLR ASLR is broken anyway attacks on TEEs not within threat model Rowhammer only some cheap modules 55
266 Lessons learned We have ignored microarchitectural attacks for many years: attacks on crypto software should be fixed attacks on ASLR ASLR is broken anyway attacks on TEEs not within threat model Rowhammer only some cheap modules for years we solely optimized for performance 55
267 56 Conclusion
268 Conclusion Optimizations always come at a cost 56
269 Conclusion Optimizations always come at a cost Some mitigations cost more than gained by the feature they defend 56
270 Conclusion Optimizations always come at a cost Some mitigations cost more than gained by the feature they defend Transient-execution attacks will keep us busy for a while 56
271 Moritz Lipp Michael Schwarz Claudio Canella Daniel Gruss
Software-based Microarchitectural Attacks
Software-based Microarchitectural Attacks Daniel Gruss January 23, 2019 Graz University of Technology Frame Rate of Slide Deck: 11 fps 1 Daniel Gruss Graz University of Technology 1337 4242 Revolutionary
More informationWho am I? Moritz Lipp PhD Graz University of
Who am I? Moritz Lipp PhD student @ Graz University of Technology @mlqxyz moritz.lipp@iaik.tugraz.at 1 Moritz Lipp, Michael Schwarz, Daniel Gruss Graz University of Technology Who am I? Michael Schwarz
More informationThe Story of Meltdown and Spectre. Jann Horn & Daniel Gruss May 17, 2018
The Story of Meltdown and Spectre Jann Horn & Daniel Gruss May 17, 2018 1 Who are we Jann Horn Google Project Zero jannh@google.com 2 Who are we Daniel Gruss Post-Doc @ Graz University Of Technology @lavados
More informationSpectre and Meltdown: Data leaks during speculative execution
Spectre and Meltdown: Data leaks during speculative execution Speaker: Jann Horn (Google Project Zero) Paul Kocher (independent) Daniel Genkin (University of Pennsylvania and University of Maryland) Yuval
More informationMeltdown and Spectre: Complexity and the death of security
Meltdown and Spectre: Complexity and the death of security May 8, 2018 Meltdown and Spectre: Wait, my computer does what? May 8, 2018 Meltdown and Spectre: Whoever thought that was a good idea? May 8,
More informationarxiv: v1 [cs.cr] 3 Jan 2018
Meltdown arxiv:1801.01207v1 [cs.cr] 3 Jan 2018 Abstract Moritz Lipp 1, Michael Schwarz 1, Daniel Gruss 1, Thomas Prescher 2, Werner Haas 2, Stefan Mangard 1, Paul Kocher 3, Daniel Genkin 4, Yuval Yarom
More informationCSE 127 Computer Security
CSE 127 Computer Security Stefan Savage, Spring 2018, Lecture 19 Hardware Security: Meltdown, Spectre, Rowhammer Vulnerabilities and Abstractions Abstraction Reality Vulnerability Review: ISA and µarchitecture
More informationSoftware Solutions to Micro-architectural Side Channels. Yinqian Zhang Assistant Professor Computer Science & Engineering The Ohio State University
Software Solutions to Micro-architectural Side Channels Yinqian Zhang Assistant Professor Computer Science & Engineering The Ohio State University Introduction Research interests Computer system security
More informationSpectre Returns! Speculation Attacks Using Return Stack Buffer
Spectre Returns! Speculation Attacks Using Return Stack Buffer Esmaeil Mohammadian, Khaled N. Khasawneh, Chengyue Song and Nael Abu-Ghazaleh University of California, Riverside WOOT 2018 BALTIMORE, USA
More informationTo accelerate our learnings, we brought in an expert in CPU side channel attacks. Anders Fogh
To accelerate our learnings, we brought in an expert in CPU side channel attacks Anders Fogh Virtualization-based isolation Microsoft Azure, Hyper-V Affected Kernel-user separation Windows Affected Process-based
More informationWhite Paper SOFTWARE TECHNIQUES FOR MANAGING SPECULATION ON AMD PROCESSORS
White Paper SOFTWARE TECHNIQUES FOR MANAGING SPECULATION ON AMD PROCESSORS INTRODUCTION Speculative execution is a basic principle of all modern processor designs and is critical to support high performance
More informationMeltdown and Spectre - understanding and mitigating the threats (Part Deux)
Meltdown and Spectre - understanding and mitigating the threats (Part Deux) Gratuitous vulnerability logos Jake Williams @MalwareJake SANS / Rendition Infosec sans.org / rsec.us @SANSInstitute / @RenditionSec
More informationMeltdown or "Holy Crap: How did we do this to ourselves" Meltdown exploits side effects of out-of-order execution to read arbitrary kernelmemory
Meltdown or "Holy Crap: How did we do this to ourselves" Abstract Meltdown exploits side effects of out-of-order execution to read arbitrary kernelmemory locations Breaks all security assumptions given
More informationarxiv: v2 [cs.cr] 16 Jun 2018
SafeSpec: Banishing the Spectre of a Meltdown with Leakage-Free Speculation Khaled N. Khasawneh Department of Computer Science and Engineering University of California, Riverside Email: kkhas1@ucr.edu
More informationMeltdown, Spectre, and Security Boundaries in LEON/GRLIB. Technical note Doc. No GRLIB-TN-0014 Issue 1.1
Template: GQMS-TPLT-1-1-0 Meltdown, Spectre, and Security Boundaries in LEON/GRLIB Technical note 2018-03-15 Doc. No Issue 1.1 Date: 2018-03-15 Page: 2 of 8 CHANGE RECORD Issue Date Section / Page Description
More informationExploiting Branch Target Injection. Jann Horn, Google Project Zero
Exploiting Branch Target Injection Jann Horn, Google Project Zero 1 Outline Introduction Reverse-engineering branch prediction Leaking host memory from KVM 2 Disclaimer I haven't worked in CPU design I
More informationSpectre and Meltdown. Clifford Wolf q/talk
Spectre and Meltdown Clifford Wolf q/talk 2018-01-30 Spectre and Meltdown Spectre (CVE-2017-5753 and CVE-2017-5715) Is an architectural security bug that effects most modern processors with speculative
More informationMicro-architectural Attacks. Chester Rebeiro IIT Madras
Micro-architectural Attacks Chester Rebeiro IIT Madras 1 Cryptography Passwords Information Flow Policies Privileged Rings ASLR Virtual Machines and confinement Javascript and HTML5 (due to restricted
More informationÉvolution des attaques sur la micro-architecture
Évolution des attaques sur la micro-architecture Clémentine Maurice, CNRS, IRISA 23 Janvier 2019 Journée Nouvelles Avancées en Sécurité des Systèmes d Information, INSA de Toulouse/LAAS-CNRS Attacks on
More informationDisclaimer. This talk vastly over-simplifies things. See notes for full details and resources.
Greg Kroah-Hartman Disclaimer This talk vastly over-simplifies things. See notes for full details and resources. https://github.com/gregkh/presentation-spectre Spectre Hardware bugs Valid code can be tricked
More informationMeltdown and Spectre - understanding and mitigating the threats
Meltdown and Spectre - understanding and mitigating the threats Gratuitous vulnerability logos Jake Williams @MalwareJake SANS / Rendition Infosec sans.org / rsec.us @RenditionSec The sky isn t falling!
More informationSecurity-Aware Processor Architecture Design. CS 6501 Fall 2018 Ashish Venkat
Security-Aware Processor Architecture Design CS 6501 Fall 2018 Ashish Venkat Agenda Theme Selection (due today at 11:59:59pm) Readings and Presentation Logistics Quick Processor Architecture Review (continued
More informationLeaky Cauldron on the Dark Land: Understanding Memory Side-Channel Hazards in SGX
Leaky Cauldron on the Dark Land: Understanding Memory Side-Channel Hazards in SGX W. Wang, G. Chen, X, Pan, Y. Zhang, XF. Wang, V. Bindschaedler, H. Tang, C. Gunter. September 19, 2017 Motivation Intel
More informationSpectre, Meltdown, and the Impact of Security Vulnerabilities on your IT Environment. Orin Jeff Melnick
Spectre, Meltdown, and the Impact of Security Vulnerabilities on your IT Environment Orin Thomas @orinthomas Jeff Melnick Jeff.Melnick@Netwrix.com In this session Vulnerability types Spectre Meltdown Spectre
More informationFrom bottom to top: Exploiting hardware side channels in web browsers
From bottom to top: Exploiting hardware side channels in web browsers Clémentine Maurice, Graz University of Technology July 4, 2017 RMLL, Saint-Étienne, France Rennes Graz Clémentine Maurice PhD since
More informationPentium IV-XEON. Computer architectures M
Pentium IV-XEON Computer architectures M 1 Pentium IV block scheme 4 32 bytes parallel Four access ports to the EU 2 Pentium IV block scheme Address Generation Unit BTB Branch Target Buffer I-TLB Instruction
More informationMeltdown and Spectre: Complexity and the death of security
Meltdown and Spectre: Complexity and the death of security May 8, 2018 Meltdown and Spectre: Wait, my computer does what? January 24, 2018 Meltdown and Spectre: Whoever thought that was a good idea? January
More informationSuperscalar Processors Ch 14
Superscalar Processors Ch 14 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction PowerPC, Pentium 4 1 Superscalar Processing (5) Basic idea: more than one instruction completion
More informationSuperscalar Processing (5) Superscalar Processors Ch 14. New dependency for superscalar case? (8) Output Dependency?
Superscalar Processors Ch 14 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction PowerPC, Pentium 4 1 Superscalar Processing (5) Basic idea: more than one instruction completion
More informationHardware Enclave Attacks CS261
Hardware Enclave Attacks CS261 Threat Model of Hardware Enclaves Intel Attestation Service (IAS) Process Enclave Untrusted Trusted Enclave Code Enclave Data Process Process Other Enclave OS and/or Hypervisor
More informationSecuring High-performance RISC-V Processors from Time Speculation Christopher Celio, Jose Renau Esperanto Technologies {christopher.
Securing High-performance RISC-V Processors from Time Speculation Christopher Celio, Jose Renau Esperanto Technologies {christopher.celio, jose.renau} @esperantotech.com 8th RISC-V Workshop Barcelona,
More informationDisclaimer. This talk vastly over-simplifies things. See notes for full details and resources.
Greg Kroah-Hartman Disclaimer This talk vastly over-simplifies things. See notes for full details and resources. https://github.com/gregkh/presentation-spectre Spectre Hardware bugs Valid code can be tricked
More informationWhite Paper. How the Meltdown and Spectre bugs work and what you can do to prevent a performance plummet. Contents
White Paper How the Meltdown and Spectre bugs work and what you can do to prevent a performance plummet Programs that do a lot of I/O are likely to be the worst hit by the patches designed to fix the Meltdown
More informationCS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines
CS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines Sreepathi Pai UTCS September 14, 2015 Outline 1 Introduction 2 Out-of-order Scheduling 3 The Intel Haswell
More informationIntel Analysis of Speculative Execution Side Channels
Intel Analysis of Speculative Execution Side Channels White Paper Revision 1.0 January 2018 Document Number: 336983-001 Intel technologies features and benefits depend on system configuration and may require
More informationHY225 Lecture 12: DRAM and Virtual Memory
HY225 Lecture 12: DRAM and irtual Memory Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS May 16, 2011 Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 1 / 36 DRAM Fundamentals Random-access
More informationARMageddon: Cache Attacks on Mobile Devices
ARMageddon: Cache Attacks on Mobile Devices Moritz Lipp, Daniel Gruss, Raphael Spreitzer, Clémentine Maurice, Stefan Mangard Graz University of Technology 1 TLDR powerful cache attacks (like Flush+Reload)
More informationThe Pentium II/III Processor Compiler on a Chip
The Pentium II/III Processor Compiler on a Chip Ronny Ronen Senior Principal Engineer Director of Architecture Research Intel Labs - Haifa Intel Corporation Tel Aviv University January 20, 2004 1 Agenda
More informationEN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design
EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown
More informationEffective Virtual CPU Configuration in Nova
Effective Virtual CPU Configuration in Nova Kashyap Chamarthy OpenStack Summit Berlin, 2018 1 / 39 Timeline of recent CPU flaws, 2018 (a) Jan 03 Spectre v1: Bounds Check Bypass Jan
More informationThe Processor: Instruction-Level Parallelism
The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy
More informationVirtual Memory. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University
Virtual Memory Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Precise Definition of Virtual Memory Virtual memory is a mechanism for translating logical
More informationIntel released new technology call P6P
P6 and IA-64 8086 released on 1978 Pentium release on 1993 8086 has upgrade by Pipeline, Super scalar, Clock frequency, Cache and so on But 8086 has limit, Hard to improve efficiency Intel released new
More informationA superscalar machine is one in which multiple instruction streams allow completion of more than one instruction per cycle.
CS 320 Ch. 16 SuperScalar Machines A superscalar machine is one in which multiple instruction streams allow completion of more than one instruction per cycle. A superpipelined machine is one in which a
More informationItanium 2 Processor Microarchitecture Overview
Itanium 2 Processor Microarchitecture Overview Don Soltis, Mark Gibson Cameron McNairy, August 2002 Block Diagram F 16KB L1 I-cache Instr 2 Instr 1 Instr 0 M/A M/A M/A M/A I/A Template I/A B B 2 FMACs
More informationReducing Miss Penalty: Read Priority over Write on Miss. Improving Cache Performance. Non-blocking Caches to reduce stalls on misses
Improving Cache Performance 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the. Reducing Miss Penalty: Read Priority over Write on Miss Write buffers may offer RAW
More informationTDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading
Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5
More informationVirtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1
Virtual Memory Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L16-1 Reminder: Operating Systems Goals of OS: Protection and privacy: Processes cannot access each other s data Abstraction:
More informationJump Over ASLR: Attacking Branch Predictors to Bypass ASLR
Jump Over ASLR: Attacking Branch Predictors to Bypass ASLR Presentation by Eric Newberry and Youssef Tobah Paper by Dmitry Evtyushkin, Dmitry Ponomarev, and Nael Abu-Ghazaleh 1 Motivation Buffer overflow
More informationModern Processor Architectures. L25: Modern Compiler Design
Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions
More informationControl Hazards. Branch Prediction
Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional
More informationVirtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1
Virtual Memory Patterson & Hennessey Chapter 5 ELEC 5200/6200 1 Virtual Memory Use main memory as a cache for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs
More informationCS 61C: Great Ideas in Computer Architecture. Multiple Instruction Issue, Virtual Memory Introduction
CS 61C: Great Ideas in Computer Architecture Multiple Instruction Issue, Virtual Memory Introduction Instructor: Justin Hsia 7/26/2012 Summer 2012 Lecture #23 1 Parallel Requests Assigned to computer e.g.
More informationComputer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling)
18-447 Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling) Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 2/13/2015 Agenda for Today & Next Few Lectures
More informationPhoto David Wright STEVEN R. BAGLEY PIPELINES AND ILP
Photo David Wright https://www.flickr.com/photos/dhwright/3312563248 STEVEN R. BAGLEY PIPELINES AND ILP INTRODUCTION Been considering what makes the CPU run at a particular speed Spent the last two weeks
More informationStatic, multiple-issue (superscaler) pipelines
Static, multiple-issue (superscaler) pipelines Start more than one instruction in the same cycle Instruction Register file EX + MEM + WB PC Instruction Register file EX + MEM + WB 79 A static two-issue
More informationChapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs
Chapter 5 (Part II) Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Virtual Machines Host computer emulates guest operating system and machine resources Improved isolation of multiple
More informationCPU models after Spectre & Meltdown. Paolo Bonzini Red Hat, Inc. KVM Forum 2018
CPU models after Spectre & Meltdown Paolo Bonzini Red Hat, Inc. KVM Forum 2018 Can this guest run on that machine? It depends! Host processor Microcode version Kernel version QEMU Machine type 2 How can
More informationAdvanced processor designs
Advanced processor designs We ve only scratched the surface of CPU design. Today we ll briefly introduce some of the big ideas and big words behind modern processors by looking at two example CPUs. The
More informationJavaScript Zero. Real JavaScript and Zero Side-Channel Attacks. Michael Schwarz, Moritz Lipp, Daniel Gruss
JavaScript Zero Real JavaScript and Zero Side-Channel Attacks Michael Schwarz, Moritz Lipp, Daniel Gruss 20.02.2018 www.iaik.tugraz.at 1 Michael Schwarz, Moritz Lipp, Daniel Gruss www.iaik.tugraz.at Outline
More informationReduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction
ISA Support Needed By CPU Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with control hazards in instruction pipelines by: 1 2 3 4 Assuming that the branch
More informationVirtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1
Virtual Memory Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L20-1 Reminder: Operating Systems Goals of OS: Protection and privacy: Processes cannot access each other s data Abstraction:
More informationCache Side Channel Attacks on Intel SGX
Cache Side Channel Attacks on Intel SGX Princeton University Technical Report CE-L2017-001 January 2017 Zecheng He Ruby B. Lee {zechengh, rblee}@princeton.edu Department of Electrical Engineering Princeton
More informationAdvanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University
Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:
More informationHardware-Based Speculation
Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register
More informationLast time: forwarding/stalls. CS 6354: Branch Prediction (con t) / Multiple Issue. Why bimodal: loops. Last time: scheduling to avoid stalls
CS 6354: Branch Prediction (con t) / Multiple Issue 14 September 2016 Last time: scheduling to avoid stalls 1 Last time: forwarding/stalls add $a0, $a2, $a3 ; zero or more instructions sub $t0, $a0, $a1
More informationBreaking Kernel Address Space Layout Randomization (KASLR) with Intel TSX. Yeongjin Jang, Sangho Lee, and Taesoo Kim Georgia Institute of Technology
Breaking Kernel Address Space Layout Randomization (KASLR) with Intel TSX Yeongjin Jang, Sangho Lee, and Taesoo Kim Georgia Institute of Technology Kernel Address Space Layout Randomization (KASLR) A statistical
More informationComplex Pipelines and Branch Prediction
Complex Pipelines and Branch Prediction Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L22-1 Processor Performance Time Program Instructions Program Cycles Instruction CPI Time Cycle
More informationCS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false.
CS 2410 Mid term (fall 2015) Name: Question 1 (10 points) Indicate which of the following statements is true and which is false. (1) SMT architectures reduces the thread context switch time by saving in
More informationChapter 5 B. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 B Large and Fast: Exploiting Memory Hierarchy Dependability 5.5 Dependable Memory Hierarchy Chapter 6 Storage and Other I/O Topics 2 Dependability Service accomplishment Service delivered as
More informationSuperscalar Processor Design
Superscalar Processor Design Superscalar Organization Virendra Singh Indian Institute of Science Bangalore virendra@computer.org Lecture 26 SE-273: Processor Design Super-scalar Organization Fetch Instruction
More informationCMSC22200 Computer Architecture Lecture 8: Out-of-Order Execution. Prof. Yanjing Li University of Chicago
CMSC22200 Computer Architecture Lecture 8: Out-of-Order Execution Prof. Yanjing Li University of Chicago Administrative Stuff! Lab2 due tomorrow " 2 free late days! Lab3 is out " Start early!! My office
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Instruction Commit The End of the Road (um Pipe) Commit is typically the last stage of the pipeline Anything an insn. does at this point is irrevocable Only actions following
More informationWide Instruction Fetch
Wide Instruction Fetch Fall 2007 Prof. Thomas Wenisch http://www.eecs.umich.edu/courses/eecs470 edu/courses/eecs470 block_ids Trace Table pre-collapse trace_id History Br. Hash hist. Rename Fill Table
More informationPerformance of Computer Systems. CSE 586 Computer Architecture. Review. ISA s (RISC, CISC, EPIC) Basic Pipeline Model.
Performance of Computer Systems CSE 586 Computer Architecture Review Jean-Loup Baer http://www.cs.washington.edu/education/courses/586/00sp Performance metrics Use (weighted) arithmetic means for execution
More informationForeshadow: Extracting the Keys to the Intel SGX Kingdom with Transient Out-of-Order Execution
Foreshadow: Extracting the Keys to the Intel SGX Kingdom with Transient Out-of-Order Execution Jo Van Bulck, imec-distrinet, KU Leuven; Marina Minkin, Technion; Ofir Weisse, Daniel Genkin, and Baris Kasikci,
More information15-740/ Computer Architecture Lecture 22: Superscalar Processing (II) Prof. Onur Mutlu Carnegie Mellon University
15-740/18-740 Computer Architecture Lecture 22: Superscalar Processing (II) Prof. Onur Mutlu Carnegie Mellon University Announcements Project Milestone 2 Due Today Homework 4 Out today Due November 15
More informationECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 15 Very Long Instruction Word Machines
ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 15 Very Long Instruction Word Machines Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall11.html
More informationMeltdown Attack Lab. 1 Introduction. SEED Labs Meltdown Attack Lab 1
SEED Labs Meltdown Attack Lab 1 Meltdown Attack Lab Copyright 2018 Wenliang Du, Syracuse University. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International
More informationRetpoline: A Branch Target Injection Mitigation
Retpoline: A Branch Target Injection Mitigation White Paper Revision 003 June, 2018 Any future revisions to this content can be found at https://software.intel.com/security-software-guidance/ when new
More informationVirtual Memory. Physical Addressing. Problem 2: Capacity. Problem 1: Memory Management 11/20/15
Memory Addressing Motivation: why not direct physical memory access? Address translation with pages Optimizing translation: translation lookaside buffer Extra benefits: sharing and protection Memory as
More informationA Comparative Performance Evaluation of Different Application Domains on Server Processor Architectures
A Comparative Performance Evaluation of Different Application Domains on Server Processor Architectures W.M. Roshan Weerasuriya and D.N. Ranasinghe University of Colombo School of Computing A Comparative
More informationLow-power Architecture. By: Jonathan Herbst Scott Duntley
Low-power Architecture By: Jonathan Herbst Scott Duntley Why low power? Has become necessary with new-age demands: o Increasing design complexity o Demands of and for portable equipment Communication Media
More informationEE 4980 Modern Electronic Systems. Processor Advanced
EE 4980 Modern Electronic Systems Processor Advanced Architecture General Purpose Processor User Programmable Intended to run end user selected programs Application Independent PowerPoint, Chrome, Twitter,
More informationLECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY
LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal
More informationComputer Science 146. Computer Architecture
Computer Architecture Spring 2004 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture 9: Limits of ILP, Case Studies Lecture Outline Speculative Execution Implementing Precise Interrupts
More informationModule 5: "MIPS R10000: A Case Study" Lecture 9: "MIPS R10000: A Case Study" MIPS R A case study in modern microarchitecture.
Module 5: "MIPS R10000: A Case Study" Lecture 9: "MIPS R10000: A Case Study" MIPS R10000 A case study in modern microarchitecture Overview Stage 1: Fetch Stage 2: Decode/Rename Branch prediction Branch
More informationEN164: Design of Computing Systems Lecture 24: Processor / ILP 5
EN164: Design of Computing Systems Lecture 24: Processor / ILP 5 Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University
More informationCOSC 6385 Computer Architecture. - Memory Hierarchies (II)
COSC 6385 Computer Architecture - Memory Hierarchies (II) Fall 2008 Cache Performance Avg. memory access time = Hit time + Miss rate x Miss penalty with Hit time: time to access a data item which is available
More informationMobilizing the Micro-Ops: Exploiting Context Sensitive Decoding for Security and Energy Efficiency
Mobilizing the Micro-Ops: Exploiting Context Sensitive Decoding for Security and Energy Efficiency Mohammadkazem Taram, Ashish Venkat, Dean Tullsen University of California, San Diego The Tension between
More informationPutting it all Together: Modern Computer Architecture
Putting it all Together: Modern Computer Architecture Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. May 10, 2018 L23-1 Administrivia Quiz 3 tonight on room 50-340 (Walker Gym) Quiz
More informationSandboxing Untrusted Code: Software-Based Fault Isolation (SFI)
Sandboxing Untrusted Code: Software-Based Fault Isolation (SFI) Brad Karp UCL Computer Science CS GZ03 / M030 9 th December 2011 Motivation: Vulnerabilities in C Seen dangers of vulnerabilities: injection
More informationComputer Science 146. Computer Architecture
Computer Architecture Spring 2004 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture 12: Hardware Assisted Software ILP and IA64/Itanium Case Study Lecture Outline Review of Global Scheduling,
More informationCOSC 6385 Computer Architecture - Memory Hierarchy Design (III)
COSC 6385 Computer Architecture - Memory Hierarchy Design (III) Fall 2006 Reducing cache miss penalty Five techniques Multilevel caches Critical word first and early restart Giving priority to read misses
More informationVersion:1.1. Overview of speculation-based cache timing side-channels
Author: Richard Grisenthwaite Date: January 2018 Version 1.1 Introduction This whitepaper looks at the susceptibility of Arm implementations following recent research findings from security researchers
More informationE0-243: Computer Architecture
E0-243: Computer Architecture L1 ILP Processors RG:E0243:L1-ILP Processors 1 ILP Architectures Superscalar Architecture VLIW Architecture EPIC, Subword Parallelism, RG:E0243:L1-ILP Processors 2 Motivation
More informationMicro-Architectural Attacks and Countermeasures
Micro-Architectural Attacks and Countermeasures Çetin Kaya Koç koc@cs.ucsb.edu Çetin Kaya Koç http://koclab.org Winter 2017 1 / 25 Contents Micro-Architectural Attacks Cache Attacks Branch Prediction Attack
More informationCSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1
CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level
More informationHandout 2 ILP: Part B
Handout 2 ILP: Part B Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism Loop unrolling by compiler to increase ILP Branch prediction to increase ILP
More informationCS450/650 Notes Winter 2013 A Morton. Superscalar Pipelines
CS450/650 Notes Winter 2013 A Morton Superscalar Pipelines 1 Scalar Pipeline Limitations (Shen + Lipasti 4.1) 1. Bounded Performance P = 1 T = IC CPI 1 cycletime = IPC frequency IC IPC = instructions per
More information