Performance is awesome!

Size: px

Start display at page:

Download "Performance is awesome!"

Hugh Willis
5 years ago
Views:

2 Acknowledgements I Background music for the choir song kindly provided by Kerbo-Kev. Cooking photos kindly provided by Becca Lee (ladyfaceblog). Santa Clause images by Some picture components are included from Mickey s Christmas Carol under fair use. 1

3 Acknowledgements II We want to thank our collaborators: Anders Fogh, Benjamin von Berg, Daniel Genkin, Dmitry Evtyushkin, Frank Piessens, Jann Horn, Jo Van Bulck, Mike Hamburg, Paul Kocher, Philipp Ortner, Stefan Mangard, Thomas Prescher, Werner Haas, and Yuval Yarom. The research behind this talk was partially funded by a generous gift from ARM and a generous gift from Intel. Any opinions, findings, and conclusions or recommendations are those of the authors and do not necessarily reflect the views of the funding parties. 2

4 3 Performance is awesome!

5 3 Performance is awesome!

6 Performance is awesome!

7 Performance is awesome! MHz 3

8 Performance is awesome! MHz RISC emulating CISC 3

9 Performance is awesome! MHz RISC emulating CISC 256 KB L2 cache integrated! 3

10 Performance is awesome! MHz RISC emulating CISC 256 KB L2 cache integrated! branch prediction 3

11 Performance is awesome! MHz RISC emulating CISC 256 KB L2 cache integrated! branch prediction out-of-order execution 3

12 More and More Performance The future is going to be fast: 4

13 More and More Performance The future is going to be fast: Apple A12 Bionic (iphone X): 16 KB pages 128 KB caches 4

14 Spectres of The Past

15 Side-Channel Attacks Bug-free software does not mean safe execution 5

16 Side-Channel Attacks Bug-free software does not mean safe execution Information leaks due to underlying hardware 5

17 Side-Channel Attacks Bug-free software does not mean safe execution Information leaks due to underlying hardware Exploit leakage through side-effects 5

18 Side-Channel Attacks Bug-free software does not mean safe execution Information leaks due to underlying hardware Exploit leakage through side-effects Power consumption Execution time CPU caches 5

19 Architecture and Microarchitecture Instruction Set Architecture (ISA) is an abstract model of a computer (x86, ARMv8, SPARC,... ) 6

20 Architecture and Microarchitecture Instruction Set Architecture (ISA) is an abstract model of a computer (x86, ARMv8, SPARC,... ) Interface between hardware and software 6

21 Architecture and Microarchitecture Instruction Set Architecture (ISA) is an abstract model of a computer (x86, ARMv8, SPARC,... ) Interface between hardware and software Microarchitecture is an ISA implementation 6

22 Architecture and Microarchitecture Instruction Set Architecture (ISA) is an abstract model of a computer (x86, ARMv8, SPARC,... ) Interface between hardware and software Microarchitecture is an ISA implementation 6

23 Microarchitectural Elements Modern CPUs contain multiple microarchitectural elements 7

24 Microarchitectural Elements Modern CPUs contain multiple microarchitectural elements Caches and buffer Predictor 7

25 Microarchitectural Elements Modern CPUs contain multiple microarchitectural elements Caches and buffer Predictor Transparent for the programmer 7

26 Microarchitectural Elements Modern CPUs contain multiple microarchitectural elements Caches and buffer Predictor Transparent for the programmer Timing optimizations side-channel leakage 7

27 8 Food Cache

28 8 Food Cache

29 8 Food Cache

30 8 Food Cache

31 8 Food Cache

32 8 Food Cache

33 8 Food Cache

35 Revolutionary concept! Store your food at home, never go to the grocery store during cooking. Can store ALL kinds of food. ONLY TODAY INSTEAD OF $1,300 ORDER VIA PHONE:

38 CPU Cache printf("%d", i); printf("%d", i); 9

39 CPU Cache printf("%d", i); printf("%d", i); Cache miss 9

40 CPU Cache printf("%d", i); Cache miss Request printf("%d", i); 9

41 CPU Cache printf("%d", i); printf("%d", i); Cache miss Request Response 9

42 CPU Cache printf("%d", i); printf("%d", i); Cache miss i Request Response 9

43 CPU Cache printf("%d", i); printf("%d", i); Cache miss Cache hit i Request Response 9

44 CPU Cache DRAM access, slow printf("%d", i); printf("%d", i); Cache miss Cache hit i Request Response 9

45 CPU Cache DRAM access, slow printf("%d", i); printf("%d", i); Cache miss Cache hit No DRAM access, much faster i Request Response 9

46 Caching speeds up Memory Accesses Cache Hits Number of accesses Access time [CPU cycles] 10

47 Caching speeds up Memory Accesses Cache Hits Cache Misses Number of accesses Access time [CPU cycles] 10

48 Flush+Reload Attacker flush access Shared Memory Victim access 11

49 Flush+Reload Attacker flush access cached Shared Memory Shared Memory cached Victim access 11

50 Flush+Reload Attacker flush access Shared Memory Shared Memory Victim access 11

51 Flush+Reload Attacker flush access Shared Memory Victim access 11

52 Flush+Reload Attacker flush access Shared Memory Victim access 11

53 Flush+Reload Attacker flush access Shared Memory Shared Memory Victim access 11

54 Flush+Reload Attacker flush access Shared Memory Shared Memory Victim access 11

55 Flush+Reload Attacker flush access Shared Memory Shared Memory Victim access vs 11 Victim accessed (fast) Victim did not access (slow)

56 Attacks Just by looking at cache hits/misses, we can... 12

57 Attacks Just by looking at cache hits/misses, we can... Leak AES keys from the cache 12

58 Attacks Just by looking at cache hits/misses, we can... Leak AES keys from the cache Leak keystroke timings via the cache 12

59 Attacks Just by looking at cache hits/misses, we can... Leak AES keys from the cache Leak keystroke timings via the cache Covertly send data through the cache 12

60 Attacks Just by looking at cache hits/misses, we can... Leak AES keys from the cache Leak keystroke timings via the cache Covertly send data through the cache Browser, Cloud, TEEs,... 12

62 More and More Performance The future is going to be fast: Apple A12 Bionic (iphone X): 16 KB pages 128 KB caches 13

63 More and More Performance The future is going to be fast: Apple A12 Bionic (iphone X): 16 KB pages 128 KB caches Intel more out-of-order parallelism 13

74 Spectres of The Present

75 Out-of-Order Execution int width = 10, height = 5; float diagonal = sqrt ( width * width + height * height ); int area = width * height ; printf (" Area %d x %d = %d\n", width, height, area ); 14

76 Out-of-Order Execution Dependency int width = 10, height = 5; float diagonal = sqrt ( width * width + height * height ); int area = width * height ; Parallelize printf (" Area %d x %d = %d\n", width, height, area ); 14

77 Out-of-Order Execution L1 Instruction Cache ITLB Frontend Branch Predictor µop Cache µops Instruction Fetch & PreDecode Instruction Queue 4-Way Decode µop µop µop µop CDB Execution Engine MUX Allocation Queue µop µop µop µop Reorder buffer µop µop µop µop µop µop µop µop Scheduler µop µop µop µop µop µop µop µop ALU, AES,... ALU, FMA,... ALU, Vect,... ALU, Branch Load data Load data Store data AGU Execution Units Instructions are fetched and decoded in the front-end Memory Subsystem Load Buffer Store Buffer DTLB L1 Data Cache STLB L2 Cache 15

78 Out-of-Order Execution L1 Instruction Cache ITLB Frontend Branch Predictor µop Cache µops Instruction Fetch & PreDecode Instruction Queue 4-Way Decode µop µop µop µop CDB Execution Engine MUX Allocation Queue µop µop µop µop Reorder buffer µop µop µop µop µop µop µop µop Scheduler µop µop µop µop µop µop µop µop ALU, AES,... ALU, FMA,... ALU, Vect,... ALU, Branch Load data Load data Store data AGU Execution Units Instructions are fetched and decoded in the front-end dispatched to the backend Memory Subsystem Load Buffer Store Buffer DTLB L1 Data Cache STLB L2 Cache 15

79 Out-of-Order Execution L1 Instruction Cache ITLB Frontend Branch Predictor µop Cache µops Instruction Fetch & PreDecode Instruction Queue 4-Way Decode µop µop µop µop CDB Execution Engine MUX Allocation Queue µop µop µop µop Reorder buffer µop µop µop µop µop µop µop µop Scheduler µop µop µop µop µop µop µop µop ALU, AES,... ALU, FMA,... ALU, Vect,... ALU, Branch Load data Load data Store data AGU Execution Units Instructions are fetched and decoded in the front-end dispatched to the backend processed by individual execution units Memory Subsystem Load Buffer Store Buffer DTLB L1 Data Cache STLB L2 Cache 15

80 Building the Code An experiment *( volatile char *) 0; array [84 * 4096] = 0; 16

81 Building the Code An experiment *( volatile char *) 0; array [84 * 4096] = 0; volatile because compiler was not happy warning: statement with no effect [- Wunused - value ] *( char *) 0; 16

82 Building the Code An experiment *( volatile char *) 0; array [84 * 4096] = 0; volatile because compiler was not happy warning: statement with no effect [- Wunused - value ] *( char *) 0; Static code analyzer is still not happy warning: Dereference of null pointer *( volatile char *) 0; 16

83 Building the Code Flush+Reload over all pages of the array Access time [cycles] Page 17

84 Building the Code Flush+Reload over all pages of the array Access time [cycles] Page Unreachable code line was actually executed 17

85 Building the Code Flush+Reload over all pages of the array Access time [cycles] Page Unreachable code line was actually executed Exception was only thrown afterwards 17

86 Building the Code Out-of-order instructions leave microarchitectural traces 18

87 Building the Code Out-of-order instructions leave microarchitectural traces We can see them for example in the cache 18

88 Building the Code Out-of-order instructions leave microarchitectural traces We can see them for example in the cache We call them transient instructions 18

89 Building the Code Out-of-order instructions leave microarchitectural traces We can see them for example in the cache We call them transient instructions Execution indirectly observable 18

90 19 Loading an address

91 19 Loading an address

92 19 Loading an address

93 19 Loading an address

94 19 Loading an address

95 19 Loading an address

96 Building the Code Add another layer of indirection to test char data = *( char *) 0 xffffffff81a000e0 ; array [ data * 4096] = 0; 20

97 Building the Code Add another layer of indirection to test char data = *( char *) 0 xffffffff81a000e0 ; array [ data * 4096] = 0; Then check if any part of array is cached 20

98 Building the Code Flush+Reload over all pages of the array Access time [cycles] Page Index of cache hit reveals data 21

99 Building the Code Flush+Reload over all pages of the array Access time [cycles] Page Index of cache hit reveals data Permission check fails sometimes 21

100

101

102 Meltdown Mitigation Kernel addresses in user space are a problem 23

103 Meltdown Mitigation Kernel addresses in user space are a problem Why don t we take the kernel addresses... 23

104 Meltdown Mitigation...and remove them if not needed? 24

105 Meltdown Mitigation...and remove them if not needed? User accessible check in hardware is not reliable 24

106 Meltdown-US Mitigation Unmap the kernel in user space 25

107 Meltdown-US Mitigation Unmap the kernel in user space Kernel addresses are then no longer present 25

108 Meltdown-US Mitigation Unmap the kernel in user space Kernel addresses are then no longer present Memory which is not mapped cannot be accessed at all 25

109 KAISER Userspace Kernelspace Applications Operating Memory System 26

110 KAISER Kernel View Userspace Applications User View Kernelspace Operating Memory System Userspace Applications context switch 27 Kernelspace

111 Meltdown-P (aka Foreshadow-NG) Page Table PTE 0 PTE 1 PTE #PTI PTE 511 L1 Cache 28

112 Meltdown-P (aka Foreshadow-NG) Page Table PTE 0 PTE 1 PTE #PTI PTE 511 present L1 Cache 28

113 Meltdown-P (aka Foreshadow-NG) Page Table PTE 0 PTE 1 PTE #PTI PTE 511 present Guest Physical to Host Physical L1 Cache 28

114 Meltdown-P (aka Foreshadow-NG) Page Table PTE 0 PTE 1 PTE #PTI PTE 511 present Guest Physical to Host Physical Physical Page L1 Cache L1 lookup with physical address 28

115 Meltdown-P (aka Foreshadow-NG) Page Table PTE 0 PTE 1 PTE #PTI PTE 511 not present L1 Cache 28

116 Meltdown-P (aka Foreshadow-NG) Page Table PTE 0 PTE 1 PTE #PTI PTE 511 not present L1 lookup with virtual address L1 Cache 28

117 More and More Performance The future is going to be fast: Apple A12 Bionic (iphone X): 16 KB pages 128 KB caches Intel more ports, more parallelism, larger reorder buffer 29

118 More and More Performance The future is going to be fast: Apple A12 Bionic (iphone X): 16 KB pages 128 KB caches Intel more ports, more parallelism, larger reorder buffer AMD perceptron-based prediction mechanisms 29

119

120 Speculative Execution Let us get rid of bottlenecks 31

121 Speculative Execution Use the naughty/nice list of last year 31

122 Speculative Execution Finally, check predictions with list of this year 31

123 Speculative Execution Throwing away wrongly manufactured presents 31

124 Speculative Execution Correct predictions result in free time 31

125

126 Spectre-PHT (aka Spectre Variant 1) LUT index = 0; char* data = "textkey"; if (index < 4) then Prediction else 33 LUT[data[index] * 4096] 0

127 Spectre-PHT (aka Spectre Variant 1) LUT index = 0; char* data = "textkey"; if (index < 4) then Prediction else 33 LUT[data[index] * 4096] 0

128 Spectre-PHT (aka Spectre Variant 1) LUT index = 0; char* data = "textkey"; if (index < 4) el se en h t Speculate Prediction LUT[data[index] * 4096] 33 0

129 Spectre-PHT (aka Spectre Variant 1) LUT index = 0; char* data = "textkey"; if (index < 4) Execute Index 't' el se en h t Prediction LUT[data[index] * 4096] 33 0

130 Spectre-PHT (aka Spectre Variant 1) LUT index = 1; char* data = "textkey"; if (index < 4) then Prediction else 33 LUT[data[index] * 4096] 0

131 Spectre-PHT (aka Spectre Variant 1) LUT index = 1; char* data = "textkey"; if (index < 4) then Prediction else 33 LUT[data[index] * 4096] 0

132 Spectre-PHT (aka Spectre Variant 1) LUT index = 1; char* data = "textkey"; if (index < 4) Index 'e' Speculate el en h t se Prediction LUT[data[index] * 4096] 33 0

133 Spectre-PHT (aka Spectre Variant 1) LUT index = 1; char* data = "textkey"; if (index < 4) Index e then Prediction else 33 LUT[data[index] * 4096] 0

134 Spectre-PHT (aka Spectre Variant 1) LUT index = 2; char* data = "textkey"; if (index < 4) then Prediction else 33 LUT[data[index] * 4096] 0

135 Spectre-PHT (aka Spectre Variant 1) LUT index = 2; char* data = "textkey"; if (index < 4) then Prediction else 33 LUT[data[index] * 4096] 0

136 Spectre-PHT (aka Spectre Variant 1) LUT index = 2; char* data = "textkey"; if (index < 4) Speculate el se en h t Index 'x' Prediction LUT[data[index] * 4096] 33 0

137 Spectre-PHT (aka Spectre Variant 1) LUT index = 2; char* data = "textkey"; if (index < 4) 33 Index x then Prediction else LUT[data[index] * 4096] 0

138 Spectre-PHT (aka Spectre Variant 1) LUT index = 3; char* data = "textkey"; if (index < 4) then Prediction else 33 LUT[data[index] * 4096] 0

139 Spectre-PHT (aka Spectre Variant 1) LUT index = 3; char* data = "textkey"; if (index < 4) then Prediction else 33 LUT[data[index] * 4096] 0

140 Spectre-PHT (aka Spectre Variant 1) LUT index = 3; char* data = "textkey"; if (index < 4) Speculate Index 't' el se en h t Prediction LUT[data[index] * 4096] 33 0

141 Spectre-PHT (aka Spectre Variant 1) LUT index = 3; char* data = "textkey"; if (index < 4) Index t then Prediction else 33 LUT[data[index] * 4096] 0

142 Spectre-PHT (aka Spectre Variant 1) LUT index = 4; char* data = "textkey"; if (index < 4) then Prediction else 33 LUT[data[index] * 4096] 0

143 Spectre-PHT (aka Spectre Variant 1) LUT index = 4; char* data = "textkey"; if (index < 4) then Prediction else 33 LUT[data[index] * 4096] 0

144 Spectre-PHT (aka Spectre Variant 1) LUT Index 'K' index = 4; char* data = "textkey"; if (index < 4) Speculate el en h t se Prediction LUT[data[index] * 4096] 33 0

145 Spectre-PHT (aka Spectre Variant 1) LUT Index 'K' index = 4; char* data = "textkey"; if (index < 4) Execute el n he se t Prediction LUT[data[index] * 4096] 33 0

146 Spectre-PHT (aka Spectre Variant 1) LUT index = 5; char* data = "textkey"; if (index < 4) then Prediction else 33 LUT[data[index] * 4096] 0

147 Spectre-PHT (aka Spectre Variant 1) LUT index = 5; char* data = "textkey"; if (index < 4) then Prediction else 33 LUT[data[index] * 4096] 0

148 Spectre-PHT (aka Spectre Variant 1) LUT Index 'E' index = 5; char* data = "textkey"; if (index < 4) Speculate el en h t se Prediction LUT[data[index] * 4096] 33 0

149 Spectre-PHT (aka Spectre Variant 1) LUT Index 'E' index = 5; char* data = "textkey"; if (index < 4) Execute el en h t se Prediction LUT[data[index] * 4096] 33 0

150 Spectre-PHT (aka Spectre Variant 1) LUT index = 6; char* data = "textkey"; if (index < 4) then Prediction else 33 LUT[data[index] * 4096] 0

151 Spectre-PHT (aka Spectre Variant 1) LUT index = 6; char* data = "textkey"; if (index < 4) then Prediction else 33 LUT[data[index] * 4096] 0

152 Spectre-PHT (aka Spectre Variant 1) LUT Index 'Y' index = 6; char* data = "textkey"; if (index < 4) Speculate el en h t se Prediction LUT[data[index] * 4096] 33 0

153 Spectre-PHT (aka Spectre Variant 1) LUT Index 'Y' index = 6; char* data = "textkey"; if (index < 4) Execute el en h t se Prediction LUT[data[index] * 4096] 33 0

154 Spectre-BTB (aka Spectre Variant 2) Animal* a = bird; a->move() fly() swim() swim() Prediction LUT[data[index] * 4096] 0 34

155 Spectre-BTB (aka Spectre Variant 2) Animal* a = bird; a->move() ) y( l f swim() Speculate sw im () Prediction LUT[data[index] * 4096] 34 0

156 Spectre-BTB (aka Spectre Variant 2) Animal* a = bird; a->move() fly() swim() swim() Prediction LUT[data[index] * 4096] 0 34

157 Spectre-BTB (aka Spectre Variant 2) Animal* a = bird; a->move() Execute ) y( swim() fl sw im () Prediction LUT[data[index] * 4096] 34 0

158 Spectre-BTB (aka Spectre Variant 2) Animal* a = bird; a->move() fly() fly() swim() Prediction LUT[data[index] * 4096] 0 34

159 Spectre-BTB (aka Spectre Variant 2) Animal* a = bird; a->move() Speculate ) y( fly() fl sw im () Prediction LUT[data[index] * 4096] 34 0

160 Spectre-BTB (aka Spectre Variant 2) Animal* a = bird; a->move() fly() fly() swim() Prediction LUT[data[index] * 4096] 0 34

161 Spectre-BTB (aka Spectre Variant 2) Animal* a = fish; a->move() fly() fly() swim() Prediction LUT[data[index] * 4096] 0 34

162 Spectre-BTB (aka Spectre Variant 2) Animal* a = fish; a->move() Speculate ) y( l f fly() sw im () Prediction LUT[data[index] * 4096] 34 0

163 Spectre-BTB (aka Spectre Variant 2) Animal* a = fish; a->move() fly() fly() swim() Prediction LUT[data[index] * 4096] 0 34

164 Spectre-BTB (aka Spectre Variant 2) Animal* a = fish; a->move() ) y( l f fly() Execute sw im () Prediction LUT[data[index] * 4096] 34 0

165 Spectre-BTB (aka Spectre Variant 2) Animal* a = fish; a->move() fly() swim() swim() Prediction LUT[data[index] * 4096] 0 34

166 Spectre-RSB function()... Victim Attacker RSB 35

167 Spectre-RSB function()... reg = secret Victim reg = dummy Attacker RSB 35

168 Spectre-RSB function()... reg = secret Victim call function(short) reg = dummy Attacker RSB 35 &victim

169 Spectre-RSB function()... reg = secret Victim call function(short) Attacker reg = dummy call function(long) data[reg * 4096] RSB 35 &attacker &victim

170 Spectre-RSB function()... reg = secret Victim call function(short) Attacker reg = dummy call function(long) data[reg * 4096] RSB 35 &attacker &victim

171 Spectre-RSB function()... reg = secret Victim call function(short) Attacker reg = dummy call function(long) data[reg * 4096] RSB 35 &victim

172 Spectre-RSB function()... reg = secret Victim call function(short) Attacker reg = dummy call function(long) data[reg * 4096] RSB 35 &victim

173 Meltdown vs. Spectre operation #n time 36

174 Meltdown vs. Spectre operation #n prediction time 36

175 Meltdown vs. Spectre operation #n prediction predict CF/DF operation #n+2 time 36

176 Meltdown vs. Spectre operation #n prediction predict CF/DF possibly architectural operation #n+2 transient execution time 36

177 Meltdown vs. Spectre operation #n retire predict CF/DF possibly architectural prediction operation #n+2 transient execution time 36

178 Meltdown vs. Spectre operation #n retire flush pipeline on wrong prediction predict CF/DF possibly architectural prediction operation #n+2 transient execution retire time 36

179 Meltdown vs. Spectre operation #n retire flush pipeline on wrong prediction predict CF/DF possibly architectural prediction operation #n+2 transient execution retire retire time 36

180 Meltdown vs. Spectre operation #n retire flush pipeline on wrong prediction operation #n prediction retire predict CF/DF possibly architectural operation #n+2 transient execution retire time time 36

181 Meltdown vs. Spectre operation #n retire flush pipeline on wrong prediction operation #n prediction retire data predict CF/DF possibly architectural operation #n+2 transient execution retire time time 36

182 Meltdown vs. Spectre operation #n retire flush pipeline on wrong prediction operation #n prediction retire data predict CF/DF possibly architectural operation #n+2 transient execution retire time data dependency operation #n+2 time 36

183 Meltdown vs. Spectre operation #n predict CF/DF possibly architectural retire prediction operation #n+2 transient execution flush pipeline on wrong prediction retire retire time possibly architectural operation #n data exception retire data dependency operation #n+2 transient execution time 36

184 Meltdown vs. Spectre operation #n predict CF/DF possibly architectural retire prediction operation #n+2 transient execution flush pipeline on wrong prediction retire retire time possibly architectural operation #n data exception retire data dependency operation #n+2 transient execution time 36

185 Meltdown vs. Spectre operation #n predict CF/DF possibly architectural retire prediction operation #n+2 transient execution flush pipeline on wrong prediction retire retire time possibly architectural operation #n data exception retire data dependency Meltdown operation #n+2 transient execution time 36

186 Meltdown vs. Spectre operation #n predict CF/DF possibly architectural retire prediction operation #n+2 transient execution flush pipeline on wrong prediction retire retire time possibly architectural operation #n data exception retire data dependency Meltdown operation #n+2 transient execution raise retire time 36

187 More and More Performance The future is going to be fast: Apple A12 Bionic (iphone X): 16 KB pages 128 KB caches Intel more ports, more parallelism, larger reorder buffer AMD perceptron-based prediction mechanisms 37

188 Spectres of The Future

189 Intel MPK Protection key for a group of pages 38

190 Intel MPK Protection key for a group of pages 4 bits in PTE identify key for protected memory regions 38

191 Intel MPK Protection key for a group of pages 4 bits in PTE identify key for protected memory regions Quick update of access rights 38

192 Meltdown-PK Protection keys are lazily enforced 39

193 Meltdown-PK Protection keys are lazily enforced Protected value is forwarded to transient instructions 39

194 Meltdown-BR x86 provides dedicated instruction raising #BR exception if bound-range is exceeded 40

195 Meltdown-BR x86 provides dedicated instruction raising #BR exception if bound-range is exceeded Data used in transient execution 40

196 Meltdown-BR x86 provides dedicated instruction raising #BR exception if bound-range is exceeded Data used in transient execution Attacker determines accessed cache line using Flush+Reload 40

197 40 Meltdown-BR

198 40 Meltdown-BR

199 40 Meltdown-BR

200 Meltdown-BR x86 provides dedicated instruction raising #BR exception if bound-range is exceeded Data used in transient execution Attacker determines accessed cache line using Flush+Reload First Meltdown-type attack on AMD 40

201 41 Vulnerable Vendors Vendor Attack Intel ARM AMD Meltdown-US Meltdown-P Meltdown-GP Meltdown-NM Meltdown-RW Meltdown-PK Meltdown-BR Meltdown-DE Meltdown-AC Meltdown-UD Meltdown-SS Meltdown-XD Meltdown-SM

202 Meltdown Defense Categorization Meltdown defenses in 2 categories: 42

203 Meltdown Defense Categorization Meltdown defenses in 2 categories: D1 Architecturally inaccessible data is also microarchitecturally inaccessible 42

204 Meltdown Defense Categorization Meltdown defenses in 2 categories: D1 Architecturally inaccessible data is also microarchitecturally inaccessible D2 Preventing occurrence of faults 42

205 43 Meltdown-P Mitigation

206 Meltdown-P Mitigation Clear phyiscal address field of unmapped PTEs 43

207 Meltdown-P Mitigation Clear phyiscal address field of unmapped PTEs Flush L1 upon switching protection domains 43

208 Demo Foreshadow-NG

209 Transient Execution Attacks: Classification Transient cause? 44

210 Transient Execution Attacks: Classification Spectre-type prediction Transient cause? fault Meltdown-type 44

211 Transient Execution Attacks: Classification Spectre-PHT Spectre-BTB Spectre-type Spectre-RSB Spectre-STL prediction Transient cause? fault Meltdown-type 44

212 Spectre: Mistraining Strategies Victim same address space/ in place Spectrevulnerable branch 45

213 Spectre: Mistraining Strategies Victim same address space/ out of place Aliased branch Address collision same address space/ in place Spectrevulnerable branch 45

214 Spectre: Mistraining Strategies Victim same address space/ out of place Aliased branch Address collision same address space/ in place Spectrevulnerable branch Shared Branch Prediction State 45

215 Spectre: Mistraining Strategies Victim Attacker same address space/ out of place Aliased branch Address collision same address space/ in place Spectrevulnerable branch Shared Branch Prediction State 45

216 Spectre: Mistraining Strategies Victim Attacker same address space/ out of place Aliased branch Address collision same address space/ in place Spectrevulnerable branch Same address cross address space/ in place Shared Branch Prediction State 45

217 Spectre: Mistraining Strategies Victim Attacker same address space/ out of place Aliased branch Aliased address cross address space/ out of place Address collision Address collision same address space/ in place Spectrevulnerable branch Same address cross address space/ in place Shared Branch Prediction State 45

218 Spectre Mistraining: Vulnerable Vendors Method Attack Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL

219 Spectre Mistraining: Vulnerable Vendors Intel Method same-address-space cross-address-space Attack in-place out-of-place in-place out-of-place Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL

220 Spectre Mistraining: Vulnerable Vendors Intel ARM Method same-address-space cross-address-space same-address-space cross-address-space Attack in-place out-of-place in-place out-of-place in-place out-of-place in-place out-of-place Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL

221 Spectre Mistraining: Vulnerable Vendors Intel ARM AMD Method same-address-space cross-address-space same-address-space cross-address-space same-address-space cross-address-space Attack in-place out-of-place in-place out-of-place in-place out-of-place in-place out-of-place in-place out-of-place in-place out-of-place Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL 46

222 Super Effective Solution: Drilling template 47 Drilling template

223 Spectre Defense Categorization Spectre defenses in 3 categories: 48

224 Spectre Defense Categorization Spectre defenses in 3 categories: C1 Mitigate or reduce accuracy of covert channels 48

225 Spectre Defense Categorization Spectre defenses in 3 categories: C1 Mitigate or reduce accuracy of covert channels C2 Mitigate or abort speculation 48

226 Spectre Defense Categorization Spectre defenses in 3 categories: C1 Mitigate or reduce accuracy of covert channels C2 Mitigate or abort speculation C3 Ensure secret cannot be reached 48

227 Spectre Defenses: Microarchitectural Target Microarchitectural Element Defense Cache TLB BTB BHB PHT RSB AVX FPU Execution Ports InvisiSpec SafeSpec DAWG RSB Stuffing Retpoline SLH YSNB IBRS Taint Tracking Timer Reduction STIPB IBPB Serialization Sloth SSBD/SSBB Poison Value Index Masking Site Isolation Category: C1 C2 C3 49

228 Site Isolation Each site executed in its own process 50

229 Site Isolation Each site executed in its own process limits amount of data that is exposed 50

230 Site Isolation Each site executed in its own process limits amount of data that is exposed Chrome 67: default, Firefox: work in progress 50

231 Serialization Insert instructions stopping speculation 51

232 Serialization Insert instructions stopping speculation insert after every bounds check 51

233 Serialization Insert instructions stopping speculation insert after every bounds check x86: LFENCE, ARM: CSDB with conditional selects or moves 51

234 InvisiSpec Make transient loads invisible in the cache hierarchy 52

235 InvisiSpec Make transient loads invisible in the cache hierarchy all transient loads use a speculative buffer 52

236 InvisiSpec Make transient loads invisible in the cache hierarchy all transient loads use a speculative buffer Correct prediction: buffer content loaded into cache 52

237 InvisiSpec Make transient loads invisible in the cache hierarchy all transient loads use a speculative buffer Correct prediction: buffer content loaded into cache Wrong prediction: transient load is reverted 52

238 Spectre: Defense Analysis Attack Defense InvisiSpec DAWG SafeSpec RSB Stuffing Retpoline Poison Value Index Masking Site Isolation SLH YSNB IBRS STIPB IBPB Serialization Timer Reduction Sloth Taint Tracking SSBD/SSBB

239 Spectre: Defense Analysis Attack Defense InvisiSpec DAWG SafeSpec RSB Stuffing Retpoline Poison Value Index Masking Site Isolation SLH YSNB IBRS STIPB IBPB Serialization Timer Reduction Sloth Taint Tracking SSBD/SSBB Intel Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL

240 Spectre: Defense Analysis Attack Defense InvisiSpec DAWG SafeSpec RSB Stuffing Retpoline Poison Value Index Masking Site Isolation SLH YSNB IBRS STIPB IBPB Serialization Timer Reduction Sloth Taint Tracking SSBD/SSBB Intel ARM Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL

241 Spectre: Defense Analysis Attack Defense InvisiSpec DAWG SafeSpec RSB Stuffing Retpoline Poison Value Index Masking Site Isolation SLH YSNB IBRS STIPB IBPB Serialization Timer Reduction Sloth Taint Tracking SSBD/SSBB Intel ARM AMD Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL 53

242

243

244

245

246 Transient Execution Attacks: Classification Transient cause? 54

247 Transient Execution Attacks: Classification Spectre-type prediction Transient cause? fault Meltdown-type 54

248 Transient Execution Attacks: Classification Spectre-type microarchitectural buffer Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL prediction Transient cause? fault Meltdown-type 54

249 Transient Execution Attacks: Classification Transient cause? Spectre-type prediction fault microarchitectural buffer Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL mistraining strategy Cross-address-space Same-address-space Cross-address-space Same-address-space Cross-address-space Same-address-space Meltdown-type 54

250 Transient Execution Attacks: Classification Transient cause? Spectre-type prediction fault microarchitectural buffer Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL mistraining strategy Cross-address-space Same-address-space Cross-address-space Same-address-space Cross-address-space Same-address-space in-place (IP) vs., out-of-place (OP) PHT-SA-IP BTB-CA-IP BTB-CA-OP RSB-CA-IP Meltdown-type RSB-CA-OP RSB-SA-IP RSB-SA-OP 54

251 Transient Execution Attacks: Classification Transient cause? Spectre-type prediction fault microarchitectural buffer Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL mistraining strategy Cross-address-space Same-address-space Cross-address-space Same-address-space Cross-address-space Same-address-space in-place (IP) vs., out-of-place (OP) PHT-CA-IP PHT-CA-OP PHT-SA-IP PHT-SA-OP BTB-CA-IP BTB-CA-OP BTB-SA-IP BTB-SA-OP RSB-CA-IP Meltdown-type RSB-CA-OP RSB-SA-IP RSB-SA-OP 54

252 Transient Execution Attacks: Classification Transient cause? Spectre-type prediction fault microarchitectural buffer Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL Meltdown-NM mistraining strategy Cross-address-space Same-address-space Cross-address-space Same-address-space Cross-address-space Same-address-space in-place (IP) vs., out-of-place (OP) PHT-CA-IP PHT-CA-OP PHT-SA-IP PHT-SA-OP BTB-CA-IP BTB-CA-OP BTB-SA-IP BTB-SA-OP RSB-CA-IP Meltdown-type Meltdown-PF RSB-CA-OP RSB-SA-IP RSB-SA-OP fault type Meltdown-BR 54 Meltdown-GP

253 Transient Execution Attacks: Classification Transient cause? Spectre-type prediction fault Meltdown-type microarchitectural buffer fault type Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL Meltdown-NM Meltdown-AC. Meltdown-DE. Meltdown-PF Meltdown-UD. Meltdown-SS. Meltdown-BR mistraining strategy Cross-address-space Same-address-space Cross-address-space Same-address-space Cross-address-space Same-address-space in-place (IP) vs., out-of-place (OP) PHT-CA-IP PHT-CA-OP PHT-SA-IP PHT-SA-OP BTB-CA-IP BTB-CA-OP BTB-SA-IP BTB-SA-OP RSB-CA-IP RSB-CA-OP RSB-SA-IP RSB-SA-OP 54 Meltdown-GP

254 Transient Execution Attacks: Classification Transient cause? Spectre-type prediction fault Meltdown-type microarchitectural buffer fault type Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL Meltdown-NM Meltdown-AC. Meltdown-DE. Meltdown-PF Meltdown-UD. Meltdown-SS. mistraining strategy Cross-address-space Same-address-space Cross-address-space Same-address-space Cross-address-space Same-address-space Meltdown-US Meltdown-P Meltdown-RW in-place (IP) vs., out-of-place (OP) PHT-CA-IP PHT-CA-OP PHT-SA-IP PHT-SA-OP BTB-CA-IP BTB-CA-OP BTB-SA-IP BTB-SA-OP RSB-CA-IP RSB-CA-OP RSB-SA-IP RSB-SA-OP Meltdown-BR Meltdown-MPX 54 Meltdown-GP

255 Transient Execution Attacks: Classification Transient cause? Spectre-type prediction fault Meltdown-type microarchitectural buffer fault type Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL Meltdown-NM Meltdown-AC. Meltdown-DE. Meltdown-PF Meltdown-UD. Meltdown-SS. mistraining strategy Cross-address-space Same-address-space Cross-address-space Same-address-space Cross-address-space Same-address-space Meltdown-US Meltdown-P Meltdown-RW Meltdown-XD. Meltdown-SM. in-place (IP) vs., out-of-place (OP) PHT-CA-IP PHT-CA-OP PHT-SA-IP PHT-SA-OP BTB-CA-IP BTB-CA-OP BTB-SA-IP BTB-SA-OP RSB-CA-IP RSB-CA-OP RSB-SA-IP RSB-SA-OP Meltdown-BR Meltdown-MPX 54 Meltdown-GP

256 Transient Execution Attacks: Classification Transient cause? Spectre-type prediction fault Meltdown-type microarchitectural buffer fault type Spectre-PHT Spectre-BTB Spectre-RSB Spectre-STL Meltdown-NM Meltdown-AC. Meltdown-DE. Meltdown-PF Meltdown-UD. Meltdown-SS. mistraining strategy Cross-address-space Same-address-space Cross-address-space Same-address-space Cross-address-space Same-address-space Meltdown-US Meltdown-P Meltdown-RW Meltdown-PK Meltdown-XD. Meltdown-SM. in-place (IP) vs., out-of-place (OP) PHT-CA-IP PHT-CA-OP PHT-SA-IP PHT-SA-OP BTB-CA-IP BTB-CA-OP BTB-SA-IP BTB-SA-OP RSB-CA-IP RSB-CA-OP RSB-SA-IP RSB-SA-OP 54 Meltdown-BR Meltdown-GP Meltdown-MPX Meltdown-BND

257 Lessons learned We have ignored microarchitectural attacks for many years: 55

258 Lessons learned We have ignored microarchitectural attacks for many years: attacks on crypto 55

259 Lessons learned We have ignored microarchitectural attacks for many years: attacks on crypto software should be fixed 55

260 Lessons learned We have ignored microarchitectural attacks for many years: attacks on crypto software should be fixed attacks on ASLR 55

261 Lessons learned We have ignored microarchitectural attacks for many years: attacks on crypto software should be fixed attacks on ASLR ASLR is broken anyway 55

262 Lessons learned We have ignored microarchitectural attacks for many years: attacks on crypto software should be fixed attacks on ASLR ASLR is broken anyway attacks on TEEs 55

263 Lessons learned We have ignored microarchitectural attacks for many years: attacks on crypto software should be fixed attacks on ASLR ASLR is broken anyway attacks on TEEs not within threat model 55

264 Lessons learned We have ignored microarchitectural attacks for many years: attacks on crypto software should be fixed attacks on ASLR ASLR is broken anyway attacks on TEEs not within threat model Rowhammer 55

265 Lessons learned We have ignored microarchitectural attacks for many years: attacks on crypto software should be fixed attacks on ASLR ASLR is broken anyway attacks on TEEs not within threat model Rowhammer only some cheap modules 55

266 Lessons learned We have ignored microarchitectural attacks for many years: attacks on crypto software should be fixed attacks on ASLR ASLR is broken anyway attacks on TEEs not within threat model Rowhammer only some cheap modules for years we solely optimized for performance 55

267 56 Conclusion

268 Conclusion Optimizations always come at a cost 56

269 Conclusion Optimizations always come at a cost Some mitigations cost more than gained by the feature they defend 56

270 Conclusion Optimizations always come at a cost Some mitigations cost more than gained by the feature they defend Transient-execution attacks will keep us busy for a while 56

271 Moritz Lipp Michael Schwarz Claudio Canella Daniel Gruss

Software-based Microarchitectural Attacks

Software-based Microarchitectural Attacks Daniel Gruss January 23, 2019 Graz University of Technology Frame Rate of Slide Deck: 11 fps 1 Daniel Gruss Graz University of Technology 1337 4242 Revolutionary