The Last Mile An Empirical Study of Timing Channels on sel4
|
|
- Paul Bryan
- 5 years ago
- Views:
Transcription
1 The Last Mile An Empirical Study of Timing on David Cock Qian Ge Toby Murray Gernot Heiser 4 November 2014 NICTA Funding and Supporting Members and Partners
2 Outline The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 2/34
3 is a verified, high-performance microkernel. We have: Proof of functional correctness. Proof of authority confinement. Proof of explicit information-flow control. WCET analysis. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 3/34
4 is a verified, high-performance microkernel. We have: Proof of functional correctness. Proof of authority confinement. Proof of explicit information-flow control. WCET analysis. We don t have: An comprehensive hardware model. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 3/34
5 Covert and Side RSA Cache AES ALU Unexpected channels invalidate info-flow control. We can t prove their absence: Depend heavily on undocumented chip internals. are probabilistic. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 4/34
6 of Interest we consider Can subvert our proof: Timing channels. Could be fixed with OS techniques e.g. Cache contention. That can be exploited in software. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 5/34
7 of Interest we consider Can subvert our proof: Timing channels. Could be fixed with OS techniques e.g. Cache contention. That can be exploited in software. we don t consider Physical attacks e.g. DPA. already excluded by proof e.g. Storage channels. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 5/34
8 An Experimental Corpus We exploit 2 principal hardware channels: The L2 cache channel. The bus contention channel (not covered today). The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 6/34
9 An Experimental Corpus We exploit 2 principal hardware channels: The L2 cache channel. The bus contention channel (not covered today). The data also suggests 3 more (2 as-yet-unrecognised). The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 6/34
10 An Experimental Corpus We exploit 2 principal hardware channels: The L2 cache channel. The bus contention channel (not covered today). The data also suggests 3 more (2 as-yet-unrecognised). We evaluate 2 cache-channel countermeasures: Instruction-based scheduling. Cache colouring. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 6/34
11 An Experimental Corpus We exploit 2 principal hardware channels: The L2 cache channel. The bus contention channel (not covered today). The data also suggests 3 more (2 as-yet-unrecognised). We evaluate 2 cache-channel countermeasures: Instruction-based scheduling. Cache colouring. We also consider countermeasures against remote, algorithmic channels: The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 6/34
12 An Experimental Corpus We exploit 2 principal hardware channels: The L2 cache channel. The bus contention channel (not covered today). The data also suggests 3 more (2 as-yet-unrecognised). We evaluate 2 cache-channel countermeasures: Instruction-based scheduling. Cache colouring. We also consider countermeasures against remote, algorithmic channels: Lucky-13 against OpenSSL is our example victim. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 6/34
13 An Experimental Corpus We exploit 2 principal hardware channels: The L2 cache channel. The bus contention channel (not covered today). The data also suggests 3 more (2 as-yet-unrecognised). We evaluate 2 cache-channel countermeasures: Instruction-based scheduling. Cache colouring. We also consider countermeasures against remote, algorithmic channels: Lucky-13 against OpenSSL is our example victim. Scheduled delivery is our countermeasure. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 6/34
14 Experimental Platforms Core Date L2 Cache imx.31 ARM1136JF-S (ARMv6) KiB E6550 Conroe (x86-64) KiB DM3730 Cortex A8 (ARMv7) KiB AM3358 Cortex A8 (ARMv7) KiB imx.6 Cortex A9 (ARMv7) KiB Exynos4412 Cortex A9 (ARMv7) KiB 7 years and 3 (ARM) core generations. 32-fold range of cache sizes. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 7/34
15 Methodology Integrated with nightly regression test. Runs each channel with each countermeasure (54 combinations). 2,000 hours of data over 12 months, 4.3 GiB. Data and analysis tools open source: timingchannels.pml The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 8/34
16 Outline The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 9/34
17 Cache Contention Conflict! domain switch If you share a core, you share a cache. Hit vs. miss makes a big time difference. Many published attacks steal keys through the cache. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 10/34
18 Cache Contention Conflict! domain switch If you share a core, you share a cache. Hit vs. miss makes a big time difference. Many published attacks steal keys through the cache. We can control cache allocation (more shortly). The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 10/34
19 Exynos4412 Cache Channel Lines touched / Lines evicted / ,768 cache lines, 1000Hz sample rate (preemption). Bandwidth: 2400b/s. Baseline for comparison. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 11/34
20 What is (IBS)? Our example attack used preemption as a clock. If it s tied to progress, the channel should vanish. This is a form of deterministic execution. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 12/34
21 What is (IBS)? Our example attack used preemption as a clock. If it s tied to progress, the channel should vanish. This is a form of deterministic execution. Advantages Applies to any channel. Simple to implement (18 lines in ). The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 12/34
22 What is (IBS)? Our example attack used preemption as a clock. If it s tied to progress, the channel should vanish. This is a form of deterministic execution. Advantages Applies to any channel. Simple to implement (18 lines in ). Disadvantages Restrictive Need to remove all clocks. Performance counter accuracy critical. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 12/34
23 What is (IBS)? Our example attack used preemption as a clock. If it s tied to progress, the channel should vanish. This is a form of deterministic execution. Advantages Applies to any channel. Simple to implement (18 lines in ). Disadvantages Restrictive Need to remove all clocks. Performance counter accuracy critical. Works great on older chips (imx.31), but... The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 12/34
24 Exynos4412 Cache Channel with IBS Lines touched Lines evicted /10 3 Preempt after 10 5 instructions. Bandwidth: 400b/s. 6 reduction very poor result. Event delivery is imprecise thanks to speculation. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 13/34
25 Exynos4412 Cache Channel with IBS Lines touched Lines evicted /10 3 Preempt after 10 5 instructions. Bandwidth: 400b/s. 6 reduction very poor result. Event delivery is imprecise thanks to speculation. Attacker can modulate it! The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 13/34
26 What is Colouring? frame number frame offset set selector line index Caches are divided into sets by physical address. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 14/34
27 What is Colouring? frame number frame offset set selector line index Caches are divided into sets by physical address. The set selector bits of the address choose the set. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 14/34
28 What is Colouring? frame number frame offset set selector line index Caches are divided into sets by physical address. The set selector bits of the address choose the set. These bits (usually) overlap the frame number. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 14/34
29 What is Colouring? frame number frame offset set selector line index Caches are divided into sets by physical address. The set selector bits of the address choose the set. These bits (usually) overlap the frame number. If these colour bits differ, the frames cannot collide. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 14/34
30 What is Colouring? frame number frame offset set selector line index Caches are divided into sets by physical address. The set selector bits of the address choose the set. These bits (usually) overlap the frame number. If these colour bits differ, the frames cannot collide. The frame number (phys. addr.) is under OS control. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 14/34
31 Colouring in In, resource allocation is securely delegated. The initial task splits RAM in to coloured pools. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 15/34
32 Colouring in In, resource allocation is securely delegated. The initial task splits RAM in to coloured pools. Advantages Very few kernel changes (in ). Low overhead. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 15/34
33 Colouring in In, resource allocation is securely delegated. The initial task splits RAM in to coloured pools. Advantages Very few kernel changes (in ). Low overhead. Disadvantages Only applies to the cache channel. Relies on internal details of cache operation. Doesn t work with large pages. Hashed caches break everything. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 15/34
34 Exynos4412 Cache Channel, Partitioned Lines touched / Lines evicted /10 3 Bandwidth: 15b/s. 160 reduction Much better, but not perfect. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 16/34
35 Exynos4412 Cache Channel, Partitioned Lines touched / Lines evicted /10 3 Bandwidth: 15b/s. 160 reduction Much better, but not perfect. Where does that 15b/s come from? The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 16/34
36 Exynos4412 TLB Channel Lines touched No TLB flush TLB flush Stalls/line No TLB flush TLB flush 4 Average rate correlates with TLB misses. Flushing on switch removes the signal. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 17/34
37 Misprediction and the Cycle Counter HPT ticks / Branch mispredicts Lines evicted 1 Cycle counter affected by invisible mispredicts. A new (an unexpected) channel. Event delivery is precise, the cycle counter is wrong. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 18/34
38 Outline The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 19/34
39 The Lucky-13 Attack on TLS Al Fardan & Paterson, SSP 2013 Exploits non-constant-time MAC calculation. This is an Algorithmic side-channel. Remotely exploitable. We reproduce the distinguishing attack. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 20/34
40 OpenSSL 1.0.1c Response Times Probability M 0 VL M 1 VL Response time (µs) Nearby attacker (crossover cable). Modified vs. unmodified packet. Distinguishable with 100% probability. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 21/34
41 The Upstream Fix: Constant-Time Code Fixed in OpenSSL version 1.0.1e A constant-time padding/mac check. Now secure (on x86). We tested on ARM still a small channel. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 22/34
42 The Upstream Fix: Constant-Time Code Fixed in OpenSSL version 1.0.1e A constant-time padding/mac check. Now secure (on x86). We tested on ARM still a small channel. We present an OS-level solution, with: Better performance. Lower latency. Lower CPU overhead. No modification of OpenSSL. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 22/34
43 OpenSSL 1.0.1e Response Times Probability M 0 VL M 1 VL Response time (µs) The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 23/34
44 OpenSSL 1.0.1e Response Times Probability M 0 VL M 1 VL M 0 CT M 1 CT Response time (µs) Constant-time implementation. Better, but still distinguishable 62%. 60µs (8.1%) latency penalty. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 24/34
45 What is? in SSL_read handler out SSL_write server Separate OpenSSL and application. Announce packets over IPC. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 25/34
46 What is? in 1 SSL_read handler out SSL_write server Separate OpenSSL and application. Announce packets over IPC. Record arrival time. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 25/34
47 What is? in 1 SSL_read handler out SSL_write C 2 server Separate OpenSSL and application. Announce packets over IPC. Record arrival time. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 25/34
48 What is? in out 1 SSL_read SSL_write 3 handler C 2 server Separate OpenSSL and application. Announce packets over IPC. Record arrival time. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 25/34
49 What is? in out 1 SSL_read SSL_write 3 handler 4 2 server Separate OpenSSL and application. C Announce packets over IPC. Record arrival time. Block response on timer. R The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 25/34
50 What is? in out 1 SSL_read SSL_write 3 handler 4 server Separate OpenSSL and application. C Announce packets over IPC. Record arrival time. Block response on timer. 2 R The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 25/34
51 What is? in out 1 SSL_read SSL_write 3 handler 4 C 2 server Separate OpenSSL and application. Announce packets over IPC. Record arrival time. Block response on timer. 5 R The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 25/34
52 on We use real-time scheduling to precisely delay messages. Provides an efficient mechanism to enforce a delay policy: See Askarov et. al., CCS The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 26/34
53 on We use real-time scheduling to precisely delay messages. Provides an efficient mechanism to enforce a delay policy: See Askarov et. al., CCS Advantages Uses existing IPC controls no modifications. Fast and effective (see next slide). The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 26/34
54 on We use real-time scheduling to precisely delay messages. Provides an efficient mechanism to enforce a delay policy: See Askarov et. al., CCS Advantages Uses existing IPC controls no modifications. Fast and effective (see next slide). Disadvantages Specific to remote/network attacks. Need to wrap (but not modify) vulnerable component. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 26/34
55 Security of Probability M 0 VL M 1 VL M 0 CT M 1 CT Response time (µs) The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 27/34
56 Security of Probability M 0 VL M 1 VL M 0 CT M 1 CT M 0 SD M 1 SD Response time (µs) More secure 57% distinguishable. Faster 10µs (1.4%) latency penalty. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 28/34
57 Performance of CPU load c 1.0.1e 1.0.1c-sd Ingress rate (packets/s) Lower overhead (2% vs. 11%), but earlier saturation. This is a worst-case benchmark no server work. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 29/34
58 Outline The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 30/34
59 These channels are real We managed to exploit every channel we tried. The bandwidth is high, and growing. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 31/34
60 These channels are real We managed to exploit every channel we tried. The bandwidth is high, and growing. They re getting worse Countermeasures get less effective on newer chips. New hardware channels e.g. branch predictor. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 31/34
61 These channels are real We managed to exploit every channel we tried. The bandwidth is high, and growing. They re getting worse Countermeasures get less effective on newer chips. New hardware channels e.g. branch predictor. But there is hope Resource partitioning (e.g. colouring) is effective. Repurpose hardware QoS features. OS-level techniques can help. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 31/34
62 Ongoing project at NICTA (Qian Ge). Developing a fully cache-coloured. We use these tools to evaluate the result. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 32/34
63 Questions? Data and Tools TS/timingchannels.pml Is Open Source! The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 33/34
64 Capacity vs. Injected Noise Injected noise (b) uncorrelated noise anticorrelated noise Capacity (b) Noise injection doesn t scale. Increasing determinism is the way to go. The Last Mile Copyright NICTA 2014 David Cock, Qian Ge, Toby Murray, Gernot Heiser 34/34
COMP9242 Advanced OS. S2/2017 W03: Caches: What Every OS Designer Must
COMP9242 Advanced OS S2/2017 W03: Caches: What Every OS Designer Must Know @GernotHeiser Copyright Notice These slides are distributed under the Creative Commons Attribution 3.0 License You are free: to
More informationCOMP9242 Advanced OS. Copyright Notice. The Memory Wall. Caching. These slides are distributed under the Creative Commons Attribution 3.
Copyright Notice COMP9242 Advanced OS S2/2018 W03: s: What Every OS Designer Must Know @GernotHeiser These slides are distributed under the Creative Commons Attribution 3.0 License You are free: to share
More informationDonn Morrison Department of Computer Science. TDT4255 Memory hierarchies
TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,
More informationAdvanced Caching Techniques (2) Department of Electrical Engineering Stanford University
Lecture 4: Advanced Caching Techniques (2) Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee282 Lecture 4-1 Announcements HW1 is out (handout and online) Due on 10/15
More informationCourse Outline. Processes CPU Scheduling Synchronization & Deadlock Memory Management File Systems & I/O Distributed Systems
Course Outline Processes CPU Scheduling Synchronization & Deadlock Memory Management File Systems & I/O Distributed Systems 1 Today: Memory Management Terminology Uniprogramming Multiprogramming Contiguous
More informationLecture 21: Virtual Memory. Spring 2018 Jason Tang
Lecture 21: Virtual Memory Spring 2018 Jason Tang 1 Topics Virtual addressing Page tables Translation lookaside buffer 2 Computer Organization Computer Processor Memory Devices Control Datapath Input Output
More information1. Memory technology & Hierarchy
1 Memory technology & Hierarchy Caching and Virtual Memory Parallel System Architectures Andy D Pimentel Caches and their design cf Henessy & Patterson, Chap 5 Caching - summary Caches are small fast memories
More informationPage 1. Multilevel Memories (Improving performance using a little cash )
Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency
More informationCaching. From an OS Perspective. COMP /S2 Week 3
cse/unsw/nicta Caching From an OS Perspective COMP9242 2005/S2 Week 3 CACHING Registers Cache Main Memory Disk Cache is fast (1 5 cycle access time) memory sitting between fast registers and slow RAM (10
More informationDepartment of Computer Science, Institute for System Architecture, Operating Systems Group. Real-Time Systems '08 / '09. Hardware.
Department of Computer Science, Institute for System Architecture, Operating Systems Group Real-Time Systems '08 / '09 Hardware Marcus Völp Outlook Hardware is Source of Unpredictability Caches Pipeline
More informationReducing Miss Penalty: Read Priority over Write on Miss. Improving Cache Performance. Non-blocking Caches to reduce stalls on misses
Improving Cache Performance 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the. Reducing Miss Penalty: Read Priority over Write on Miss Write buffers may offer RAW
More informationComputer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James
Computer Systems Architecture I CSE 560M Lecture 18 Guest Lecturer: Shakir James Plan for Today Announcements No class meeting on Monday, meet in project groups Project demos < 2 weeks, Nov 23 rd Questions
More informationExploitation of instruction level parallelism
Exploitation of instruction level parallelism Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering
More informationHomework 5. Start date: March 24 Due date: 11:59PM on April 10, Monday night. CSCI 402: Computer Architectures
Homework 5 Start date: March 24 Due date: 11:59PM on April 10, Monday night 4.1.1, 4.1.2 4.3 4.8.1, 4.8.2 4.9.1-4.9.4 4.13.1 4.16.1, 4.16.2 1 CSCI 402: Computer Architectures The Processor (4) Fengguang
More informationMicro-Architectural Attacks and Countermeasures
Micro-Architectural Attacks and Countermeasures Çetin Kaya Koç koc@cs.ucsb.edu Çetin Kaya Koç http://koclab.org Winter 2017 1 / 25 Contents Micro-Architectural Attacks Cache Attacks Branch Prediction Attack
More informationCaching. From an OS Perspective. COMP /S2 Week 3
cse/unsw Caching From an OS Perspective COMP9242 2004/S2 Week 3 CACHING Registers Cache Main Memory Disk Cache is fast (1 5 cycle access time) memory sitting between fast registers and slow RAM (10 100
More informationCS370 Operating Systems
CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2016 Lecture 33 Virtual Memory Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 FAQ How does the virtual
More informationCS3350B Computer Architecture
CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &
More informationCache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance
6.823, L11--1 Cache Performance and Memory Management: From Absolute Addresses to Demand Paging Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Cache Performance 6.823,
More informationComputer Architecture Spring 2016
Computer Architecture Spring 2016 Lecture 08: Caches III Shuai Wang Department of Computer Science and Technology Nanjing University Improve Cache Performance Average memory access time (AMAT): AMAT =
More informationBasic Memory Management
Basic Memory Management CS 256/456 Dept. of Computer Science, University of Rochester 10/15/14 CSC 2/456 1 Basic Memory Management Program must be brought into memory and placed within a process for it
More informationMemory Hierarchy. Maurizio Palesi. Maurizio Palesi 1
Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio
More informationCSE Memory Hierarchy Design Ch. 5 (Hennessy and Patterson)
CSE 4201 Memory Hierarchy Design Ch. 5 (Hennessy and Patterson) Memory Hierarchy We need huge amount of cheap and fast memory Memory is either fast or cheap; never both. Do as politicians do: fake it Give
More informationOS Extensibility: Spin, Exo-kernel and L4
OS Extensibility: Spin, Exo-kernel and L4 Extensibility Problem: How? Add code to OS how to preserve isolation? without killing performance? What abstractions? General principle: mechanisms in OS, policies
More informationImproving Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Highly-Associative Caches
Improving Cache Performance and Memory Management: From Absolute Addresses to Demand Paging 6.823, L8--1 Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Highly-Associative
More informationSpectre and Meltdown. Clifford Wolf q/talk
Spectre and Meltdown Clifford Wolf q/talk 2018-01-30 Spectre and Meltdown Spectre (CVE-2017-5753 and CVE-2017-5715) Is an architectural security bug that effects most modern processors with speculative
More informationLecture 20: Multi-Cache Designs. Spring 2018 Jason Tang
Lecture 20: Multi-Cache Designs Spring 2018 Jason Tang 1 Topics Split caches Multi-level caches Multiprocessor caches 2 3 Cs of Memory Behaviors Classify all cache misses as: Compulsory Miss (also cold-start
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Memory Hierarchy & Caches Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required
More informationBreaking Kernel Address Space Layout Randomization (KASLR) with Intel TSX. Yeongjin Jang, Sangho Lee, and Taesoo Kim Georgia Institute of Technology
Breaking Kernel Address Space Layout Randomization (KASLR) with Intel TSX Yeongjin Jang, Sangho Lee, and Taesoo Kim Georgia Institute of Technology Kernel Address Space Layout Randomization (KASLR) A statistical
More informationCaches: What Every OS Designer Must Know. COMP /S2 Week 4
Caches: What Every OS Designer Must Know COMP9242 2008/S2 Week 4 Copyright Notice These slides are distributed under the Creative Commons Attribution 3.0 License You are free: to share to copy, distribute
More informationCS162 Operating Systems and Systems Programming Lecture 11 Page Allocation and Replacement"
CS162 Operating Systems and Systems Programming Lecture 11 Page Allocation and Replacement" October 3, 2012 Ion Stoica http://inst.eecs.berkeley.edu/~cs162 Lecture 9 Followup: Inverted Page Table" With
More informationFrom bottom to top: Exploiting hardware side channels in web browsers
From bottom to top: Exploiting hardware side channels in web browsers Clémentine Maurice, Graz University of Technology July 4, 2017 RMLL, Saint-Étienne, France Rennes Graz Clémentine Maurice PhD since
More informationPage 1. Memory Hierarchies (Part 2)
Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy
More informationMeltdown or "Holy Crap: How did we do this to ourselves" Meltdown exploits side effects of out-of-order execution to read arbitrary kernelmemory
Meltdown or "Holy Crap: How did we do this to ourselves" Abstract Meltdown exploits side effects of out-of-order execution to read arbitrary kernelmemory locations Breaks all security assumptions given
More informationEITF20: Computer Architecture Part 5.1.1: Virtual Memory
EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache optimization Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache
More informationComputer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg
Computer Architecture and System Software Lecture 09: Memory Hierarchy Instructor: Rob Bergen Applied Computer Science University of Winnipeg Announcements Midterm returned + solutions in class today SSD
More informationCSE 451: Operating Systems. Section 10 Project 3 wrap-up, final exam review
CSE 451: Operating Systems Section 10 Project 3 wrap-up, final exam review Final exam review Goal of this section: key concepts you should understand Not just a summary of lectures Slides coverage and
More informationChapter 4: Memory Management. Part 1: Mechanisms for Managing Memory
Chapter 4: Memory Management Part 1: Mechanisms for Managing Memory Memory management Basic memory management Swapping Virtual memory Page replacement algorithms Modeling page replacement algorithms Design
More informationOperating Systems. Operating Systems Sina Meraji U of T
Operating Systems Operating Systems Sina Meraji U of T Recap Last time we looked at memory management techniques Fixed partitioning Dynamic partitioning Paging Example Address Translation Suppose addresses
More informationOutline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??
Outline EEL 7 Graduate Computer Architecture Chapter 3 Limits to ILP and Simultaneous Multithreading! Limits to ILP! Thread Level Parallelism! Multithreading! Simultaneous Multithreading Ann Gordon-Ross
More informationCache Performance (H&P 5.3; 5.5; 5.6)
Cache Performance (H&P 5.3; 5.5; 5.6) Memory system and processor performance: CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st
More informationCS 153 Design of Operating Systems Winter 2016
CS 153 Design of Operating Systems Winter 2016 Lecture 16: Memory Management and Paging Announcement Homework 2 is out To be posted on ilearn today Due in a week (the end of Feb 19 th ). 2 Recap: Fixed
More informationCache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals
Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics
More informationLec 11 How to improve cache performance
Lec 11 How to improve cache performance How to Improve Cache Performance? AMAT = HitTime + MissRate MissPenalty 1. Reduce the time to hit in the cache.--4 small and simple caches, avoiding address translation,
More informationSpring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand
Caches Nima Honarmand Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required by all of the running applications
More informationARMageddon: Cache Attacks on Mobile Devices
ARMageddon: Cache Attacks on Mobile Devices Moritz Lipp, Daniel Gruss, Raphael Spreitzer, Clémentine Maurice, Stefan Mangard Graz University of Technology 1 TLDR powerful cache attacks (like Flush+Reload)
More informationIntroduction. COMP9242 Advanced Operating Systems 2010/S2 Week 1
Introduction COMP9242 Advanced Operating Systems 2010/S2 Week 1 2010 Gernot Heiser UNSW/NICTA/OK Labs. Distributed under Creative Commons Attribution License 1 Copyright Notice These slides are distributed
More informationImproving Interrupt Response Time in a Verifiable Protected Microkernel
Improving Interrupt Response Time in a Verifiable Protected Microkernel Bernard Blackham Yao Shi Gernot Heiser The University of New South Wales & NICTA, Sydney, Australia EuroSys 2012 Motivation The desire
More informationPAGE REPLACEMENT. Operating Systems 2015 Spring by Euiseong Seo
PAGE REPLACEMENT Operating Systems 2015 Spring by Euiseong Seo Today s Topics What if the physical memory becomes full? Page replacement algorithms How to manage memory among competing processes? Advanced
More informationcsci 3411: Operating Systems
csci 3411: Operating Systems Memory Management II Gabriel Parmer Slides adapted from Silberschatz and West Each Process has its Own Little World Virtual Address Space Picture from The
More informationCSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]
CSF Cache Introduction [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user with as much
More informationTutorial 11. Final Exam Review
Tutorial 11 Final Exam Review Introduction Instruction Set Architecture: contract between programmer and designers (e.g.: IA-32, IA-64, X86-64) Computer organization: describe the functional units, cache
More informationCPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner
CPS104 Computer Organization and Programming Lecture 16: Virtual Memory Robert Wagner cps 104 VM.1 RW Fall 2000 Outline of Today s Lecture Virtual Memory. Paged virtual memory. Virtual to Physical translation:
More informationSurvey results. CS 6354: Memory Hierarchy I. Variety in memory technologies. Processor/Memory Gap. SRAM approx. 4 6 transitors/bit optimized for speed
Survey results CS 6354: Memory Hierarchy I 29 August 2016 1 2 Processor/Memory Gap Variety in memory technologies SRAM approx. 4 6 transitors/bit optimized for speed DRAM approx. 1 transitor + capacitor/bit
More informationWrite only as much as necessary. Be brief!
1 CIS371 Computer Organization and Design Final Exam Prof. Martin Wednesday, May 2nd, 2012 This exam is an individual-work exam. Write your answers on these pages. Additional pages may be attached (with
More informationEE 4683/5683: COMPUTER ARCHITECTURE
EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major
More informationImproving Cache Performance
Improving Cache Performance Tuesday 27 October 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU Summary Previous Class Memory hierarchy
More informationCPS 104 Computer Organization and Programming Lecture 20: Virtual Memory
CPS 104 Computer Organization and Programming Lecture 20: Virtual Nov. 10, 1999 Dietolf (Dee) Ramm http://www.cs.duke.edu/~dr/cps104.html CPS 104 Lecture 20.1 Outline of Today s Lecture O Virtual. 6 Paged
More informationCS510 Operating System Foundations. Jonathan Walpole
CS510 Operating System Foundations Jonathan Walpole Page Replacement Page Replacement Assume a normal page table (e.g., BLITZ) User-program is executing A PageInvalidFault occurs! - The page needed is
More informationMemory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o
Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Opteron example Cache performance Six basic optimizations Virtual memory Processor DRAM gap (latency) Four issue superscalar
More informationCSE 120. Translation Lookaside Buffer (TLB) Implemented in Hardware. July 18, Day 5 Memory. Instructor: Neil Rhodes. Software TLB Management
CSE 120 July 18, 2006 Day 5 Memory Instructor: Neil Rhodes Translation Lookaside Buffer (TLB) Implemented in Hardware Cache to map virtual page numbers to page frame Associative memory: HW looks up in
More informationTopic 18: Virtual Memory
Topic 18: Virtual Memory COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Virtual Memory Any time you see virtual, think using a level of indirection
More informationCS 6354: Memory Hierarchy I. 29 August 2016
1 CS 6354: Memory Hierarchy I 29 August 2016 Survey results 2 Processor/Memory Gap Figure 2.2 Starting with 1980 performance as a baseline, the gap in performance, measured as the difference in the time
More informationShadow: Real Applications, Simulated Networks. Dr. Rob Jansen U.S. Naval Research Laboratory Center for High Assurance Computer Systems
Shadow: Real Applications, Simulated Networks Dr. Rob Jansen Center for High Assurance Computer Systems Cyber Modeling and Simulation Technical Working Group Mark Center, Alexandria, VA October 25 th,
More informationAccelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation Kevin Hsieh
Accelerating Pointer Chasing in 3D-Stacked : Challenges, Mechanisms, Evaluation Kevin Hsieh Samira Khan, Nandita Vijaykumar, Kevin K. Chang, Amirali Boroumand, Saugata Ghose, Onur Mutlu Executive Summary
More informationEfficient Memory Integrity Verification and Encryption for Secure Processors
Efficient Memory Integrity Verification and Encryption for Secure Processors G. Edward Suh, Dwaine Clarke, Blaise Gassend, Marten van Dijk, Srinivas Devadas Massachusetts Institute of Technology New Security
More informationMemory Management. Dr. Yingwu Zhu
Memory Management Dr. Yingwu Zhu Big picture Main memory is a resource A process/thread is being executing, the instructions & data must be in memory Assumption: Main memory is infinite Allocation of memory
More informationPick a time window size w. In time span w, are there, Multiple References, to nearby addresses: Spatial Locality
Pick a time window size w. In time span w, are there, Multiple References, to nearby addresses: Spatial Locality Repeated References, to a set of locations: Temporal Locality Take advantage of behavior
More informationAdapted from David Patterson s slides on graduate computer architecture
Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual
More informationAs the amount of ILP to exploit grows, control dependences rapidly become the limiting factor.
Hiroaki Kobayashi // As the amount of ILP to exploit grows, control dependences rapidly become the limiting factor. Branches will arrive up to n times faster in an n-issue processor, and providing an instruction
More informationCourse Administration
Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570
More informationIntroduction to Operating Systems
Introduction to Operating Systems Lecture 6: Memory Management MING GAO SE@ecnu (for course related communications) mgao@sei.ecnu.edu.cn Apr. 22, 2015 Outline 1 Issues of main memory 2 Main memory management
More informationVirtual Memory #2 Feb. 21, 2018
15-410...The mysterious TLB... Virtual Memory #2 Feb. 21, 2018 Dave Eckhardt Brian Railing 1 L16_VM2 Last Time Mapping problem: logical vs. physical addresses Contiguous memory mapping (base, limit) Swapping
More informationDepartment of Computer Science Institute for System Architecture, Operating Systems Group REAL-TIME MICHAEL ROITZSCH OVERVIEW
Department of Computer Science Institute for System Architecture, Operating Systems Group REAL-TIME MICHAEL ROITZSCH OVERVIEW 2 SO FAR talked about in-kernel building blocks: threads memory IPC drivers
More informationCSE Computer Architecture I Fall 2011 Homework 07 Memory Hierarchies Assigned: November 8, 2011, Due: November 22, 2011, Total Points: 100
CSE 30321 Computer Architecture I Fall 2011 Homework 07 Memory Hierarchies Assigned: November 8, 2011, Due: November 22, 2011, Total Points: 100 Problem 1: (30 points) Background: One possible organization
More informationVirtual Memory Design and Implementation
Virtual Memory Design and Implementation To do q Page replacement algorithms q Design and implementation issues q Next: Last on virtualization VMMs Loading pages When should the OS load pages? On demand
More informationVirtual Memory: Mechanisms. CS439: Principles of Computer Systems February 28, 2018
Virtual Memory: Mechanisms CS439: Principles of Computer Systems February 28, 2018 Last Time Physical addresses in physical memory Virtual/logical addresses in process address space Relocation Algorithms
More informationInstruction Level Parallelism (Branch Prediction)
Instruction Level Parallelism (Branch Prediction) Branch Types Type Direction at fetch time Number of possible next fetch addresses? When is next fetch address resolved? Conditional Unknown 2 Execution
More informationMove back and forth between memory and disk. Memory Hierarchy. Two Classes. Don t
Memory Management Ch. 3 Memory Hierarchy Cache RAM Disk Compromise between speed and cost. Hardware manages the cache. OS has to manage disk. Memory Manager Memory Hierarchy Cache CPU Main Swap Area Memory
More informationMemory Management Ch. 3
Memory Management Ch. 3 Ë ¾¾ Ì Ï ÒÒØ Å ÔÔ ÓÐÐ 1 Memory Hierarchy Cache RAM Disk Compromise between speed and cost. Hardware manages the cache. OS has to manage disk. Memory Manager Ë ¾¾ Ì Ï ÒÒØ Å ÔÔ ÓÐÐ
More informationChapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative
Chapter 6 s Topics Memory Hierarchy Locality of Reference SRAM s Direct Mapped Associative Computer System Processor interrupt On-chip cache s s Memory-I/O bus bus Net cache Row cache Disk cache Memory
More informationComplex Pipelines and Branch Prediction
Complex Pipelines and Branch Prediction Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L22-1 Processor Performance Time Program Instructions Program Cycles Instruction CPI Time Cycle
More informationChapter 8: Memory-Management Strategies
Chapter 8: Memory-Management Strategies Chapter 8: Memory Management Strategies Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel 32 and
More informationVirtual Memory. Motivations for VM Address translation Accelerating translation with TLBs
Virtual Memory Today Motivations for VM Address translation Accelerating translation with TLBs Fabián Chris E. Bustamante, Riesbeck, Fall Spring 2007 2007 A system with physical memory only Addresses generated
More informationRecall: Address Space Map. 13: Memory Management. Let s be reasonable. Processes Address Space. Send it to disk. Freeing up System Memory
Recall: Address Space Map 13: Memory Management Biggest Virtual Address Stack (Space for local variables etc. For each nested procedure call) Sometimes Reserved for OS Stack Pointer Last Modified: 6/21/2004
More informationReplacement Policy: Which block to replace from the set?
Replacement Policy: Which block to replace from the set? Direct mapped: no choice Associative: evict least recently used (LRU) difficult/costly with increasing associativity Alternative: random replacement
More informationReducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip
Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off
More informationPage 1. Goals for Today" TLB organization" CS162 Operating Systems and Systems Programming Lecture 11. Page Allocation and Replacement"
Goals for Today" CS162 Operating Systems and Systems Programming Lecture 11 Page Allocation and Replacement" Finish discussion on TLBs! Page Replacement Policies! FIFO, LRU! Clock Algorithm!! Working Set/Thrashing!
More informationPage 1 ILP. ILP Basics & Branch Prediction. Smarter Schedule. Basic Block Problems. Parallelism independent enough
ILP ILP Basics & Branch Prediction Today s topics: Compiler hazard mitigation loop unrolling SW pipelining Branch Prediction Parallelism independent enough e.g. avoid s» control correctly predict decision
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationCS370 Operating Systems
CS370 Operating Systems Colorado State University Yashwant K Malaiya Spring 2018 L20 Virtual Memory Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 Questions from last time Page
More informationOPERATING SYSTEMS CS136
OPERATING SYSTEMS CS136 Jialiang LU Jialiang.lu@sjtu.edu.cn Based on Lecture Notes of Tanenbaum, Modern Operating Systems 3 e, 1 Chapter 5 INPUT/OUTPUT 2 Overview o OS controls I/O devices => o Issue commands,
More informationTopics to be covered. EEC 581 Computer Architecture. Virtual Memory. Memory Hierarchy Design (II)
EEC 581 Computer Architecture Memory Hierarchy Design (II) Department of Electrical Engineering and Computer Science Cleveland State University Topics to be covered Cache Penalty Reduction Techniques Victim
More informationSlide 6-1. Processes. Operating Systems: A Modern Perspective, Chapter 6. Copyright 2004 Pearson Education, Inc.
Slide 6-1 6 es Announcements Slide 6-2 Extension til Friday 11 am for HW #1 Previous lectures online Program Assignment #1 online later today, due 2 weeks from today Homework Set #2 online later today,
More informationCS 153 Design of Operating Systems Winter 2016
CS 153 Design of Operating Systems Winter 2016 Lecture 18: Page Replacement Terminology in Paging A virtual page corresponds to physical page/frame Segment should not be used anywhere Page out = Page eviction
More informationChapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs
Chapter 5 (Part II) Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Virtual Machines Host computer emulates guest operating system and machine resources Improved isolation of multiple
More informationMemory Hierarchy. Maurizio Palesi. Maurizio Palesi 1
Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio
More informationMo Money, No Problems: Caches #2...
Mo Money, No Problems: Caches #2... 1 Reminder: Cache Terms... Cache: A small and fast memory used to increase the performance of accessing a big and slow memory Uses temporal locality: The tendency to
More informationCOSC 6385 Computer Architecture - Memory Hierarchy Design (III)
COSC 6385 Computer Architecture - Memory Hierarchy Design (III) Fall 2006 Reducing cache miss penalty Five techniques Multilevel caches Critical word first and early restart Giving priority to read misses
More informationMemory management, part 2: outline
Memory management, part 2: outline Page replacement algorithms Modeling PR algorithms o Working-set model and algorithms Virtual memory implementation issues 1 Page Replacement Algorithms Page fault forces
More information