Silent Shredder: Zero-Cost Shredding For Secure Non-Volatile Main Memory Controllers

Similar documents
Silent Shredder: Zero-Cost Shredding for Secure Non-Volatile Main Memory Controllers

PageVault: Securing Off-Chip Memory Using Page-Based Authen?ca?on. Blaise-Pascal Tine Sudhakar Yalamanchili

arxiv: v1 [cs.dc] 3 Jan 2019

Using Address Independent Seed Encryption and Bonsai Merkle Trees to Make Secure Processors OS- and Performance-Friendly

1. INTRODUCTION. and reduced lifetime) in single-level cell (SLC) NVMs;

Module Outline. CPU Memory interaction Organization of memory modules Cache memory Mapping and replacement policies.

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

WALL: A Writeback-Aware LLC Management for PCM-based Main Memory Systems

Computer Architecture Memory hierarchies and caches

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

DRAM Tutorial Lecture. Vivek Seshadri

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

I, J A[I][J] / /4 8000/ I, J A(J, I) Chapter 5 Solutions S-3.

Chapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

Scalable High Performance Main Memory System Using PCM Technology

Main Memory Supporting Caches

Low-Cost Inter-Linked Subarrays (LISA) Enabling Fast Inter-Subarray Data Movement in DRAM

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

Row Buffer Locality Aware Caching Policies for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface

EFFICIENTLY ENABLING CONVENTIONAL BLOCK SIZES FOR VERY LARGE DIE- STACKED DRAM CACHES

MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices

Structure of Computer Systems

Aerie: Flexible File-System Interfaces to Storage-Class Memory [Eurosys 2014] Operating System Design Yongju Song

Efficient Data Mapping and Buffering Techniques for Multi-Level Cell Phase-Change Memories

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

Assignment 1 due Mon (Feb 4pm

A Comparison of Capacity Management Schemes for Shared CMP Caches

Memory Hierarchy. Reading. Sections 5.1, 5.2, 5.3, 5.4, 5.8 (some elements), 5.9 (2) Lecture notes from MKP, H. H. Lee and S.

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Computer Science 146. Computer Architecture

Advanced Memory Organizations

Moneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010

CSC501 Operating Systems Principles. OS Structure

Effect of memory latency

V. Primary & Secondary Memory!

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN

Secure Hierarchy-Aware Cache Replacement Policy (SHARP): Defending Against Cache-Based Side Channel Attacks

5 Solutions. Solution a. no solution provided. b. no solution provided

Comparing Memory Systems for Chip Multiprocessors

Adapted from David Patterson s slides on graduate computer architecture

EN1640: Design of Computing Systems Topic 06: Memory System

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Architecture- level Security Issues for main memory. Ali Shafiee

Intel Software Guard Extensions (Intel SGX) Memory Encryption Engine (MEE) Shay Gueron

Random-Access Memory (RAM) Systemprogrammering 2007 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

LECTURE 5: MEMORY HIERARCHY DESIGN

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

Random-Access Memory (RAM) Systemprogrammering 2009 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics! The memory hierarchy

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

CS/ECE 3330 Computer Architecture. Chapter 5 Memory

SRM-Buffer: An OS Buffer Management SRM-Buffer: An OS Buffer Management Technique toprevent Last Level Cache from Thrashing in Multicores

1. Creates the illusion of an address space much larger than the physical memory

Toward SLO Complying SSDs Through OPS Isolation

Copyright 2012, Elsevier Inc. All rights reserved.

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Memory Hierarchy. Slides contents from:

The Reuse Cache Downsizing the Shared Last-Level Cache! Jorge Albericio 1, Pablo Ibáñez 2, Víctor Viñals 2, and José M. Llabería 3!!!

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Energy-Efficient Cache Design Using Variable-Strength Error-Correcting Codes

Token Coherence. Milo M. K. Martin Dissertation Defense

Main Memory (Fig. 7.13) Main Memory

LECTURE 10: Improving Memory Access: Direct and Spatial caches

Introduction to Operating Systems. Chapter Chapter

Computer System Overview OPERATING SYSTEM TOP-LEVEL COMPONENTS. Simplified view: Operating Systems. Slide 1. Slide /S2. Slide 2.

Copyright 2012, Elsevier Inc. All rights reserved.

Handout 4 Memory Hierarchy

1. Memory technology & Hierarchy

Lecture 2: Memory Systems

Chapter 8. Virtual Memory

Hardware Support for NVM Programming

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi

Ming Ming Wong Jawad Haj-Yahya Anupam Chattopadhyay

Memory Technologies. Technology Trends

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho

EN1640: Design of Computing Systems Topic 06: Memory System

Using Transparent Compression to Improve SSD-based I/O Caches

OS Assignment I. The result is a smaller kernel. Microkernels provide minimal process and memory management, in addition to a communication facility.

SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS

Optimizing Datacenter Power with Memory System Levers for Guaranteed Quality-of-Service

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network

Virtual Memory: From Address Translation to Demand Paging

Exploring Latency-Power Tradeoffs in Deep Nonvolatile Memory Hierarchies

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

Transcription:

Silent Shredder: Zero-Cost Shredding For Secure Non-Volatile Main Memory Controllers 1 ASPLOS 2016 2-6 th April Amro Awad (NC State University) Pratyusa Manadhata (Hewlett Packard Labs) Yan Solihin (NC State University) Stuart Haber (Hewlett Packard Labs) William Horne (Hewlett Packard Labs)

Outline Background Related Work Goal Design Evaluation Summary 2

Outline Background Related Work Goal Design Evaluation Summary 3

Emerging NVMs Emerging NVMs are promising replacements for DRAM. Fast (comparable to DRAM). Dense. Non-Volatile: persistent memory, no refresh power. Examples: Phase-Change Memory (PCM). Memristor. Source: http://www.techweekeurope.co.uk/ 4

Emerging NVMs NVMs have their drawbacks: Limited endurance (e.g., PCM has ~10 8 writes per cell). Slow writes (e.g., PCM has ~150ns write latency). Data Remanence attacks are easier! Requirements for using NVMs: Encrypt Data. Encryption reduces efficiency of DCW and Flip-N-Write Reduce number of writes, e.g., DCW and Flip-N-Write. 5

Data Shredding Data Shredding: The operation of zeroing out memory to avoid data leak. It prevents data leak between processes or virtual machines. Expensive: Up to 40% of page fault time could be spent in zeroing pages. For tested graph analytics apps, about 41.9% of memory writes could result from shredding. 6

Example of Data Shredding VM 4- Zero out 3- Request allocation Process VM Process OS NVM 1- Request allocation 2- Zero out Hypervisor 7

How to implement shredding? Technique No cache pollution Lowprocessor time No Bus Traffic No Memory Writes Persistent Regular stores (indirectly) (indirectly) Non-Temporal Stores Can we shred without writing? DMA-Support Non- Temporal Bulk Zeroing [Jiang, PACT09] RowClone (DRAM specific) [Shehadri, MICRO 2013] 8

Threat Model Physical access to the memory. Snoop memory bus. 9

Encryption/Decryption Process Encryption/Decryption: CTR-mode. Last-level Cache (LLC) 5- Return decrypted 1- Cache line miss 3- Submit read request XOR 4- Receive from NVM 3- Generate One-Time Pad (OTP) Encryption Key 2- Retrieve unique IV Initialization Vector (IV) Secure Area The IV must change every time you encrypt new data. Key insight: IV used for encryption = IV used for decryption. 10

Initialization Vectors We use Split-Counter Scheme [C. Yan, ISCA 2006] : 4KB Page (64 Cache lines) Cache line 0 Cache line 1 512-bits 512-bits Cache line 63 512-bits 64-bit 7-bit 7-bit 7-bit IV Major Minor Cache line address Padding Major (per page) 11

Typical Shredding Non-temporal Bulk Shredding Zero Page X Read & update counters Counter Cache Write encrypted Encryption/ Decryption Page X NVM 12

Our Proposal: Silent Shredder Key idea: instead of zeroing shredded page, make it unintelligible By changing the key or IV prior to decryption Design options: Have a key for every process - Impractical: the memory controller needs to know process ID. - Shared data requires same key. Increment all minor counters of a page - Increases re-encryption frequency: minor counters will overflow faster. Increment the major counter 13

Software Compatibility To achieve software compatibility, would like to have zero cache lines for new/shredded pages. Shredding: Increment major counter and zero all minor counters. Zero-filled cache lines are returned for zeroed minor counters. When minor counter overflows, it starts from 1. 14

Design 1. Shred p Memory Controller 3. Increment M reset m1 m64 +1 5. Done Proc 2. Invalidate p 0 0 0 0 0 0 Cache and Coherence Controller 4. Acknowledge P Tag Major Ctr Minor counters 15 Counter Cache

Design Counter Cache Tag Major Ctr 1. Miss x Minor counters LLC MUX 4. Return the fetched block Or a zero-filled block 2. Read the minor counter of the block x =0? 3b. Yes 3a. No: fetch x D k MC 00..0 16 NVMM

Evaluation Methodology To evaluate our design, we use Gem5 to run a modified kernel. Added shred command to execute inside kernel s clear_page function. Baseline uses non-temporal stores bulk zeroing. We use multi-programmed workloads from SPEC 2006 and PowerGraph suites. Warm up 1B then run 500M instructions on each core (~4B overall) from initialization and graph construction phases. We assume battery-backed Counter Cache. 17

Configurations Processor CPU L1 Cache L2 Cache L3 Cache L4 Cache 8-Cores, X86-64, 2GHz clock 2 cycles, 64KB size, 8-way, LRU, 64B block size 8 cycles, 512KB size, 8-way, LRU, 64B block size Shared, 25 cycles, 8MB size, 8-way, LRU, 64B block size Shared 35 cycles, 64MB size, 8-way, LRU, 64B block size Main Memory (NVM) Operating System Capacity 16GB # Channels 2 channels Channel bandwidth 12.8 GB/s Read/Write latency 75ns/150ns IV Cache 10 cycles, 4MB capacity, 8-way associativity, 64B blocks OS Gentoo Kernel 3.4.91 18

Characterization 19

Results 48.6% write reduction 44.6% (very high shredding) 50.3% read traffic reduction 46.5% (Very high shredding) 20

Results 3.3x reads speed up 2.8x (very high shredding) 6.4% IPC Improvement 19.3% (very high shredding) 21

Other Use Cases Bulk zeroing: Silent Shredder can be used for initializing large areas. Large-Scale Data Isolation: Fast data shredding for isolation across VMs or isolated nodes. Fast and efficient virtual disk provisioning when using byteaddressable NVM devices. Garbage collectors in managed programming languages. 22

Summary We eliminate writes due to data shredding. Our scheme is based on manipulating IV values. Silent Shredder leads to write reduction and performance improvement. Applicable to other cases. 23

Thanks! Questions 24

Encryption Assumption Encryption: CTR-mode. Plaintext XOR Ciphertext Same IV should never be reused for encryption. OTP generation doesn t need the data. One Time Pad (OTP) Encryption Global Key Initialization Vector (IV) 25

Security Concerns Any IV-based encryption scheme needs to guarantee the following: Counter Cache Persistency Counters must be kept persistent either by battery-backed, using write-through cache or using NVM-based counter cache. IVs and Data Integrity IVs and Data must be protected from tampering/replaying. Authenticated encryption, e.g., Bonsai Merkle Tree, can be used. 26

Backup slides 27

Costs of Data Shredding Increasing overall number of main memory writes. Our experiments showed that up to 42% of main memory writes can result from shredding. 28