PageVault: Securing Off-Chip Memory Using Page-Based Authen?ca?on. Blaise-Pascal Tine Sudhakar Yalamanchili

Size: px

Start display at page:

Download "PageVault: Securing Off-Chip Memory Using Page-Based Authen?ca?on. Blaise-Pascal Tine Sudhakar Yalamanchili"

Audrey Marshall
5 years ago
Views:

1 PageVault: Securing Off-Chip Memory Using Page-Based Authen?ca?on Blaise-Pascal Tine Sudhakar Yalamanchili

2 Outline Background: Memory Security Motivation Proposed Solution Implementation Evaluation Conclusion

3 Background Cloud Computing Threat Model Compute as a service deals with sensitive content trading, banking, medical, legal, search, etc. Offloading data + computation

4 Background Cloud Computing Threat Model Data encryption (SSL) Compute Sandboxing (Intel SGX) Hardware Security: Admin? Admin

5 Background Memory Attacks Snooping Spoofing Splicing Replay Authentication Encryption

6 Background Hash Generation Generate unique seed Create nonce from seed Hash block using nonce Store MAC for integrity check Where to store? - CPU area limited - Store it in memory - Still secure?

Integrity Check Fetch block Fetch partial tree Re-compute root

7 Background Merkle Tree Authentication Generate MAC for each block Build binary hash tree Store tree nodes in memory Store root on chip Integrity Check Fetch block Fetch partial tree Re-compute root Storage Overhead Large User Data = 8 * 64B = 512B Meta Data = 14 * 16B = 224B

Tree 40% 1% 50% Meta-Data 41% 23% 55% - 8GB DRAM - 128bit MAC [1] C, Yan. D. Englender, M. Prulovic, et al.

8 Motivation Authentication Cost Memory overhead GMT 41% BMT 23% SGX 55% Runtime overhead GMT [1] 150% BMT [2] 13% SGX [3] <5% Processor cost 32KB counter cache GMT BMT SGX User blocks 59% 77% 45% MACs 0% 20% 5% Counters 1% 2% 0% Hash Tree 40% 1% 50% Meta-Data 41% 23% 55% - 8GB DRAM - 128bit MAC [1] C, Yan. D. Englender, M. Prulovic, et al. ISCA 06 [2] B. Rogers, Raleigh, S. Chhabra, M. Prvulovic et al. ISCA 07 [3] S. Gueron, Cryptology eprint Archive, Report 2016/204

9 Proposed Solution Key Insight Access the memory at larger block granularity Potential Benefits Reduce storage overhead Reduce memory traffic Reduce runtime overhead Challenges Maintain Security Cache Pollution? User Data = 8 * 64B = 512B Meta Data = * * 16B = = 64B 32B 160B

Proposed Solution Aggregate Message Authentication [1] Use XOR to combine blocks: - Aggregate MAC = MAC 0 MAC 1 The aggregate MAC is secure if both operands are unique MACs are unique spatially

10 Proposed Solution Aggregate Message Authentication [1] Use XOR to combine blocks: - Aggregate MAC = MAC 0 MAC 1 The aggregate MAC is secure if both operands are unique MACs are unique spatially - Seed s block address MACs are unique temporally - Seed s block counter A partition is a set of consecutive blocks protected by a single aggregate MAC [1] J. Katz and A. Y. Lindell. CT-RSA-2008

11 Proposed Solution Read Transaction A single MAC protect all blocks in a page partition Fetch all blocks in the page partition on read access Compute the MAC of each fetched block Compute the aggregate MAC Compare with cached MAC

12 Proposed Solution Write Transaction Operates at block granularity Reduce memory traffic Clear aggregate MAC after each read Compute MAC of dirty block Writeback the dirty block Append to aggregate MAC - Aggregate MAC = Hash(block)

Implementation Handling Partial Read Requests Aggregate MAC only protects off-chip blocks Need to track which blocks are off-chip Where to store tracking info?

13 Implementation Handling Partial Read Requests Aggregate MAC only protects off-chip blocks Need to track which blocks are off-chip Where to store tracking info? Use counter cache - Counter cache access latency - Counter overflow Use LLC lookup transaction - Group blocks from same partition into same set - Shift index region of block address left - Return full partition status mask (e.g. 4-bit register) Partition size = 4

14 Implementation Handling Cache Eviction Evicting block in currently accessed partition - Will invalidate the partition on-chip status - Add lookup logic to pick from next partition in set Evicting Clean Blocks - Aggregate MAC should be updated - Need to recompute block MAC - No need to send the block off-chip

15 Implementation PageVault Architecture Vault controller Counter cache MAC cache HMAC engine AES engine Command Queue

16 Evaluation Simulation Manifold Full System Simulator 3 Ghz 4-OoO-cores 32K-L1 2MB-L2 DramSim2-8GB - DDR3-1.25ns - 2 channels GCM-AES cycles - 16 stages HMAC-SHA1-8 cycles bit MACs 8 KB counter cache 8 KB MAC cache Splash2, Parsec, GraphBig benchmarks Metrics Runtime/Storage overhead

17 Evaluation Systems Configuration NOEA: Baseline system with no protection GMT: Galois Merkle Tree (vanilla) BMT: Bonsai Merkle Tree (state of the art) SGX: Intel SGX (applied) PMT2: PageVault with 2 blocks per partition PMT4: PageVault with 4 blocks per partition PMT8: PageVault with 8 blocks per partition

18 Results Memory Overhead Meta-data overhead reduction: from 23% to 8% - Using 128-bit MACs to protect 8GB user data Can reach down to 5% for higher partition size (8) User data occupancy above 90% Why? MACs reduction by 1/N GMT BMT SGX PMT2 PMT4 PMT8 User Data 59% 77% 45% 86% 91% 94% MAC s 0% 20% 5% 11% 6% 3% Counters 1% 2% 0% 2% 2% 2% Hash Tre 40% 1% 50% 1% 1% 1% Meta-D ata 41% 23% 55% 15% 8% 5%

Performance Exploits Prefetched blocks reuse - Accuracy above 85%

19 Results Execution Time: up to 10-12% improvement bodytrack, Lu-c outperform NOEA Parsec GraphBIG & Splash2 Parsec and Splash2 Performance Exploits Prefetched blocks reuse - Accuracy above 85% MAC cache efficiency - Hit rate above 70% Reduced Hash Processing Time

20 Results GraphBIG Prefetch Accuracy Good prefetch accuracy in LLC (~80%) DFS has high cache misses due to sync variables Parsec & Splash2

21 Results GraphBIG Memory Traffic Off-chip read traffic degrades by 15% The write traffic shows similar degradation - Due to synchronization variables creating cache pollution.

22 Results GraphBIG Memory Traffic Off-chip traffic degrades by 15% Traffic comes from hash tree traffic for counters

23 Results Reducing the Partition Size Improve runtime by 8% Less cache pollution But? - 2x Memory overhead - from 8% to 15% Adaptive resizing - Compiler driven - Hardware driven use block counters history

24 Conclusion A cost efficient memory protection Exploits AMAC properties Significant reduction of storage overhead Total runtime execution is improved Increase compute capacity of the secure system Adaptive Compression scheme

25 Thank You!

26 Results Counter Cache Hit Rate

27 Results Runtime Effect on 8KB vs 16KB MAC cache Runtime Effect on Partition Size

Runtime overhead ~13% Storage overhead ~53% Prior Work GMT [Chenyu

28 BMT [Brian 07] Counter Based Encryption Hash tree covers counters MACs authenticate data. Runtime overhead ~13% Storage overhead ~53% Prior Work GMT [Chenyu 06] Counter Based Encryption Hash tree covers data Hash tree covers counters Runtime overhead ~151% Storage overhead ~134%

29 Background Basic Direct Encryption Encrypt block using key AES is very slow! Counter Mode Encryption Generate unique seed Create pad from seed Encrypt block using pad Cache the pad for decryption Can fetch block while accessing cache

30 Evaluation Benchmarks GraphBIG has 100x more traffic than Splash2 Parsec and Splash2 GraphBIG

31 Results Splash/Parsec Prefetch Accuracy Good prefetch accuracy in LLC (~85%) Average miss rate reduction (10%) Parsec & Splash2

32 Results Splash/Parsec Memory Traffic Off-chip read traffic is reduced by 10% The write traffic shows similar reduction

33 Background Bonsai Merkle Tree Tree covers counters only Counters are small Tree overhead reduced MAC overhead still large bit MAC > 25% bit MAC > 31% bit MAC > 38% Compression? - bits shuffling - hardware cost - B. Rogers et al. ISCA 07 User Data = 8 * 64B = 512B Meta Data = 10 * 16B = 160B

Silent Shredder: Zero-Cost Shredding For Secure Non-Volatile Main Memory Controllers

Silent Shredder: Zero-Cost Shredding For Secure Non-Volatile Main Memory Controllers 1 ASPLOS 2016 2-6 th April Amro Awad (NC State University) Pratyusa Manadhata (Hewlett Packard Labs) Yan Solihin (NC