Design and Implementation of the Ascend Secure Processor. Ling Ren, Christopher W. Fletcher, Albert Kwon, Marten van Dijk, Srinivas Devadas

Similar documents
The Ascend Secure Processor. Christopher Fletcher MIT

Ascend: Architecture for Secure Computation on Encrypted Data Oblivious RAM (ORAM)

Freecursive ORAM: [Nearly] Free Recursion and Integrity Verification for Position-based Oblivious RAM

Design space exploration and optimization of path oblivious RAM in secure processors

Searchable Encryption Using ORAM. Benny Pinkas

Secure DIMM: Moving ORAM Primitives Closer to Memory

Design Space Exploration and Optimization of Path Oblivious RAM in Secure Processors

arxiv: v1 [cs.ar] 4 Nov 2016

Onion ORAM: Constant Bandwidth ORAM Using Additively Homomorphic Encryption Ling Ren

Efficient Memory Integrity Verification and Encryption for Secure Processors

Exploring Timing Side-channel Attacks on Path-ORAMs

Protecting Private Data in the Cloud: A Path Oblivious RAM Protocol

Memory Defenses. The Elevation from Obscurity to Headlines. Rajeev Balasubramonian School of Computing, University of Utah

6.857 L17. Secure Processors. Srini Devadas

Efficient Private Information Retrieval

Secure Remote Storage Using Oblivious RAM

Architecture- level Security Issues for main memory. Ali Shafiee

CSC 5930/9010 Cloud S & P: Cloud Primitives

Asymptotically Tight Bounds for Composing ORAM with PIR

AEGIS: Architecture for Tamper-Evident and Tamper-Resistant Processing

A HIGH-PERFORMANCE OBLIVIOUS RAM CONTROLLER ON THE CONVEY HC-2EX HETEROGENEOUS COMPUTING PLATFORM

at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2016 Signature redacted

AEGIS Secure Processor

Efficient Oblivious Data Structures for Database Services on the Cloud

Architectural Primitives for Secure Computation Platforms

ObfusMem: A Low-Overhead Access Obfuscation for Trusted Memories

Main Memory and the CPU Cache

TinySec: A Link Layer Security Architecture for Wireless Sensor Networks. Presented by Paul Ruggieri

Privacy-Preserving Computation with Trusted Computing via Scramble-then-Compute

TWORAM: Efficient Oblivious RAM in Two Rounds with Applications to Searchable Encryption

PageVault: Securing Off-Chip Memory Using Page-Based Authen?ca?on. Blaise-Pascal Tine Sudhakar Yalamanchili

CSAIL. Computer Science and Artificial Intelligence Laboratory. Massachusetts Institute of Technology

Intel Software Guard Extensions (Intel SGX) Memory Encryption Engine (MEE) Shay Gueron

Making Searchable Encryption Scale to the Cloud. Ian Miers and Payman Mohassel

Sanctum: Minimal HW Extensions for Strong SW Isolation

CSAIL. Computer Science and Artificial Intelligence Laboratory. Massachusetts Institute of Technology

Abstract. 1 Introduction

An Accountability Scheme for Oblivious RAMs

Towards Constant Bandwidth Overhead Integrity Checking of Untrusted Data

GP-ORAM: A Generalized Partition ORAM

Towards Constant Bandwidth Overhead Integrity Checking of Untrusted Data

ZeroTrace: Oblivious Memory Primitives from Intel SGX

Practical Oblivious RAM and its Applications

roram: Efficient Range ORAM with O(log 2 N) Locality

Ming Ming Wong Jawad Haj-Yahya Anupam Chattopadhyay

Authenticated Storage Using Small Trusted Hardware Hsin-Jung Yang, Victor Costan, Nickolai Zeldovich, and Srini Devadas

Cooperative Path-ORAM for Effective Memory Bandwidth Sharing in Server Settings

Cache Timing Attacks in Cryptography

Securing Cloud Computations with Oblivious Primitives from Intel SGX

SGX Security Background. Masab Ahmad Department of Electrical and Computer Engineering University of Connecticut

Using Address Independent Seed Encryption and Bonsai Merkle Trees to Make Secure Processors OS- and Performance-Friendly

CS252 Project TFS: An Encrypted File System using TPM

Influential OS Research Security. Michael Raitza

CSAIL. Computer Science and Artificial Intelligence Laboratory. Massachusetts Institute of Technology

Crypto Background & Concepts SGX Software Attestation

Multi-Client Oblivious RAM Secure Against Malicious Servers

Concrete cryptographic security in F*

1 Introduction. Albert Kwon*, David Lazar, Srinivas Devadas, and Bryan Ford Riffle. An Efficient Communication System With Strong Anonymity

Burst ORAM: Minimizing ORAM Response Times for Bursty Access Patterns

A practical integrated device for lowoverhead, secure communications.

Trojan-tolerant Hardware & Supply Chain Security in Practice

Atom. Horizontally Scaling Strong Anonymity. Albert Kwon Henry Corrigan-Gibbs 10/30/17, SOSP 17

Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation!

Ring ORAM: Closing the Gap Between Small and Large Client Storage Oblivious RAM

PROTECTING CONVERSATIONS

Sorting integer arrays: security, speed, and verification. D. J. Bernstein

Firewalls, Tunnels, and Network Intrusion Detection

Block Cipher Modes of Operation

Distributed Oblivious RAM for Secure Two-Party Computation

RISCV with Sanctum Enclaves. Victor Costan, Ilia Lebedev, Srini Devadas

Cryptography: More Primitives

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

Post-Quantum Cryptography A Collective Challenge

On the (In)security of Hash-based Oblivious RAM and a New Balancing Scheme

Growth of the Internet Network capacity: A scarce resource Good Service

Practical Oblivious RAM and its Applications

HP S1500 SSL Appliance. Product overview. Key features. Data sheet

Recursive ORAMs with Practical Constructions

SGX Enclave Life Cycle Tracking TLB Flushes Security Guarantees

ObliviSync: Practical Oblivious File Backup and Synchronization

Somewhat Homomorphic Encryption

Oblivious Computation with Data Locality

DATA INTEGRITY TECHNIQUES IN CLOUD: AN ANALYSIS

SIDE CHANNEL ATTACKS AGAINST IOS CRYPTO LIBRARIES AND MORE DR. NAJWA AARAJ HACK IN THE BOX 13 APRIL 2017

Usable PIR. Network Security and Applied. Cryptography Laboratory.

Argon2 for password hashing and cryptocurrencies

Lecture Nov. 21 st 2006 Dan Wendlandt ISP D ISP B ISP C ISP A. Bob. Alice. Denial-of-Service. Password Cracking. Traffic.

Block Cipher Modes of Operation

ANET: An Anonymous Networking Protocol

(a) Symmetric model (b) Cryptography (c) Cryptanalysis (d) Steganography

Bucket ORAM: Single Online Roundtrip, Constant Bandwidth Oblivious RAM

Kurose & Ross, Chapters (5 th ed.)

Danube University Krems. The University for Continuing Education. Security Issues in Resource-limited Sensor Networks. Thilo Sauter Albert Treytl

Computer Security. 10r. Recitation assignment & concept review. Paul Krzyzanowski. Rutgers University. Spring 2018

Memory Hierarchies && The New Bottleneck == Cache Conscious Data Access. Martin Grund

Efficient Quantum-Immune Keyless Signatures with Identity

TSKT-ORAM: A Two-Server k-ary Tree Oblivious RAM without Homomorphic Encryption

Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance

Persistent key, value storage

Lecture 6: Symmetric Cryptography. CS 5430 February 21, 2018

Transcription:

Design and Implementation of the Ascend Secure Processor Ling Ren, Christopher W. Fletcher, Albert Kwon, Marten van Dijk, Srinivas Devadas

Agenda Motivation Ascend Overview ORAM for obfuscation Ascend: Frontend Ascend: Backend ASIC Implementation Conclusion

Motivation

Motivation Computation outsourcing is becoming more ubiquitous. Involves sharing private data to untrusted servers/applications. With physical access to the compute node, adversary can observe access patterns.

Motivation The state-of-the-art secure systems can not stop an adversary targeting Memory access pattern. Encrypting data does not help. Solution: Necessary to provide hardware support for obfuscating the access patterns.

The dilemma of secure hardware architects

The dilemma of secure hardware architects For a given program P and input x, it needs T(P,x) time. Two Choices: 1. Optimize the program based on input data = Security Vulnerability 2. T(p) is oblivious to input = Worst case Performance Ascend leans towards choice 2. What is the mid-point? Application specific security features

Ascend: Problem Statement Given an arbitrary batch program P, a public length of time T and two arbitrary inputs to P namely x and y: running P(x) for T time is indistinguishable from running P(y) for T time from the perspective of the Ascend chip s power and IO pins.

The protocol overview

Threat Model The session key K is stored in a register, not accessible to P. The Ascend chip is assumed to be tamper-resistant : TCB. The server can monitor the traffic and timing on I/O pins. Server can tamper the Program or external memory. Analog channels are not protected.

ORAM for Obfuscation

Oblivious RAM (ORAM) [Goldreich-Ostrovsky 96] On-chip Chip pins Cache miss ORAM Controller Shuffled Provably removes all access pattern, leakage

Basic ORAM Primitive Given two memory sequences A and A` with same length The sequences can be read/write. ORAM guarantees that both are computationally indistinguishable. An adversary watching the accesses can not tell: Whether the source is reading or writing? Where the the access going to? What data is accessed?

Path ORAM [CCS 13] Chip pins The ORAM ORAM Controller (on-chip) Read/writes

Path ORAM Algorithm 1. Look up PosMap for input address a to obtain the leaf label l. Generate and replace l with a new random label l`. 2. Traverse the path to l and decrypt and store all the data in stash. 3. Update block a in the stash to l`. 4. If not write, return the data else write the data to contents of 5. Evict and encrypt as many blocks as possible from the stash to P(l) in the ORAM tree

Bucket Made up of Z (L + O(L) + B) bits. If L=4, there is 50% of actual data on DRAM. AES-128 in counter mode is used to encrypt the plaintext bucket A monotonically increasing counter is used to encrypt: Each 128 bit chunk is encrypted with The value of IV (counter) is added along the bucket. Use a different key K every run to avoid a replay attack.

Recursive ORAM Size of the PosMap grows linearly with the size. Such huge memory on-chip is a bad idea. This is similar to classic VA PA conversion! Solution: Store PosMaps in a separate ORAM. There are effectively two types of ORAMs: Data ORAM PosMap ORAM

Recursive ORAM

ORAM in Hardware: Challenges Size of PosMap. Even with a recursive ORAM, a large chunk of data accesses is for PosMap ORAM. Complex Stash Eviction Logic Modifying DRAM as ORAM Inefficiencies due to row misses (accessing different rows) Percentage of Bytes read from PosMap ORAMs for X=8 and Z=4.

Frontend

PLB and Unified ORAM PosMap Lookaside Buffer(PLB): Store the leaf information to avoid accesses to PosMap ORAM. Security Risk! Proposed Solution: Combine the PosMap and data ORAMs to form a Unified ORAM This Unified ORAM contains both Path and data info in its leaves. PosMap access is as costly as data access! But secure.

Security problem Data ORAM Map ORAM ORAM-level access pattern Time Without PLB With PLB

Data ORAM Unified ORAM Map ORAM Path Path Path Path A Unified ORAM A+1 23

Unified ORAM ORAM-level access pattern Data ORAM Map ORAM Time Unified ORAM

PMMAC - Memory Integrity Check The data read from the external memory is passed through PMMAC to verify the authenticity. But MAC is susceptible to replay attacks. Paper proposes to use PosMap entries as non-repeating counters. MAC is attached to the data block and is relatively small compared to the data. Authors prove that Breaking the PMMAC scheme is as hard as breaking the underlying MAC scheme.

PMMAC - Memory Integrity Check Consider block a with data d and access counter c. Data Write: Replace PosMap entry of a with c Generate the leaf l as mod 2 L Backend receives the data as (h,d) where Data Read: PMMAC receives data from backend as (h *, d * ) You can verify the integrity by

Backend

Stash Eviction Logic One of the most crucial part of Path ORAM and generally the bottleneck for throughput is eviction from stash. Strategy should also not let the stash overflow. Push each evicted block to the deepest possible leaf in the Path (P(l)) During the read/eviction, there are basically 2 tasks: Generate the path of access [PushBack()] Push the data down the path. [PushToLeaf()] Authors propose a single cycle algorithm for PushBack()

Stash Eviction Logic Consider a block a that needs to be evicted from the stash which resided at leaf l and now moved to l`. You need to figure out what s the best place to push the block a in the tree, for this scenario. Info necessary: Current Occupancy, Paths to l and l` and stash. Algorithm: a_loc = PushBack(l, l`, occupancy); //Called many times PushToLeaf(stash, l); //Clears the stash for the path l

Mapping ORAM to DRAM In a naïve representation of ORAM in DRAM, every access to a leaf is different DRAM row. Solution Build subtrees and map to same row of the DRAM. Improved the bandwidth to 90-95% of the peak bandwidth. Example of a k=2 subtree

ASIC Implementation

Chip Specification 25 SPARC T1 Cores (Princeton) The LLC misses are handled by the ORAM controller ORAM Controller with L=23 and B=512 AES and SHA units (for ORAM and server communication) PMMAC support with 64bit counters and SHA3-224 PosMap of size 8kb on chip (6 levels of recursion)

ASIC Implementation

Power, Performance and Area Consumes 299mW at 857MHz with V dd =1.1V (32nm node) Completes an ORAM access of 512 bits in 1275 cycles Average slowdown of around 4x on SPEC-Int-2006 Total area of 0.326 mm 2 for ORAM Controller. Module Frontend Backend Encryption Dimensions( um) 636.7 x 218.7 346.6 x 364.5 669.0 x 364.5 Area (mm 2 ) 0.139 0.126 0.244

Challenges Significant storage space is spent for metadata. Can not have a large DRAM Total number of memory accesses are not hidden Single user is resident on the chip Can not bypass the ORAM controller Increased power consumption my multiple redundant accesses

Conclusion This work presents the first silicon implementation of ORAM, integrated in a system. Presents the entire execution model for running an untrusted program on sensitive user data. The implemented logic is only half the size of a single SPARC core. Adds an estimated performance overhead of 4x a reasonable ask if security is the first class citizen.

Questions?