Design and Implementation of the Ascend Secure Processor Ling Ren, Christopher W. Fletcher, Albert Kwon, Marten van Dijk, Srinivas Devadas
Agenda Motivation Ascend Overview ORAM for obfuscation Ascend: Frontend Ascend: Backend ASIC Implementation Conclusion
Motivation
Motivation Computation outsourcing is becoming more ubiquitous. Involves sharing private data to untrusted servers/applications. With physical access to the compute node, adversary can observe access patterns.
Motivation The state-of-the-art secure systems can not stop an adversary targeting Memory access pattern. Encrypting data does not help. Solution: Necessary to provide hardware support for obfuscating the access patterns.
The dilemma of secure hardware architects
The dilemma of secure hardware architects For a given program P and input x, it needs T(P,x) time. Two Choices: 1. Optimize the program based on input data = Security Vulnerability 2. T(p) is oblivious to input = Worst case Performance Ascend leans towards choice 2. What is the mid-point? Application specific security features
Ascend: Problem Statement Given an arbitrary batch program P, a public length of time T and two arbitrary inputs to P namely x and y: running P(x) for T time is indistinguishable from running P(y) for T time from the perspective of the Ascend chip s power and IO pins.
The protocol overview
Threat Model The session key K is stored in a register, not accessible to P. The Ascend chip is assumed to be tamper-resistant : TCB. The server can monitor the traffic and timing on I/O pins. Server can tamper the Program or external memory. Analog channels are not protected.
ORAM for Obfuscation
Oblivious RAM (ORAM) [Goldreich-Ostrovsky 96] On-chip Chip pins Cache miss ORAM Controller Shuffled Provably removes all access pattern, leakage
Basic ORAM Primitive Given two memory sequences A and A` with same length The sequences can be read/write. ORAM guarantees that both are computationally indistinguishable. An adversary watching the accesses can not tell: Whether the source is reading or writing? Where the the access going to? What data is accessed?
Path ORAM [CCS 13] Chip pins The ORAM ORAM Controller (on-chip) Read/writes
Path ORAM Algorithm 1. Look up PosMap for input address a to obtain the leaf label l. Generate and replace l with a new random label l`. 2. Traverse the path to l and decrypt and store all the data in stash. 3. Update block a in the stash to l`. 4. If not write, return the data else write the data to contents of 5. Evict and encrypt as many blocks as possible from the stash to P(l) in the ORAM tree
Bucket Made up of Z (L + O(L) + B) bits. If L=4, there is 50% of actual data on DRAM. AES-128 in counter mode is used to encrypt the plaintext bucket A monotonically increasing counter is used to encrypt: Each 128 bit chunk is encrypted with The value of IV (counter) is added along the bucket. Use a different key K every run to avoid a replay attack.
Recursive ORAM Size of the PosMap grows linearly with the size. Such huge memory on-chip is a bad idea. This is similar to classic VA PA conversion! Solution: Store PosMaps in a separate ORAM. There are effectively two types of ORAMs: Data ORAM PosMap ORAM
Recursive ORAM
ORAM in Hardware: Challenges Size of PosMap. Even with a recursive ORAM, a large chunk of data accesses is for PosMap ORAM. Complex Stash Eviction Logic Modifying DRAM as ORAM Inefficiencies due to row misses (accessing different rows) Percentage of Bytes read from PosMap ORAMs for X=8 and Z=4.
Frontend
PLB and Unified ORAM PosMap Lookaside Buffer(PLB): Store the leaf information to avoid accesses to PosMap ORAM. Security Risk! Proposed Solution: Combine the PosMap and data ORAMs to form a Unified ORAM This Unified ORAM contains both Path and data info in its leaves. PosMap access is as costly as data access! But secure.
Security problem Data ORAM Map ORAM ORAM-level access pattern Time Without PLB With PLB
Data ORAM Unified ORAM Map ORAM Path Path Path Path A Unified ORAM A+1 23
Unified ORAM ORAM-level access pattern Data ORAM Map ORAM Time Unified ORAM
PMMAC - Memory Integrity Check The data read from the external memory is passed through PMMAC to verify the authenticity. But MAC is susceptible to replay attacks. Paper proposes to use PosMap entries as non-repeating counters. MAC is attached to the data block and is relatively small compared to the data. Authors prove that Breaking the PMMAC scheme is as hard as breaking the underlying MAC scheme.
PMMAC - Memory Integrity Check Consider block a with data d and access counter c. Data Write: Replace PosMap entry of a with c Generate the leaf l as mod 2 L Backend receives the data as (h,d) where Data Read: PMMAC receives data from backend as (h *, d * ) You can verify the integrity by
Backend
Stash Eviction Logic One of the most crucial part of Path ORAM and generally the bottleneck for throughput is eviction from stash. Strategy should also not let the stash overflow. Push each evicted block to the deepest possible leaf in the Path (P(l)) During the read/eviction, there are basically 2 tasks: Generate the path of access [PushBack()] Push the data down the path. [PushToLeaf()] Authors propose a single cycle algorithm for PushBack()
Stash Eviction Logic Consider a block a that needs to be evicted from the stash which resided at leaf l and now moved to l`. You need to figure out what s the best place to push the block a in the tree, for this scenario. Info necessary: Current Occupancy, Paths to l and l` and stash. Algorithm: a_loc = PushBack(l, l`, occupancy); //Called many times PushToLeaf(stash, l); //Clears the stash for the path l
Mapping ORAM to DRAM In a naïve representation of ORAM in DRAM, every access to a leaf is different DRAM row. Solution Build subtrees and map to same row of the DRAM. Improved the bandwidth to 90-95% of the peak bandwidth. Example of a k=2 subtree
ASIC Implementation
Chip Specification 25 SPARC T1 Cores (Princeton) The LLC misses are handled by the ORAM controller ORAM Controller with L=23 and B=512 AES and SHA units (for ORAM and server communication) PMMAC support with 64bit counters and SHA3-224 PosMap of size 8kb on chip (6 levels of recursion)
ASIC Implementation
Power, Performance and Area Consumes 299mW at 857MHz with V dd =1.1V (32nm node) Completes an ORAM access of 512 bits in 1275 cycles Average slowdown of around 4x on SPEC-Int-2006 Total area of 0.326 mm 2 for ORAM Controller. Module Frontend Backend Encryption Dimensions( um) 636.7 x 218.7 346.6 x 364.5 669.0 x 364.5 Area (mm 2 ) 0.139 0.126 0.244
Challenges Significant storage space is spent for metadata. Can not have a large DRAM Total number of memory accesses are not hidden Single user is resident on the chip Can not bypass the ORAM controller Increased power consumption my multiple redundant accesses
Conclusion This work presents the first silicon implementation of ORAM, integrated in a system. Presents the entire execution model for running an untrusted program on sensitive user data. The implemented logic is only half the size of a single SPARC core. Adds an estimated performance overhead of 4x a reasonable ask if security is the first class citizen.
Questions?