Making Searchable Encryption Scale to the Cloud. Ian Miers and Payman Mohassel

Similar documents
Searchable Encryption Using ORAM. Benny Pinkas

Main Memory and the CPU Cache

Usable PIR. Network Security and Applied. Cryptography Laboratory.

Protecting Private Data in the Cloud: A Path Oblivious RAM Protocol

Design and Implementation of the Ascend Secure Processor. Ling Ren, Christopher W. Fletcher, Albert Kwon, Marten van Dijk, Srinivas Devadas

Lectures 6+7: Zero-Leakage Solutions

VERIFIABLE SYMMETRIC SEARCHABLE ENCRYPTION

A HIGH-PERFORMANCE OBLIVIOUS RAM CONTROLLER ON THE CONVEY HC-2EX HETEROGENEOUS COMPUTING PLATFORM

Secure Remote Storage Using Oblivious RAM

Locality of Reference

CSC 5930/9010 Cloud S & P: Cloud Primitives

Privacy-Preserving Computation with Trusted Computing via Scramble-then-Compute

Learning to Play Well With Others

CSE 373 OCTOBER 23 RD MEMORY AND HARDWARE

L9: Storage Manager Physical Data Organization

Computer Memory. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1

Data Representation. Types of data: Numbers Text Audio Images & Graphics Video

CS 245 Midterm Exam Winter 2014

Advanced Database Systems

B+ Tree Review. CSE332: Data Abstractions Lecture 10: More B Trees; Hashing. Can do a little better with insert. Adoption for insert

Purity: building fast, highly-available enterprise flash storage from commodity components

CSE 373 OCTOBER 25 TH B-TREES

Next-Generation Parallel Query

Learning to Play Well With Others

Efficient Private Information Retrieval

Chapter 8: Memory-Management Strategies

CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES

Pharmacy college.. Assist.Prof. Dr. Abdullah A. Abdullah

CHAPTER 8: MEMORY MANAGEMENT. By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

CS3600 SYSTEMS AND NETWORKS

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

Chapter 8: Main Memory. Operating System Concepts 9 th Edition

from circuits to RAM programs in malicious-2pc

Operating Systems. File Systems. Thomas Ropars.

CS 111. Operating Systems Peter Reiher

Searchable Symmetric Encryption: Optimal Locality in Linear Space via Two-Dimensional Balanced Allocations

Chapter 8: Main Memory

Main Memory (Part II)

Virtual Memory: From Address Translation to Demand Paging

Chapter 8: Main Memory

Column Stores vs. Row Stores How Different Are They Really?

Developing MapReduce Programs

DIY Hosting for Online Privacy

Physical Level of Databases: B+-Trees

DIY Hosting for Online Privacy. Shoumik Palkar and Matei Zaharia Stanford University

Design space exploration and optimization of path oblivious RAM in secure processors

Design Space Exploration and Optimization of Path Oblivious RAM in Secure Processors

Chapter 8: Memory- Management Strategies. Operating System Concepts 9 th Edition

PS2 out today. Lab 2 out today. Lab 1 due today - how was it?

CS307 Operating Systems Main Memory

Memory Defenses. The Elevation from Obscurity to Headlines. Rajeev Balasubramonian School of Computing, University of Utah

Information Retrieval and Organisation

File System Internals. Jo, Heeseung

Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c

TWORAM: Efficient Oblivious RAM in Two Rounds with Applications to Searchable Encryption

Chapter 7: Main Memory. Operating System Concepts Essentials 8 th Edition

ECE 598 Advanced Operating Systems Lecture 22

Dynamic Indexability and Lower Bounds for Dynamic One-Dimensional Range Query Indexes

Memory management. Last modified: Adaptation of Silberschatz, Galvin, Gagne slides for the textbook Applied Operating Systems Concepts

Chapter 8: Memory- Management Strategies. Operating System Concepts 9 th Edition

Chapter 8: Memory- Management Strategies

Indexing and Search with

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13

Background: disk access vs. main memory access (1/2)

Computer Systems Laboratory Sungkyunkwan University

ECE 598 Advanced Operating Systems Lecture 10

File System Internals. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Chapter 14: File-System Implementation

DRS: Advanced Concepts, Best Practices and Future Directions

The Right Read Optimization is Actually Write Optimization. Leif Walsh

CS 245: Database System Principles

TFS: A Transparent File System for Contributory Storage

Disk Scheduling COMPSCI 386

M 2 R: Enabling Stronger Privacy in MapReduce Computa;on

Systems Architecture II

The Google File System

Efficient Oblivious Data Structures for Database Services on the Cloud

INDEX CONSTRUCTION 1

Operating Systems and Computer Networks. Memory Management. Dr.-Ing. Pascal A. Klein

RACS: Extended Version in Java Gary Zibrat gdz4

Background. Contiguous Memory Allocation

Motivation. Operating Systems. File Systems. Outline. Files: The User s Point of View. File System Concepts. Solution? Files!

Module 6: INPUT - OUTPUT (I/O)

This Unit: Main Memory. Building a Memory System. First Memory System Design. An Example Memory System

Virtual Memory. Chapter 8

CS162 - Operating Systems and Systems Programming. Address Translation => Paging"

SQL on Structurally-Encrypted Databases

CS307: Operating Systems

PROCESS VIRTUAL MEMORY. CS124 Operating Systems Winter , Lecture 18

CS5460: Operating Systems Lecture 14: Memory Management (Chapter 8)

Outline. Database Tuning. Join Strategies Running Example. Outline. Index Tuning. Nikolaus Augsten. Unit 6 WS 2014/2015

CS 525: Advanced Database Organization 04: Indexing

Chapter 12: Indexing and Hashing. Basic Concepts

External Sorting Implementing Relational Operators

Camdoop Exploiting In-network Aggregation for Big Data Applications Paolo Costa

CS370 Operating Systems

Order-Revealing Encryption:

CLOUD-SCALE FILE SYSTEMS

Memory Management. Dr. Yingwu Zhu

Memory Management Algorithms on Distributed Systems. Katie Becker and David Rodgers CS425 April 15, 2005

Transcription:

Making Searchable Encryption Scale to the Cloud Ian Miers and Payman Mohassel

End to end Encryption No encryption Transport encryption End2End Encryption Service provider Service provider Service provider user user user user user user

E2E Encrypted Messaging Need Crypto Want Crypto

E2E Encrypted Messaging Ban Crypto Need Crypto Want Crypto

Deploying E2E encrypted messaging WhatsApp No feature loss Many users probably don t know they are using it imessage Same features as SMS WebRTC video chat

Search For some communication mechanisms, people expect search Email is the canonical example, but not the only one. Slack Any email replacement

I m not particularly thrilled with building an apartment building which has the biggest bars on every window Jeff Bonforte (Yahoo VP mail and messagin)

E2E Encrypted Messaging Need Crypto Want Crypto

Searchable Encryption

An index for search The f 1 f 5 f 9 f 52 f 67 f 72 f 99 Man f 7 f 11 f 21 f 42 f 67 Bites f 2 Dog f 21 f 27 f 31

Index on client, store on server The f 1 f 5 f 9 f 52 f 67 f 72 f 99 Man f 7 f 1 1 f 2 1 f 4 2 f 6 7 Bites f 2 Dog f 2 1 f 2 7 f 3 1

Index on client, store on server f 21 f 27 f 31 Dog

Index on client, store on server The f 1 f 5 f 9 f 52 f 67 f 72 f 99 f 21 f 27 f 31 Man f 7 f 1 1 f 2 1 f 4 2 f 6 7 Bites f 2 Dog Dog f 2 1 f 2 7 f 3 1

An index for search The f 1 f 5 f 9 f 52 f 67 f 72 f 99 Man f 7 f 11 f 21 f 42 f 67 Bites f 2 Dog f 21 f 27 f 31

A Naïvely Encrypted Index H(k keyword) E(k,list of files) 8afa2 1c35f dc4cf 9f126

Index on client, store on server 8afa2 1c35f dc4cf H(k Dog ) 9f126

A Naïvely Encrypted Index H(k keyword) E(k,list of files) 8afa2 1c35f dc4cf 9f126 Leaks term frequency 8afa2 is the most frequent keyword The is the most frequent English word

An Inefficient Encrypted index H(k keyword KeyWord_ctr) 8afa2 E(k, f i ) 1c35f 41bb a5l9 5r6n d4c1 For a given keyword, each file containing it is stored in a separate random location This hides keyword frequency in a space efficient way Very inefficient to search: Requires one random read per result Results in a ~25-50x increase in I/O usage Yahoo! Mail search is already IO bound!!! Not viable for a server supporting multiple users who are not paying for it

Search at Cloud Scale Many small indexes < 1GB each > 1 Billion accounts Cannot store in memory Must use disk storage IO Bound Fragmented index causes massive increase in io for search A search for one keyword returning N documents takes N times as many reads.

Good news Email search queries are fairly simple Typically single keyword Conjunctive search nice, but not necessary Most searches are on meta data Searches on mail content are rare ~250 searches a second across all users ~300 million monthly active users But we must solve the IO issue.

An Inefficient Encrypted index H(k keyword KeyWord_ctr) 8afa2 E(k, f i ) 1c35f 41bb 9f126 dc4cf d4c1

IO Efficient search for static indexes

Chunked Encrypted Index H(k keyword chunk_ctr) E(k, chunk of files) 8afa2 1c35f 41bb 9f126 Assume we have all documents initially We break up the list into chunks Way more efficient to search Can scale to terabytes Cash et al (Crypto 13, NDSS 14) dc4cf d4c1

Problem: updates H(k keyword chunk_ctr) E(k, chunk of files) 8afa2 1c35f 41bb : lost DOG Dog is 9f126 Need to add to Dog entry. But that leaks what we updated 9f126 dc4cf d4c1

Problem: updates H(k keyword chunk_ctr) E(k, chunk of files) 8afa2 1c35f 41bb : lost DOG Dog is 9f126 Need to add to Dog entry. But that leaks what we updated 9f126 dc4cf d4c1

Problem: updates 8afa2 1c35f 41bb 9f126 dc4cf d4c1 fe52

Problem: updates 8afa2 1c35f 41bb 9f126 dc4cf d4c1 fe52

IO-DSSE: Scaling Dynamic Searchable Encryption to Millions of Indexes By Improving Locality

Obliviously Updateable Index Standard search index 8afa2 1c35f 41bb 9f126 dc4cf fe52 d4c1

Obliviously Updateable Index 8afa2 1c35f 41bb 9f126 dc4cf fe52 d4c1

Obliviously Updateable Index 8afa2 1c35f 41bb 9f126 dc4cf fe52 d4c1

Obliviously Updateable Index 8afa2 1c35f 41bb 9f126 dc4cf fe52 d4c1

Obliviously Updateable Index 8afa2 1c35f 41bb 9f126 dc4cf d4c1 fe52

Obliviously Updateable Index?

Chunked Encrypted Index 8afa2 1c35f 41bb 9f126 dc4cf d4c1 fe52

Buffer locally, put full chunks on server Zipfs f 1 law f 7 f 11 really f 2 Keywords have a power law distribution: common ones are really frequent, others are sparse We will end up with too many partial buckets on the client We can t upload partial buckets kills f 2 this f 1 f 5 f 9 f 52 f 67 f 72 idea f 2

We need an obliviously updatable index overflow Local Device Data center k 1 k 7 f 3 f 4 f 1 k 1 k 2 f 4 f 4 f 9 k 1 k 4 f 1, f 5, f 8, f 12 f 4, f 1, f 3, f 9 keyword qeuries k 3 f 1 f 8 k 3 f 1 f 8 f 9 k 1 f 2, f 3, f 8, f 11 Obliviously Updatable Index Full Block Index

Oblivious RAM ORAM hides locations of access to memory (both reads and writes) How to build ORAM 1. Encrypt memory 2. Shuffle memory locations on reads or writes to hide locations In Path ORAM, shuffling has logarithmic overhead.

OUI from Path ORAM RAM 1. Read(for search) 2. Shuffle 3. Read (for search) 4. Shuffle 5. Read/write for update 6. Shuffle

Client side stash Path ORAM

Client side stash Path ORAM

Client side stash Path ORAM

Client side stash Path ORAM

Client side stash Path ORAM

Client side stash Path ORAM

From ORAM to an OUI ORAM allows you to write to a location in memory without revealing the location Can add to a partial chunk without revealing we did so. Bandwidth costs get worse was ORAM gets larger Requires you to read and write Log(N)*B bytes for a read of B bytes from an ORAM of size N For 16GB of ORAM, server needs 32.06 GB of space and reading 4KB takes 350KB read + 350KB write. Storing full index in ORAM requires too much bandwidth

From ORAM to an OUI ORAM hides both reads and writes Search explicitly leaks repeated reads Same files are returned each time. Same search token/hash used. No need to hide reads using ORAM Updates may happen in batches

OUI from Oblivious RAM 1. Read(for search) 2. Shuffle 3. Read (for search) 4. Shuffle 5. Read/write for update 6. Shuffle

Partial ORAM? 1. Read(for search) 2. Read (for search) 3. Read/write for update 4. Shuffle

OUI 1. Read(for search) 2. Shuffle 3. Read (for search) 4. Shuffle 5. Read/write for update 6. Shuffle

OUI 1. Read(for search) 2. Read (for search) 3. Shuffle + Shuffle 4. Read/write for update 5. Shuffle

OUI 1. Read(for search) 2. Read (for search) 3. Shuffle + Shuffle 4. Read/write for update 5. Shuffle

OUI 1. Read(for search) 2. Read (for search) 3. Shuffle + Shuffle 4. Read/write for update 5. Shuffle

Reads

Why not just read directly?

Leaks updates

OUI from ORAM Searching triggers a read and write of Log(n)*B data To avoid Log(N)*B read +write for each search Just read address for chunk for given keyword Defer read and write until later (i.e. when the phone is plugged in and on Wi-Fi) Search is constant bandwidth and has nice locality All updates must happen after deferred IO is done We get some savings from batching the IO together Multiple searches on the same keyword are free

OUI from ORAM Read directly from tree for search BUT Must complete full path read and write prior to any updates Call these deferred reads

Batched reads and writes Deferred (full reads ) reads and updates are not random events They will happen in groups either When an email comes in we get many updates We might update the non local index only once a day (if system is not multi client) Batched reads and writes reduce the amount of data read and written For n full reads/ writes, The root is only updated once instead of n times Its children once instead n/2 times, etc

Deferred Reads

Deferred Reads

Deferred Reads

Deferred Reads + Batching

Batched update

Performance IO Savings (percentage) vs Simple encrypted index( including all previous works under purely dynamic insertion) Savings just for search (ignoring updates) Oblivious index from path ORAM

Conclusion Searchable encryption might be feasible for cloud based messaging with effort It pays to examine problems in context You can always get better performance by relaxing security assumptions Sometimes the relaxation is inherent to the setting and free

8afa2 1c35f 41bb 9f126 dc4cf d4c1 Updates Query local, ORAM, and index with efficient access Update : Buffer locally, overflow to ORAM, then commit full chunks to index Defer ORAM I/O from queries until update period Requirements 40 to 250mb of client storage to store a list of keywords Client has fast internet sometimes Ideally, client has large local buffer Local buffer keyword data Query on keyword Obliviously updateable Index (OUI)

Client side stash Path ORAM

Client side stash Path ORAM

Client side stash Path ORAM

Client side stash Path ORAM

Client side stash Path ORAM

Client side stash Path ORAM

From ORAM to an OUI ORAM allows you to write to a location in memory without revealing the location Can add to a partial chunk without revealing we did so. Bandwidth costs get worse was ORAM gets larger Requires you to read and write Log(N)*B bytes for a read of B bytes from an ORAM of size N For 16GB of ORAM, server needs 32.06 GB of space and reading 4KB takes 350KB read + 350KB write. Storing full index in ORAM requires too much bandwidth

From ORAM to an OUI ORAM hides both reads and writes Search explicitly leaks repeated reads Same files are returned each time. Same search token/hash used. No need to hide reads using ORAM Updates may happen in batches

Reads

Why not just read directly?

Leaks updates

OUI from ORAM Read directly from tree for search BUT Must complete full path read and write prior to any updates Call these deferred reads

Batched reads and writes Deferred (full reads ) reads and updates are not random events They will happen in groups either When an email comes in we get many updates We might update the non local index only once a day (if system is not multi client) Batched reads and writes reduce the amount of data read and written For n full reads/ writes, The root is only updated once instead of n times Its children once instead n/2 times, etc

Deferred Reads

Deferred Reads

Deferred Reads

Deferred Reads + Batching

Batched update