Efficient Data Structures for Tamper-Evident Logging

Similar documents
Efficient Tamper-Evident Data Structures for Untrusted Servers

Abstract. 1 Introduction. Efficient Data Structures for Tamper-Evident Logging. Department of Computer Science, Rice University

Staggeringly Large Filesystems

Blockchain. CS 240: Computing Systems and Concurrency Lecture 20. Marco Canini

Problem: Equivocation!

Bitcoin, Security for Cloud & Big Data

Failure models. Byzantine Fault Tolerance. What can go wrong? Paxos is fail-stop tolerant. BFT model. BFT replication 5/25/18

Federated Array of Bricks Y Saito et al HP Labs. CS 6464 Presented by Avinash Kulkarni

Abstract Authenticated dictionaries are a widely discussed paradigm to enable verifiable integrity for data storage on untrusted servers, such as

Engineering Goals. Scalability Availability. Transactional behavior Security EAI... CS530 S05

Certificate reputation. Dorottya Papp

Bitcoin. CS6450: Distributed Systems Lecture 20 Ryan Stutsman

Index Construction. Dictionary, postings, scalable indexing, dynamic indexing. Web Search

CSC 5930/9010 Modern Cryptography: Cryptographic Hashing

An Authentication System for Data Archives in Named Data Networking

Advantages of P2P systems. P2P Caching and Archiving. Squirrel. Papers to Discuss. Why bother? Implementation

Securing Distributed Computation via Trusted Quorums. Yan Michalevsky, Valeria Nikolaenko, Dan Boneh

CONIKS: Bringing Key Transparency to End Users

Computer Security. 14. Blockchain & Bitcoin. Paul Krzyzanowski. Rutgers University. Spring 2019

Cryptography and Cryptocurrencies. Intro to Cryptography and Cryptocurrencies

HDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017

Crypto tricks: Proof of work, Hash chaining

The objectives of this project include the following. Note that students are expected to choose a subset of goals to fulfill:

GFS: The Google File System

VERIFIABLE SYMMETRIC SEARCHABLE ENCRYPTION

Bitcoin and Blockchain

Understanding Cryptography A Textbook for Students and Practitioners by Christof Paar and Jan Pelzl. Chapter 6 Introduction to Public-Key Cryptography

Key-value store with eventual consistency without trusting individual nodes

Radix - Tempo. Dan Hughes

Radix - Tempo. Dan Hughes Abstract

Database Architectures

Scalable Bias-Resistant Distributed Randomness

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

EVT/WOTE 09 AUGUST 10, Ersin Öksüzoğlu Dan S. Wallach

Software Protection via Obfuscation

Efficiency. Efficiency: Indexing. Indexing. Efficiency Techniques. Inverted Index. Inverted Index (COSC 488)

Ensimag - 4MMSR Network Security Student Seminar. Bitcoin: A peer-to-peer Electronic Cash System Satoshi Nakamoto

Distributed Filesystem

Logging System for Longlifetime

Analysis of a Redactable Signature Scheme on Data with Dependencies

Chapter 12: Query Processing. Chapter 12: Query Processing

Column Stores vs. Row Stores How Different Are They Really?

Confirmed VPN Privacy Audit and Open Watch Analysis Summary Report and Documentation

Cian Kinsella CEO, Digiprove

Blockchain for Enterprise: A Security & Privacy Perspective through Hyperledger/fabric

Column-Stores vs. Row-Stores. How Different are they Really? Arul Bharathi

New Oracle NoSQL Database APIs that Speed Insertion and Retrieval

Foster B-Trees. Lucas Lersch. M. Sc. Caetano Sauer Advisor

Modern key distribution with ClaimChains

Lecture 18 Message Integrity. Stephen Checkoway University of Illinois at Chicago CS 487 Fall 2017 Slides from Miller & Bailey s ECE 422

Computer Security. 08r. Pre-exam 2 Last-minute Review Cryptography. Paul Krzyzanowski. Rutgers University. Spring 2018

Lecture Summary CSC 263H. August 5, 2016

Time-Series Data in MongoDB on a Budget. Peter Schwaller Senior Director Server Engineering, Percona Santa Clara, California April 23th 25th, 2018

CONIKS BRINGING KEY TRANSPARENCY TO END USERS. Marcela Melara. Aaron Blankstein, Joseph Bonneau*, Edward W. Felten, Michael J.

Efficient and Secure Source Authentication for Multicast

Introduction to Bitcoin I

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

Overview. SSL Cryptography Overview CHAPTER 1

Security (and finale) Dan Ports, CSEP 552

How Formal Analysis and Verification Add Security to Blockchain-based Systems

PYTHIA SERVICE BY VIRGIL SECURITY WHITE PAPER

3-2. Index construction. Most slides were adapted from Stanford CS 276 course and University of Munich IR course.

Data Blocks: Hybrid OLTP and OLAP on compressed storage

Using Cryptography CMSC 414. October 16, 2017

Solutions for Netezza Performance Issues

What s New in MySQL 5.7 Geir Høydalsvik, Sr. Director, MySQL Engineering. Copyright 2015, Oracle and/or its affiliates. All rights reserved.

CS5112: Algorithms and Data Structures for Applications

GFS: The Google File System. Dr. Yingwu Zhu

Auditing TPM Commands

Chapter 12: Query Processing

Lecture 44 Blockchain Security I (Overview)

Byzantine Techniques

Information Retrieval

BlockFin A Fork-Tolerant, Leaderless Consensus Protocol April

The Google File System

CSE 124: Networked Services Lecture-17

Blockchains & Cryptocurrencies

CSE 3461/5461: Introduction to Computer Networking and Internet Technologies. Network Security. Presentation L

Princess Nora Bint Abdulrahman University College of computer and information sciences Networks department Networks Security (NET 536)

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Introduction to Cryptography ECE 597XX/697XX

Advanced Database Systems

Benchmarking results of SMIP project software components

Samsara: Honor Among Thieves in Peer-to-Peer Storage

Issues. Separation of. Distributed system security. Security services. Security policies. Security mechanism

Session 3: Lawful Interception

Who s Protecting Your Keys? August 2018

Practical Byzantine Fault Tolerance. Castro and Liskov SOSP 99

Data Integrity. Modified by: Dr. Ramzi Saifan

Learning Data Systems Components

Blockchain-backed analytics: Adding blockchain-based quality gates to data science projects

Database System Concepts

HyPer-sonic Combined Transaction AND Query Processing

Distributed Data Store

Database Architectures

Near Memory Key/Value Lookup Acceleration MemSys 2017

Secure Multiparty Computation

Chapter 13: Query Processing

Authenticated Storage Using Small Trusted Hardware Hsin-Jung Yang, Victor Costan, Nickolai Zeldovich, and Srini Devadas

From One to Many: Synced Hash-Based Signatures

Transcription:

Efficient Data Structures for Tamper-Evident Logging Scott A. Crosby Dan S. Wallach Rice University

Everyone has logs

Tamper evident solutions Current commercial solutions Write only hardware appliances Security depends on correct operation Would like cryptographic techniques Logger proves correct behavior Existing approaches too slow

Our solution History tree Logarithmic for all operations Benchmarks at >1,750 events/sec Benchmarks at >8,000 audits/sec In addition Propose new threat model Demonstrate the importance of auditing

Threat model Forward integrity Events prior to Byzantine failure are tamper-evident Don t know when logger becomes evil Clients are trusted Strong insider attacks Malicious administrator Evil logger Clients may be mostly evil Only trusted during insertion protocol

Limitations and Assumptions Limitations Detect misbehaviour, not prevent it Cannot prevent junk from being logged Assumptions Privacy is outside our scope Data may encrypted Crypto is secure

System design Logger Stores events Never trusted Clients Little storage Create events to be logged Trusted only at time of event creation Sends commitments to auditors Auditors Verify correct operation Little storage Trusted, at least one is honest Client Client Client Auditor Auditor Logger

This talk Discuss the necessity of auditing Describe the history tree Evaluation Scaling the log

Tamper evident log Events come in Commitments go out Each commits to the entire past C n-3 C n-2 C n-1 Logger X n-3 X n-2 X n-1

Hash chain log Existing approach [Kelsey,Schneier] C n =H(C n-1 X n ) Logger signs C n C n-3 X n-5 X n-4 X n-3

Hash chain log Existing approach [Kelsey,Schneier] C n =H(C n-1 X n ) Logger signs C n C n-2 X n-5 X n-4 X n-3 X n-2

Hash chain log Existing approach [Kelsey,Schneier] C n =H(C n-1 X n ) Logger signs C n C n-1 X n-5 X n-4 X n-3 X n-2 X n-1

Problem We don t trust the logger! X n-4 X n-3 X n-2 X n-1 X n C n C n Logger returns a stream of commitments Each corresponds to a log C n-2 C n-1

Problem We don t trust the logger! X n-4 X n-3 X n-2 X n-1 X n C n Does C n really contain the just inserted X n? Do and really commit the same historical events? C n-2 C n-1 Is the event at index i in log C n really? X i

Problem We don t trust the logger! Logger signs stream of log heads Each corresponds to some log Does C n-3 really contain the just inserted X n-3? Do and really commit the same historical events? C n-2 C n-1 Is the event at index i in log C n really? X i

Solution: Audit the logger Only way to detect tampering Check the returned commitments apple For consistency For correct event lookup C n-2 C n-1 X n-3 applec n-3 Previously Auditing = looking historical events Assumed to infrequent Performance was ignored

Solution Auditors check the returned commitments For consistency For correct event lookup apple C n-2 C n-1 X n-3 applec n-3 Previously Auditing = looking historical events Assumed to infrequent Performance was ignored

Auditing is a frequent operation If the logger knows this commitment will not be audited for consistency with a later commitment. C n-1 X n-3 X n-2 X n-1 C n-3 X n-6 X n-5 X n-4 X n-3

Auditing is a frequent operation Successfully tampered with a tamper evident log Auditing required in forward integrity threat model C n-1 X n-3 X n-2 X n-1 C n-3 X n-6 X n-5 X n-4 X n-3

Auditing is a frequent operation Every commitment must have a non-zero probability of being audited C n-1 X n-3 X n-2 X n-1 C n-3 X n-6 X n-5 X n-4 X n-3

Forking the log Rolls back the log and adds on different events Attack requires two commitments on different forks disagree on the contents of one event. If system has historical integrity, audits must fail or be skipped C n-1 X n-5 X n-4 X n-3 X n-2 X n-1 C n-3 X n-6 X n-5 X n-4 X n-3

New paradigm Auditing cannot be avoided Audits should occur On every event insertion Between commitments returned by logger How to make inserts and audits cheap CPU Communications complexity Storage

Two kinds of audits Membership auditing Verify proper insertion Lookup historical events X i apple C n Incremental auditing C i apple C n Prove consistency between two commitments

Membership auditing a hash chain Is X apple C n-3? n-5

Membership auditing a hash chain Is X n-5 apple C n-3? X n-6 X n-5 X n-4 X n-3 P C n-3 X n-5 X n-4 X n-3

Membership auditing a hash chain Is X n-5 apple C n-3? X n-6 X n-5 X n-4 X n-3 C n-3 X n-5 X n-4 X n-3

Incremental auditing a hash chain Are C n-5 apple C n-1?

Incremental auditing a hash chain P X n-6 X n-5 X n-4 X n-3 X n-2 X n-1 C n-1 X n-6 X n-5 X n-4 X n-3 X n-2 X n-1 X n-6 X n-5 C n-5

Incremental auditing a hash chain C n-5 P X n-6 X n-5 X n-4 X n-3 X n-2 X n-1 C n-1 X n-6 X n-5 X n-4 X n-3 X n-2 X n-1 X n-6 X n-5 C n-5

Incremental auditing a hash chain C n-1 P X n-6 X n-5 X n-4 X n-3 X n-2 X n-1 C n-1 X n-6 X n-5 X n-4 X n-3 X n-2 X n-1 X n-6 X n-5 C n-5

Incremental auditing a hash chain C n-5 C n-1 P X n-6 X n-5 X n-4 X n-3 X n-2 X n-1 C n-1 X n-6 X n-5 X n-4 X n-3 X n-2 X n-1 X n-6 X n-5 C n-5

Existing tamper evident log designs Hash chain Auditing is linear time Historical lookups Very inefficient Skiplist history [Maniatis,Baker] Auditing is still linear time O(log n) historical lookups

Our solution History tree O(log n) instead of O(n) for all operations Variety of useful features Write-once append-only storage format Predicate queries + safe deletion May probabilistically detect tampering Auditing random subset of events Not beneficial for skip-lists or hash chains

History Tree Merkle binary tree Events stored on leaves Logarithmic path length Random access Permits reconstruction of past version and past commitments

History Tree C 2 X 1 X 2

History Tree C 3 X 1 X 2 X 3

History Tree C 4 X 1 X 2 X 3 X 4

History Tree C 5 X 1 X 2 X 3 X 4 X 5

History Tree C 6 X 1 X 2 X 3 X 4 X 5 X 6

History Tree C 7 X 1 X 2 X 3 X 4 X 5 X 6 X 7

History Tree X 1 X 2 X 3 X 4 X 5 X 6 X 7

Incremental auditing

C 3 Auditor X 1 X 2 X 3

C 3 C 4 Auditor X 1 X 2 X 3 X 4

C 3 C 5 Auditor X 1 X 2 X 3 X 4 X 5

C 3 C 6 Auditor X 1 X 2 X 3 X 4 X 5 X 6

C 3 C 7 Auditor X 1 X 2 X 3 X 4 X 5 X 6 X 7

Incremental proof C 3 apple C 7 C 3 Auditor C 7 X 1 X 2 X 3 X 4 X 5 X 6 X 7

Incremental proof C 3 apple C 7 C 3 Auditor C 7 X 1 X 2 X 3 X 4 X 5 X 6 X 7 P is consistent with P is consistent with C 7 C 3 Therefore C 7 and C 3 are consistent.

Incremental proof C 3 apple C 7 C 3 Auditor C 7 X 1 X 2 X 3 X 4 X 5 X 6 X 7 P is consistent with P is consistent with C 7 C 3 Therefore C 7 and C 3 are consistent.

Incremental proof C 3 apple C 7 C 3 Auditor C 7 X 1 X 2 X 3 X 4 X 5 X 6 X 7 P is consistent with P is consistent with C 7 C 3 Therefore C 7 and C 3 are consistent.

Incremental proof C 3 apple C 7 C 3 Auditor C 7 X 1 X 2 X 3 X 4 X 5 X 6 X 7 P is consistent with P is consistent with C 7 C 3 Therefore C 7 and C 3 are consistent.

Pruned subtrees C 3 Auditor C 7 X 1 X 2 X 3 X 4 X 5 X 6 X 7 Although not sent to auditor Fixed by hashes above them C 3, C 7 fix the same (unknown) events

Membership proof that X apple C 3 7 C 7 Verify that Read out event X 1 X 2 X 3 X 4 X 5 X 6 X 7 C 7 has the same contents as P X 3

Merkle aggregation

Merkle aggregation Annotate events with attributes $1 $8 $3 $2 $5 $2 $2

Aggregate them up the tree Max() $8 $8 $5 $8 $3 $5 $4 $1 $8 $3 $2 $5 $2 $4 Included in hashes and checked during audits

Querying the tree Max() $8 $8 $5 $8 $3 $5 $4 $1 $8 $3 $2 $5 $2 $4 Find all transactions over $6

Safe deletion Max() $8 $8 $5 $8 $3 $5 $4 $1 $8 $3 $2 $5 $2 $4 Authorized to delete all transactions under $4

Merkle aggregation is flexible Many ways to map events to attributes Arbitrary computable function Many attributes Timestamps, dollar values, flags, tags Many aggregation strategies +, *, min(), max(), ranges, and/or, Bloom filters

Generic aggregation (apple,apple,apple) apple : Type of attributes on each node in history apple : Aggregation function apple : Maps an event to its attributes For any predicate P, as long as: P(x) OR P(y) IMPLIES P(xappley) Then: Can query for events matching P Can safe-delete events not matching P

Evaluating the history tree Big-O performance Syslog implementation

Big-O performance C j apple C i X i applec j Insert History tree O(log n) O(log n) O(log n) Hash chain O(j-i) O(j-i) O(1) Skip-list history [Maniatis,Baker] O(j-i) or O(n) O(log n) or O(n) O(1)

Skiplist history [Maniatis,Baker] Hash chain with extra links Extra links cannot be trusted without auditing Checking them Best case: only events since last audit Worst case: examining the whole history If extra links are valid Using them for historical lookups O(log n) time and space

Syslog implementation We ran 80-bit security level 1024 bit DSA signatures 160 bit SHA-1 Hash We recommend 112-bit security level 224 bit ECDSA signatures 66% faster SHA-224 (Truncated SHA-256) 33% slower [NIST SP800-57 Part 1, Recommendations for Key Magament Part 1: General (Revised 2007)]

Syslog implementation Syslog Trace from Rice CS departmental servers 4M events, 11 hosts over 4 days, 5 attributes per event Repeated 20 times to create 80M event trace

Syslog implementation Implementation Hybrid C++ and Python Single threaded MMAP-based append-only write-once storage for log 1024-bit DSA signatures and 160-bit SHA-1 hashes Machine Dual-core 2007 desktop machine 4gb RAM

Performance Insert performance: 1,750 events/sec 2.4% : Parse 2.6% : Insert 11.8% : Get commitment 83.3% : Sign commitment Auditing performance With locality (last 5M events) 10,000-18,000 incremental proofs/sec 8,600 membership proofs/sec Without locality 30 membership proofs/sec < 4,000 byte self-contained proof size Compression reduces performance and proof size by 50%

Improving performance Increasing audit throughput above 8,000 audits/sec Increasing insert throughput above 1,750 inserts/sec

Increasing audit throughput Audits require read-only access to the log Trivially offloaded to additional cores For infinite scalability May replicate the log server Master assigns event indexes Slaves build history tree locally

Increasing insert throughput Public key signatures are slow 83% of runtime Three easy optimization Sign only some commitments Use faster signatures Offload to other hosts Increase throughput to 10k events/sec

More concurrency with replication Processing pipeline: Inserting into history tree O(1). Serialization point Fundamental limit Must be done on each replica 38,000 events/sec using only one core Commitment or proofs generation O(log n). Signing commitments O(1), but expensive. Concurrently on other hosts

Storing on secondary storage 7 3 6 10 1 2 4 5 8 X 1 X 2 X 3 X 4 X 5 X 7 9 11 Nodes are frozen (no longer ever change) In post-order traversal Static order Map into an array X 6

Partial proofs C 4 C 7 X 1 X 2 X 3 X 4 X 5 X 6 X 7 Can re-use node hashes from prior audits (eg, incremental proof from C 3 to C 4 )

Conclusion New paradigm Importance of frequent auditing History tree Efficient auditing Efficient predicate queries and safe deletion Scalable Proofs of tamper-evidence will be in my PhD Thesis

Questions?

Historical integrity C n-4 X n-5 X n-4

Historical integrity C n-4 X n-5 X n-4

Historical integrity C n-1 X n-5 X n-4 X n-3 X n-2 X n-1 C n-4 X n-5 X n-4

Historical integrity C n-1 X n-5 X n-4 X n-3 X n-2 X n-1 C n-4 X n-5 X n-4

Defining historically integrity A logging system is tamper-evident when: If there is a verified incremental proof between commitments C j and C k (j<k), then for all i<j and all verifiable membership proofs that event i in log C j is X i and event i in log C k is X i, we must have X i =X i. C n-1 X n-5 X n-4 X n-3 X n-2 X n-1 C n-3 X n-5 X n-4 X n-3

Safe deletion R X 1 X 2 X 3 X 4 X 5 X 6 X 7 Unimportant events may be deleted When auditor requests deleted event Logger supplies proof that ancestor was not important