Blockchain, Throughput, and Big Data Trent McConaghy

Similar documents
BigchainDB: A Scalable Blockchain Database. Trent McConaghy

Security (and finale) Dan Ports, CSEP 552

A Public Database for the Planet

Distributed Consensus Protocols

Megastore: Providing Scalable, Highly Available Storage for Interactive Services & Spanner: Google s Globally- Distributed Database.

AGREEMENT PROTOCOLS. Paxos -a family of protocols for solving consensus

Recap. CSE 486/586 Distributed Systems Paxos. Paxos. Brief History. Brief History. Brief History C 1

Fundamentals Large-Scale Distributed System Design. (a.k.a. Distributed Systems 1)

BlockFin A Fork-Tolerant, Leaderless Consensus Protocol April

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

Practical Byzantine Fault Tolerance Consensus and A Simple Distributed Ledger Application Hao Xu Muyun Chen Xin Li

Failure models. Byzantine Fault Tolerance. What can go wrong? Paxos is fail-stop tolerant. BFT model. BFT replication 5/25/18

Distributed Consensus: Making Impossible Possible

Applications of Paxos Algorithm

Consistency. CS 475, Spring 2018 Concurrent & Distributed Systems

How do we build TiDB. a Distributed, Consistent, Scalable, SQL Database

A definition. Byzantine Generals Problem. Synchronous, Byzantine world

Blockchain for Enterprise: A Security & Privacy Perspective through Hyperledger/fabric

Data Consistency and Blockchain. Bei Chun Zhou (BlockChainZ)

Distributed Systems. 10. Consensus: Paxos. Paul Krzyzanowski. Rutgers University. Fall 2017

Transactions Between Distributed Ledgers

Alternatives to Blockchains. Sarah Meiklejohn (University College London)

Practical Byzantine Fault Tolerance. Miguel Castro and Barbara Liskov

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems

Consensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks.

Distributed Systems Intro and Course Overview

A Reliable Broadcast System

Paxos and Replication. Dan Ports, CSEP 552

Consensus and related problems

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

DATABASE SCALE WITHOUT LIMITS ON AWS

Bitcoin. CS6450: Distributed Systems Lecture 20 Ryan Stutsman

Alternative Consensus

Using Paxos For Distributed Agreement

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL

The power of Blockchain: Smart Contracts. Foteini Baldimtsi

Chapter 13. Digital Cash. Information Security/System Security p. 570/626

Byzantine Techniques

Integrity in Distributed Databases

BBc-1 : Beyond Blockchain One - An Architecture for Promise-Fixation Device in the Air -

NewSQL Databases. The reference Big Data stack

CS-580K/480K Advanced Topics in Cloud Computing. NoSQL Database

How to Scale Out MySQL on EC2 or RDS. Victoria Dudin, Director R&D, ScaleBase

Paxos provides a highly available, redundant log of events

Introduction to Cryptography in Blockchain Technology. December 23, 2018

Consensus, impossibility results and Paxos. Ken Birman

Bitcoin and Blockchain

Distributed Ledger Technology & Fintech Applications. Hart Montgomery, NFIC 2017

Consensus & Blockchain

CS5412: CONSENSUS AND THE FLP IMPOSSIBILITY RESULT

ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases

CISC 7610 Lecture 2b The beginnings of NoSQL

10. Replication. Motivation

BIG DATA AND CONSISTENCY. Amy Babay

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

Practice: Large Systems Part 2, Chapter 2

Cluster-Level Google How we use Colossus to improve storage efficiency

Low-Latency Multi-Datacenter Databases using Replicated Commit

Don t Give Up on Serializability Just Yet. Neha Narula

BigTable. Chubby. BigTable. Chubby. Why Chubby? How to do consensus as a service

Unlimited Scalability in the Cloud A Case Study of Migration to Amazon DynamoDB

A Global In-memory Data System for MySQL Daniel Austin, PayPal Technical Staff

Fault-Tolerance & Paxos

Blockchain. CS 240: Computing Systems and Concurrency Lecture 20. Marco Canini

Lecture 3. Introduction to Cryptocurrencies

There Is More Consensus in Egalitarian Parliaments

Byzantine Failures. Nikola Knezevic. knl

PARALLEL CONSENSUS PROTOCOL

Practical Byzantine Fault Tolerance. Castro and Liskov SOSP 99

Distributed Systems Final Exam

Distributed Consensus: Making Impossible Possible

Blockchain Beyond Bitcoin. Mark O Connell

Distributed Systems COMP 212. Lecture 19 Othon Michail

Building Consistent Transactions with Inconsistent Replication

Consensus. Chapter Two Friends. 2.3 Impossibility of Consensus. 2.2 Consensus 16 CHAPTER 2. CONSENSUS

Leslie Lamport. April 20, Leslie Lamport. Jenny Tyrväinen. Introduction. Education and Career. Most important works.

Replications and Consensus

CPS 512 midterm exam #1, 10/7/2016

Real Life Web Development. Joseph Paul Cohen

Outline. More Security Protocols CS 239 Security for System Software April 22, Needham-Schroeder Key Exchange

Report to Brewer s original presentation of his CAP Theorem at the Symposium on Principles of Distributed Computing (PODC) 2000

Proof of Stake Made Simple with Casper

Byzantine Consensus. Definition

Distributed Systems 11. Consensus. Paul Krzyzanowski

Introduction Aggregate data model Distribution Models Consistency Map-Reduce Types of NoSQL Databases

Blockchain, cryptography, and consensus

BYZANTINE CONSENSUS THROUGH BITCOIN S PROOF- OF-WORK

Algorithms for and against the Cloud

Introduction to Graph Databases

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

Outline More Security Protocols CS 239 Computer Security February 6, 2006

Comparison of Different Implementations of Paxos for Distributed Database Usage. Proposal By. Hanzi Li Shuang Su Xiaoyu Li

CS 655 Advanced Topics in Distributed Systems

Latest Trends in Database Technology NoSQL and Beyond

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

CSE 5306 Distributed Systems. Fault Tolerance

A Byzantine Fault-Tolerant Ordering Service for the Hyperledger Fabric Blockchain Platform

Agreement and Consensus. SWE 622, Spring 2017 Distributed Software Engineering

Workshop Report: ElaStraS - An Elastic Transactional Datastore in the Cloud

TWO-PHASE COMMIT ATTRIBUTION 5/11/2018. George Porter May 9 and 11, 2018

Transcription:

Blockchain, Throughput, and Big Data Trent McConaghy Bitcoin Startups Berlin Oct 28, 2014

Conclusion Outline Throughput numbers Big data Consensus algorithms ACID Blockchain Big data?

Throughput numbers Bitcoin typical 1 transactions per second Blockchain size is 25Gb. >10Gb / yr. Takes 1d to download 25 Gb 11 Gb Nov 13 Nov 14 [blockchain.info, retrieved Oct 29, 2014]

Conclusion Throughput numbers Bitcoin 1 tps (observed) Bitcoin 7 tps (theoretical max due to block size limit) VISA 2000 tps Twitter 5000 tps Twitter 150,000+ tps at peak [https://en.bitcoin.it/wiki/scalability] https://blog.twitter.com/2013/newtweets-per-second-record-and-how

Conclusion Throughput numbers What ifs on Bitcoin, all else equal: 2000 tps 10Gb*2000/7 = 1.42 Pb/yr = 3.9 Gb/day (grows faster than you can download!) 150,000 tps 214 Pb/yr You might say we only need unspent outputs. But there are many use cases where we want to see all past transactions (blockchain apps, transparency, auditing,.. almost anything beyond simple payment.)

Conclusion Throughput numbers What ifs on Bitcoin, all else equal: 2000 tps 10Gb*2000/7 = 1.42 Pb/yr = 3.9 Gb/day (grows faster than you can download!) 150,000 tps 214 Pb/yr Q: Why not just focus on unspent outputs? A: There are many use cases where we want to see all past transactions. Auditing, compliance, ownership history, other Blockchain apps -- maybe most things beyond a simple payment?)

How? Conclusion Scaling up blockchain? Is this the right question to ask? What does Big Data have to say?

Conclusion Block chain: what "A block chain is a transaction database shared by all nodes participating in a system based on the Bitcoin protocol. https://en.bitcoin.it/wiki/block_chain

From Big Data: Cassandra DB. Scale-up linearity! http://1.bp.blogspot.com/- ZFtW7MFMqZQ/TrG5ujuDGdI/AAAAAAAAAW w/heceemd50x4/s1600/scale.png

Cassandra DB: How it works http://stevenpoitras.com/the-nutanix-bible/ http://1.bp.blogspot.com/- Hsr6O8pwyzU/TrG5didDPjI/AAAAAAAAAWM/ A3vL3wMkQgw/s1600/global.png

Cassandra DB: Consensus Paxos Algorithm (more later)

Conclusion Block chain DB: Consensus Bitcoin uses the block chain algorithm to achieve distributed consensus on who owns what coins. https://en.bitcoin.it/wiki/alternative_chain

A walk down history lane: a Conclusion gauntlet was thrown down Can you implement a distributed database that can tolerate the failure of any number of its processes (possibly all of them) without losing consistency, and that will resume normal behavior when more than half the processes are again working properly? -Leslie Lamport to colleagues, 1980 http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html (referring to Paxos)

Gauntlet 2, Byzantine problem Before [1], it was generally assumed that a three-processor system could tolerate one faulty processor. This paper shows that "Byzantine" faults, in which a faulty processor sends inconsistent information to the other processors, can defeat any traditional three-processor algorithm. (The term Byzantine didn't appear until [46].) In general, 3n+1 processors are needed to tolerate n faults. However, if digital signatures are used, 2n+1 processors are enough. This paper introduced the problem of handling Byzantine faults. I think it also contains the first precise statement of the consensus problem. http://research.microsoft.com/enus/um/people/lamport/pubs/pubs.html -Lesley Lamport [1] L. Lamport et al, Reaching Agreement in the Presence of Faults, J. ACM 27(2), April 1980

Towards Practical ConclusionSolutions to Byzantine Generals Problem "A fault-tolerant file system called Echo was built at SRC in the late 80s. The builders claimed that it would maintain consistency despite any number of non-byzantine faults, and would make progress if any majority of the processors were working. As with most such systems, it was quite simple when nothing went wrong, but had a complicated algorithm for handling failures based on taking care of all the cases that the implementers could think of. I decided that what they were trying to do was impossible, and set out to prove it. http://research.microsoft.com/enus/um/people/lamport/pubs/pubs.html

The Practical Conclusion Solution to Byzantine Generals Problem "I decided that what they were trying to do was impossible, and set out to prove it. Instead, I discovered the Paxos algorithm... At the heart of the algorithm is a three-phase consensus protocol...to my knowledge, Paxos contains the first three-phase commit algorithm that is a real algorithm, with a clearly stated correctness condition and a proof of correctness.. http://research.microsoft.com/enus/um/people/lamport/pubs/pubs.html L. Lamport et al, "The Byzantine Generals Problem, ACM Transactions on Programming Languages and Systems 4 (3): 382 401, July 1982

Conclusion Extensions to Paxos Byzantine Paxos[8][10] adds an extra message (Verify) which acts to distribute knowledge and verify the actions of the other processors http://en.wikipedia.org/wiki/paxos_(computer_science)#byzantine_paxos [8] Lamport, Leslie (2005). "Fast Paxos". [10] Castro, Miguel (2001). "Practical Byzantine Fault Tolerance".

Conclusion Opinions on Paxos there is only one consensus protocol, and that s Paxos -Mike Burrows, inventor of Chubby service at Google all other approaches are just broken versions of Paxos. The Paxos protocol.. is famously subtle and a bit difficult to.. it s clear that a good consensus protocol is surprisingly hard to find. http://the-paper-trail.org/blog/consensusprotocols-two-phase-commit/

Conclusion Production Use of Paxos Google all products that use BigTable or Spanner (search, analytics, email,..) Clustrix - distributed SQL DB Apache Cassandra distributed NoSQL FoundationDB distributed NewSQL Neo4j HA graph DB (and most other modern distributed DBs!) Heroku / Salesforce, Microsoft, IBM, [wikipedia, more]

Explaining Conclusion Paxos: Two Phase Commit (2PC) (1) Do you? (2) I do! I do! Problem: inconsistent result if node failure after Do you? http://the-paper-trail.org/blog/consensus-protocolstwo-phase-commit/

Explaining Conclusion Paxos: Three Phase Commit (3PC) Break the I do part into two phases Prepare to commit. ( Don t listen to any more do-you s for now ) Commit. I do! Can still fail: one site is in prepared to commit when another is not. http://the-paper-trail.org/blog/consensusprotocols-three-phase-commit/

Conclusion Explaining Paxos: The Algorithm http://the-papertrail.org/blog/consensu s-protocols-paxos/

Explaining Paxos: The Algorithm http://the-papertrail.org/blog/consensusprotocols-paxos/

Conclusion Block chain: A design decision The block chain is broadcast to all nodes on the networking [sic] using a flood protocol [https://en.bitcoin.it/wiki/block_chain] (Very inefficient!)

My thoughts: Two approaches to scale up Big data-fy the blockchain Builds on man-decades of work Significant scalability hurdles? <or> Blockchain-ify big data Builds on man-centuries (millennia?) of work Scalability challenges already resolved Needs distributed control (which isn t easy either!)

Conclusion Questions I have Why doesn t the Bitcoin community talk more about Paxos? (Yet it does talk about Byzantine Generals) Why isn t the blockchain itself partially distributed? (in the sense that pieces of it are sharded throughout the network, rather than a full duplicate everywhere) Am I missing something? Is there something fundamentally wrong with blockchain-ifying big data?

Summary The blockchain is a DB, as are modern Big Data NoSQL and NewSQL DBs. They re all distributed. Distributing a DB by making a full copy on every node scales extremely poorly. Distributed DBs need a consensus algorithm. Posed as the Byzantine Generals problem (Lesley Lamport) Most modern Big Data DBs use the Paxos Algorithm. all other approaches are just broken versions of Paxos. Open Q s summary: Paxos <-> BTC relation? Can we blockchain-ify big data?