Data Consistency Now and Then

Similar documents
Transactions and ACID

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

FIT: A Distributed Database Performance Tradeoff. Faleiro and Abadi CS590-BDS Thamir Qadah

10. Replication. Motivation

Introduction to NoSQL

Performance and Forgiveness. June 23, 2008 Margo Seltzer Harvard University School of Engineering and Applied Sciences

The CAP theorem. The bad, the good and the ugly. Michael Pfeiffer Advanced Networking Technologies FG Telematik/Rechnernetze TU Ilmenau

Database Availability and Integrity in NoSQL. Fahri Firdausillah [M ]

Transaction Management & Concurrency Control. CS 377: Database Systems

CMPT 354: Database System I. Lecture 11. Transaction Management

CSE 530A. Non-Relational Databases. Washington University Fall 2013

CAP Theorem. March 26, Thanks to Arvind K., Dong W., and Mihir N. for slides.

Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

Banking System Upgrade - Frequently Asked Questions (FAQs)

Introduction Aggregate data model Distribution Models Consistency Map-Reduce Types of NoSQL Databases

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

SCALABLE CONSISTENCY AND TRANSACTION MODELS

NoSQL systems: sharding, replication and consistency. Riccardo Torlone Università Roma Tre

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Big Data Management and NoSQL Databases

Consistency in Distributed Storage Systems. Mihir Nanavati March 4 th, 2016

Database Systems. Announcement

DATABASE TRANSACTIONS. CS121: Relational Databases Fall 2017 Lecture 25

CS5412: TRANSACTIONS (I)

Overview. Introduction to Transaction Management ACID. Transactions

Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

Outline. Database Tuning. Undesirable Phenomena of Concurrent Transactions. Isolation Guarantees (SQL Standard) Concurrency Tuning.

Exam 2 Review. Fall 2011

CSC 261/461 Database Systems Lecture 20. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

OpenEdge & CouchDB. Integrating the OpenEdge ABL with CouchDB. Don Beattie Software Architect Quicken Loans Inc.

Introduction TRANSACTIONS & CONCURRENCY CONTROL. Transactions. Concurrency

Cloud Computing. DB Special Topics Lecture (10/5/2012) Kyle Hale Maciej Swiech

Outline. Database Management and Tuning. Isolation Guarantees (SQL Standard) Undesirable Phenomena of Concurrent Transactions. Concurrency Tuning

Database Architectures

Mutual consistency, what for? Replication. Data replication, consistency models & protocols. Difficulties. Execution Model.

Outline. Database Tuning. Undesirable Phenomena of Concurrent Transactions. Isolation Guarantees (SQL Standard) Concurrency Tuning.

Don t Give Up on Serializability Just Yet. Neha Narula

Topics. File Buffer Cache for Performance. What to Cache? COS 318: Operating Systems. File Performance and Reliability

Overview of Transaction Management

Parallel DBs. April 25, 2017

ebay s Architectural Principles

Weak Consistency and Disconnected Operation in git. Raymond Cheng

Eventual Consistency Today: Limitations, Extensions and Beyond

Data-Intensive Distributed Computing

Transactions. Kathleen Durant PhD Northeastern University CS3200 Lesson 9

Intro to Transaction Management

Modern Database Concepts

Chapter 19: Distributed Databases

Page 1. CS194-3/CS16x Introduction to Systems. Lecture 8. Database concurrency control, Serializability, conflict serializability, 2PL and strict 2PL

CS6450: Distributed Systems Lecture 11. Ryan Stutsman

IBM DB2 Log Analysis Tool Version 1.3

CS Amazon Dynamo

CAP Theorem, BASE & DynamoDB

DB2 Data Sharing Then and Now

Consistency and Scalability

ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases


Transactions. 1. Transactions. Goals for this lecture. Today s Lecture

CS5412: TRANSACTIONS (I)

Architekturen für die Cloud

CHAPTER 3 RECOVERY & CONCURRENCY ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

March 14th-18th, 2016 ENGINEERING DECISIONS BEHIND WORLD OF TANKS GAME CLUSTER #GDC2016. Maksim Baryshnikov, Wargaming

Example File Systems Using Replication CS 188 Distributed Systems February 10, 2015

Distributed KIDS Labs 1

Apache Cassandra - A Decentralized Structured Storage System

Best Practices for Scaling Websites Lessons from ebay

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

Introduction to NoSQL Databases

CS122 Lecture 15 Winter Term,

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

4/9/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data Spring 2018 Colorado State University. FAQs. Architecture of GFS

CS6450: Distributed Systems Lecture 15. Ryan Stutsman

High Availability through Warm-Standby Support in Sybase Replication Server A Whitepaper from Sybase, Inc.

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

JOURNALING FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 26

Recoverability. Kathleen Durant PhD CS3200

Concurrency Control & Recovery

Eventual Consistency Today: Limitations, Extensions and Beyond

Distributed Data Store

Webinar Series TMIP VISION

Lectures 8 & 9. Lectures 7 & 8: Transactions

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

Distributed Transaction Management 2003

Transactions. Lecture 8. Transactions. ACID Properties. Transaction Concept. Example of Fund Transfer. Example of Fund Transfer (Cont.

The Google File System

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein. Copyright 2003 Philip A. Bernstein. Outline

Database Management System

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL

Important Lessons. Today's Lecture. Two Views of Distributed Systems

Strong Consistency & CAP Theorem

Data Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

Principles of Software Construction: Objects, Design, and Concurrency

CPS352 Lecture - The Transaction Concept

From the event loop to the distributed system. Martyn 3rd November, 2011

IBM C Foundations of IBM Big Data & Analytics Architecture V1.

Lecture X: Transactions

Transaction Management Overview. Transactions. Concurrency in a DBMS. Chapter 16

CSC 261/461 Database Systems Lecture 24

Transactions and Concurrency Control. Dr. Philip Cannata

Plan. Department of Informatics. Advanced Software Engineering Prof. J. Pasquier-Rocha Cours de Master en Informatique - SH 2003/04

Transcription:

Data Consistency Now and Then Todd Schmitter JPMorgan Chase June 27, 2017 Room #208

Data consistency in real life Social media Facebook post: January 22, 2017, at a political rally Comments displayed are from unrelated posts from: December 2012 May 2014 March 2015 How much would you care if this happened to you? 2

Data consistency in real life Address change If you updated your address on your bank s web site how frustrated would you be if the same institution continued to mail your credit card statement to your old address? 3

Data consistency in real life ATM withdrawal If you withdraw cash or deposit checks at an ATM, do you expect to see your new balance reflected immediately on your mobile device? 4

Changing expectations Real-time inquiry; some real-time updates; limited user interaction Nightly updates; paper reports, statements etc. 1960s and 70s: Batch systems operating on a single data set Real-time updates; constant mobile, web user interaction 1980s and 90s: Concurrent batch and on-line at a single site; some site replication for business continuity 2000 and beyond: Distributed systems operating on distributed databases 5

Brewer s theorem (1/2) Also known as the CAP theorem, which states that it is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees: Consistency Availability Partition tolerance Every read receives the most recent write or an error Every request receives a (non-error) response without guarantee that it contains the most recent write The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes Source: https://en.wikipedia.org/wiki/cap_theorem 6

Brewer s theorem (2/2) A system can guarantee both Consistency and Availability as long as no Partition tolerance is required. Since non-distributed systems have no partitions by definition, they can be CA systems. As availability expectations have increased, a shift toward AP systems has occurred. AP systems cannot guarantee consistency (at least not immediately). This is appropriate when availability is more important for the business requirement. If immediate consistency is more important than availability, then a CP system is appropriate. 7

Limits of consistency and concurrency What factors drive us toward eventual consistency? Year Global internet traffic 1992 100 GB per day 1997 100 GB per hour 2002 100 GB per second 2007 2,000 GB per second 2016 26,600 GB per second 2021 105,800 GB per second The Cisco VNI forecast: historical internet context The volume of updates pushes the limitations of traditional RDBMS lock management in a shared-everything architecture Various isolation levels were introduced to improve concurrency but, to some extent, they were really ways to start compromising on consistency (for example, uncommitted read) 8

You can t always get what you want Real business requirements drive us toward compromising on consistency. Although there are new techniques for managing it, the basic concept is not new: Paying a credit card from a checking account Withdrawing cash from an ATM when the host deposit system is unavailable What happens when consistency cannot be achieved in a single transaction? 9

The atomic transaction in slow motion 1 10 20 30 Time in milliseconds T1 Application 1 Insert to transaction history table T2 RDBMS 1 Write log record of change to history table T3 Application 1 Update balance on account table T4 RDBMS 1 Write log record of change to account table T5 RDBMS 1 Write log commit record for UOW completion History table Account table Database log 10

DB2 on z/os data sharing Achieving high concurrency and consistency Availability: mul-ple database nodes can handle simultaneous requests for the same row on separate OS images. Performance: updates only need to be persisted on the log. Consistency: the coupling facility manages changes to the same row across nodes. Par--on tolerance: None the data is s-ll a single point of failure. 11

Eventual consistency in slow motion 500 T2 Application 1 Publish transaction history event to event log Time in milliseconds 1000 1500 T1 Application 1 Insert to transaction history table (and write local log record to whatever database is being used) T3 Application 2 Read event from history event log through subscription Auditor: Read event from history event log T4 Application 2 Update balance on account table in response to history event (and write to local database log) T5 Application 2 Publish account event to event log T6 Auditor Read event from account event log, look for matching history event and take appropriate action History table Account table History event log Account event log 12

The responsibilities don t go away they just move When we migrate away from a traditional RDBMS (like DB2), we don t eliminate the need to account for the things it was doing. We just move the responsibility for handling those things into the application. Those responsibilities include: Understanding the interested parties in a given change (that is, event) Ensuring that all interested parties arrive (eventually) at a consistent state Undoing changes to resolve inconsistencies resulting from failures Understanding when the sequence of changes is important Providing a means of recovering starting from a prior consistent state when the current state becomes corrupted 13

Use cases revisited For each of the following, where would you place it on the CAP triangle and why? Social media Address change ATM withdrawal Principles: Consider what is most important to the customer (or user) The benefit achieved should warrant the cost of the chosen solution note that added complexity is effectively an increase in cost Consider the breadth of solutions available, not just the most recent products or tools on the scene 14

When eventual consistency isn t enough The ATM cash withdrawal use case Traditionally, ATMs need both consistency and availability, so they will sacrifice partition tolerance under normal conditions If partition tolerance becomes necessary due to a network failure, availability is chosen over consistency However, in the end (eventually), consistency is important to the customer! Therefore, reconciliation techniques are built in to resolve any inconsistencies that might have resulted from providing availability during partition tolerance (network outage) 15

Optimizing transactional boundaries Stepping back from the various use cases, we can see that almost any business interaction can include both atomic and eventually consistent components Completely atomic transactions tend to be simpler to manage and, so, are preferable, if the business objective can be met If the use case demands breaking the transaction apart, additional complexity is required Each use case should seek the optimal balance between atomic transactions and eventual consistency in light of the business requirement 16

Summary Implications of eventual consistency and event-driven architecture on system design Data relationships must be understood and managed, regardless of design approach Referential integrity must be maintained through the design and execution of the application, not the database The approach for conflict detection and resolution must be understood and implemented as part of the application design Subscribers to events must understand their responsibility for handling failure and reconciliation Transactional boundaries should be optimized to fit the business need 17

Thank you! Todd Schmitter Data Consistency Now and Then Track: Architecture 10:00am Room #208 Social /in/toddschmitter Email todd.schmitter@gmail.com 18