Data Consistency Now and Then Todd Schmitter JPMorgan Chase June 27, 2017 Room #208
Data consistency in real life Social media Facebook post: January 22, 2017, at a political rally Comments displayed are from unrelated posts from: December 2012 May 2014 March 2015 How much would you care if this happened to you? 2
Data consistency in real life Address change If you updated your address on your bank s web site how frustrated would you be if the same institution continued to mail your credit card statement to your old address? 3
Data consistency in real life ATM withdrawal If you withdraw cash or deposit checks at an ATM, do you expect to see your new balance reflected immediately on your mobile device? 4
Changing expectations Real-time inquiry; some real-time updates; limited user interaction Nightly updates; paper reports, statements etc. 1960s and 70s: Batch systems operating on a single data set Real-time updates; constant mobile, web user interaction 1980s and 90s: Concurrent batch and on-line at a single site; some site replication for business continuity 2000 and beyond: Distributed systems operating on distributed databases 5
Brewer s theorem (1/2) Also known as the CAP theorem, which states that it is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees: Consistency Availability Partition tolerance Every read receives the most recent write or an error Every request receives a (non-error) response without guarantee that it contains the most recent write The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes Source: https://en.wikipedia.org/wiki/cap_theorem 6
Brewer s theorem (2/2) A system can guarantee both Consistency and Availability as long as no Partition tolerance is required. Since non-distributed systems have no partitions by definition, they can be CA systems. As availability expectations have increased, a shift toward AP systems has occurred. AP systems cannot guarantee consistency (at least not immediately). This is appropriate when availability is more important for the business requirement. If immediate consistency is more important than availability, then a CP system is appropriate. 7
Limits of consistency and concurrency What factors drive us toward eventual consistency? Year Global internet traffic 1992 100 GB per day 1997 100 GB per hour 2002 100 GB per second 2007 2,000 GB per second 2016 26,600 GB per second 2021 105,800 GB per second The Cisco VNI forecast: historical internet context The volume of updates pushes the limitations of traditional RDBMS lock management in a shared-everything architecture Various isolation levels were introduced to improve concurrency but, to some extent, they were really ways to start compromising on consistency (for example, uncommitted read) 8
You can t always get what you want Real business requirements drive us toward compromising on consistency. Although there are new techniques for managing it, the basic concept is not new: Paying a credit card from a checking account Withdrawing cash from an ATM when the host deposit system is unavailable What happens when consistency cannot be achieved in a single transaction? 9
The atomic transaction in slow motion 1 10 20 30 Time in milliseconds T1 Application 1 Insert to transaction history table T2 RDBMS 1 Write log record of change to history table T3 Application 1 Update balance on account table T4 RDBMS 1 Write log record of change to account table T5 RDBMS 1 Write log commit record for UOW completion History table Account table Database log 10
DB2 on z/os data sharing Achieving high concurrency and consistency Availability: mul-ple database nodes can handle simultaneous requests for the same row on separate OS images. Performance: updates only need to be persisted on the log. Consistency: the coupling facility manages changes to the same row across nodes. Par--on tolerance: None the data is s-ll a single point of failure. 11
Eventual consistency in slow motion 500 T2 Application 1 Publish transaction history event to event log Time in milliseconds 1000 1500 T1 Application 1 Insert to transaction history table (and write local log record to whatever database is being used) T3 Application 2 Read event from history event log through subscription Auditor: Read event from history event log T4 Application 2 Update balance on account table in response to history event (and write to local database log) T5 Application 2 Publish account event to event log T6 Auditor Read event from account event log, look for matching history event and take appropriate action History table Account table History event log Account event log 12
The responsibilities don t go away they just move When we migrate away from a traditional RDBMS (like DB2), we don t eliminate the need to account for the things it was doing. We just move the responsibility for handling those things into the application. Those responsibilities include: Understanding the interested parties in a given change (that is, event) Ensuring that all interested parties arrive (eventually) at a consistent state Undoing changes to resolve inconsistencies resulting from failures Understanding when the sequence of changes is important Providing a means of recovering starting from a prior consistent state when the current state becomes corrupted 13
Use cases revisited For each of the following, where would you place it on the CAP triangle and why? Social media Address change ATM withdrawal Principles: Consider what is most important to the customer (or user) The benefit achieved should warrant the cost of the chosen solution note that added complexity is effectively an increase in cost Consider the breadth of solutions available, not just the most recent products or tools on the scene 14
When eventual consistency isn t enough The ATM cash withdrawal use case Traditionally, ATMs need both consistency and availability, so they will sacrifice partition tolerance under normal conditions If partition tolerance becomes necessary due to a network failure, availability is chosen over consistency However, in the end (eventually), consistency is important to the customer! Therefore, reconciliation techniques are built in to resolve any inconsistencies that might have resulted from providing availability during partition tolerance (network outage) 15
Optimizing transactional boundaries Stepping back from the various use cases, we can see that almost any business interaction can include both atomic and eventually consistent components Completely atomic transactions tend to be simpler to manage and, so, are preferable, if the business objective can be met If the use case demands breaking the transaction apart, additional complexity is required Each use case should seek the optimal balance between atomic transactions and eventual consistency in light of the business requirement 16
Summary Implications of eventual consistency and event-driven architecture on system design Data relationships must be understood and managed, regardless of design approach Referential integrity must be maintained through the design and execution of the application, not the database The approach for conflict detection and resolution must be understood and implemented as part of the application design Subscribers to events must understand their responsibility for handling failure and reconciliation Transactional boundaries should be optimized to fit the business need 17
Thank you! Todd Schmitter Data Consistency Now and Then Track: Architecture 10:00am Room #208 Social /in/toddschmitter Email todd.schmitter@gmail.com 18