Philip A. Bernstein Colin Reid Ming Wu Xinhao Yuan Microsoft Corporation March 7, 2012

Similar documents
Scaling Optimistic Concurrency Control by Approximately Partitioning the Certifier and Log

Percona Live September 21-23, 2015 Mövenpick Hotel Amsterdam

Problems Caused by Failures

Heckaton. SQL Server's Memory Optimized OLTP Engine

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein Sameh Elnikety. Copyright 2012 Philip A. Bernstein

An Evaluation of Distributed Concurrency Control. Harding, Aken, Pavlo and Stonebraker Presented by: Thamir Qadah For CS590-BDS

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

Lecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

Distributed System. Gang Wu. Spring,2018

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

Transaction Management: Concurrency Control, part 2

Locking for B+ Trees. Transaction Management: Concurrency Control, part 2. Locking for B+ Trees (contd.) Locking vs. Latching

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein. Copyright 2003 Philip A. Bernstein. Outline

(Pessimistic) Timestamp Ordering

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

CompSci 516: Database Systems

CSE 344 MARCH 25 TH ISOLATION

Chapter 5. Concurrency Control Techniques. Adapted from the slides of Fundamentals of Database Systems (Elmasri et al., 2006)

A tomicity: All actions in the Xact happen, or none happen. D urability: If a Xact commits, its effects persist.

Database Management Systems 2010/11

Topics. File Buffer Cache for Performance. What to Cache? COS 318: Operating Systems. File Performance and Reliability

Advances in Data Management Transaction Management A.Poulovassilis

Concurrency in XML. Concurrency control is a method used to ensure that database transactions are executed in a safe manner.

Introduction to Data Management CSE 344

TRANSACTION PROPERTIES

Module 15: Managing Transactions and Locks

Samsara: Efficient Deterministic Replay in Multiprocessor. Environments with Hardware Virtualization Extensions

3. Concurrency Control for Transactions Part One

Database Systems CSE 414

Deukyeon Hwang UNIST. Wook-Hee Kim UNIST. Beomseok Nam UNIST. Hanyang Univ.

Lock-based concurrency control. Q: What if access patterns rarely, if ever, conflict? Serializability. Concurrency Control II (OCC, MVCC)

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 18-1

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Introduction to Data Management CSE 344

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)

Optimistic Concurrency Control. April 18, 2018

Control. CS432: Distributed Systems Spring 2017

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

Phantom Problem. Phantom Problem. Phantom Problem. Phantom Problem R1(X1),R1(X2),W2(X3),R1(X1),R1(X2),R1(X3) R1(X1),R1(X2),W2(X3),R1(X1),R1(X2),R1(X3)

Advanced Memory Management

CSE 344 MARCH 9 TH TRANSACTIONS

Concurrency Control Overview. COSC 404 Database System Implementation. Concurrency Control. Lock-Based Protocols. Lock-Based Protocols (2)

(Pessimistic) Timestamp Ordering. Rules for read and write Operations. Read Operations and Timestamps. Write Operations and Timestamps

Chapter 18 Concurrency Control Techniques

Database Management System

L i (A) = transaction T i acquires lock for element A. U i (A) = transaction T i releases lock for element A

7. Concurrency Control for Transactions Part Two

Introduction. A more thorough explanation of the overall topic

Crash Recovery Review: The ACID properties

Lecture 18: Reliable Storage

Transactions. Chapter 15. New Chapter. CS 2550 / Spring 2006 Principles of Database Systems. Roadmap. Concept of Transaction.

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks

Transaction Processing: Concurrency Control. Announcements (April 26) Transactions. CPS 216 Advanced Database Systems

Concurrency Control. R &G - Chapter 19

Crash Recovery. The ACID properties. Motivation

Lecture 7: Transactional Memory Intro. Topics: introduction to transactional memory, lazy implementation

2 Copyright 2015 M. E. Kabay. All rights reserved. 4 Copyright 2015 M. E. Kabay. All rights reserved.

Information Systems (Informationssysteme)

Database Recovery Techniques. DBMS, 2007, CEng553 1

Database Systems CSE 414

Crash Recovery. Chapter 18. Sina Meraji

White Paper Amazon Aurora A Fast, Affordable and Powerful RDBMS

What are Transactions? Transaction Management: Introduction (Chap. 16) Major Example: the web app. Concurrent Execution. Web app in execution (CS636)

Transactions and Concurrency Control. Dr. Philip Cannata

DATABASE DESIGN I - 1DL300

Databases - Transactions II. (GF Royle, N Spadaccini ) Databases - Transactions II 1 / 22

Chapter 7 (Cont.) Transaction Management and Concurrency Control

SQL, NoSQL, MongoDB. CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden

Deadlock Prevention (cont d) Deadlock Prevention. Example: Wait-Die. Wait-Die

6.830 Lecture Recovery 10/30/2017

Transaction Management: Introduction (Chap. 16)

Unit 10.5 Transaction Processing: Concurrency Zvi M. Kedem 1

Concurrency Control II and Distributed Transactions

Review: The ACID properties. Crash Recovery. Assumptions. Motivation. More on Steal and Force. Handling the Buffer Pool

High Performance Transactions in Deuteronomy

Concurrency Control. Chapter 17. Comp 521 Files and Databases Fall

Transactions. Transaction. Execution of a user program in a DBMS.

Concurrency Control in Distributed Systems. ECE 677 University of Arizona

CMP-3440 Database Systems

Chapter B: Hierarchical Model

Lecture: Consistency Models, TM

Goal of Concurrency Control. Concurrency Control. Example. Solution 1. Solution 2. Solution 3

Multi-User-Synchronization

Concurrency Control. [R&G] Chapter 17 CS432 1

Transaction Management: Concurrency Control

CSE 344 MARCH 5 TH TRANSACTIONS

Database Management Systems

DATABASE TRANSACTIONS. CS121: Relational Databases Fall 2017 Lecture 25

Database Principles: Fundamentals of Design, Implementation, and Management Tenth Edition. Chapter 13 Managing Transactions and Concurrency

Optimistic Concurrency Control. April 13, 2017

Transactional Information Systems:

Concurrency Control CHAPTER 17 SINA MERAJI

Concurrency Control. Transaction Management. Lost Update Problem. Need for Concurrency Control. Concurrency control

Slides Courtesy of R. Ramakrishnan and J. Gehrke 2. v Concurrent execution of queries for improved performance.

Transactional Information Systems:

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Chapter 13 : Concurrency Control

Transcription:

Philip A. Bernstein Colin Reid Ming Wu Xinhao Yuan Microsoft Corporation March 7, 2012 Published at VLDB 2011: http://www.vldb.org/pvldb/vol4/p944-bernstein.pdf

A new algorithm for optimistic concurrency control (OCC) on tree-structured indices, called meld. Scenario 1: A data-sharing system The log is the database. All servers can access it. Each transaction appends its after-images to the log. Each server runs meld to do OCC and roll forward the log Server App DBMS cache Meld Server App DBMS cache Meld Server App DBMS cache Meld Log 2

Core App DBMS Cache Log Core App DBMS Core Roll Forward The log is the database. All cores can access it. Each transaction appends its afterimages to the log. One core runs meld to do OCC and roll forward the log

Motivation System architecture Meld Algorithm Performance Conclusion 4

G A B C H I Binary Search Tree D Tree is marshaled into the log A D C B I H G 5 5

Copy on write To update a node, replace nodes up to the root G G B H B A C I C D D Update D s value 6 6

Each server has a cache of the last committed DB state DB cache G B A C H D I Transaction execution 1. Get pointer to snapshot 2. Generate updates locally 3. Append intention log record B G C D last committed A D C B I H G D C B G Snapshot 7

8 Each server processes intention records in sequence To process transaction T s intention record. Check whether T experienced a conflict If not, T committed, so the server merges the intention into its last committed state All servers make the same commit/abort decisions Did a committed transaction write into T s readset or writeset here? T s conflict zone transaction T A D C B I H G D C B G Snapshot

1. Run transaction 2. Broadcast intention 3. Append intention to log 4. Send log location 5. De-serialize intention 6. Meld 9 9

1. Broadcasting the intention 2. Appending intention to the log 3. Optimistic concurrency control (OCC) 4. Meld Technology will improve 1 & 2 For 3, app behavior drives OCC performance But 4 depends on single-threaded processor performance, which isn t improving Hence, it s important to optimize Meld 10

Compare transaction T s after-image to the last committed state which is annotated with version and dependency metadata Traverse T s intention, comparing versions to last-committed state Stop traversing when you reach an unchanged subtree If version(x)=version(x ) then simply replace x by x Log x x x x [read: x] state when T executed Last-committed state before T Transaction T s intention 11

Motivation System architecture Meld Algorithm Performance Conclusion 12

Latest version of database state Intention record Meld New version of database state Must be computationally efficient Must be deterministic Must produce the same sequence of states on all servers 13

T1 creates keys B,C,D,E T2 and T3 then execute concurrently, both based on the T2 result of T1 D T1 B D C T3 E D T2 inserts A B E B E T3 inserts F A C C F T2 and T3 do not conflict, so the resulting melded state is A, B, C, D, E, F B D E A C F 14 14

Node Metadata version of the subtree dependency info Root B metadata D C E Every node n has a unique version number (VN) VN(n) permanently identifies the exact content of n s subtree Each node n in an intention T stores metadata about T s snapshot Version of n in T s snapshot Dependency information Each node s metadata compresses to ~30 bytes 15 15

We need to avoid synchronization when assigning VNs Stored as offsets from the base location of their intention The base location is assigned when the intention is logged Given: T0 s root subtree has VN 50 VN of each subtree S in T1= 50 + S s offset T1 B D VN=54 VN=52 E VN=53 C VN=51 T0 T1 C B E D VN Offset +1 +2 +3 +4 Absolute VN: 50 51 52 53 54 16 16

Subtree metadata includes a source structure version (SSV). Intutively, SSV(n) = version of n in transaction T s snapshot DependsOn(n) = Y if T depends on n not having changed while T executed T0 T1 C B E D VN Offset: +1 +2 +3 +4 SSV: 0 0 0 50 DependsOn: N N N Y Absolute VN 50 51 52 53 54 T1 s root subtree depends on the entire tree version 50. Since SSV(D) = VN( ), T1 becomes the last-committed state. 17 17

A serial intention is one whose source version is the last committed state. Meld is then trivial and needs to consider only the root node. T1 was serial. T2 is serial, so meld makes T2 the last committed state. Thus, a meld of a serial intention executes in constant time. T0 T1 T2 C B E D VN Offset: +1 +2 +3 +4 SSV: 0 0 0 50 DependsOn: N N N Y A B D +1 +2 +3 0 52 54 N N N Absolute VN 51 52 53 54 55 56 57 18

T1 D B E C T2 D T3 D B E B E A C C F B D E A C F 19 19

T3 is not serial because VN of D in T2 (= 57) SSV(D) in T3 (= 54). Meld checks if T3 conflicts with a transaction in its conflict zone Traverses T3, comparing T3 s nodes to the last-committed state If there are no conflicts, then since T3 is concurrent, meld creates an ephemeral intention to merge T3 s state T0 T1 T2 T3 C B E D A B D VN Offset: +1 +2 +3 +4 SSV: 0 0 0 50 DependsOn: N N N Y +1 +2 +3 0 52 54 N N N F E D +1 +2 +3 0 53 54 N N N Absolute VN 51 52 53 54 55 56 57 58 59 60 20 20

A committed concurrent intention produces an ephemeral intention (e.g. M3) It s created deterministically in memory on all servers. T0 T1 T2 T3 M3 C B E D A B D VN Offset: +1 +2 +3 +4 SSV: 0 0 0 50 DependsOn: N N N Y +1 +2 +3 0 52 54 N N N F E D +1 +2 +3 0 53 54 N N N D +1 57 N Absolute VN 51 52 53 54 55 56 57 58 59 60 61 It logically commits immediately after the intention it melds. To meld the concurrent intention T3 above, we need to consider metadata only on the root node D. 21 21

Most are automatically trimmed Each committed intention I trims the previous ephemeral intention with either a persisted node (if I is serial) or an ephemeral node (if I is concurrent). To track them use an ephemeral flag (or count) on each node that has ephemeral descendants Periodically run a flush transaction It copies a set of ephemeral nodes that have no reachable ephemeral nodes It makes the original ephemeral nodes unreachable in the new committed state. It has no dependencies, so it can t conflict 22

Phantom avoidance Asymmetric meld operations Necessary in common case when subtrees do not align Uses a key-range as a parameter to the top-down recursion Deletions Use tombstones in the intention header Checkpointing and recovery 23 23

Focus here is on meld throughput only For latency, see the paper We count committed and aborted transactions Experiment setup 128K keys, all in main memory. Keys and payloads are 8 bytes. Serializable isolation, so intentions contain readsets De-serialize intentions on separate threads before meld Transaction size affects meld throughput So does conflict zone size ( concurrency degree ) As transaction size or concurrency degree increase more concurrent transactions update keys with common ancestors meld has to traverse deeper in the tree 24

r:w ratio is 1:1 con-di = concurrency degree i 25

26

Brute force = traverse the whole tree 27

Lots of OCC papers but none that give details of efficient conflict-testing By contrast, there s a huge literature on conflicttesting for locking Oxenstored [Gazagnairem & Hanquezis, ICFP 09] Similar scenario: MV trees and OCC However, very coarse-grain conflict-testing Uses none of our optimizations 28

New algorithm for OCC Developed many optimizations to truncate the conflict checking early in the tree traversal Implemented and measure it Future work: Apply it to other tree structures Measure it on various storage devices Compare it with locking and other OCC methods on multiversion trees Try to apply it to physiological logging 29

2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Consider node n in Intention I SCV(n) = VN of the node that first generated the payload in n s predecessor Altered(n) = true if n s payload differs from its predecessor s DependsOn(n) = true if I depends on n s predecessor s content NCV(n) = if Altered(n) then VN(n) else SCV(n) n s content changed if SCV(n I ) NCV(n last committed state ) v 1 v 2 v 14 v 15 SCV IsAltered=true NCV=self SCV DependsOn=true Implies a conflict 31

Again consider node n in Intention I DependsOnTree(n) = true if I depends on n s subtree not having changed NSV(n) = oldest version of n whose subtree is exactly subtree(vn(n)) DependsOnTree(n) & NSV(n last committed state ) SSV(n) a conflict Can extend DependsOnTree with DependencyRange of keys in last committed state v 1 v 8 v 14 v 15 NSV SSV DependsOnTree=true Implies a conflict 32

SubtreeIsOnlyReadDependent(n) = true iff none of n s descendants are updated in I (i.e., have Altered = true). Analogous to an intention-to-read lock Avoids traversing entire tree when a descendent of n has DependsOnTree=true and NSV(n last committed state ) = SSV(n). It also enables computing NSV NSV(n) = (SubtreeIsOnlyReadDependent(n) = true) SSV(n) else VN(n) 33