MASSACHUSETTS INSTITUTE OF TECHNOLOGY

Similar documents
MASSACHUSETTS INSTITUTE OF TECHNOLOGY

G Distributed Systems: Fall Quiz II

What are Transactions? Transaction Management: Introduction (Chap. 16) Major Example: the web app. Concurrent Execution. Web app in execution (CS636)

6.830 Lecture Recovery 10/30/2017

6.830 Lecture Recovery 10/30/2017

Transaction Management: Introduction (Chap. 16)

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2008 Quiz I

INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados

2

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Computer Systems Engineering: Spring Quiz 2

6.824 Distributed System Engineering: Spring Quiz I Solutions

Concurrency Control! Snapshot isolation" q How to ensure serializability and recoverability? " q Lock-Based Protocols" q Other Protocols"

CSE 444: Database Internals. Lectures Transactions

CSE 190D Spring 2017 Final Exam

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2008 Quiz II

CS 245: Database System Principles

CSE 190D Spring 2017 Final Exam Answers

Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY Fall Quiz II

Overview of Transaction Management

CSE 444: Database Internals. Lectures 13 Transaction Schedules

/ Cloud Computing. Recitation 13 April 14 th 2015

G Distributed Systems: Fall Quiz I

Database Systems. Announcement

Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2004.

Transaction Processing. Introduction to Databases CompSci 316 Fall 2018

Final Review. May 9, 2017

Concurrency. Consider two ATMs running in parallel. We need a concurrency manager. r1[x] x:=x-250 r2[x] x:=x-250 w[x] commit w[x] commit

CS 245 Final Exam Winter 2017

FAQs Snapshots and locks Vector Clock

Final Review. May 9, 2018 May 11, 2018

Transactions. Transaction. Execution of a user program in a DBMS.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Distributed System Engineering: Spring Exam II

Concurrency Control & Recovery

Consistency in Distributed Systems

6.830 Problem Set 3 Assigned: 10/28 Due: 11/30

Fault Tolerant Distributed Main Memory Systems

EECS 498 Introduction to Distributed Systems

Transaction Management Overview

Databases - Transactions II. (GF Royle, N Spadaccini ) Databases - Transactions II 1 / 22

Last time: Saw how to implement atomicity in the presence of crashes, but only one transaction running at a time

Implementing Isolation

6.097 Operating System Engineering: Fall Quiz II THIS IS AN OPEN BOOK, OPEN NOTES QUIZ.

IMPORTANT: Circle the last two letters of your class account:

CS 370 Concurrency worksheet. T1:R(X); T2:W(Y); T3:R(X); T2:R(X); T2:R(Z); T2:Commit; T3:W(X); T3:Commit; T1:W(Y); Commit

Introduction to Data Management. Lecture #18 (Transactions)

THIS IS AN OPEN BOOK, OPEN NOTES QUIZ.

CSE 444, Winter 2011, Midterm Examination 9 February 2011

Database Management System

Concurrency Control & Recovery

Goal of Concurrency Control. Concurrency Control. Example. Solution 1. Solution 2. Solution 3

RESILIENT DISTRIBUTED DATASETS: A FAULT-TOLERANT ABSTRACTION FOR IN-MEMORY CLUSTER COMPUTING

Concurrency control 12/1/17

Transaction Management Overview. Transactions. Concurrency in a DBMS. Chapter 16

INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados

Page 1. Goals of Today s Lecture. The ACID properties of Transactions. Transactions

CSE 344 Final Examination

Transaction Processing: Concurrency Control. Announcements (April 26) Transactions. CPS 216 Advanced Database Systems

TRANSACTION MANAGEMENT

Lock Tuning. Concurrency Control Goals. Trade-off between correctness and performance. Correctness goals. Performance goals.

Information Systems (Informationssysteme)

Concurrent & Distributed Systems Supervision Exercises

Intro to Transaction Management

/ Cloud Computing. Recitation 13 April 12 th 2016

Problems Caused by Failures

Distributed Systems. Lec 12: Consistency Models Sequential, Causal, and Eventual Consistency. Slide acks: Jinyang Li

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2015 Quiz I

CONCURRENCY CONTROL, TRANSACTIONS, LOCKING, AND RECOVERY

Transactions and Concurrency Control

Concurrency Control. R &G - Chapter 19

CS 347 Parallel and Distributed Data Processing

CSE 344 Final Examination

Database Architectures

Database Systems CSE 414

Transaction Management and Concurrency Control. Chapter 16, 17

Transactions and Recovery Study Question Solutions

Topics in Reliable Distributed Systems

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2009 Quiz I Solutions

Distributed Systems (5DV147)

CSE 344 Final Review. August 16 th

COMP Parallel Computing. CC-NUMA (2) Memory Consistency

Database Architectures

Distributed Transaction Management. Distributed Database System

Building Consistent Transactions with Inconsistent Replication

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein Sameh Elnikety. Copyright 2012 Philip A. Bernstein

Chapter 15 : Concurrency Control

Slides Courtesy of R. Ramakrishnan and J. Gehrke 2. v Concurrent execution of queries for improved performance.

Database Tuning and Physical Design: Execution of Transactions

Numerical Computing with Spark. Hossein Falaki

Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY Fall Quiz I

SQL, NoSQL, MongoDB. CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

Databases - Transactions

Transaction Management Exercises KEY

DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Distributed System Engineering: Spring Exam I

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein. Copyright 2003 Philip A. Bernstein. Outline

Global atomicity. Such distributed atomicity is called global atomicity A protocol designed to enforce global atomicity is called commit protocol

Datenbanksysteme II: Implementation of Database Systems Synchronization of Concurrent Transactions

Transaction Management: Concurrency Control

CSC 261/461 Database Systems Lecture 21 and 22. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Transcription:

Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.814/6.830 Database Systems: Fall 2016 Quiz II There are 14 questions and?? pages in this quiz booklet. To receive credit for a question, answer it according to the instructions given. You can receive partial credit on questions. You have 80 minutes to answer the questions. Write your name on this cover sheet AND at the bottom of each page of this booklet. Some questions may be harder than others. Attack them in the order that allows you to make the most progress. If you find a question ambiguous, be sure to write down any assumptions you make. Be neat. If we can t understand your answer, we can t give you credit! THIS IS AN OPEN BOOK, OPEN NOTES QUIZ. YOU MAY USE A LAPTOP OR CALCULATOR. YOU MAY NOT ACCESS THE INTERNET. Do not write in the boxes below 1-4 (xx/32) 5-8(xx/26) 9-12 (xx/24) 13-14 (xx/18) Total (xx/100)

6.814/6.830 Fall 2016, Quiz 2 Page 2 of?? I Distributed Query Processing Consider a distributed database with a fact table t1 of size 600 MB, and a dimension table t2 of size 50 MB. Users frequently join the two tables on t1.fkey = t2.pkey, and then aggregate the results to produce a small output. Consider two partitionings: A : Hash partition t1 on t1.fkey and t2 on t2.pkey B : Round-robin partition t1, and replicate t2 on every node 1. [8 points]: You try both partitionings on a cluster of 3 nodes where each node has 100 MB of RAM, and you find partitioning B outperforms partitioning A. This is surprising because both configurations need to perform no network I/O for the join, and partitioning A reads strictly less of t2 on every node, since t2 is partitioned instead of replicated. Provide one explanation for how this could occur. Answer: Distribution of t1.fkey is skewed. II Two Phase Commit Ben Bitdiddle is running a distributed transaction processing system with the two-phase commit (without presumed commit). Ben observes that both log writes and one-way network latencies in his system are about 10 ms, and he finds that the throughput of his system is very low, only about 20 transactions per second. He also notes that most of the transactions in his system do no conflict. Remembering that group commit is a good way to improve transaction throughput, Ben decides to try adding group commit to the coordinator s write of the commit record. In his system, before writing the commit record, the coordinator waits for X milliseconds for other transactions to be ready to commit, and then writes them to the log as a batch. 2. [6 points]: Do you think this optimization will substantially (by a factor of 2 or more) improve throughput of Ben s system? Why or why not? Answer: No, forced log writes on suboords and network latencies will dominate any improvement from logging.

6.814/6.830 Fall 2016, Quiz 2 Page 3 of?? III Locking and OCC Consider the following transactions, with initial values of X=3 and Y=5. For writes we also indicate the values that are written (i.e., WY ;Y=tx*2 means that a write to Y is done that sets its value to tx*2.) T1 T2 T3 ---------- ---------- -------- tx = RX ty = RY WY ;Y=0 WY ;Y=tx*2 WX ;X=ty+1 3. [8 points]: Suppose you know that the first operation that runs is RY in T2. Assuming there are no deadlocks, list all possible values for X and Y that could result from an execution of these transactions with strict two-phase locking? Answer: (1) X = 6, Y = 0. (2) X = 6, Y = 12. 4. [10 points]: Again, assuming the first operation that runs is RY in T2, list all possible values that could arise for X and Y from an execution of these transactions using snapshot isolation? Assume snapshot isolation behaves identically to serially-validated OCC, except that read sets are not tracked (so there is no need to check that the read set of a validating transaction intersects with the write set of any concurrent transaction). Answer: (1) X = 6, Y = 0. (2) X = 6, Y = 12. (3) X = 6, Y = 6.

6.814/6.830 Fall 2016, Quiz 2 Page 4 of?? IV ARIES/logging Consider three transactions that run concurrently, using strict two-phase locking: T1 T2 T3 -- -- -- RA RA RA WA WA RC RB RB WB WC (This is not an interleaving of the operations, just a list of what each transaction does.) Suppose the system crashes, and only T1 has committed. You are given following partial log at the time of recovery. 1 T1 BEGIN 2 T2 BEGIN 3 T3 BEGIN 4 T1 UPDATE A 5 CHECKPOINT 6 T1 COMMIT 7 T1 END 8 T2 UPDATE A 9 xxx 5. [8 points]: Assuming there are no additional checkpoints or other transactions running, what must the value of xxx be? Answer: T2 UPDATE B 6. [8 points]: Assuming that there were no transactions running before LSN 1, if T1 s update to A is flushed just after LSN 6, and there were no other flushes, at what log record would the REDO phase begin? Answer: LSN 4

6.814/6.830 Fall 2016, Quiz 2 Page 5 of?? V Dynamo In Dynamo, suppose you have three replicas A, B, and C of a data item X, you are running a (N=3,R=2,W=2) system but with sloppy quorums enabled, and with the next successor of A,B,C being D. Suppose that before each write, the writing client does a (sloppy) quorum read of X, and then performs a write by sending it to a random node in the replica set, supplying one of the most recent versions from the quorum read. When a replica receives a write from a client, it adds or updates its vector clock value and propagates the write to N-1 other nodes (which may not all be in the replica set due to the use of sloppy quorums). The write only succeeds if W nodes acknowledge the write. A node receiving a write to an X that is not causally related to a previous write it received will store both versions of the data (and the write will succeed). Suppose that replicas process the following sequence of writes, where (A,1) represents the fact that A s vector clock was at 1, and [(A,1),(B,1)] is an example vector clock for X. 1. A creates and successfully writes a new version [(A,1)] 2. B creates and successfully writes a new version [(A,1),(B,1)] 3. C creates and successfully writes a new version [(A,1),(C,1)] 7. [6 points]: Which sets of nodes could have processed write 1? (Circle True or False for each item below.) A. True / False A,B, and C B. True / False just A and B C. True / False just A and C D. True / False just A and D Answer: All of the above.

6.814/6.830 Fall 2016, Quiz 2 Page 6 of?? 8. [4 points]: Which sets of nodes could have processed write 2? (Circle True or False for each item below.) A. True / False A,B and C B. True / False just A and B C. True / False just B and C D. True / False just B and D Answer: B, C, D. 9. [4 points]: Which sets of nodes could have processed write 3? (Circle True or False for each item below.) A. True / False A,B and C B. True / False just A and C C. True / False just B and C D. True / False just C and D Answer: All of the above. 10. [4 points]: What causally unrelated versions exist in the system after write 3 completes? (Write your answer in the space below) Answer: [(A,1),(B,1)],[(A,1),(C,1)]

6.814/6.830 Fall 2016, Quiz 2 Page 7 of?? VI Spark You want to analyze the popularity of each website in a huge collection of websites using PageRank. You plan to compute PageRank using Spark on a cluster of 10 machines. All the web pages are stored in HDFS and the following Spark program is run to compute PageRank: val links = spark.textfile(...).map(...).persist() // (URL, links) var ranks = // RDD of (URL, rank) pairs for (i <- 1 to ITERATIONS) { // Build an RDD of (url, (links, rank)) pairs val linksjoinranks = links.join(ranks); // Build an RDD of (targeturl, float) pairs // with the contributions sent by each page val contribs = linksjoinranks.flatmap { (url, (links, rank)) => links.map(dest => (dest, rank/links.size)) } // Sum contributions by URL val sumcontribs = contribs.reducebykey((x,y) => x+y) // Get new ranks ranks = sumcontribs.mapvalues(sum => a/n + (1-a)*sum) } // END OF PROGRAM Suppose after the first iteration of the for loop, one machine crashes. Spark will have to recover the failed partitions from the crashed machine. 11. [8 points]: Considering linksjoinranks, contribs, sumcontribs and ranks, which RDDs require more work (computation and network communication) to recover? Why? Answer: linksjoinranks and sumcontribs are wide transformation.

6.814/6.830 Fall 2016, Quiz 2 Page 8 of?? After writing several applications with Spark, you discover that it s possible to append values from an RDD to a file in HDFS. Suppose your replace the line //END OF PROGRAM in the previous code with val hdfsfile = // Open a file in append mode on HDFS ranks.map(x => hdfsfile.append(x)) 12. [8 points]: You run this program several times (with ITERATIONS set to the same value each time), and you find that you sometimes get different numbers of (URL, rank) pairs in the output HDFS file. Assuming HDFS properly serializes concurrent appends to a file, describe in one sentence how this could happen. Answer: If ranks fails, it will re-execute and thus also re-execute any HDFS related operations again

6.814/6.830 Fall 2016, Quiz 2 Page 9 of?? VII Locking Last year, Ben Bitdiddle built LocksDB, a database with a new a custom lock-based concurrency control mechanism. This mechanism extends the read/write locking you saw in class to include increment operations. The core idea of LocksDB is that while reads and writes are able to express increment (or decrement) operations, one can improve concurrency by treating increments differently than arbitrary writes. The key observation is that the state of the database is the same regardless of the order in which two concurrent increments run. To get an idea of how transactions with increments work, we provide an example below. In this listing, RX stands for a Read of variable X, WX stands for a Write of some value to variable X, and IX means to increment X by some value. As in the model of writes presented in class, values used in increments may depend on values previously read. Assume numbers do not overflow and that individual increments happen atomically with respect to other increments/writes. T1 IX,5 COMMIT T2 local_x = RX WX,local_x + 5 COMMIT Figure 1: Both T1 and T2 are doing the same operation on the data, but T1 uses the increment operation. For LocksDB, the definition of conflicting operations needs to be extended to include increment operations. Recall that conflicting operations are those where the state of the system depends on the order in which a pair of operations are executed. In LocksDB, as in the two-phase locking protocol we studied in class, before a transaction can perform a particular R/W/I operation, it needs to hold the appropriate lock, and locks for conflicting operation types cannot be held simultaneously by two transactions. LocksDB has three types of locks: read-only (R), exclusive (X), and increment (I). R and X locks behave as in a conventional 2PL system. I-locks are new. Below, we have provided the new lock compatibility table that we will use for this exercise. Read locks are compatible with other read locks, and nothing else, and increment locks are compatible with other increment locks, but with nothing else. Exclusive locks are not compatible with anything else. R X I R Y N N X N N N I N N Y R and I locks are not compatible because if T1 reads a value and T2 increments it, and T1 re-reads it, it may not read the same value, violating the repeatable reads property.

6.814/6.830 Fall 2016, Quiz 2 Page 10 of?? 13. [10 points]: Now that we have the lock compatibility table, we also need to design a modified version of strict 2 phase locking (S2PL). Our modified version of S2PL works the following way: upon receiving a request to read, write or increment an item, the lock manager must decide what lock to acquire, or whether to upgrade an already held lock. The following table has been partially filled in for you. The rows indicate the lock the transaction currently has on an item, and the columns indicate the lock the transaction should request to perform the operation in the column on the item. Fill in the rest of the entries. Think about what would happen if a transaction attempted to upgrade a lock and another transaction was currently holding the same lock (for example, if T1 and T2 both have R locks on object i, and T1 attempts to write it, T1 would need to acquire an X lock and would be blocked waiting for T2 to release the R lock.) Answer: Hint: R can never be promoted to I. R W I hasnone acqrlock acqxlock hasrlock nothing acqxlock hasxlock nothing nothing hasilock R W I none acqrlock acqxlock acqilock hasrlock nothing acqxlock acqxlock hasxlock nothing nothing nothing hasilock acqxlock acqxlock nothing

6.814/6.830 Fall 2016, Quiz 2 Page 11 of?? 14. [8 points]: Ben wants to improve locking performance by allowing dual-granularity locking: a transaction may opt to lock the entire table in R, X or I mode, rather than locking many individual items, or it may lock individual data items. To do this, Ben decides that he will adapt the multilevel locking scheme from class based on intention locks. In this scheme, a transaction that wants to read/write/increment an individual data item will first acquire an R/X/I intention lock on the table, and then acquire the appropriate R/X/I on the data item. Fill in the new entries in the lock compatibility table below. IR, IX, II are the intention locks for R, X, and I respectively. As an example, we ve supplied the value for R/IR locks, because a transaction that wants to read some data item inside the table (hence requiring an IR lock on the table) can run concurrently with a transaction that is reading the entire table, the value here is Y. (Write Y or N in each blank cell.) R X I IR IX II R Y N N Y X N N N I N N Y IR Y IX II Answer: R E I IR IX II R Y N N Y N N E N N N N N N I N N Y N N Y IR Y N N Y Y Y IX N N N Y Y Y II N N Y Y Y Y End of Quiz II