Torturing Databases for Fun and Profit

Size: px
Start display at page:

Download "Torturing Databases for Fun and Profit"

Transcription

1 Torturing Databases for Fun and Profit Mai Zheng, Joseph Tucek, Dachuan Huang, Feng Qin, Mark Lillibridge Elizabeth S Yang, Bill W Zhao, Shashank Singh The Ohio State University HP Labs

2 2

3 database 3

4 ACID: atomicity, consistency, isolation, and durability - even under failures 4

5 List of databases survived 5

6 Everything is broken under simulated power faults Database TokyoCabinet MariaDB LightningDB SQLite KVS-A SQL-A File System Workload W-1 W-2 W-3 W-4.1 W-4.2 W-4.3 ext3 D D D ACD ACD ACD XFS -- D D ACD D ACD ext3 D D D D D D XFS D D D D D D ext D XFS ext3 D D -- D D D XFS D D D D ext Hang XFS ext3 D D D D D D XFS D D D D D D ext3 D D CD CD CD CD SQL-B XFS CD D CD CD CD CD SQL-C NTFS D D D D D D 6

7 Power faults cannot happen nowadays, right? 7

8 2013: POWER OUTAGE during Super Bowl because a RELAY DEVICE MALFUNCTIONED. 8

9 2014: Internap Data Center OUTAGE Takes Down Livestream, StackExchange 2014: Data Center FIRE Leads to OUTAGE 2014: ELECTRICAL FIRE took down primary data center ALL POWER was OFF 2013: POWER OUTAGE knocks DreamHost customers offline 2013: A data center POWER OUTAGE is being blamed for Visa downtime 2013: A POWER OUTAGE at a key New Jersey data center 2013: POWER OUTAGE during Super Bowl because a RELAY DEVICE MALFUNCTIONED. 2012: HUNAM ERROR was responsible for a data center POWER OUTAGE 2012: POWER OUTAGE Hits London Data Center 2012: Amazon Data Center LOSES POWER During STORM 2011: Colocation provider Colo4 experienced a POWER OUTAGE 2010: About 3,000 servers at Montreal web host iweb experienced an OUTAGE 2010: CAR CRASH Triggers Amazon POWER OUTAGE 9

10 Database Torture 101

11 Fault Model: Clean termination of I/O stream database on-the-fly I/O blocks minimum atomic transfer unit (e.g., 512B/4KB) blocks transferred to durable media 11

12 Fault Model: Clean termination of I/O stream database on-the-fly I/O blocks blocks transferred to durable media blocks after the fault have no effect a fault happens blocks before the fault are NOT corrupted/ dropped/reordered 12

13 Why not introduce corruption/dropping/reordering? Unreasonable to require databases to handle arbitrary bad behavior introduced in the lower layers Simulated bad behavior w/o verification by real failures may be unrealistic - I/O path in kernel & device puts constraints on failure states 13

14 How do we test DBs on multiple OSes w/ high fidelity? No disturbance on thread scheduling database No disturbance on interactions among DB, memory manager, FS, volume manager, I/O scheduler, 14

15 How do we test DBs on multiple OSes w/ high fidelity? iscsi initiator decouple via iscsi iscsi target database SCSI commands over network torturing framework 15

16 How do we test DBs on multiple OSes w/ high fidelity? iscsi initiator decouple via iscsi iscsi target database SCSI commands over network torturing framework 16

17 Framework Overview Worker & Checker ❶ ❸ database SCSI cmds ❷ Record & Replayer 17

18 Framework Overview Worker & Checker ❶ ❸ database SCSI cmds ❷ Record & Replayer 18

19 Workload Example meta rows work rows key value THR-1-TXN-1 v-init-thr-1-txn-1 THR-1-TXN-2 v-init-thr-1-txn-2 THR-2-TXN-1 v-init-thr-2-txn-1 THR-2-TXN-2 v-init-thr-2-txn-2 k-1 v-init-1 k-2 v-init-2 k-3 v-init-3 k-4 v-init-4 k-5 v-init-5 k-6 v-init-6 k-7 v-init-7 k-8 v-init-8 DB table 19

20 Workload Example key value two parts meta rows work rows THR-1-TXN-1 v-init-thr-1-txn-1 THR-1-TXN-2 v-init-thr-1-txn-2 THR-2-TXN-1 v-init-thr-2-txn-1 THR-2-TXN-2 v-init-thr-2-txn-2 k-1 v-init-1 k-2 v-init-2 k-3 v-init-3 k-4 v-init-4 k-5 v-init-5 2 threads, 2 transactions per thread k-6 v-init-6 k-7 v-init-7 k-8 v-init-8 DB table 20

21 Has known initial state meta rows work rows key value THR-1-TXN-1 v-init-thr-1-txn-1 THR-1-TXN-2 v-init-thr-1-txn-2 THR-2-TXN-1 v-init-thr-2-txn-1 THR-2-TXN-2 v-init-thr-2-txn-2 k-1 v-init-1 k-2 v-init-2 k-3 v-init-3 k-4 v-init-4 k-5 v-init-5 k-6 v-init-6 k-7 v-init-7 k-8 v-init-8 DB table 21

22 Each transaction updates N random work rows + 1 meta row meta rows work rows key value THR-1-TXN-1 v-init-thr-1-txn-1 THR-1-TXN-2 v-init-thr-1-txn-2 THR-2-TXN-1 v-init-thr-2-txn-1 THR-2-TXN-2 v-init-thr-2-txn-2 k-1 v-init-1 k-2 v-init-2 k-3 v-init-3 k-4 v-init-4 k-5 v-init-5 k-6 v-init-6 k-7 v-init-7 k-8 v-init-8 DB table 22

23 Each transaction updates N random work rows + 1 meta row meta rows work rows key value THR-1-TXN-1 v-init-thr-1-txn-1 THR-1-TXN-2 v-init-thr-1-txn-2 THR-2-TXN-1 v-init-thr-2-txn-1 THR-2-TXN-2 v-init-thr-2-txn-2 k-1 v-init-1 k-2 v-init-2 k-3 v-init-3 k-4 v-init-4 k-5 v-init-5 k-6 v-init-6 k-7 v-init-7 k-8 v-init-8 DB table THR-1-TXN-1 23

24 Each transaction updates N random work rows + 1 meta row meta rows work rows key value THR-1-TXN-1 v-init-thr-1-txn-1 THR-1-TXN-2 v-init-thr-1-txn-2 THR-2-TXN-1 v-init-thr-2-txn-1 THR-2-TXN-2 v-init-thr-2-txn-2 k-1 v-init-1 k-2 v-thr-1-txn-1 k-3 v-init-3 k-4 v-init-4 k-5 v-thr-1-txn-1 k-6 v-init-6 k-7 v-init-7 k-8 v-init-8 DB table THR-1-TXN-1 save transaction ID 24

25 Each transaction updates N random work rows + 1 meta row meta rows work rows key value THR-1-TXN-1 k-2-k-5-ts-00:01 THR-1-TXN-2 v-init-thr-1-txn-2 THR-2-TXN-1 v-init-thr-2-txn-1 THR-2-TXN-2 v-init-thr-2-txn-2 k-1 v-init-1 k-2 v-thr-1-txn-1 k-3 v-init-3 k-4 v-init-4 k-5 v-thr-1-txn-1 k-6 v-init-6 k-7 v-init-7 k-8 v-init-8 DB table save work-row keys & timestamp right before commit THR-1-TXN-1 save transaction ID 25

26 Fully exercise concurrency control meta rows work rows key value THR-1-TXN-1 k-2-k-5-ts-00:01 THR-1-TXN-2 v-init-thr-1-txn-2 THR-2-TXN-1 v-init-thr-2-txn-1 THR-2-TXN-2 v-init-thr-2-txn-2 k-1 v-init-1 k-2 v-thr-1-txn-1 k-3 v-init-3 k-4 v-init-4 k-5 v-thr-1-txn-1 k-6 v-init-6 k-7 v-init-7 k-8 v-init-8 DB table THR-1-TXN-1 26

27 Fully exercise concurrency control meta rows work rows key value THR-1-TXN-1 k-2-k-5-ts-00:01 THR-1-TXN-2 v-init-thr-1-txn-2 THR-2-TXN-1 k-7-k-6-ts-00:03 THR-2-TXN-2 v-init-thr-2-txn-2 k-1 v-init-1 k-2 v-thr-1-txn-1 k-3 v-init-3 k-4 v-init-4 k-5 v-thr-1-txn-1 k-6 v-thr-2-txn-1 k-7 v-thr-2-txn-1 k-8 v-init-8 DB table THR-2-TXN-1 27

28 Fully exercise concurrency control meta rows work rows key value THR-1-TXN-1 k-2-k-5-ts-00:01 THR-1-TXN-2 k-6-k-8-ts-00:13 THR-2-TXN-1 k-7-k-6-ts-00:03 THR-2-TXN-2 v-init-thr-2-txn-2 k-1 v-init-1 k-2 v-thr-1-txn-1 k-3 v-init-3 k-4 v-init-4 k-5 v-thr-1-txn-1 k-6 v-thr-1-txn-2 k-7 v-thr-2-txn-1 k-8 v-thr-1-txn-2 DB table THR-1-TXN-2 28

29 Fully exercise concurrency control meta rows work rows key value THR-1-TXN-1 k-2-k-5-ts-00:01 THR-1-TXN-2 k-6-k-8-ts-00:13 THR-2-TXN-1 k-7-k-6-ts-00:03 THR-2-TXN-2 k-3-k-7-ts-00:14 k-1 v-init-1 k-2 v-thr-1-txn-1 k-3 v-thr-2-txn-2 k-4 v-init-4 k-5 v-thr-1-txn-1 k-6 v-thr-1-txn-2 k-7 v-thr-2-txn-2 k-8 v-thr-1-txn-2 DB table THR-2-TXN-2 29

30 30

31 A power fault just happened during our workload 31

32 Is there any ACID violation after recovery? meta rows work rows key value THR-1-TXN-1 k-2-k-5-ts-00:01 THR-1-TXN-2 k-6-k-8-ts-00:13 THR-2-TXN-1 k-7-k-6-ts-00:03 THR-2-TXN-2 v-init-thr-2-txn-2 k-1 v-init-1 k-2 v-thr-1-txn-1 k-3 v-thr-2-txn-2 k-4 v-init-4 k-5 v-thr-1-txn-1 k-6 v-thr-1-txn-2 k-7 v-thr-2-txn-2 k-8 v-thr-1-txn-2 recovered DB table 32

33 Is there any ACID violation after recovery? meta rows work rows key value THR-1-TXN-1 k-2-k-5-ts-00:01 THR-1-TXN-2 k-6-k-8-ts-00:13 THR-2-TXN-1 k-7-k-6-ts-00:03 THR-2-TXN-2 v-init-thr-2-txn-2 k-1 v-init-1 k-2 v-thr-1-txn-1 k-3 v-thr-2-txn-2 k-4 v-init-4 k-5 v-thr-1-txn-1 k-6 v-thr-1-txn-2 k-7 v-thr-2-txn-2 k-8 v-thr-1-txn-2 recovered DB table 33

34 Is there any ACID violation after recovery? meta rows work rows key value THR-1-TXN-1 k-2-k-5-ts-00:01 THR-1-TXN-2 k-6-k-8-ts-00:13 THR-2-TXN-1 k-7-k-6-ts-00:03 THR-2-TXN-2 v-init-thr-2-txn-2 k-1 v-init-1 k-2 v-thr-1-txn-1 k-3 v-thr-2-txn-2 k-4 v-init-4 k-5 v-thr-1-txn-1 k-6 v-thr-1-txn-2 k-7 v-thr-2-txn-2 k-8 v-thr-1-txn-2 recovered DB table 34

35 Is there any ACID violation after recovery? meta rows work rows key value THR-1-TXN-1 k-2-k-5-ts-00:01 THR-1-TXN-2 k-6-k-8-ts-00:13 THR-2-TXN-1 k-7-k-6-ts-00:03 THR-2-TXN-2 v-init-thr-2-txn-2 k-1 v-init-1 k-2 v-thr-1-txn-1 k-3 v-thr-2-txn-2 k-4 v-init-4 k-5 v-thr-1-txn-1 k-6 v-thr-1-txn-2 k-7 v-thr-2-txn-2 k-8 v-thr-1-txn-2 recovered DB table 35

36 Is there any ACID violation after recovery? meta rows work rows key value THR-1-TXN-1 k-2-k-5-ts-00:01 THR-1-TXN-2 k-6-k-8-ts-00:13 THR-2-TXN-1 k-7-k-6-ts-00:03 THR-2-TXN-2 v-init-thr-2-txn-2 k-1 v-init-1 k-2 v-thr-1-txn-1 k-3 v-thr-2-txn-2 k-4 v-init-4 k-5 v-thr-1-txn-1 k-6 v-thr-1-txn-2 k-7 v-thr-2-txn-2 k-8 v-thr-1-txn-2 recovered DB table Atomicity violation! should have been updated w/ work rows 36

37 Is there any ACID violation after recovery? meta rows work rows key value THR-1-TXN-1 k-2-k-5-ts-00:01 THR-1-TXN-2 k-6-k-8-ts-00:13 THR-2-TXN-1 k-7-k-6-ts-00:03 THR-2-TXN-2 v-init-thr-2-txn-2 k-1 v-init-1 k-2 v-thr-1-txn-1 k-3 v-thr-2-txn-2 k-4 v-init-4 k-5 v-thr-1-txn-1 k-6 v-thr-1-txn-2 k-7 v-thr-2-txn-2 k-8 v-thr-1-txn-2 recovered DB table allow checking time & order related properties 37

38 Is there any ACID violation after recovery? meta rows work rows key value THR-1-TXN-1 k-2-k-5-ts-00:01 THR-1-TXN-2 k-6-k-8-ts-00:13 THR-2-TXN-1 k-7-k-6-ts-00:03 THR-2-TXN-2 v-init-thr-2-txn-2 k-1 v-init-1 k-2 v-thr-1-txn-1 k-3 v-thr-2-txn-2 k-4 v-init-4 k-5 v-thr-1-txn-1 k-6 v-thr-1-txn-2 k-7 v-thr-2-txn-2 k-8 v-thr-1-txn-2 recovered DB table More workloads & ACID checking in the paper 38

39 Framework Overview Worker & Checker ❶ ❸ database SCSI cmds ❷ Record & Replayer 39

40 Framework Overview Worker & Checker ❶ ❸ database SCSI cmds ❷ Record & Replayer 40

41 Capturing I/O trace without kernel modification target daemon Worker SCSI cmds file system backing store 41

42 Capturing I/O trace without kernel modification 4 target daemon 3 Worker SCSI Tracer Worker s block trace 2 1 file system SCSI cmds minimum atomic block-transfer operations (mini ops) backing store 42

43 Constructing a post-fault disk image 4 target daemon SCSI Tracer Worker s block trace SCSI cmds Replayer file system backing store clean image 43

44 Constructing a post-fault disk image 4 target daemon 3 SCSI Tracer Worker s block trace 2 1 fault point SCSI cmds Replayer file system backing store 1 failure image 44

45 Checking the post-fault DB 4 target daemon 3 Checker SCSI Tracer Worker s block trace 2 1 fault point SCSI cmds Replayer file system 1 failure image 45

46 Checking the post-fault DB 4 check log target daemon 3 Checker SCSI Tracer Worker s block trace 2 1 fault point autorecovery SCSI cmds Replayer fsck file system 1 failure image 46

47 Testing different fault points easily 4 target daemon 3 Checker SCSI Tracer Worker s block trace 2 1 fault point SCSI cmds Replayer file system backing store clean image 47

48 Testing different fault points easily 4 Checker target daemon SCSI Tracer Worker s block trace fault point SCSI cmds Replayer file system 1 2 backing store failure image 48

49 Testing different fault points easily target daemon 4 3 fault point Checker SCSI Tracer Worker s block trace 2 1 SCSI cmds Replayer file system backing store failure image 49

50 The framework is not good enough Sometimes need several days - too many mini operations, too many potential fault points 50

51 The framework is not good enough Sometimes need several days - too many mini operations, too many potential fault points We tried sampling - but only a few fault points trigger ACID violations 51

52 The framework is not good enough Sometimes need several days - too many mini operations, too many potential fault points We tried sampling - but only a few fault points trigger ACID violations Don t know why 52

53 Enhanced Design

54 Framework Overview Worker & Checker ❶ ❸ Multi-layer Tracer database SCSI cmds ❷ Record & Replayer 54

55 Framework Overview Worker & Checker ❶ ❸ Multi-layer Tracer database SCSI cmds ❷ Record & Replayer 55

56 Original trace provides little semantics op# content LBA 1 0a a bc target daemon Worker SCSI Tracer Worker s block trace SCSI file system backing store 56

57 Enhancing w/ more context op# content LBA timestamp SCSI cmd# file syscall 1 0a x.db msync(x.db) 2 0a x.log fsync(x.log) bc fs-j fsync(x.log) fs-j fsync(x.log) target daemon Worker SCSI Tracer Worker s multi-layer trace SCSI multi-layer tracer file system backing store 57

58 What makes some fault points special? Checking result Worker s multi-layer trace op# content LBA ts cmd# file syscall anything special? 58

59 5 patterns found from 2 databases MMAP p : unintended update to mmap ed blocks op# LBA file syscall x.db fsync(x.log) x.db msync(x.db) 59

60 5 patterns found from 2 databases MMAP p : unintended update to mmap ed blocks op# LBA file syscall x.db fsync(x.log) x.db msync(x.db) 60

61 5 patterns found from 2 databases MMAP p : unintended update to mmap ed blocks op# LBA file syscall x.db fsync(x.log) x.db msync(x.db) intended 61

62 5 patterns found from 2 databases MMAP p : unintended update to mmap ed blocks op# LBA file syscall x.db fsync(x.log) x.db msync(x.db) intended 62

63 5 patterns found from 2 databases MMAP p : unintended update to mmap ed blocks op# LBA file syscall x.db fsync(x.log) x.db msync(x.db) unintended intended implicit flush of dirty blocks by kernel or FS under heavy transactions 63

64 5 patterns found from 2 databases MMAP p : unintended update to mmap ed blocks op# LBA file syscall x.db fsync(x.log) x.db msync(x.db) unintended special! intended 64

65 5 patterns found from 2 databases MMAP p : unintended update to mmap ed blocks op# LBA file syscall x.db fsync(x.log) x.db msync(x.db) unintended special! intended Four more patterns: REP p, JUMP p, HEAD p, TRAN p 65

66 Add fault injection policy to determine where to inject faults check log Checker target daemon SCSI Tracer Worker s blk trace SCSI Replayer Fault Injection Policy file system backing store 66

67 Alternative to Exhaustive: Pattern-based Ranking Worker s multi-layer trace op# LBA cmd# file syscall x.db msyc(x.db) x.log fsync(x.log) x.log fsync(x.log) x.log fsync(x.log) x.log fsync(x.log) x.log fsync(x.log) x.db fsync(x.log) fs-j fsync(x.log) Scoreboard MMAP p REP p JUMP p HEAD p TRAN p total score 67

68 Alternative to Exhaustive: Pattern-based Ranking Worker s multi-layer trace op# LBA cmd# file syscall x.db msyc(x.db) x.log fsync(x.log) x.log fsync(x.log) x.log fsync(x.log) x.log fsync(x.log) x.log fsync(x.log) x.db fsync(x.log) fs-j fsync(x.log) MMAP p REP p JUMP p HEAD p TRAN p Scoreboard total score 68

69 Alternative to Exhaustive: Pattern-based Ranking Worker s multi-layer trace op# LBA cmd# file syscall x.db msyc(x.db) x.log fsync(x.log) x.log fsync(x.log) x.log fsync(x.log) x.log fsync(x.log) x.log fsync(x.log) x.db fsync(x.log) fs-j fsync(x.log) MMAP p REP p JUMP p HEAD p TRAN p Scoreboard total score 69

70 Alternative to Exhaustive: Pattern-based Ranking Worker s multi-layer trace op# LBA cmd# file syscall x.db msyc(x.db) x.log fsync(x.log) x.log fsync(x.log) x.log fsync(x.log) x.log fsync(x.log) x.log fsync(x.log) x.db fsync(x.log) fs-j fsync(x.log) MMAP p REP p JUMP p HEAD p TRAN p Scoreboard total score 70

71 Alternative to Exhaustive: Pattern-based Ranking Worker s multi-layer trace op# LBA cmd# file syscall x.db msyc(x.db) x.log fsync(x.log) x.log fsync(x.log) x.log fsync(x.log) x.log fsync(x.log) x.log fsync(x.log) x.db fsync(x.log) fs-j fsync(x.log) Scoreboard MMAP p REP p JUMP p HEAD p TRAN p total score 71

72 Alternative to Exhaustive: Pattern-based Ranking Worker s multi-layer trace op# LBA cmd# file syscall x.db msyc(x.db) x.log fsync(x.log) x.log fsync(x.log) x.log fsync(x.log) x.log fsync(x.log) x.log fsync(x.log) x.db fsync(x.log) fs-j fsync(x.log) Scoreboard MMAP p REP p JUMP p HEAD p TRAN p total score 72

73 Alternative to Exhaustive: Pattern-based Ranking Worker s multi-layer trace op# LBA cmd# file syscall x.db msyc(x.db) x.log fsync(x.log) x.log fsync(x.log) x.log fsync(x.log) x.log fsync(x.log) x.log fsync(x.log) x.db fsync(x.log) fs-j fsync(x.log) Scoreboard MMAP p REP p JUMP p HEAD p TRAN p total score

74 Alternative to Exhaustive: Pattern-based Ranking Worker s multi-layer trace op# LBA cmd# file syscall x.db msyc(x.db) x.log fsync(x.log) x.log fsync(x.log) x.log fsync(x.log) x.log fsync(x.log) x.log fsync(x.log) x.db fsync(x.log) fs-j fsync(x.log) Scoreboard MMAP p REP p JUMP p HEAD p TRAN p total score st -rank: 2 nd -rank: 3 rd -rank: 4 th -rank: predicted most error-prone

75 Alternative to Exhaustive: Pattern-based Ranking st -rank: 2 nd -rank: 3 rd -rank: 4 th -rank: predicted most error-prone

76 Alternative to Exhaustive: Pattern-based Ranking st -rank: 2 nd -rank: 3 rd -rank: 4 th -rank: predicted most error-prone

77 Diagnosis Support

78 Worker s multi-layer trace helps understand what happened at fault time Worker SCSI target daemon SCSI Tracer multi-layer tracer Worker s multi-layer trace: op#, content, LBA, timestamp, SCSI cmd#, file, syscall, file system backing store 78

79 Add function-call tracing to disclose more semantics for diagnosis Worker file system SCSI target daemon SCSI Tracer backing store multi-layer tracer Worker s multi-layer trace: op#, content, LBA, timestamp, SCSI cmd#, file, syscall, function call 79

80 Enable same tracing during checking to see why recovery didn t work check log Checker target daemon SCSI Tracer Worker s block trace a fault point triggering ACID violation SCSI Replayer Checker s multi-layer trace: file system 1 2 multi-layer tracer op#, content, LBA, timestamp, SCSI cmd#, file, syscall, function call 80

81 Results

82 8 databases - Open-source: TokyoCabinet, MariaDB, LightningDB, SQLite - Commercial: KVS-A, SQL-A, SQL-B, SQL-C 4 workloads 3 file systems - ext3, XFS, NTFS Several operating systems - Linux: RHEL 6, Debian6, Ubuntu 12 LTS - Windows 7 Enterprise Experimental Environment 82

83 Not a single DB can survive all tests DB FS W-1 W-2 W-3 W-4.1 W-4.2 W-4.3 A C I D TokyoCabinet ext3 D D D ACD ACD ACD 0.15% 0.14% % XFS -- D D ACD D ACD <0.01% 0.01% % MariaDB ext3 D D D D D D % XFS D D D D D D % LightningDB ext D % XFS SQLite ext3 D D -- D D D % XFS D D D D % KVS-A ext Hang XFS SQL-A ext3 D D D D D D % XFS D D D D D D % SQL-B ext3 D D CD CD CD CD % % XFS CD D CD CD CD CD % % SQL-C NTFS D D D D D D % 83

84 Durability violation is most common DB FS W-1 W-2 W-3 W-4.1 W-4.2 W-4.3 A C I D TokyoCabinet ext3 D D D ACD ACD ACD 0.15% 0.14% % XFS -- D D ACD D ACD <0.01% 0.01% % MariaDB ext3 D D D D D D % XFS D D D D D D % LightningDB ext D % XFS SQLite ext3 D D -- D D D % XFS D D D D % KVS-A ext Hang XFS SQL-A ext3 D D D D D D % XFS D D D D D D % SQL-B ext3 D D CD CD CD CD % % XFS CD D CD CD CD CD % % SQL-C NTFS D D D D D D % 84

85 Some violations are difficult to trigger DB FS W-1 W-2 W-3 W-4.1 W-4.2 W-4.3 A C I D TokyoCabinet ext3 D D D ACD ACD ACD 0.15% 0.14% % XFS -- D D ACD D ACD <0.01% 0.01% % MariaDB ext3 D D D D D D % XFS D D D D D D % LightningDB ext D % XFS SQLite ext3 D D -- D D D % XFS D D D D % KVS-A ext Hang XFS SQL-A ext3 D D D D D D % XFS D D D D D D % SQL-B ext3 D D CD CD CD CD % % XFS CD D CD CD CD CD % % SQL-C NTFS D D D D D D % 85

86 Some violations are difficult to trigger DB FS W-1 W-2 W-3 W-4.1 W-4.2 W-4.3 A C I D TokyoCabinet ext3 D D D ACD ACD ACD 0.15% 0.14% % XFS -- D D ACD D ACD <0.01% 0.01% % MariaDB ext3 D D D D D D % XFS D D D D D D % LightningDB ext D % XFS SQLite ext3 D D -- D D D % XFS D D D D % KVS-A ext Hang XFS SQL-A ext3 D D D D D D % XFS D D D D D D % SQL-B ext3 D D CD CD CD CD % % XFS CD D CD CD CD CD % % SQL-C NTFS D D D D D D % 86

87 Case Study: A TokyoCabinet Bug Failure symptoms faults injected in a region of operations cause: - A violation: a transaction is partially committed - D violation: some rows are irretrievable - C violation: retrievable rows by range query and point queries are different 87

88 Case Study: A TokyoCabinet Bug Why recovery didn t work? tchdbopenimpl(x.tcb) open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, 100, x.tcb tcbdbget() Checker s trace when ACID violations were found tchdbopenimpl(x.tcb) open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, 101, x.tcb tchdbwalrestore() tcbdbget() Checker s trace when no violation was found 88 Delta Debugging [Zeller, SIGSOFT 02/FSE-10]

89 Case Study: A TokyoCabinet Bug Why recovery didn t work? tchdbopenimpl(x.tcb) open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, 100, x.tcb //no tchdbwalrestore() tcbdbget() Checker s trace when ACID violations were found tchdbopenimpl(x.tcb) open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, 101, x.tcb tchdbwalrestore() tcbdbget() Checker s trace when no violation was found 89 Delta Debugging [Zeller, SIGSOFT 02/FSE-10]

90 Case Study: A TokyoCabinet Bug Why recovery didn t work? tchdbopenimpl(x.tcb) open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, 100, x.tcb //no tchdbwalrestore() tcbdbget() Checker s trace when ACID violations were found tchdbopenimpl(x.tcb) open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, 101, x.tcb tchdbwalrestore() tcbdbget() Checker s trace when no violation was found 90 Delta Debugging [Zeller, SIGSOFT 02/FSE-10]

91 What happened at fault time? Worker s trace around the bug-triggering fault points op#30 #90 mmap(8192, x.tcb, 0) fsync(x.tcb.wal) //op#, LBA, content, file, Case Study: A TokyoCabinet Bug syscall op#26, 630,, x.tcb.wal, fsync(x.tcb.wal) op#27, 960,, fs-j, fsync(x.tcb.wal) op#28, 964,, fs-j, fsync(x.tcb.wal) op#29, 480, 100, x.tcb, fsync(x.tcb.wal) msync(x.tcb) //op#, LBA, content, file, syscall op#91, 480, 101, x.tcb, msync(x.tcb) tchdbopenimpl(x.tcb) open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, 100, x.tcb //no tchdbwalrestore() tcbdbget() 91 Why recovery didn t work? tchdbopenimpl(x.tcb) open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, 101, x.tcb tchdbwalrestore() tcbdbget()

92 What happened at fault time? Worker s trace around the bug-triggering fault points op#30 #90 mmap(8192, x.tcb, 0) fsync(x.tcb.wal) //op#, LBA, content, file, Case Study: A TokyoCabinet Bug syscall op#26, 630,, x.tcb.wal, fsync(x.tcb.wal) op#27, 960,, fs-j, fsync(x.tcb.wal) op#28, 964,, fs-j, fsync(x.tcb.wal) op#29, 480, 100, x.tcb, fsync(x.tcb.wal) msync(x.tcb) //op#, LBA, content, file, syscall op#91, 480, 101, x.tcb, msync(x.tcb) tchdbopenimpl(x.tcb) open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, 100, x.tcb //no tchdbwalrestore() tcbdbget() 92 Why recovery didn t work? tchdbopenimpl(x.tcb) open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, 101, x.tcb tchdbwalrestore() tcbdbget()

93 What happened at fault time? Worker s trace around the bug-triggering fault points op#30 #90 mmap(8192, x.tcb, 0) fsync(x.tcb.wal) //op#, LBA, content, file, Case Study: A TokyoCabinet Bug syscall op#26, 630,, x.tcb.wal, fsync(x.tcb.wal) op#27, 960,, fs-j, fsync(x.tcb.wal) op#28, 964,, fs-j, fsync(x.tcb.wal) op#29, 480, 100, x.tcb, fsync(x.tcb.wal) msync(x.tcb) //op#, LBA, content, file, syscall op#91, 480, 101, x.tcb, msync(x.tcb) tchdbopenimpl(x.tcb) open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, 100, x.tcb //no tchdbwalrestore() tcbdbget() 93 Why recovery didn t work? Intended update tchdbopenimpl(x.tcb) open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, 101, x.tcb tchdbwalrestore() tcbdbget()

94 What happened at fault time? Worker s trace around the bug-triggering fault points op#30 #90 mmap(8192, x.tcb, 0) fsync(x.tcb.wal) //op#, LBA, content, file, Case Study: A TokyoCabinet Bug syscall op#26, 630,, x.tcb.wal, fsync(x.tcb.wal) op#27, 960,, fs-j, fsync(x.tcb.wal) op#28, 964,, fs-j, fsync(x.tcb.wal) op#29, 480, 100, x.tcb, fsync(x.tcb.wal) msync(x.tcb) //op#, LBA, content, file, syscall op#91, 480, 101, x.tcb, msync(x.tcb) tchdbopenimpl(x.tcb) open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, 100, x.tcb //no tchdbwalrestore() tcbdbget() 94 Why recovery didn t work? Intended update tchdbopenimpl(x.tcb) open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, 101, x.tcb tchdbwalrestore() tcbdbget()

95 What happened at fault time? Worker s trace around the bug-triggering fault points op#30 #90 mmap(8192, x.tcb, 0) fsync(x.tcb.wal) //op#, LBA, content, file, Case Study: A TokyoCabinet Bug syscall op#26, 630,, x.tcb.wal, fsync(x.tcb.wal) op#27, 960,, fs-j, fsync(x.tcb.wal) op#28, 964,, fs-j, fsync(x.tcb.wal) op#29, 480, 100, x.tcb, fsync(x.tcb.wal) msync(x.tcb) //op#, LBA, content, file, syscall op#91, 480, 101, x.tcb, msync(x.tcb) tchdbopenimpl(x.tcb) open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, 100, x.tcb //no tchdbwalrestore() tcbdbget() 95 Why recovery didn t work? Unintended update! Intended update tchdbopenimpl(x.tcb) open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, 101, x.tcb tchdbwalrestore() tcbdbget()

96 What happened at fault time? Worker s trace around the bug-triggering fault points op#30 #90 mmap(8192, x.tcb, 0) fsync(x.tcb.wal) //op#, LBA, content, file, Case Study: A TokyoCabinet Bug syscall op#26, 630,, x.tcb.wal, fsync(x.tcb.wal) op#27, 960,, fs-j, fsync(x.tcb.wal) op#28, 964,, fs-j, fsync(x.tcb.wal) op#29, 480, 100, x.tcb, fsync(x.tcb.wal) msync(x.tcb) //op#, LBA, content, file, syscall op#91, 480, 101, x.tcb, msync(x.tcb) tchdbopenimpl(x.tcb) open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, 100, x.tcb //no tchdbwalrestore() tcbdbget() 96 Why recovery didn t work? tchdbopenimpl(x.tcb) open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, 101, x.tcb tchdbwalrestore() tcbdbget() One solution: Failure-atomic msync() [Park et.al., EuroSys 13]

97 Patterns reduce required test points greatly while achieving similar coverage SQL-C MariaDB KVS-A LightningDB SQL-B SQL-A SQLite TokyoCabinet reduction factor = 97

98 Patterns reduce required test points greatly while achieving similar coverage SQL-C MariaDB KVS-A LightningDB SQL-B SQL-A SQLite TokyoCabinet the 2 databases from which patterns are extracted reduction factor = 98

99 Patterns reduce required test points greatly while achieving similar coverage SQL-C 157 MariaDB 72 KVS-A 61 LightningDB SQL-B SQL-A SQLite reduce testing time from > 2 months to < 3 days! TokyoCabinet 6 reduction factor = 99

100 A wake-up call Conclusions & Future Work - traditional testing methodology may not be enough for today s complex storage systems - thorough testing requires purpose-built workloads and intelligent fault injection techniques 100

101 A wake-up call Conclusions & Future Work - traditional testing methodology may not be enough for today s complex storage systems - thorough testing requires purpose-built workloads and intelligent fault injection techniques Different layers in OS can help in different ways - iscsi: fault injection w/ high portability & high fidelity - LBA & syscall: generic behavior patterns - combined multi-layer info: clear whole picture of complicated scenarios 101

102 Conclusions & Future Work We should bridge the gaps of understanding/assumptions! between DB & OS between User & DB 102

103 Conclusions & Future Work We should bridge the gaps of understanding/assumptions! between DB & OS between User & DB Pop Quiz: true or false? mmap'ed files are not updated until msync() file-length update are persistent after fdatasync() durability is provided by the default configuration transactions are durable after COMMIT 103

104 Conclusions & Future Work We should bridge the gaps of understanding/assumptions! between DB & OS between User & DB Pop Quiz: true or false? mmap'ed files are not updated until msync() file-length update are persistent after fdatasync() durability is provided by the default configuration transactions are durable after COMMIT false 104

105 Conclusions & Future Work We should bridge the gaps of understanding/assumptions! between DB & OS between User & DB Pop Quiz: true or false? mmap'ed files are not updated until msync() file-length update are persistent after fdatasync() durability is provided by the default configuration transactions are durable after COMMIT false depends! depends! depends! 105

106 Conclusions & Future Work We should bridge the gaps of understanding/assumptions! between DB & OS between User & DB Pop Quiz: true or false? mmap'ed files are not updated until msync() file-length update are persistent after fdatasync() durability is provided by the default configuration transactions are durable after COMMIT false depends! depends! depends! Thank you! 106

TxFS: Leveraging File-System Crash Consistency to Provide ACID Transactions

TxFS: Leveraging File-System Crash Consistency to Provide ACID Transactions TxFS: Leveraging File-System Crash Consistency to Provide ACID Transactions Yige Hu, Zhiting Zhu, Ian Neal, Youngjin Kwon, Tianyu Chen, Vijay Chidambaram, Emmett Witchel The University of Texas at Austin

More information

CS 318 Principles of Operating Systems

CS 318 Principles of Operating Systems CS 318 Principles of Operating Systems Fall 2017 Lecture 17: File System Crash Consistency Ryan Huang Administrivia Lab 3 deadline Thursday Nov 9 th 11:59pm Thursday class cancelled, work on the lab Some

More information

Lightweight Application-Level Crash Consistency on Transactional Flash Storage

Lightweight Application-Level Crash Consistency on Transactional Flash Storage Lightweight Application-Level Crash Consistency on Transactional Flash Storage Changwoo Min, Woon-Hak Kang, Taesoo Kim, Sang-Won Lee, Young Ik Eom Georgia Institute of Technology Sungkyunkwan University

More information

FS Consistency & Journaling

FS Consistency & Journaling FS Consistency & Journaling Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau) Why Is Consistency Challenging? File system may perform several disk writes to serve a single request Caching

More information

EECS 482 Introduction to Operating Systems

EECS 482 Introduction to Operating Systems EECS 482 Introduction to Operating Systems Winter 2018 Harsha V. Madhyastha Multiple updates and reliability Data must survive crashes and power outages Assume: update of one block atomic and durable Challenge:

More information

CS122 Lecture 15 Winter Term,

CS122 Lecture 15 Winter Term, CS122 Lecture 15 Winter Term, 2017-2018 2 Transaction Processing Last time, introduced transaction processing ACID properties: Atomicity, consistency, isolation, durability Began talking about implementing

More information

Topics. File Buffer Cache for Performance. What to Cache? COS 318: Operating Systems. File Performance and Reliability

Topics. File Buffer Cache for Performance. What to Cache? COS 318: Operating Systems. File Performance and Reliability Topics COS 318: Operating Systems File Performance and Reliability File buffer cache Disk failure and recovery tools Consistent updates Transactions and logging 2 File Buffer Cache for Performance What

More information

Crash Consistency: FSCK and Journaling. Dongkun Shin, SKKU

Crash Consistency: FSCK and Journaling. Dongkun Shin, SKKU Crash Consistency: FSCK and Journaling 1 Crash-consistency problem File system data structures must persist stored on HDD/SSD despite power loss or system crash Crash-consistency problem The system may

More information

CSE 530A ACID. Washington University Fall 2013

CSE 530A ACID. Washington University Fall 2013 CSE 530A ACID Washington University Fall 2013 Concurrency Enterprise-scale DBMSs are designed to host multiple databases and handle multiple concurrent connections Transactions are designed to enable Data

More information

Caching and reliability

Caching and reliability Caching and reliability Block cache Vs. Latency ~10 ns 1~ ms Access unit Byte (word) Sector Capacity Gigabytes Terabytes Price Expensive Cheap Caching disk contents in RAM Hit ratio h : probability of

More information

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Last Class Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications Basic Timestamp Ordering Optimistic Concurrency Control Multi-Version Concurrency Control C. Faloutsos A. Pavlo Lecture#23:

More information

Towards Efficient, Portable Application-Level Consistency

Towards Efficient, Portable Application-Level Consistency Towards Efficient, Portable Application-Level Consistency Thanumalayan Sankaranarayana Pillai, Vijay Chidambaram, Joo-Young Hwang, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau 1 File System Crash

More information

Actions are never left partially executed. Actions leave the DB in a consistent state. Actions are not affected by other concurrent actions

Actions are never left partially executed. Actions leave the DB in a consistent state. Actions are not affected by other concurrent actions Transaction Management Recovery (1) Review: ACID Properties Atomicity Actions are never left partially executed Consistency Actions leave the DB in a consistent state Isolation Actions are not affected

More information

Outline. Failure Types

Outline. Failure Types Outline Database Tuning Nikolaus Augsten University of Salzburg Department of Computer Science Database Group 1 Unit 10 WS 2013/2014 Adapted from Database Tuning by Dennis Shasha and Philippe Bonnet. Nikolaus

More information

RECOVERY CHAPTER 21,23 (6/E) CHAPTER 17,19 (5/E)

RECOVERY CHAPTER 21,23 (6/E) CHAPTER 17,19 (5/E) RECOVERY CHAPTER 21,23 (6/E) CHAPTER 17,19 (5/E) 2 LECTURE OUTLINE Failures Recoverable schedules Transaction logs Recovery procedure 3 PURPOSE OF DATABASE RECOVERY To bring the database into the most

More information

W4118 Operating Systems. Instructor: Junfeng Yang

W4118 Operating Systems. Instructor: Junfeng Yang W4118 Operating Systems Instructor: Junfeng Yang File systems in Linux Linux Second Extended File System (Ext2) What is the EXT2 on-disk layout? What is the EXT2 directory structure? Linux Third Extended

More information

CSC 261/461 Database Systems Lecture 20. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

CSC 261/461 Database Systems Lecture 20. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101 CSC 261/461 Database Systems Lecture 20 Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101 Announcements Project 1 Milestone 3: Due tonight Project 2 Part 2 (Optional): Due on: 04/08 Project 3

More information

CrashMonkey: A Framework to Systematically Test File-System Crash Consistency. Ashlie Martinez Vijay Chidambaram University of Texas at Austin

CrashMonkey: A Framework to Systematically Test File-System Crash Consistency. Ashlie Martinez Vijay Chidambaram University of Texas at Austin CrashMonkey: A Framework to Systematically Test File-System Crash Consistency Ashlie Martinez Vijay Chidambaram University of Texas at Austin Crash Consistency File-system updates change multiple blocks

More information

Crash Recovery. Assignment 1 Posted Saturday

Crash Recovery. Assignment 1 Posted Saturday Crash Recovery Wyatt Lloyd Assignment 1 Posted Saturday On github, instructions in readme.md: https://github.com/usc657/username-assignment1 Posted later than I intended => You get lots of late days Please

More information

Introduction to Data Management CSE 344

Introduction to Data Management CSE 344 Introduction to Data Management CSE 344 Lecture 22: Transactions I CSE 344 - Fall 2014 1 Announcements HW6 due tomorrow night Next webquiz and hw out by end of the week HW7: Some Java programming required

More information

EXPLODE: a Lightweight, General System for Finding Serious Storage System Errors. Junfeng Yang, Can Sar, Dawson Engler Stanford University

EXPLODE: a Lightweight, General System for Finding Serious Storage System Errors. Junfeng Yang, Can Sar, Dawson Engler Stanford University EXPLODE: a Lightweight, General System for Finding Serious Storage System Errors Junfeng Yang, Can Sar, Dawson Engler Stanford University Why check storage systems? Storage system errors are among the

More information

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) ) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) Transactions - Definition A transaction is a sequence of data operations with the following properties: * A Atomic All

More information

The Dangers and Complexities of SQLite Benchmarking. Dhathri Purohith, Jayashree Mohan and Vijay Chidambaram

The Dangers and Complexities of SQLite Benchmarking. Dhathri Purohith, Jayashree Mohan and Vijay Chidambaram The Dangers and Complexities of SQLite Benchmarking Dhathri Purohith, Jayashree Mohan and Vijay Chidambaram 2 3 Benchmarking SQLite is Non-trivial! Benchmarking complex systems in a repeatable fashion

More information

DATABASE TRANSACTIONS. CS121: Relational Databases Fall 2017 Lecture 25

DATABASE TRANSACTIONS. CS121: Relational Databases Fall 2017 Lecture 25 DATABASE TRANSACTIONS CS121: Relational Databases Fall 2017 Lecture 25 Database Transactions 2 Many situations where a sequence of database operations must be treated as a single unit A combination of

More information

Big Data Technology Incremental Processing using Distributed Transactions

Big Data Technology Incremental Processing using Distributed Transactions Big Data Technology Incremental Processing using Distributed Transactions Eshcar Hillel Yahoo! Ronny Lempel Outbrain *Based on slides by Edward Bortnikov and Ohad Shacham Roadmap Previous classes Stream

More information

Datacenter replication solution with quasardb

Datacenter replication solution with quasardb Datacenter replication solution with quasardb Technical positioning paper April 2017 Release v1.3 www.quasardb.net Contact: sales@quasardb.net Quasardb A datacenter survival guide quasardb INTRODUCTION

More information

Advanced file systems: LFS and Soft Updates. Ken Birman (based on slides by Ben Atkin)

Advanced file systems: LFS and Soft Updates. Ken Birman (based on slides by Ben Atkin) : LFS and Soft Updates Ken Birman (based on slides by Ben Atkin) Overview of talk Unix Fast File System Log-Structured System Soft Updates Conclusions 2 The Unix Fast File System Berkeley Unix (4.2BSD)

More information

Kubuntu Installation:

Kubuntu Installation: Kubuntu Installation: Kubuntu is a user friendly operating system based on KDE, the K Desktop Environment. With a predictable 6 month release cycle and part of the Ubuntu project, Kubuntu is the GNU/Linux

More information

CS3210: Crash consistency

CS3210: Crash consistency CS3210: Crash consistency Kyle Harrigan 1 / 45 Administrivia Lab 4 Part C Due Tomorrow Quiz #2. Lab3-4, Ch 3-6 (read "xv6 book") Open laptop/book, no Internet 2 / 45 Summary of cs3210 Power-on -> BIOS

More information

Database Management System

Database Management System Database Management System Lecture 10 Recovery * Some materials adapted from R. Ramakrishnan, J. Gehrke and Shawn Bowers Basic Database Architecture Database Management System 2 Recovery Which ACID properties

More information

CS5460: Operating Systems Lecture 20: File System Reliability

CS5460: Operating Systems Lecture 20: File System Reliability CS5460: Operating Systems Lecture 20: File System Reliability File System Optimizations Modern Historic Technique Disk buffer cache Aggregated disk I/O Prefetching Disk head scheduling Disk interleaving

More information

Introduction to Databases, Fall 2005 IT University of Copenhagen. Lecture 10: Transaction processing. November 14, Lecturer: Rasmus Pagh

Introduction to Databases, Fall 2005 IT University of Copenhagen. Lecture 10: Transaction processing. November 14, Lecturer: Rasmus Pagh Introduction to Databases, Fall 2005 IT University of Copenhagen Lecture 10: Transaction processing November 14, 2005 Lecturer: Rasmus Pagh Today s lecture Part I: Transaction processing Serializability

More information

*-Box (star-box) Towards Reliability and Consistency in Dropbox-like File Synchronization Services

*-Box (star-box) Towards Reliability and Consistency in Dropbox-like File Synchronization Services *-Box (star-box) Towards Reliability and Consistency in -like File Synchronization Services Yupu Zhang, Chris Dragga, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau University of Wisconsin - Madison 6/27/2013

More information

Motivating Example. Motivating Example. Transaction ROLLBACK. Transactions. CSE 444: Database Internals

Motivating Example. Motivating Example. Transaction ROLLBACK. Transactions. CSE 444: Database Internals CSE 444: Database Internals Client 1: SET money=money-100 WHERE pid = 1 Motivating Example Client 2: SELECT sum(money) FROM Budget Lectures 13 Transaction Schedules 1 SET money=money+60 WHERE pid = 2 SET

More information

The transaction. Defining properties of transactions. Failures in complex systems propagate. Concurrency Control, Locking, and Recovery

The transaction. Defining properties of transactions. Failures in complex systems propagate. Concurrency Control, Locking, and Recovery Failures in complex systems propagate Concurrency Control, Locking, and Recovery COS 418: Distributed Systems Lecture 17 Say one bit in a DRAM fails: flips a bit in a kernel memory write causes a kernel

More information

File Systems: Consistency Issues

File Systems: Consistency Issues File Systems: Consistency Issues File systems maintain many data structures Free list/bit vector Directories File headers and inode structures res Data blocks File Systems: Consistency Issues All data

More information

Announcements. Motivating Example. Transaction ROLLBACK. Motivating Example. CSE 444: Database Internals. Lab 2 extended until Monday

Announcements. Motivating Example. Transaction ROLLBACK. Motivating Example. CSE 444: Database Internals. Lab 2 extended until Monday Announcements CSE 444: Database Internals Lab 2 extended until Monday Lab 2 quiz moved to Wednesday Lectures 13 Transaction Schedules HW5 extended to Friday 544M: Paper 3 due next Friday as well CSE 444

More information

Lecture 21: Logging Schemes /645 Database Systems (Fall 2017) Carnegie Mellon University Prof. Andy Pavlo

Lecture 21: Logging Schemes /645 Database Systems (Fall 2017) Carnegie Mellon University Prof. Andy Pavlo Lecture 21: Logging Schemes 15-445/645 Database Systems (Fall 2017) Carnegie Mellon University Prof. Andy Pavlo Crash Recovery Recovery algorithms are techniques to ensure database consistency, transaction

More information

Why Transac'ons? Database systems are normally being accessed by many users or processes at the same 'me.

Why Transac'ons? Database systems are normally being accessed by many users or processes at the same 'me. Transac'ons 1 Why Transac'ons? Database systems are normally being accessed by many users or processes at the same 'me. Both queries and modifica'ons. Unlike opera'ng systems, which support interac'on

More information

Database Technology. Topic 11: Database Recovery

Database Technology. Topic 11: Database Recovery Topic 11: Database Recovery Olaf Hartig olaf.hartig@liu.se Types of Failures Database may become unavailable for use due to: Transaction failures e.g., incorrect input, deadlock, incorrect synchronization

More information

(Lightweight) Recoverable Virtual Memory. Robert Grimm New York University

(Lightweight) Recoverable Virtual Memory. Robert Grimm New York University (Lightweight) Recoverable Virtual Memory Robert Grimm New York University The Three Questions What is the problem? What is new or different? What are the contributions and limitations? The Goal Simplify

More information

Transaction Management & Concurrency Control. CS 377: Database Systems

Transaction Management & Concurrency Control. CS 377: Database Systems Transaction Management & Concurrency Control CS 377: Database Systems Review: Database Properties Scalability Concurrency Data storage, indexing & query optimization Today & next class Persistency Security

More information

JOURNALING FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 26

JOURNALING FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 26 JOURNALING FILE SYSTEMS CS124 Operating Systems Winter 2015-2016, Lecture 26 2 File System Robustness The operating system keeps a cache of filesystem data Secondary storage devices are much slower than

More information

Reminder from last time

Reminder from last time Concurrent systems Lecture 7: Crash recovery, lock-free programming, and transactional memory DrRobert N. M. Watson 1 Reminder from last time History graphs; good (and bad) schedules Isolation vs. strict

More information

CS 445 Introduction to Database Systems

CS 445 Introduction to Database Systems CS 445 Introduction to Database Systems TTh 2:45-4:20pm Chadd Williams Pacific University 1 Overview Practical introduction to databases theory + hands on projects Topics Relational Model Relational Algebra/Calculus/

More information

File System Consistency. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

File System Consistency. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University File System Consistency Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Crash Consistency File system may perform several disk writes to complete

More information

1/9/13. + The Transaction Concept. Transaction Processing. Multiple online users: Gives rise to the concurrency problem.

1/9/13. + The Transaction Concept. Transaction Processing. Multiple online users: Gives rise to the concurrency problem. + Transaction Processing Enterprise Scale Data Management Divy Agrawal Department of Computer Science University of California at Santa Barbara + The Transaction Concept Multiple online users: Gives rise

More information

Recoverability. Kathleen Durant PhD CS3200

Recoverability. Kathleen Durant PhD CS3200 Recoverability Kathleen Durant PhD CS3200 1 Recovery Manager Recovery manager ensures the ACID principles of atomicity and durability Atomicity: either all actions in a transaction are done or none are

More information

Lab IV. Transaction Management. Database Laboratory

Lab IV. Transaction Management. Database Laboratory Lab IV Transaction Management Database Laboratory Objectives To work with transactions in ORACLE To study the properties of transactions in ORACLE Database integrity must be controlled when access operations

More information

Introduction to Data Management CSE 344

Introduction to Data Management CSE 344 Introduction to Data Management CSE 344 Lecture 22: Transactions CSE 344 - Fall 2013 1 Announcements HW6 is due tonight Webquiz due next Monday HW7 is posted: Some Java programming required Plus connection

More information

CSE Database Management Systems. York University. Parke Godfrey. Winter CSE-4411M Database Management Systems Godfrey p.

CSE Database Management Systems. York University. Parke Godfrey. Winter CSE-4411M Database Management Systems Godfrey p. CSE-4411 Database Management Systems York University Parke Godfrey Winter 2014 CSE-4411M Database Management Systems Godfrey p. 1/16 CSE-3421 vs CSE-4411 CSE-4411 is a continuation of CSE-3421, right?

More information

Database Management Systems Introduction to DBMS

Database Management Systems Introduction to DBMS Database Management Systems Introduction to DBMS D B M G 1 Introduction to DBMS Data Base Management System (DBMS) A software package designed to store and manage databases We are interested in internal

More information

ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases

ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases We talked about transactions and how to implement them in a single-node database. We ll now start looking into how to

More information

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Administrivia. Last Class. Faloutsos/Pavlo CMU /615

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Administrivia. Last Class. Faloutsos/Pavlo CMU /615 Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos A. Pavlo Lecture#23: Crash Recovery Part 2 (R&G ch. 18) Administrivia HW8 is due Thurs April 24 th Faloutsos/Pavlo

More information

NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory

NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory Dhananjoy Das, Sr. Systems Architect SanDisk Corp. 1 Agenda: Applications are KING! Storage landscape (Flash / NVM)

More information

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class. Today s Class. Faloutsos/Pavlo CMU /615

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class. Today s Class. Faloutsos/Pavlo CMU /615 Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos A. Pavlo Lecture#23: Crash Recovery Part 1 (R&G ch. 18) Last Class Basic Timestamp Ordering Optimistic Concurrency

More information

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) ) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) Goal A Distributed Transaction We want a transaction that involves multiple nodes Review of transactions and their properties

More information

Amazon Aurora Relational databases reimagined.

Amazon Aurora Relational databases reimagined. Amazon Aurora Relational databases reimagined. Ronan Guilfoyle, Solutions Architect, AWS Brian Scanlan, Engineer, Intercom 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Current

More information

CSE 444: Database Internals. Lectures 13 Transaction Schedules

CSE 444: Database Internals. Lectures 13 Transaction Schedules CSE 444: Database Internals Lectures 13 Transaction Schedules CSE 444 - Winter 2018 1 About Lab 3 In lab 3, we implement transactions Focus on concurrency control Want to run many transactions at the same

More information

Lesson 9 Transcript: Backup and Recovery

Lesson 9 Transcript: Backup and Recovery Lesson 9 Transcript: Backup and Recovery Slide 1: Cover Welcome to lesson 9 of the DB2 on Campus Lecture Series. We are going to talk in this presentation about database logging and backup and recovery.

More information

6.830 Lecture Recovery 10/30/2017

6.830 Lecture Recovery 10/30/2017 6.830 Lecture 14 -- Recovery 10/30/2017 Have been talking about transactions Transactions -- what do they do? Awesomely powerful abstraction -- programmer can run arbitrary mixture of commands that read

More information

Database Security: Transactions, Access Control, and SQL Injection

Database Security: Transactions, Access Control, and SQL Injection .. Cal Poly Spring 2013 CPE/CSC 365 Introduction to Database Systems Eriq Augustine.. Transactions Database Security: Transactions, Access Control, and SQL Injection A transaction is a sequence of SQL

More information

ECE 650 Systems Programming & Engineering. Spring 2018

ECE 650 Systems Programming & Engineering. Spring 2018 ECE 650 Systems Programming & Engineering Spring 2018 Database Transaction Processing Tyler Bletsch Duke University Slides are adapted from Brian Rogers (Duke) Transaction Processing Systems Systems with

More information

File System Consistency

File System Consistency File System Consistency Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

In This Lecture. Transactions and Recovery. Transactions. Transactions. Isolation and Durability. Atomicity and Consistency. Transactions Recovery

In This Lecture. Transactions and Recovery. Transactions. Transactions. Isolation and Durability. Atomicity and Consistency. Transactions Recovery In This Lecture Database Systems Lecture 15 Natasha Alechina Transactions Recovery System and Media s Concurrency Concurrency problems For more information Connolly and Begg chapter 20 Ullmanand Widom8.6

More information

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) ) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) Goal A Distributed Transaction We want a transaction that involves multiple nodes Review of transactions and their properties

More information

Introduction to Transaction Management

Introduction to Transaction Management Introduction to Transaction Management CMPSCI 445 Fall 2008 Slide content adapted from Ramakrishnan & Gehrke, Zack Ives 1 Concurrency Control Concurrent execution of user programs is essential for good

More information

Testing Error Handling Code in Device Drivers Using Characteristic Fault Injection

Testing Error Handling Code in Device Drivers Using Characteristic Fault Injection 1 Testing Error Handling Code in Device Drivers Using Characteristic Fault Injection Jia-Ju Bai, Yu-Ping Wang, Jie Yin, Shi-Min Hu Department of Computer Science and Technology Tsinghua University Beijing,

More information

Tuesday, April 6, Inside SQL Server

Tuesday, April 6, Inside SQL Server Inside SQL Server Thank you Goals What happens when a query runs? What each component does How to observe what s going on Delicious SQL Cake Delicious SQL Cake Delicious SQL Cake Delicious SQL Cake Delicious

More information

PJFS (Partitioning, Journaling File System): For Embedded Systems. Ada-Europe 6-June-2006 Greg Gicca

PJFS (Partitioning, Journaling File System): For Embedded Systems. Ada-Europe 6-June-2006 Greg Gicca PJFS (Partitioning, Journaling File System): For Embedded Systems Ada-Europe 6-June-2006 Greg Gicca Why File Systems Are Important Most computer systems today, even deeply embedded, have ample storage

More information

Page 1. Quiz 18.1: Flow-Control" Goals for Today" Quiz 18.1: Flow-Control" CS162 Operating Systems and Systems Programming Lecture 18 Transactions"

Page 1. Quiz 18.1: Flow-Control Goals for Today Quiz 18.1: Flow-Control CS162 Operating Systems and Systems Programming Lecture 18 Transactions Quiz 18.1: Flow-Control" CS162 Operating Systems and Systems Programming Lecture 18 Transactions" April 8, 2013 Anthony D. Joseph http://inst.eecs.berkeley.edu/~cs162 Q1: True _ False _ Flow control is

More information

CMPT 354: Database System I. Lecture 11. Transaction Management

CMPT 354: Database System I. Lecture 11. Transaction Management CMPT 354: Database System I Lecture 11. Transaction Management 1 Why this lecture DB application developer What if crash occurs, power goes out, etc? Single user à Multiple users 2 Outline Transaction

More information

Chapter 20 Introduction to Transaction Processing Concepts and Theory

Chapter 20 Introduction to Transaction Processing Concepts and Theory Chapter 20 Introduction to Transaction Processing Concepts and Theory - Logical units of DB processing - Large database and hundreds of transactions - Ex. Stock market, super market, banking, etc - High

More information

Recovery and Logging

Recovery and Logging Recovery and Logging Computer Science E-66 Harvard University David G. Sullivan, Ph.D. Review: ACID Properties A transaction has the following ACID properties: Atomicity: either all of its changes take

More information

CSE 451: Operating Systems Winter Module 17 Journaling File Systems

CSE 451: Operating Systems Winter Module 17 Journaling File Systems CSE 451: Operating Systems Winter 2017 Module 17 Journaling File Systems Mark Zbikowski mzbik@cs.washington.edu Allen Center 476 2013 Gribble, Lazowska, Levy, Zahorjan In our most recent exciting episodes

More information

High availability with MariaDB TX: The definitive guide

High availability with MariaDB TX: The definitive guide High availability with MariaDB TX: The definitive guide MARCH 2018 Table of Contents Introduction - Concepts - Terminology MariaDB TX High availability - Master/slave replication - Multi-master clustering

More information

Testing Methodologies

Testing Methodologies Testing Methodologies Achieving Fewer Returns, Satisfied Customers and Richer Markets Michael Bellon, V.P. Engineering, Flexstar Technology August 2013 1 SSD Mortality Survey The following is an excerpt

More information

System Wide Tracing User Need

System Wide Tracing User Need System Wide Tracing User Need dominique toupin ericsson com April 2010 About me Developer Tool Manager at Ericsson, helping Ericsson sites to develop better software efficiently Background

More information

Workload Performance Comparison. Eden Kim, Calypso Systems, Inc.

Workload Performance Comparison. Eden Kim, Calypso Systems, Inc. Workload Performance Comparison Eden Kim, Calypso Systems, Inc. Agenda Part 1: Workload Comparisons: Part 2: NVMe SSD, & Mmap Part 3: DAX: Conclusions 2 Part 1: Workload Comparisons Synthetic Benchmark:

More information

COURSE 1. Database Management Systems

COURSE 1. Database Management Systems COURSE 1 Database Management Systems Assessment / Other Details Final grade 50% - laboratory activity / practical test 50% - written exam Course details (bibliography, course slides, seminars, lab descriptions

More information

Software Development Using Full System Simulation with Freescale QorIQ Communications Processors

Software Development Using Full System Simulation with Freescale QorIQ Communications Processors Patrick Keliher, Simics Field Application Engineer Software Development Using Full System Simulation with Freescale QorIQ Communications Processors 1 2013 Wind River. All Rights Reserved. Agenda Introduction

More information

DH2i s DxEnterprise for Linux Server A SQL Server Use Case

DH2i s DxEnterprise for Linux Server A SQL Server Use Case DH2i s DxEnterprise for Linux Server A SQL Server Use Case Introduction Linux is the world s fastest growing server operating system. Responding to this growth, Microsoft is shipping a version of SQL Server

More information

Deep Dive: InnoDB Transactions and Write Paths

Deep Dive: InnoDB Transactions and Write Paths Deep Dive: InnoDB Transactions and Write Paths From the client connection to physical storage Marko Mäkelä, Lead Developer InnoDB Michaël de Groot, MariaDB Consultant InnoDB Concepts Some terms that an

More information

SQL: Transactions. Announcements (October 2) Transactions. CPS 116 Introduction to Database Systems. Project milestone #1 due in 1½ weeks

SQL: Transactions. Announcements (October 2) Transactions. CPS 116 Introduction to Database Systems. Project milestone #1 due in 1½ weeks SQL: Transactions CPS 116 Introduction to Database Systems Announcements (October 2) 2 Project milestone #1 due in 1½ weeks Come to my office hours if you want to chat about project ideas Midterm in class

More information

Formal Technology in the Post Silicon lab

Formal Technology in the Post Silicon lab Formal Technology in the Post Silicon lab Real-Life Application Examples Haifa Verification Conference Jamil R. Mazzawi Lawrence Loh Jasper Design Automation Focus of This Presentation Finding bugs in

More information

Improving Linux Development with better tools. Andi Kleen. Oct 2013 Intel Corporation

Improving Linux Development with better tools. Andi Kleen. Oct 2013 Intel Corporation Improving Linux Development with better tools Andi Kleen Oct 2013 Intel Corporation ak@linux.intel.com Linux complexity growing Source lines in Linux kernel All source code 16.5 16 15.5 M-LOC 15 14.5 14

More information

Tolerating Hardware Device Failures in Software. Asim Kadav, Matthew J. Renzelmann, Michael M. Swift University of Wisconsin Madison

Tolerating Hardware Device Failures in Software. Asim Kadav, Matthew J. Renzelmann, Michael M. Swift University of Wisconsin Madison Tolerating Hardware Device Failures in Software Asim Kadav, Matthew J. Renzelmann, Michael M. Swift University of Wisconsin Madison Current state of OS hardware interaction Many device drivers assume device

More information

Deep Dive: InnoDB Transactions and Write Paths

Deep Dive: InnoDB Transactions and Write Paths Deep Dive: InnoDB Transactions and Write Paths From the client connection to physical storage Marko Mäkelä, Lead Developer InnoDB Michaël de Groot, MariaDB Consultant Second Edition, for MariaDB Developers

More information

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver Using the SDACK Architecture to Build a Big Data Product Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver Outline A Threat Analytic Big Data product The SDACK Architecture Akka Streams and data

More information

Announcements. Transaction. Motivating Example. Motivating Example. Transactions. CSE 444: Database Internals

Announcements. Transaction. Motivating Example. Motivating Example. Transactions. CSE 444: Database Internals Announcements CSE 444: Database Internals Lab 2 is due TODAY Lab 3 will be released tomorrow, part 1 due next Monday Lectures 13 Transaction Schedules CSE 444 - Spring 2015 1 HW4 is due on Wednesday HW3

More information

Databases: transaction processing

Databases: transaction processing Databases: transaction processing P.A.Rounce Room 6.18 p.rounce@cs.ucl.ac.uk 1 ACID Database operation is processing of a set of transactions Required features of a database transaction should be Atomicity

More information

Virtual machines (e.g., VMware)

Virtual machines (e.g., VMware) Case studies : Introduction to operating systems principles Abstraction Management of shared resources Indirection Concurrency Atomicity Protection Naming Security Reliability Scheduling Fairness Performance

More information

6.830 Lecture Recovery 10/30/2017

6.830 Lecture Recovery 10/30/2017 6.830 Lecture 14 -- Recovery 10/30/2017 Have been talking about transactions Transactions -- what do they do? Awesomely powerful abstraction -- programmer can run arbitrary mixture of commands that read

More information

CSC 261/461 Database Systems Lecture 21 and 22. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

CSC 261/461 Database Systems Lecture 21 and 22. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101 CSC 261/461 Database Systems Lecture 21 and 22 Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101 Announcements Project 3 (MongoDB): Due on: 04/12 Work on Term Project and Project 1 The last (mini)

More information

Final Review. May 9, 2017

Final Review. May 9, 2017 Final Review May 9, 2017 1 SQL 2 A Basic SQL Query (optional) keyword indicating that the answer should not contain duplicates SELECT [DISTINCT] target-list A list of attributes of relations in relation-list

More information

Analysis of Derby Performance

Analysis of Derby Performance Analysis of Derby Performance Staff Engineer Olav Sandstå Senior Engineer Dyre Tjeldvoll Sun Microsystems Database Technology Group This is a draft version that is subject to change. The authors can be

More information

Background. Let s see what we prescribed.

Background. Let s see what we prescribed. Background Patient B s custom application had slowed down as their data grew. They d tried several different relief efforts over time, but performance issues kept popping up especially deadlocks. They

More information

Final Review. May 9, 2018 May 11, 2018

Final Review. May 9, 2018 May 11, 2018 Final Review May 9, 2018 May 11, 2018 1 SQL 2 A Basic SQL Query (optional) keyword indicating that the answer should not contain duplicates SELECT [DISTINCT] target-list A list of attributes of relations

More information

Improving Linux development with better tools

Improving Linux development with better tools Improving Linux development with better tools Andi Kleen Oct 2013 Intel Corporation ak@linux.intel.com Linux complexity growing Source lines in Linux kernel All source code 16.5 16 15.5 M-LOC 15 14.5 14

More information

Concurrency control 12/1/17

Concurrency control 12/1/17 Concurrency control 12/1/17 Bag of words... Isolation Linearizability Consistency Strict serializability Durability Snapshot isolation Conflict equivalence Serializability Atomicity Optimistic concurrency

More information